FACTOID # 73: 62% of Bulgarians describe themselves as either 'not very' or 'not at all' happy.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Statistical semantics
Linguistics
Theoretical linguistics
Phonetics
Phonology
Morphology
Syntax
Semantics
Lexical semantics
Statistical semantics
Structural semantics
Prototype semantics
Stylistics
Prescription
Pragmatics
Applied linguistics
Psycholinguistics
Sociolinguistics
Generative linguistics
Cognitive linguistics
Computational linguistics
Descriptive linguistics
Historical linguistics
Comparative linguistics
Etymology
History of linguistics
List of linguists
Unsolved problems

Statistical Semantics is the study of "how the statistical patterns of human word usage can be used to figure out what people mean, at least to a level sufficient for information access" (Furnas, 2006). How can we figure out what words mean, simply by looking at patterns of words in huge collections of text? What are the limits to this approach to understanding words? Linguistics is the scientific study of human language. ... Theoretical linguistics is that branch of linguistics that is most concerned with developing models of linguistic knowledge. ... Phonetics (from the Greek word φωνή, phone = sound/voice) is the study of sounds (voice). ... The vowels of modern (Standard) Arabic and (Israeli) Hebrew from the phonological point of view. ... Morphology is a subdiscipline of linguistics that studies word structure. ... For other uses, see Syntax (disambiguation). ... In the main, semantics (from the Greek and in greek letters σημαντικός or in latin letters semantikós, or significant meaning, derived from sema, sign) is the study of meaning, in some sense of that term. ... Lexical semantics is a field in computer science and linguistics which deals mainly with word meaning. ... This page is a candidate for speedy deletion. ... Prototype Theory is a model of graded categorization in Cognitive Science, where all members of a category do not have equal status. ... Stylistics is the study of style used in literary, and verbal language and the effect the writer/speaker wishes to communicate to the reader/hearer. ... In linguistics, prescription is the laying down or prescribing of normative rules for the use of a language. ... In linguistics and semiotics, pragmatics is concerned with bridging the explanatory gap between sentence meaning and speakers meaning. ... Applied linguistics is the branch of linguistics concerned with using linguistic theory to address real-world problems. ... Psycholinguistics or psychology of language is the study of the psychological and neurobiological factors that enable humans to acquire, use, and understand language. ... Sociolinguistics is the study of the effect of any and all aspects of society, including cultural norms, expectations, and context, on the way language is used. ... Generative linguistics is a school of thought within linguistics that makes use of the concept of a generative grammar. ... In linguistics and cognitive science, cognitive linguistics (CL) refers to the currently dominant school of linguistics that views the important essence of language as innately based in evolutionarily-developed and speciated faculties, and seeks explanations that advance or fit well into the current understandings of the human mind. ... Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural language from a computational perspective. ... Descriptive linguistics is the work of analyzing and describing how language is actually spoken now (or how it was actually spoken in the past), by any group of people. ... Historical linguistics (also diachronic linguistics or comparative linguistics) is primarily the study of the ways in which languages change over time. ... Historical linguistics (also diachronic linguistics or comparative linguistics) is primarily the study of the ways in which languages change over time, by means of examining languages which are recognizably related through similarities such as vocabulary, word formation, and syntax, as well as the surviving records of ancient languages. ... Etymology is the study of the origins of words. ... Efforts to describe and explain the human language faculty have been undertaken throughout recorded history. ... A linguist in the academic sense is a person who studies linguistics. ... Unsolved problems in : Note: Use the unsolved tag: {{unsolved|F|X}}, where F is any field in the sciences: and X is a concise explanation with or without links. ... Prof. ...

Contents

History

The term Statistical Semantics was first used by Weaver (1955) in his well-known paper on machine translation. He argued that word sense disambiguation for machine translation should be based on the co-occurrence frequency of the context words near a given target word. The underlying assumption that "a word is characterized by the company it keeps" was advocated by J.R. Firth (1957). This assumption is known in Linguistics as the Distributional Hypothesis. Delavenay (1960) defined Statistical Semantics as "Statistical study of meanings of words and their frequency and order of recurrence." Furnas et al. (1983) is frequently cited as a foundational contribution to Statistical Semantics. An early success in the field was Latent Semantic Analysis. Warren Weaver is an author of the well-known work on communication, The Mathematical Theory of Communication (together with Claude Shannon). ... Machine translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. ... In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence. ... Co-occurrence can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order. ... John Rupert Firth (1890, Keighley, Yorkshire - 1960), commonly known as J. R. Firth, was an English linguist. ... Linguistics is the scientific study of human language. ... Prof. ... Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, invented in 1990 [1] by Scott Deerwester, Susan Dumais, George Furnas, Thomas Landauer, and Richard Harshman. ...


Applications of statistical semantics

Research in Statistical Semantics has resulted in a wide variety of algorithms that use the Distributional Hypothesis to discover many aspects of semantics, by applying statistical techniques to large corpora: In the main, semantics (from the Greek and in greek letters σημαντικός or in latin letters semantikós, or significant meaning, derived from sema, sign) is the study of meaning, in some sense of that term. ... In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). ...

  • Measuring the similarity in word relations (Turney, 2006)
  • Discovering words with a given relation (Hearst, 1992)
  • Classifying relations between words (Turney and Littman, 2005)
  • Extracting keywords from documents (Frank et al., 1999; Turney, 2000)
  • Measuring the cohesiveness of text (Turney, 2003)
  • Discovering the different senses of words (Pantel and Lin, 2002)
  • Distinguishing the different senses of words (Turney, 2004)
  • Subcognitive aspects of words (Turney, 2001)
  • Distinguishing praise from criticism (Turney and Littman, 2003)

Semantic similarity, variously also called semantic closeness/proximity/nearness, is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content. ...

Related fields

Statistical Semantics focuses on the meanings of common words and the relations between common words, unlike Text Mining, which tends to focus on whole documents, document collections, or named entities (names of people, places, and organizations). Statistical Semantics is a subfield of Computational linguistics and Natural language processing. Text mining, sometimes alternately referred to as text data mining or knowledge discovery in text (KDT), refers generally to the process of deriving high quality information from text. ... Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural language from a computational perspective. ... Natural language processing (NLP) is a subfield of artificial intelligence and linguistics. ...


Many of the applications of Statistical Semantics (listed above) can also be addressed by lexicon-based algorithms, instead of the corpus-based algorithms of Statistical Semantics. One advantage of corpus-based algorithms is that they are typically not as labour-intensive as lexicon-based algorithms. Another advantage is that they are usually easier to adapt to new languages than lexicon-based algorithms. However, the best performance on an application is often achieved by combining the two approaches (Turney et al., 2003). Look up lexicon in Wiktionary, the free dictionary. ... In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). ...


See also

Latent semantic analysis (LSA) is a technique in natural language processing, in particular in vectorial semantics, invented in 1990 [1] by Scott Deerwester, Susan Dumais, George Furnas, Thomas Landauer, and Richard Harshman. ... Text mining, sometimes alternately referred to as text data mining or knowledge discovery in text (KDT), refers generally to the process of deriving high quality information from text. ... Information retrieval (IR) is the science of searching for information in documents, searching for documents themselves, searching for metadata which describe documents, or searching within databases, whether relational stand-alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. ... Natural language processing (NLP) is a subfield of artificial intelligence and linguistics. ... Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural language from a computational perspective. ... Web mining and web usage mining is the application of data mining techniques to discover usage patterns from the Web in order to better understand and serve the needs of Web-based applications. ... Semantic similarity, variously also called semantic closeness/proximity/nearness, is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content. ... Co-occurrence can either mean concurrence / coincidence or, in a more specific sense, the above-chance frequent occurrence of two terms from a text corpus alongside each other in a certain order. ... In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). ...

External links

References

  • Delavenay, E. (1960). An Introduction to Machine Translation, New York, NY: Thames and Hudson.
  • Firth, J.R. (1957). A synopsis of linguistic theory 1930-1955. In Studies in Linguistic Analysis, pp. 1-32. Oxford: Philological Society. Reprinted in F.R. Palmer (ed.), Selected Papers of J.R. Firth 1952-1959, London: Longman (1968).
  • Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., and Nevill-Manning, C.G. (1999). Domain-specific keyphrase extraction. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99), pp. 668-673. California: Morgan Kaufmann.
  • Furnas, G.W., Landauer, T.K., Gomez, L.M., and Dumais, S.T. (1983). Statistical semantics: Analysis of the potential performance of keyword information systems. Bell System Technical Journal, 62(6):1753-1806.
  • Hearst, M.A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fourteenth International Conference on Computational Linguistics, pages 539–545, Nantes, France.
  • Landauer, T.K., and Dumais, S.T. (1997). A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240.
  • Lund, K., Burgess, C., and Atchley, R.A. (1995). Semantic and associative priming in high-dimensional semantic space. In Proceedings of the 17th Annual Conference of the Cognitive Science Society, pages 660-665.
  • Pantel, P., and Lin, D. (2002). Discovering word senses from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 613–619.
  • Terra, E., and Clarke, C.L.A. (2003). Frequency estimates for statistical word similarity measures. In Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003 (HLT/NAACL 2003), pages 244–251.
  • Turney, P.D. (2000). Learning algorithms for keyphrase extraction. Information Retrieval, 2(4), 303-336. OAI arXiv.org:cs/0212020
  • Turney, P.D. (2001). Answering subcognitive Turing Test questions: A reply to French. Journal of Experimental and Theoretical Artificial Intelligence, 13(4), 409-419. OAI arXiv.org:cs/0212015
  • Turney, P.D. (2003). Coherent keyphrase extraction via Web mining, In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, 434-439. OAI arXiv.org:cs/0308033
  • Turney, P.D. (2004). Word sense disambiguation by Web mining for word co-occurrence probabilities. In Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (SENSEVAL-3), Barcelona, Spain, pp. 239-242. OAI arXiv.org:cs/0407065
  • Turney, P.D. (2006), Similarity of semantic relations. Computational Linguistics, 32(3), 379-416. OAI arXiv.org:cs/0608100
  • Turney, P.D., and Littman, M.L. (2003). Measuring praise and criticism: Inference of semantic orientation from association, ACM Transactions on Information Systems (TOIS), 21(4), 315-346. OAI arXiv.org:cs/0309034
  • Turney, P.D., and Littman, M.L. (2005). Corpus-based learning of analogies and semantic relations. Machine Learning, 60(1–3):251–278. OAI arXiv.org:cs/0508103
  • Turney, P.D., Littman, M.L., Bigham, J., and Shnayder, V. (2003). Combining independent modules to solve multiple-choice synonym and analogy problems. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP-03), Borovets, Bulgaria, pp. 482-489. OAI arXiv.org:cs/0309035
  • Weaver, W. (1955). Translation. In W.N. Locke and D.A. Booth (eds.), Machine Translation of Languages, Cambridge, MA: MIT Press. ISBN 0-8371-8434-7


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.