FACTOID # 108: Japan leads the world in car production, producing almost 50% more cars than either of its next closest competitors, Germany and the United StatesInteresting industry facts »
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "WordNet" also viewed:
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > WordNet

WordNet is a semantic lexicon for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and can be downloaded and used freely. The database can also be browsed online. A semantic lexicon is dictionary of words labeled with semantic classes so associations can be drawn between words that have not been encountered before. ... The English language is a West Germanic language that originates in England. ... In metadata a Synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. ... In general, semantics (from the Greek semantikos, or significant meaning, derived from sema, sign) is the study of meaning, in some sense of that term. ... Synonyms can be nouns, adverbs or adjectives, as long as both members of the pair are the same part of speech. ... The dictionary is a list of words with their definitions, a list of characters with their glyphs, or a list of words with corresponding words in other languages. ... The word thesaurus is derived from 16th century New Latin, in turn from Latin thesaurus, from ancient Greek thesauros, store-house, treasury. Besides its meaning as a treasury or storehouse, it more commonly means a listing of words with similar, related, or opposite meanings (this new meaning of thesaurus dates... Garry Kasparov playing against Deep Blue, the first machine to win a chess game against a reigning world champion. ... The BSD license is a permissive license and is one of the most widely used free software licenses. ... In computing , a database can be defined as a structured collection of records or data that is stored in a computer so that a program can consult it to answer queries. ... Online means being connected to the Internet or another similar electronic network, like a bulletin board system. ...


WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Development began in 1985. Over the years, the project received about $3 million of funding, mainly from government agencies interested in machine translation. Princeton University is a private coeducational research university located in Princeton, New Jersey, in the United States of America. ... Psychology (from Greek: ψυχή, psukhē, spirit, soul; and λόγος, logos, knowledge) is an academic / applied discipline involving the scientific study of mental processes and behavior of humans and animals. ... The meaning of the word professor (Latin: one who claims publicly to be an expert) varies. ... George A. Miller (born February 3 1920) is a famous professor of psychology at Princeton University, whose most famous work was The Magical Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information, which was published in 1956 in In the linguistics community, Miller is well... Year 1985 (MCMLXXXV) was a common year starting on Tuesday (link displays 1985 Gregorian calendar). ... Machine translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. ...

Contents

Database contents

As of 2006, the database contains about 150,000 words organized in over 115,000 synsets for a total of 207,000 word-sense pairs; in compressed form, it is about 12 megabytes in size.[1] 2006 is a common year starting on Sunday of the Gregorian calendar. ... In metadata a Synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. ... In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ... A megabyte is a unit of information or computer storage equal to exactly one million bytes. ...


WordNet distinguishes between nouns, verbs, adjectives and adverbs because they follow different grammatical rules. Every synset contains a group of synonymous words or collocations (a collocation is a sequence of words that go together to form a specific meaning, such as "car pool"); different senses of a word are in different synsets. The meaning of the synsets is further clarified with short defining glosses (Definitions and/or example sentences). A typical example synset with gloss is: Noun or noun substantive is a lexical category which is defined in terms of how its members combine with other grammatical kinds of expressions. ... It has been suggested that Verbal agreement be merged into this article or section. ... talea harris and sophie king are sluts In grammar, an adjective is a word whose main syntactic role is to modify a noun or pronoun (called the adjectives subject, giving more information about what the noun or pronoun refers to. ... An adverb is a part of speech. ... This article or section does not cite its references or sources. ... This article needs additional references or sources for verification. ...

good, right, ripe -- (most suitable or right for a particular purpose; "a good time to plant tomatoes"; "the right time to act"; "the time is ripe for great sociological changes")

Most synsets are connected to other synsets via a number of semantic relations. These relations vary based on the type of word, and include:

  • Nouns
    • hypernyms: Y is a hypernym of X if every X is a (kind of) Y
    • hyponyms: Y is a hyponym of X if every Y is a (kind of) X
    • coordinate terms: Y is a coordinate term of X if X and Y share a hypernym
    • holonym: Y is a holonym of X if X is a part of Y
    • meronym: Y is a meronym of X if Y is a part of X
  • Verbs
    • hypernym: the verb Y is a hypernym of the verb X if the activity X is a (kind of) Y (travel to movement)
    • troponym: the verb Y is a troponym of the verb X if the activity Y is doing X in some manner (lisp to talk)
    • entailment: the verb Y is entailed by X if by doing X you must be doing Y (sleeping by snoring)
    • coordinate terms: those verbs sharing a common hypernym
  • Adjectives
    • related nouns
    • participle of verb
  • Adverbs
    • root adjectives

While semantic relations apply to all members of a synset because they share a meaning but are all mutually synonyms, words can also be connected to other words through lexical relations, including antonyms (opposites of each other) and derivationally related, as well. Noun or noun substantive is a lexical category which is defined in terms of how its members combine with other grammatical kinds of expressions. ... A hypernym (in Greek υπερνύμιον, literally meaning extra name) is a word whose extension includes the extension of the word of which it is a hypernym. ... A hyponym (in Greek: υπονύμιον, literally meaning few names) is a word whose extension is included within that of another word. ... Holonymy (in Greek holon = whole and onoma = name) is a semantic relation. ... Meronymy (from the Greek words meros = part and onoma = name) is a semantic relation. ... It has been suggested that Verbal agreement be merged into this article or section. ... talea harris and sophie king are sluts In grammar, an adjective is a word whose main syntactic role is to modify a noun or pronoun (called the adjectives subject, giving more information about what the noun or pronoun refers to. ... An adverb is a part of speech. ... Synonyms can be nouns, adverbs or adjectives, as long as both members of the pair are the same part of speech. ... Look up Antonym in Wiktionary, the free dictionary. ...


WordNet also provides the polysemy count of a word: the number of synsets that contain the word. If a word participates in several synsets (i.e. has several senses) then typically some senses are much more common than others. WordNet quantifies this by the frequency score: in which several sample texts have all words semantically tagged with the corresponding synset, and then a count provided indicating how often a word appears in a specific sense.


The morphology functions of the software distributed with the database try to deduce the lemma or root form of a word from the user's input; only the root form is stored in the database unless it has irregular inflected forms. In linguistics, and particularly in morphology, a lemma or citation form is the canonical form of a lexeme. ... The root is the primary lexical unit of a word, which carries the most significant aspects of semantic content and cannot be reduced into smaller constituents. ... A word is a unit of language that carries meaning and consists of one or more morphemes which are linked more or less tightly together, and has a phonetical value. ...


Knowledge structure

Both nouns and verbs are organized into hierarchies, defined by hypernym or IS A relationships. For instance, the first sense of the word dog would have the following hypernym hierarchy; the words at the same level are synonyms of each other: some sense of dog is synonymous with some other senses of domestic dog and Canis familiaris, and so on. Each set of synonyms (synset), has a unique index and shares its properties, such as a gloss (or dictionary) definition.

 dog, domestic dog, Canis familiaris => canine, canid => carnivore => placental, placental mammal, eutherian, eutherian mammal => mammal => vertebrate, craniate => chordate => animal, animate being, beast, brute, creature, fauna => ... 

At the top level, these hierarchies are organized into base types, 25 primitive groups for nouns, and 15 for verbs. These groups form lexicographic files at a maintenance level. These primitive groups are connected to an abstract root node that have, for some time, been assumed by various applications that use WordNet.


In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept involved with lexicographic files, do not apply here the same way they do for nouns and verbs.


The network of nouns is far deeper than that of the other parts of speech. Verbs have a far bushier structure, and adjectives are organized into many distinct clusters. Adverbs are defined in terms of the adjectives they are derived from, and thus inherit their structure from that of the adjectives.


Psychological justification

The goal of WordNet was to develop a system that would be consistent with the knowledge acquired over the years about how human beings process language. Anomic aphasia, for example, creates a condition that seems to selectively encumber individuals' ability to name objects; this makes the decision to partition the parts of speech into distinct hierarchies more of a principled decision than an arbitrary one. Nominal aphasia is a form of aphasia (loss of language capability caused by brain damage) in which the subject has difficulty remembering or recognizing names which the subject should know well. ...


In the case of hyponymy, psychological experiments revealed that individuals can access properties of nouns more quickly depending on when a characteristic becomes a defining property. That is, individuals can quickly verify that canaries can sing because a canary is a songbird (only one level of hyponymy), but requires slightly more time to verify that canaries can fly (two levels of hyponymy) and even more time to verify canaries have skin (multiple levels of hyponymy). This suggests that we too store semantic information in a way that is much like WordNet, because we only retain the most specific information needed to differentiate one particular concept from similar concepts. A hyponym (in Greek: υπονύμιον, literally meaning few names) is a word whose extension is included within that of another word. ...


WordNet as an ontology

The hypernym/hyponym relationships among the noun synsets can be interpreted as specialization relations between conceptual categories. In other words, WordNet can be interpreted and used as a lexical ontology in the computer science sense. However, such an ontology should normally be corrected before being used since it contains hundreds of basic semantic inconsistencies such as (i) the existence of common specializations for exclusive categories and (ii) redundancies in the specialization hierarchy. Furthermore, transforming WordNet into a lexical ontology usable for knowledge representation should normally also involve (i) distinguishing the specialization relations into subtypeOf and instanceOf relations, and (ii) associating intuitive unique identifiers to each category. Although such corrections and transformations have been performed and documented as part of the integration of WordNet 1.7 into the cooperatively updatable knowledge base of WebKB-2, most projects claiming to re-use WordNet for knowledge-based applications (typically, knowledge-oriented information retrieval) simply re-use it as such. In both computer science and information science, an ontology is a data model that represents a domain and is used to reason about the objects in that domain and the relations between them. ... Computer science, or computing science, is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. ...


A prominent example of using WordNet, as it is, as an ontology is to determine the similarity between words. Various algorithms have been proposed, and these include considering the distance between the conceptual categories of these words, as well as considering the hierarchical structure of the WordNet ontology. A number of these WordNet-based word similarity algorithms are implemented in a Perl package called WordNet::Similarity. Semantic similarity, variously also called semantic closeness/proximity/nearness, is a concept whereby a set of documents or terms within term lists are assigned a metric based on the likeness of their meaning / semantic content. ... Wikibooks has a book on the topic of Perl Programming Perl is a dynamic programming language created by Larry Wall and first released in 1987. ...


See the related projects section for more.


Limitations

Unlike other dictionaries, WordNet does not include information about etymology, pronunciation and the forms of irregular verbs and contains only limited information about usage. Not to be confused with Entomology, the study of insects. ... It has been suggested that Regular verb be merged into this article or section. ...


The actual lexicographical and semantical information is maintained in lexicographer files, which are then processed by a tool called grind to produce the distributed database. Both grind and the lexicographer files are freely available in a separate distribution, but modifying and maintaining the database requires expertise.


Because it groups similar words together under a single, general definition, the definitions WordNet provides for most individual words are not accurate.


Though WordNet contains a sufficient wide range of common words, it does not cover special domain vocabulary. Since it is primarily designed to act as an underlying database for different applications, those applications cannot be used in specific domains that are not covered by WordNet.


Interfaces

The Jawbone project provides a Java API to the WordNet 2.1 and 3.0 data. The source code is released under the MIT license. Java is a programming language originally developed by Sun Microsystems and released in 1995. ... The MIT License, also called the X License or the X11 License, originated at the Massachusetts Institute of Technology, is a license for the use of certain types of computer software. ...


The Natural Language Toolkit provides a Python API to the WordNet 3.0. Python is a high-level programming language first released by Guido van Rossum in 1991. ...


Related projects

A project at Brown University started by Jeff Stibel, James A. Anderson, Steve Reiss and others called Applied Cognition Lab created a disambiguator using WordNet in 1998.[2] The project later morphed into a company called Simpli, which is now owned by ValueClick. George Miller joined the Company as a member of the Advisory Board. Simpli built an Internet search engine that utilized a knowledgebase principally based on WordNet to disambiguate and expand keywords and synsets to help retrieve information online. WordNet was expanded upon to add increased dimensionality, such as intentionality (used for x), people (Britney Spears) and colloquial terminology more relevant to Internet search (i.e., blogging, ecommerce). Neural network algorithms searched the expanded WordNet for related terms to disambiguate search keywords (Java, in the sense of coffee) and expand the search synset (Coffee, Drink, Joe) to improve search engine results.[3] Before the company was acquired, it performed searches across search engines such as Google, Yahoo!, Ask.com and others.[4] Brown University is a private university located in Providence, Rhode Island. ... Jeffrey Stibel is the President and CEO of Web. ... James (Jim) A. Anderson is a Professor of Cognitive Science and Brain Science at Brown University. ... Simpli was an early search engine that offered disambiguation to search terms. ... ValueClick is a Westlake Village, CA-based online advertising company, which connects those wanting to host advertisements on websites with advertisers who are willing to pay out fees based on a variety of different plans and methods. ... Britney Jean Spears (born December 2, 1981) is a Grammy Award-winning[1] American pop singer, dancer, actress, author and songwriter. ... // See also Artificial neural network. ... Google, Inc. ... Yahoo! - Wikipedia, the free encyclopedia /**/ @import /skins-1. ... This article or section does not cite any references or sources. ...


The project EuroWordNet has produced WordNets for several European languages and linked them together; these are not freely available however. The Global Wordnet project attempts to coordinate the production and linking of wordnets for all languages. Oxford University Press, the publishers of the Oxford English Dictionary have voiced plans to produce their own online WordNet. EuroWordNet is a system of semantic networks for European languages. ... The GlobalWordNet project aims to follow the process and data formats of the Princeton WordNet effort. ... Oxford University Press (OUP) is a highly-respected publishing house and a department of the University of Oxford in England. ... The Oxford English Dictionary print set The Oxford English Dictionary (OED) is a dictionary published by the Oxford University Press (OUP), and is generally regarded as the most comprehensive and scholarly dictionary of the English language. ...


The eXtended WordNet is a project at the University of Texas at Dallas which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. It is also freely available under a license similar to WordNet's. The eXtended WordNet is a project at the University of Texas at Dallas (and funded by the National Science Foundation) which aims to improve WordNet by semantically parsing the glosses, thus making the information contained in these definitions available for automatic knowledge processing systems. ... This article does not cite its references or sources. ...


The GCIDE project produces a dictionary by combining a public domain Webster's Dictionary from 1913 with some WordNet definitions and material provided by volunteers. It is released under the copyleft license GPL. GCIDE is the GNU version of the Collaborative International Dictionary of English. ... The public domain comprises the body of all creative works and other knowledge—writing, artwork, music, science, inventions, and others—in which no person or organization has any proprietary interest. ... 1888 advertisement for Websters Dictionary Websters Dictionary is the common title given to English language dictionaries in the United States, derived from American lexicographer Noah Webster. ... Year 1913 (MCMXIII) was a common year starting on Wednesday (link will display the full calendar). ... The reversed c in a full circle is the copyleft symbol. ... The GNU logo The GNU General Public License (GNU GPL or simply GPL) is a widely-used free software license, originally written by Richard Stallman for the GNU project. ...


WordNet is also commonly re-used via mappings between the WordNet categories and the categories from other ontologies. Most often, only the top-level categories of WordNet are mapped. However, the authors of the SUMO ontology have produced a mapping between all of the WordNet synsets, (including nouns, verbs, adjectives and adverbs), and SUMO classes. The most recent addition of the mappings provides links to all of the more specific terms in the MId-Level Ontology (MILO), which extends SUMO. The OpenCyc upper ontology is also linked to some of WordNet. It has been suggested that SUMO class be merged into this article or section. ... This article needs to be cleaned up to conform to a higher standard of quality. ... Cyc is an artificial intelligence project which attempts to assemble a comprehensive ontology and database of everyday common-sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. ...


In most works that claim to have integrated WordNet into other ontologies, the content of WordNet has not simply been corrected when semantic problems have been encountered; instead, WordNet has been used as an inspiration source but heavily re-interpreted and updated whenever suitable. This was the case when, for example, the top-level ontology of WordNet was re-structured according to the OntoClean based approach or when WordNet was used as a primary source for constructing the lower classes of the SENSUS ontology. In 2000, Nicola Guarino and Chris Welty published a series of papers on what came to be named OntoClean, a methodology for ontology-driven conceptual analysis. ...


FrameNet is a project similar to WordNet. It consists of a lexicon which is based on annotating over 100,000 sentences with their semantic properties. the unit in focus is the lexical frame, a type of state or event together with the properties associated with it. FrameNet is a project at Berkeley which produces an electronic resource describing semantic frames. ...


An independent project titled wordNet with an initial lowercase w is an ongoing project to links words and phrases via a custom Web crawler. A web crawler (also known as a Web spider or Web robot) is a program or automated script which browses the World Wide Web in a methodical, automated manner. ...


References

  • WordNet: An Electronic Lexical Database. Christiane Fellbaum (editor). 1998. The MIT Press, Cambridge, MA
  • Five Papers on WordNet by Miller, George A., Christiane Fellbaum, Katherine J. Miller. August, 1993, retrieved May 4, 2005

See also

The semantic web is an evolving extension of the World Wide Web in which web content can be expressed not only in natural language, but also in a form that can be read and used by software agents, thus permitting them to find, share and integrate information more easily. ... Look up taxonomy in Wiktionary, the free dictionary. ... In metadata a Synonym ring or synset, is a group of data elements that are considered semantically equivalent for the purposes of information retrieval. ... ThoughtTreasure is a commonsense knowledge base and architecture for natural language processing. ...

External links


  Results from FactBites:
 
WordNet - Princeton University Cognitive Science Laboratory (256 words)
WordNet® is a large lexical database of English, developed under the direction of George A.
WordNet's structure makes it a useful tool for computational linguistics and natural language processing.
The Global WordNet Organization is coordinating and guiding new the development of new wordnets and holding biannual meetings.
Welcome to WordNet (124 words)
WordNet (Word of God Broadcasting Network) presents the life-changing Gospel of Jesus Christ through the media of television (standard channel 28 and digital channel 55-3 serving the Charlotte metro area) and radio (1540AM WOGR, Charlotte; FM translator 88.3MHz, Harrisburg/Charlotte; 1420AM WGAS, S. Gastonia; and 93.3 WOGR-FM, Salisbury).
The WordNet television stations feature a Family Friendly format that includes vintage television programs such as The Lone Ranger, Daniel Boone, and Bonanza; children’s educational and entertainment programs, national and local ministry broadcasts as well as inspirational music videos.
The WordNet radio stations simulcast a combination of nationally acclaimed teaching programs and edifying urban gospel music.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.