FACTOID # 3: Andorrans live the longest, four years longer than in neighbouring France and Spain.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RELATED ARTICLES
People who viewed "Stemming" also viewed:
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Stemming

A stemmer is a program or algorithm which determines the morphological root of a given inflected (or, sometimes, derived) word form -- generally a written word form.


A stemmer for English, for example, should identify the string "cats" (and possibly "catlike", "catty" etc.) as based on the root "cat", and "stemmer", "stemming", "stemmed" as based on "stem".


English stemmers are fairly trivial (with only occasional problems, such as "dries" being the third-person singular present form of the verb "dry", "axes" being the plural of "ax" as well as "axis"); but stemmers become harder to design as the morphology, orthography, and character encoding of the target language becomes more complex. For example, an Italian stemmer is more complex than an English one (because of more possible verb inflections), a Russian one is more complex (more possible noun declensions), a Hebrew one is even more complex (a hairy writing system), and so on.


Stemmers are common elements in query systems, since a user who runs a query on "daffodils" probably cares about documents that contain the word "daffodil" (without the s).


A more complex approach to the problem of determining a morphological root of a word is lemmatisation. This process involves first determining the part of speech of a word, and applying different normalisation rules for each part of speech.


The first ever published stemmer was written by Julie Beth Lovins: Lovins JB (1968) Development of a stemming algorithm, Mechanical Translation and Computational Linguistics, 11: 22-31. This paper was remarkable for its early date, and had great influence on later work in this area.


A later stemmer was written by Martin Porter, and published in Program, Vol 14 no. 3 pp 130-137, July 1980. This stemmer became very widely used, and became the de-facto standard algorithm used for English stemming. Dr Porter received the Tony Kent Strix award in 2000 for his work on stemming and Information retrieval.


Many implementations of this algorithm were written and freely distributed. Unfortunately, many of these implementations contained subtle flaws, and as a result systems using these stemmers performed less well than they ought. To eliminate this source of error, around the year 2000 Martin Porter released an official free-software implementation of the algorithm. Over the next few years, he extended this work by building Snowball, a framework for writing stemming algorithms, and implemented an improved English stemmer together with stemmers for several other languages.

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

External links


  Results from FactBites:
 
BIGpedia - Stem cell - Encyclopedia and Dictionary Online (4502 words)
Research with embryonic stem cells derived from humans is controversial because, in order to start a stem cell 'line' or lineage, it requires the destruction of a blastocyst (an embryo that has not yet grown beyond 150 cells), which some people believe to be human beings.
Stem cells which derived from the inner mass cells of a blastocyst (an early embryo) have pluripotent properties—they are able to grow into any of the 200 cell types in the body.
Embryonic stem cell researchers are currently attempting to grow the cells beyond the first stages of cell development, to overcome difficulties in host rejection of implanted stem cells, and to control the multiplying of implanted embryonic stem cells, which otherwise multiply uncontrollably, producing cancer.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.