FACTOID # 147: France is the top destination in the world for tourists, accounting for 11 percent of all tourist arrivals worldwide.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Mass lexical comparison

Mass lexical comparison or mass comparison is a highly controversial method developed by the well-known linguist Joseph Greenberg to find genetic relationships among languages in the remote past, beyond the limits of the traditional comparative method, or in situations where there are too many languages to practically apply the latter without many generations of work. Joseph Harold Greenberg (May 28, 1915-May 7, 2001) was a prominent and controversial linguist, known for his work in both language classification and typology. ...

Contents


Traditional historical linguistics

The comparative method

Since the development of comparative linguistics in the 19th century, a linguist who claims that two languages are related, in the absence of historical evidence, is expected to back up that claim by presenting general rules that describe the differences between their lexicons, morphologies, and grammars. The procedure is described in detail in the comparative method article. Historical linguistics (also diachronic linguistics or comparative linguistics) is primarily the study of the ways in which languages change over time, by means of examining languages which are recognizably related through similarities such as vocabulary, word formation, and syntax, as well as the surviving records of ancient languages. ... Alternative meaning: Nineteenth Century (periodical) (18th century — 19th century — 20th century — more centuries) As a means of recording the passage of time, the 19th century was that century which lasted from 1801-1900 in the sense of the Gregorian calendar. ... The comparative method (in linguistics) is a method used to detect genetic relationships between languages and to establish a consistent relationship hypothesis by reconstructing: the common ancestor of the languages in question, a plausible sequence of regular changes by which the historically known languages can be derived from that common...


For instance, one could prove that Spanish is related to Italian by showing that many words of the former can be mapped to corresponding words of the latter by a relatively small set of replacement rules — such as change initial es- by s-, final -os by -i, etc. Many similar correspondences exist between the grammars of the two languages. Since those systematic correspondences are extremely unlikely to be random coincidences, the most likely explanation by far is that the two languages have evolved from a single ancestral tongue (Latin, in this case). Latin is an ancient Indo-European language originally spoken in the region around Rome called Latium. ...


Most pre-historical language groupings that are widely accepted today — such as the Indo-European, Algonquian, and Bantu families — have been proved in this way, although many — such as Niger-Congo, and until quite recently Afro-Asiatic and Sino-Tibetan — have not, and some families whose proponents claim to have proved them in this way (eg. Nostratic) have not been widely accepted. The Indo-European languages include some 443 (SIL estimate) languages and dialects, including most of the major language families of Europe, as well as many languages of Southwest and South Asia, which belong to a single superfamily. ... Algonquian languages The Algonquian (also Algonkian) languages are a subfamily of Native American languages that includes most of the languages in the Algic language family (others are Wiyot and Yurok of northwestern California). ... Map showing the approximate distribution of Bantu (dull yellow) vs. ... Map showing the distribution of Niger-Congo languages The Niger-Congo languages constitute one of the worlds major language families, and Africas largest in terms of geographical area, number of speakers, and number of distinct languages. ... Map showing the distribution of Afro-Asiatic languages The Afro-Asiatic languages are a language family of about 240 languages and 285 million people widespread throughout North Africa, East Africa, the Sahel, and Southwest Asia. ... Sino-Tibetan languages form a language family of about 250 languages of East Asia, second only to Indo-European in terms of the number of speakers. ... Nostratic is a controversial language super-family that suggests links between many Eurasian language families. ...


Limitations of the comparative method

However, besides systematic changes, languages are also subject to random mutations (such as borrowings from other languages, irregular inflections, compounding, and abbreviation) that affect one word at a time, or small subsets of words. For example, Spanish perro, which does not come from Latin, cannot be rule-mapped to its Italian equivalent cane.


As those sporadic changes accumulate, they will increasingly obscure the systematic ones — just as enough dirt and scratches on a photograph will eventually make the face unrecognizable. Presumably for this reason, the comparative method has not been able to provide reliable evidence of genetic relationship between languages that have split off more than 10,000 years ago. Considering that humans probably have been speaking fully developed languages since at least 60,000 years ago (when Australia was first populated), it is hardly surprising that many languages and language families still have no known relationship with other groups.


Mass lexical comparison

Lexical similarity

In an effort to extend comparative linguistics beyond its present limits, and arrive at his broad super-family groupings, Greenberg invented a new statistical method, mass lexical comparison. In this method, one simply compares a large sample of words from one language A with its equivalents in the other language B, looking for similar sound patterns. Thus, for example, Spanish cabeza and Italian capo are similar to the extent that both contain the same consonant sound [k], similar vowel sounds [a], and similar consonants [b], [p], in the same sequence. In human language, a phoneme is a set of phones (speech sounds or sign elements) that are cognitively equivalent. ... Listen to this article · (info) This audio file was created from the revision dated 2005-07-20, and does not reflect subsequent edits to the article. ... Listen to this article · (info) This audio file was created from the revision dated 2005-07-18, and does not reflect subsequent edits to the article. ...


Departing from the traditional criterion, Greenberg did not look for any systematic trend in these similarities, trusting that a sufficiently large percentage S(A,B) of sufficiently similar pairs among the samples would be enough to prove a common origin for the two languages. This assumption is valid in principle, because S is expected to be higher for languages that have split off more recently, and decrease as the split recedes into the past. The difficult part is deciding what constitutes "sufficient" similarity.


From similarity to phylogeny

Assuming that the similarity measures are statistically significant, they can be used to decide the branching order of the languages on their presumed genetic tree. That is, if the computed similarity S(A,B) is greater than S(A,C) and S(B,C), one can take it as indication that A and B separated from C before separating from each other. In other words, there is a single branch of the tree that includes A and B but not C.


Mass comparison

Greenberg also observed that, just from statistical principles, the computed similarity between the lexicons of two sets of closely related languages would be more reliable than that computed from two languages alone. (This justifies the "mass" in the method's name.)


Thus, paradoxically, the lexical comparison method becomes more accurate as the investigation recedes into the past — which offsets to some extent the increased level of statistical noise in the measurements. This stands in contrast to the traditional comparative method, which becomes more unreliable as it is applied to broader language groups — since the structural comparisons must be applied to increasingly dubious, inaccurate, and incomplete reconstructed proto-languages.


The mass lexical comparison method also has the advantage that it can reconstruct the broad phylogeny for a large set of languages directly from raw lexical samples, without the need to wait for detailed morphological studies of each language or the reconstruction of proto-languages for each branch — which in the case of Native American languages, for example, would take an enormous amount of work. Native American languages are the indigenous languages of the Americas, spoken by Native Americans from the southern tip of South America to Alaska and Greenland. ...


Choosing the sample lexicon

Ideally, the sample lexicons should contain only words that are likely to have survived in either language since the time of their hypothetical common origin, and are unlikely to be replaced by borrowed or reinvented words. For studies that extend more than 5000 years into the past, that criterion leaves only a few hundred concepts — such as body parts, close family relations, common animals and plants, water, fire, sky, stone, spear, etc..


Words for "modern" concepts — such as "wine", "horse", and "steel" — may show spurious similarities between unrelated languages, due to the name being imported by a culture together with the thing; e.g. Spanish pan and Japanese pan ("bread"). Alternatively, the names of recently imported concepts may get invented separately in related languages, such as computadora ("computer") in Latin American Spanish and ordinateur in French. Either way, such words would only add noise and bias to the comparison.


Weaknesses of the method

Significance of the similarity

In theory, the reliability of Greenberg's method could be settled by statistical analysis; namely, by computing the probability that a given similarity level S could have arisen by chance coincidences between totally unrelated languages. Two languages then should be considered similar only if the observed value of S was significantly greater than this "baseline" level. Statistics is the science and practice of developing knowledge through the use of empirical data expressed in quantitative form. ...


Unfortunately, this computation is very difficult to do. For one thing, the similarity level S is expected to depend on the phonetic repertoires of the two languages; thus, for instance, one expects more chance resemblances between two languages that have few vowels and many consonants, than between a vowel-rich and a vowel-poor language. Similar biases can be expected when comparing languages that allow consonant clusters with those that don't, or polysyllabic languages with monosyllabic ones. It follows that deciding what would be a significant level of similarity would require a stochastic model for a "random lexicon" that took into account letter frequencies, syllable structure, and many other similar statistics. Stochastic, from the Greek stochos or goal, means of, relating to, or characterized by conjecture; conjectural; random. ...


At the same time, the correspondences used in the method are often tenuous, to say the least, requiring at times a correspondence of only one phoneme, or even only one characteristic (labial, dental, etc.). A wide semantic range is also allowed; for example, words were compared by Greenberg, in his book on the American languages, meaning arm, shoulder, armpit, forearm, elbow, etc. Thus, using this method, Lyle Campbell, a linguist specializing in the languages of the Americas and author of a review of Greenberg's book, was able to establish a correspondence between the proposed Amerind language and Finnish, and others were able to do so with Latin and many languages obviously not related to those of the Americas.


Onomatopoeic forms

Also, some of the "ancient" concepts that are most suitable for inclusion in the sample lexicons may have been originally denoted by onomatopoeic words that imitate a natural sound associated with the concept. (Examples of originally onomatopeic words in English include such words as "crack", "crow", "cough", "gurgle", etc.). The independent use of this principle in two languages will tend to create similar word pairs, that contribute to the similarity measure S but are not due to common origin. Look up onomatopoeia in Wiktionary, the free dictionary. ...


Ideally, such words ought to be excluded from the sample lexicon; but the onomatopeic origin of a word may be hard to recognize in its present form. Even basic words like "milk" or "wind" have been claimed to reflect the corresponding sounds (those of sucking and blowing, respectively). Unfortunately, the impact of these "natural false cognates" in the similarity measure is hard to estimate.


Semantic drift and subjectivity

Finally, in every language the same concept can often be expressed by two or more different words; and the meanings of words are known to drift over centuries just as much as their forms. Thus, for example, the meanings of corn and grain in English overlap to a large extent; and corn, which originally referred to cereals like wheat and barley, has come to mean chiefly "maize" in the United States. Species T. boeoticum T. durum T. monococcum T. spelta References:   ITIS 42236 2002-09-22 Wheat (Triticum spp. ... Binomial name Hordeum vulgare L. Barley (Hordeum) is a major food and animal feed crop, a member of the grass family Poaceae. ... Binomial name Zea mays L. Maize (Zea mays ssp. ...


As a consequence of these semantic shifts and synonymies, the construction of the representative lexicon for a language typically involves many choices that must often be made on subjective criteria. These choices may be unconsciously biased towards words that are similar to those previously chosen for other languages, thus artificially inflating the similarity measure S. Unfortunately, the impact of this factor, too, is hard to quantify. Semantic drift, in historical linguistics, is a phenomenon whereby words change in meaning over a period of time, resulting in semantic differences between cognates. ...


Assessment

Empirical validation

In spite of its conceptual difficulties, the basic soundness of the mass comparison method has been validated by many empirical tests. When applied to European languages, for example, it does reproduce all the classifications already established by traditional methods, namely the Indo-European classification. The Indo-European languages include some 443 (SIL estimate) languages and dialects, including most of the major language families of Europe, as well as many languages of Southwest and South Asia, which belong to a single superfamily. ...


More importantly, the method allowed Greenberg to arrive at his new classification of the African languages, which has been widely accepted by linguists and confirmed afterwards by traditional methods. As for his later Amerind and Eurasiatic classifications, while they have been strongly rejected by most linguists (chiefly out of mistrust for the method), there are still those who support it. In addition to its use by social scientists to refer (broadly) to the various indigenous languages of The Americas, the term Amerind languages may controversially refer to one of the three families in Joseph H. Greenbergs classification of all Native American languages—the other two being Na-Dene and... The Eurasiatic languages are a hypotetical language group from which allegedly descend several language families of Europe and Asia, including Indo-European languages, Uralic and Altaic. ...


Acceptance

The statistical difficulties inherent in Greenberg's method of mass lexical comparison are compounded by the fact that many historical linguists are unfamiliar with statistical analysis, and therefore are at a disadvantage when it comes to evaluating or criticizing its conclusions.


Presumably for all these reasons, and also for its revolutionary character, the method remains highly controversial. Although it continues to have ardent scholarly proponents, it is still rejected by most historical linguists, who view the comparative method as the only legitimate way to establish pre-historical common ancestry for languages. The comparative method (in linguistics) is a method used to detect genetic relationships between languages and to establish a consistent relationship hypothesis by reconstructing: the common ancestor of the languages in question, a plausible sequence of regular changes by which the historically known languages can be derived from that common...


External links

  • How likely are chance resemblances between languages? A critique of the mass comparison method, investigating the likelihood of chance resemblances with statistical methods.

  Results from FactBites:
 
Mass lexical comparison - Wikipedia, the free encyclopedia (1536 words)
Mass lexical comparison or mass comparison is a highly controversial method developed by the well-known linguist Joseph Greenberg to find genetic relationships among languages in the remote past, beyond the limits of the traditional comparative method, or in situations where there are too many languages to practically apply the latter without many generations of work.
Since the development of comparative linguistics in the 19th century, a linguist who claims that two languages are related, in the absence of historical evidence, is expected to back up that claim by presenting general rules that describe the differences between their lexicons, morphologies, and grammars.
The statistical difficulties inherent in Greenberg's method of mass lexical comparison are compounded by the fact that many historical linguists are unfamiliar with statistical analysis, and therefore are at a disadvantage when it comes to evaluating or criticizing its conclusions.
Merritt Ruhlen - Wikipedia, the free encyclopedia (508 words)
The majority of criticisms of Ruhlen center around his use of mass comparison, which instead of using common historical linguistic methods of comparison, involves comparing the lexicons of however many languages one is investigating and examining them for words in two or more languages which appear similar phonologically and have a similar meaning.
Historical linguists argue that most results turned up with mass comparison could easily be cases of simple coincidence.
Ruhlen and his followers reply that the sheer volume of the correspondences which their mass comparisons have turned up is far too large to possibly be due to chance.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.