FACTOID # 171: Want to go to the United States? Try going to Albania first. Albania has more U.S visa lottery winners per capita than anywhere else in the world.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Stylometry

Stylometry is the application of the study of linguistic style, usually to written language. In the last few years it has successfully been applied also to music and to fine-art paintings. Stylistics is the study of style used in literary, and verbal language and the effect the writer/speaker wishes to communicate to the reader/hearer. ...


Stylometry is often used to attribute authorship to anonymous or disputed documents. It has legal as well as academic and literary applications, ranging from the question of the authorship of Shakespeare's works to Forensic linguistics. Authorship is the act of creating a work, idea or theory. ... ... This portrait, called the Chandos portrait, hangs in the National Portrait Gallery. ... Forensic linguistics is the name given to a number of sub-disciplines within applied linguistics, and which relate to the interface between language, the law and crime. ...


An early example is Lorenzo Valla's 1439 proof that the Donation of Constantine was a forgery, an argument based partly on a comparison of the Latin with that used in authentic 4th Century documents. Lorenzo (or Laurentius) Valla (c. ... Events Battle of Grotnik, which ended the hussite movement in Poland Eric of Pomerania, King of Sweden, Denmark and Norway is declared deposed in Sweden. ... The Donation of Constantine (Latin, Constitutum Donatio Constantini or Constitutum domini Constantini imperatoris) is a forged Roman imperial edict devised probably between 750 and 850. ... Forgery is the process of making or adapting objects or documents (see false document), with the intention to deceive. ... Latin is an ancient Indo-European language originally spoken in the region around Rome called Latium. ... As a means of recording the passage of time, the 4th century was that century which lasted from 301 to 400. ...

Contents


Methods

Modern stylometry draws heavily on the aid of computers for statistical analysis, artificial intelligence and access to the growing corpus of texts available via the Internet. The tower of a personal computer. ... Statistics is the science and practice of developing knowledge through the use of empirical data expressed in quantitative form. ... Hondas intelligent humanoid robot AI redirects here. ... In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). ...


Whereas in the past, stylometry emphasized the rarest or most striking elements of a text, contemporary techniques can isolate identifying patterns even in common parts of speech.


Frequency in chunks

In one method of determining style, the text is analyzed to find the 50 most common words. The text is then broken into 5,000 word chunks and each of the chunks is analyzed to find the frequency of those 50 words in that chunk. This generates a unique 50-number identifier for each chunk. These numbers place each chunk of text into a point in a 50-dimensional space. This 50-dimensional space is flattened into a plane using principal components analysis (PCA). This results in a display of points that correspond to an author's style. If two literary works are placed on the same plane, the resulting pattern may show if both works were by the same author or different authors. In statistics, principal components analysis (PCA) is a technique for simplifying a dataset. ...


Neural networks

Neural networks are used to analyze authorship of texts. One such network was built with the links having random strengths. The network was presented with training texts of known authorship. Any time the network guessed incorrectly, it adjusted the strengths of its links until the network could properly identify known texts. Once the training period is complete, the network can properly determine authorship of texts by authors that it had been trained on previously. Simplified view of an artificial neural network A neural network is an interconnected group of biological neurons. ...


Genetic Algorithms

The genetic algorithm is another artificial intelligence technique used in stylometry. A method starts out with a set of rules. An example rule might be, "If but appears more than 1.7 times in every thousand words, then the text is author X". The program is presented with text and uses the rules to determine authorship. The rules are tested against a set of known texts and each rule is given a fitness score. The 50 rules with the lowest scores are thrown out. The remaining 50 rules are given small changes and 50 new rules are introduced. This is repeated until the evolved rules correctly attribute the texts. A genetic algorithm (GA) is a search technique used in computer science to find approximate solutions to optimization and search problems. ...


Rare Pairs

One method for identifying style is called "rare pairs", and relies upon individual habits of collocation. The use of certain words may, for a particular author, idiosyncratically entail the use of other, predictable words. This article or section does not cite its references or sources. ...


  Results from FactBites:
 
Stylometry - Wikipedia, the free encyclopedia (524 words)
Stylometry is the application of the study of linguistic style, usually to written language.
Stylometry is often used to attribute authorship to anonymous or disputed documents.
An early example is Lorenzo Valla's 1439 proof that the Donation of Constantine was a forgery, an argument based partly on a comparison of the Latin with that used in authentic 4th Century documents.
The State of Authorship Attribution Studies: (1) The History and the Scope; (2) The Problems -- Towards Credibility and ... (3350 words)
Stylometry - the statistical analysis of literary style - complements traditional literary scholarship since it offers a means of capturing the often elusive character of an author's style by quantifying some of its features.
The origins of stylometry may be traced back to the work of Mendenhall (1887) on word-lengths and the idea of counting features of a text was extended by Yule (1938) to include sentence-lengths.
Stylometry is essentially a case of pattern recognition.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.