|
Stylometry is the application of the study of linguistic style, usually to written language. In the last few years it has successfully been applied also to music and to fine-art paintings. Stylistics is the study of style used in literary, and verbal language and the effect the writer/speaker wishes to communicate to the reader/hearer. ...
Stylometry is often used to attribute authorship to anonymous or disputed documents. It has legal as well as academic and literary applications, ranging from the question of the authorship of Shakespeare's works to Forensic linguistics. Authorship is the act of creating a work, idea or theory. ...
...
This portrait, called the Chandos portrait, hangs in the National Portrait Gallery. ...
Forensic linguistics is the name given to a number of sub-disciplines within applied linguistics, and which relate to the interface between language, the law and crime. ...
An early example is Lorenzo Valla's 1439 proof that the Donation of Constantine was a forgery, an argument based partly on a comparison of the Latin with that used in authentic 4th Century documents. Lorenzo (or Laurentius) Valla (c. ...
Events Battle of Grotnik, which ended the hussite movement in Poland Eric of Pomerania, King of Sweden, Denmark and Norway is declared deposed in Sweden. ...
The Donation of Constantine (Latin, Constitutum Donatio Constantini or Constitutum domini Constantini imperatoris) is a forged Roman imperial edict devised probably between 750 and 850. ...
Forgery is the process of making or adapting objects or documents (see false document), with the intention to deceive. ...
Latin is an ancient Indo-European language originally spoken in the region around Rome called Latium. ...
As a means of recording the passage of time, the 4th century was that century which lasted from 301 to 400. ...
Methods Modern stylometry draws heavily on the aid of computers for statistical analysis, artificial intelligence and access to the growing corpus of texts available via the Internet. The tower of a personal computer. ...
Statistics is the science and practice of developing knowledge through the use of empirical data expressed in quantitative form. ...
Hondas intelligent humanoid robot AI redirects here. ...
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). ...
Whereas in the past, stylometry emphasized the rarest or most striking elements of a text, contemporary techniques can isolate identifying patterns even in common parts of speech.
Frequency in chunks In one method of determining style, the text is analyzed to find the 50 most common words. The text is then broken into 5,000 word chunks and each of the chunks is analyzed to find the frequency of those 50 words in that chunk. This generates a unique 50-number identifier for each chunk. These numbers place each chunk of text into a point in a 50-dimensional space. This 50-dimensional space is flattened into a plane using principal components analysis (PCA). This results in a display of points that correspond to an author's style. If two literary works are placed on the same plane, the resulting pattern may show if both works were by the same author or different authors. In statistics, principal components analysis (PCA) is a technique for simplifying a dataset. ...
Neural networks Neural networks are used to analyze authorship of texts. One such network was built with the links having random strengths. The network was presented with training texts of known authorship. Any time the network guessed incorrectly, it adjusted the strengths of its links until the network could properly identify known texts. Once the training period is complete, the network can properly determine authorship of texts by authors that it had been trained on previously. Simplified view of an artificial neural network A neural network is an interconnected group of biological neurons. ...
Genetic Algorithms The genetic algorithm is another artificial intelligence technique used in stylometry. A method starts out with a set of rules. An example rule might be, "If but appears more than 1.7 times in every thousand words, then the text is author X". The program is presented with text and uses the rules to determine authorship. The rules are tested against a set of known texts and each rule is given a fitness score. The 50 rules with the lowest scores are thrown out. The remaining 50 rules are given small changes and 50 new rules are introduced. This is repeated until the evolved rules correctly attribute the texts. A genetic algorithm (GA) is a search technique used in computer science to find approximate solutions to optimization and search problems. ...
Rare Pairs One method for identifying style is called "rare pairs", and relies upon individual habits of collocation. The use of certain words may, for a particular author, idiosyncratically entail the use of other, predictable words. This article or section does not cite its references or sources. ...
|