|
Machine translation, sometimes referred to by the acronym MT, is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. At its basic level, MT performs simple substitution of atomic words in one natural language for words in another. Using corpus techniques, more complex translations may be attempted, allowing for better handling of differences in linguistic typology, phrase recognition, and translation of idioms, as well as the isolation of anomalies. Computational linguistics is an interdisciplinary field dealing with the statistical and logical modeling of natural language from a computational perspective. ...
Software, or program, enables a computer to perform specific tasks, as opposed to the physical components of the system (hardware). ...
Look up translate in Wiktionary, the free dictionary. ...
The term natural language is used to distinguish languages spoken and signed (by hand signals and facial expressions) by humans for general-purpose communication from constructs such as writing, computer-programming languages or the languages used in the study of formal logic, especially mathematical logic. ...
In general, substitution is the replacement of one thing with another. ...
Corpus linguistics is the study of language as expressed in samples (corpora) or real world text. ...
Linguistic typology is the typology that classifies languages by their features. ...
As Thought Process During the process of thinking, recognition occurs when some event, process, pattern, or object recurs. ...
Idioms is an expression (i. ...
Current machine translation software often allows for customisation by domain or profession (such as weather reports) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. This article or section does not cite its references or sources. ...
Satellite image of Hurricane Hugo with a polar low visible at the top of the image. ...
Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has unambiguously identified which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language. In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence. ...
Introduction
The translation process may be stated as: The translation process is an activity during which a person (the translator) establishes equivalences between a text, or segments of a text, and another language. ...
- Decoding the meaning of the source text; and
- Re-encoding this meaning in the target language.
Behind this ostensibly simple procedure lies a complex cognitive operation. To decode the meaning of the source text in its entirety, the translator must interpret and analyse all the features of the text, a process which requires in-depth knowledge of both the grammar, semantics, syntax, idioms, and the like of the source language, as well as the culture of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the target language. For other senses of the word code, see code (disambiguation). ...
In linguistics, meaning is the content carried by the words or signs exchanged by people when communicating through language. ...
A source text is text (usually written but sometimes oral) from which information or ideas are derived. ...
The word encoding has a number of meanings. ...
In linguistics, meaning is the content carried by the words or signs exchanged by people when communicating through language. ...
Translation is an activity comprising the interpretation of the meaning of a text in one language—the source text—and the production of a new, equivalent text in another language—the target text, also called the translation. ...
Grammar is the study of rules governing the use of language. ...
Semantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign) refers to the aspects of meaning that are expressed in a language, code, or other form of representation. ...
For other uses, see Syntax (disambiguation). ...
Idioms is an expression (i. ...
Culture (from the [[Latin)) cultura stemming from colere, meaning to cultivate), generally refers to patterns of human activity and the symbolic structures that give such activity significance. ...
Therein lies the challenge in machine translation: how to program a computer to "understand" a text as a person does and also to "create" a new text in the target language that "sounds" as if it has been written by a person. This problem may be approached in a number of ways.
Approaches
Pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation. Machine translation can use a method based on linguistic rules, which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language. Image File history File links Direct_translation_and_transfer_translation_pyramind. ...
Image File history File links Direct_translation_and_transfer_translation_pyramind. ...
Interlingual machine translation is one instance of rule-based machine translation approaches. ...
An expert system also known as a knowledge based system, is a computer program that contains some of the subject-specific knowledge of one or more human experts. ...
It is often argued that the success of machine translation requires the problem of natural language understanding to be solved first. Natural language processing (NLP) is a subfield of artificial intelligence and linguistics. ...
Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as interlingual machine translation or transfer-based machine translation. These methods require extensive lexicons with morphological, syntactic, and semantic information, and large sets of rules. Interlingual machine translation is one instance of rule-based machine translation approaches. ...
Look up lexicon in Wiktionary, the free dictionary. ...
For other uses, see Morphology. ...
For other uses, see Syntax (disambiguation). ...
Semantics (Greek semantikos, giving signs, significant, symptomatic, from sema, sign) refers to the aspects of meaning that are expressed in a language, code, or other form of representation. ...
Given enough data, machine translation programs often work well enough for a native speaker of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual corpus of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. First language (native language, mother tongue) is the language a person learns first. ...
In linguistics, a corpus (plural corpora) or text corpus is a large and structured set of texts (now usually electronically stored and processed). ...
To translate between closely related languages, a technique referred to as shallow-transfer machine translation may be used.
Dictionary-based machine translation -
Main article: Dictionary-based machine translation Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does — word by word, usually without much correlation of meaning between them. Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does â word by word, usually without much correlation of meaning between them. ...
The dictionary is a list of words with their definitions, a list of characters with their glyphs, or a list of words with corresponding words in other languages. ...
Statistical machine translation -
Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE from IBM. Google currently uses SYSTRAN, but is working on a statistical translation method for most of their machine translation in the future. Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations materials to train their system. Accuracy of the translation has improved. [1] Statistical or non-statistical, since perfect pararelle and useful pairs are rare in number vs. natural language occurrences, text machine translation will remain at best for gist comprehension only. Statistical machine translation, sometimes referred to as Stat MT or SMT, is a machine translation paradigm where translations are generated on the basis of statistical and information theoretic models whose parameters are derived from the analysis of bilingual text corpora. ...
A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ...
Hansard is the traditional name for the printed transcripts of parliamentary debates in the Westminster system of government. ...
MOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOOMOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOooOOOOOOOO ...
International Business Machines Corporation (known as IBM or Big Blue; NYSE: IBM) is a multinational computer technology corporation headquartered in Armonk, New York, USA. The company is one of the few information technology companies with a continuous history dating back to the 19th century. ...
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. ...
The foundation of the U.N. The United Nations (UN) is an international organization whose stated aims are to facilitate co-operation in international law, international security, economic development, social progress and human rights issues. ...
Example-based machine translation -
Main article: Example-based machine translation Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. It is essentially a translation by analogy and can be viewed as an implementation of case-based reasoning approach of machine learning. Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual corpus as its main knowledge base, at run-time. ...
Look up corpus in Wiktionary, the free dictionary. ...
Analogy is either the cognitive process of transferring or giving information from a particular subject (the analogue or source) to another particular subject (the target), or a linguistic expression corresponding to such a process. ...
Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. ...
As a broad subfield of artificial intelligence, Machine learning is concerned with the development of algorithms and techniques that allow computers to learn. At a general level, there are two types of learning: inductive, and deductive. ...
Interlingual machine translation -
Interlingual machine translation is one instance of rule-based machine translation approaches. According to this approach, the source language, ie. the text to be translated is transformed into an interlingual, ie. source/target language independent representation. The target language is then generated out of the interlingua. Interlingual machine translation is one instance of rule-based machine translation approaches. ...
Major issues Word sense disambiguation -
Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel [2]. He pointed out that without a "universal encyclopaedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches. In computational linguistics, word sense disambiguation (WSD) is the problem of determining in which sense a word having a number of distinct senses is used in a given sentence. ...
1950 (MCML) was a common year starting on Sunday (link will take you to calendar). ...
Yehoshua Bar-Hillel (1915-1975) was a philosopher, mathematician and linguist at MIT and the Hebrew University. ...
Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.
Named entities Related to named entity recognition in information extraction. Named entity recognition (NER) (also known as entity identification (EI)) is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. ...
Information extraction (IE) is a type of information retrieval whose goal is to automatically extract structured or semistructured information from unstructured machine-readable documents. ...
History -
The history of machine translation generally starts in the 1950s after the second world war. The Georgetown experiment in 1954 involved fully automatic translation of more than sixty Russian sentences into English. The experiment was a great success and ushered in an era of significant funding for machine translation research. The authors claimed that within three or five years, machine translation would be a solved problem. The history of machine translation generally starts in the 1950s, although work can be found from earlier periods. ...
Combatants Allied powers: United Kingdom Soviet Union United States Republic of China and others Axis powers: Germany Italy Japan and others Commanders Winston Churchill Joseph Stalin Franklin Roosevelt Chiang Kai-shek Adolf Hitler Benito Mussolini Hideki TÅjÅ Casualties Military dead: 17,000,000 Civilian dead: 33,000,000 Total...
The Georgetown-IBM experiment was an influential demonstration of machine translation, which took place on January 7, 1954. ...
However, the real progress was much slower, and after the ALPAC report in 1966, which found that the ten years long research had failed to fulfill the expectations, the funding was dramatically reduced. Starting in the late 1980s, as computational power increased and became less expensive, more interest began to be shown in statistical models for machine translation. ALPAC (Automatic Language Processing Advisory Commitee) was a commitee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular. ...
Statistical machine translation, sometimes referred to as Stat MT or SMT, is a machine translation paradigm where translations are generated on the basis of statistical and information theoretic models whose parameters are derived from the analysis of bilingual text corpora. ...
Today there are many software programs for translating natural language, several of them online, such as the SYSTRAN system which powers both Google translate and the AltaVista's Babelfish. Although there is no system that provides the holy-grail of "Fully automatic high quality machine translation" (FAHQMT), many systems provide reasonable output. SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. ...
Google, Inc. ...
Current AltaVista site logo. ...
Real world applications Despite their inherent limitations, MT programs are currently used by various organisations around the world. Probably the largest institutional user is the European Commission, which uses a highly customised version of the commercial MT system SYSTRAN to handle the automatic translation of a large volume of preliminary drafts of documents for internal use. The European Commission (formally the Commission of the European Communities) is the executive body of the European Union. ...
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. ...
A Danish translation agency, Lingtech A/S [3], has been translating patent applications from English to Danish since 1993 using a proprietary rule-based machine translation system, PaTrans, working together with the translation memory based Trados commercial CAT tool. The system requires both manual pre- and post-editing, but the monthly output is still approximately 400,000 words per operator.[citation needed] The Spanish daily newspaper Periódico de Catalunya is translated from Spanish into Catalan with an MT system [4]. Catalan IPA: (català IPA: or []) is a Romance language, the national language of Andorra and one of several co-official languages in the Spanish autonomous communities of Balearic Islands, Valencia (under the name Valencian), and Catalonia. ...
Google has claimed that promising results were obtained using a proprietary statistical machine translation engine [5]. The statistical translation engine used in the Google Translation tools for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology. [6] [7] [8] Uwe Muegge has implemented a demo website [9] that uses a controlled language in combination with the Google engine to produce fully automatic, high-quality machine translations of his English, German, and French web sites. Google, Inc. ...
Uwe Muegge (pronounced []), (b. ...
Controlled Natural Languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. ...
With the recent focus on terrorism, the military sources in US invest significant amounts of money in natural language engineering. In-Q-Tel [10] (a venture capital fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like Language Weaver. Currently the military community is interested in translation and processing of languages like Arabic, Pashto, and Dari. [citation needed] Information Processing Technology Office in DARPA hosts programs like TIDES and Babylon Translator. US Air Force has awarded a $1 million contract to develop a language translation technology. [11] Venture capital is a general term to describe financing for startup and early stage businesses as well as businesses in turn around situations. ...
Language Weaver is a Los Angeles, California based company that was founded by USCs Kevin Knight and Daniel Marcu, marketing a new software product that learns a language by itself by reading a lot of text (extracting probabilistic translation dictionaries, patterns, and rules), and therefore is able to translate...
Arabic ( or just ) is the largest living member of the Semitic language family in terms of speakers. ...
Pashto (â, IPA: also known as Pakhto, Pushto, Pukhto, Pashtoe, Pashtu, Pushtu, Pushtoo, Pathan, Pukhto â, or Afghan language) is a language spoken by people living in western Pakistan and southeastern Afghanistan. ...
The term Dari derives from Fârsi-e Darbâri which means Persian of the (royal) courts. It developed at the royal courts of the Samanids (980 AD) in Central Asia and became the major language of Persia. ...
The Defense Advanced Research Projects Agency (DARPA) is an agency of the United States Department of Defense responsible for the development of new technology for use by the military. ...
TIDES is an ambitious technology development effort, funded by DARPA. It stands for Translingual Information Detection, Extraction and Summarization. ...
Babylon is a single-click translation utility and also information source tool. ...
Evaluation There are various means for evaluating the performance of machine-translation systems. The oldest is the use of human judges to assess a translation's quality. More recent, automated means of evaluation include BLEU, NIST and METEOR. Trinomial name Homo sapiens sapiens Linnaeus, 1758 Humans, or human beings, are bipedal primates belonging to the mammalian species Homo sapiens (Latin: wise man or knowing man) under the family Hominidae (the great apes). ...
Automation (ancient Greek: = self dictated) or Industrial Automation is the use of computers to control industrial machinery and processes, replacing human operators. ...
BLEU (acronym for Bilingual evaluation understudy) is a method for evaluating the quality of text which has been translated using machine translation. ...
NIST is an evaluation metric used in Machine Translation which is based on the BLEU matric, but with some alterations. ...
Photo of a burst of meteors with extended exposure time A meteor is the visible path of a meteoroid that enters the Earths (or another bodys) atmosphere, commonly called a shooting star or falling star. ...
Current products of machine translation all point to a state of impasse. Most cannot escape of being robotic, at best, useful for "gisting translations" — translations that at best allow the reader to "get the gist" of the source text, but are unlikely to convey a perfect understanding of it. An experienced reader, especially one with a good knowledge of the two languages involved, may find such a raw translation useful. The flaw in indiscriminate reliance on machine translation is that human language is context-embedded and that, inescapably, it takes a human being to adequately comprehend a human context. And even purely human-generated translations are prone to error. One approach, based on machine translation software, for a foreign user to understand an original text engages the reader’s active knowledgeable mind in the translation process. The term natural language is used to distinguish languages spoken and signed (by hand signals and facial expressions) by humans for general-purpose communication from constructs such as writing, computer-programming languages or the languages used in the study of formal logic, especially mathematical logic. ...
ConTeXt is a document preparation system based on the TeX typesetting system. ...
Trinomial name Homo sapiens sapiens Linnaeus, 1758 Humans, or human beings, are bipedal primates belonging to the mammalian species Homo sapiens (Latin: wise man or knowing man) under the family Hominidae (the great apes). ...
Trinomial name Homo sapiens sapiens Linnaeus, 1758 Humans, or human beings, are bipedal primates belonging to the mammalian species Homo sapiens (Latin: wise man or knowing man) under the family Hominidae (the great apes). ...
Trinomial name Homo sapiens sapiens Linnaeus, 1758 Humans, or human beings, are bipedal primates belonging to the mammalian species Homo sapiens (Latin: wise man or knowing man) under the family Hominidae (the great apes). ...
The word error has different meanings in different domains. ...
Quotes In the words of the European Association for Machine Translation (EAMT): The European Association for Machine Translation is the European branch of the International Association for Machine Translation. ...
- Machine translation (MT) is the application of computers to the task of translating texts from one natural language to another. One of the very earliest pursuits in computer science, MT has proved to be an elusive goal, but today a number of systems are available which produce output which, if not perfect, is of sufficient quality to be useful in a number of specific domains. [12] (1997)
See also Hondas humanoid robot AI redirects here. ...
Cognitive science is usually defined as the scientific study either of mind or of intelligence (e. ...
Computer-assisted translation, Computer-aided translation, or CAT is a form of translation wherein a human translator translates texts using computer software designed to support and facilitate the translation process. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
Distributed Language Translation (DLT) was a project to develop a machine translation system for twelve European languages. ...
Eurotra was an ambitious machine translation project established and funded by the European Commission from the late 1970s until 1994. ...
The history of machine translation generally starts in the 1950s, although work can be found from earlier periods. ...
Language acquisition is the process by which the language capability develops in a human. ...
If one is translating texts of unknown origin, the first order of business is to recognize the language of the text, also known as language identification which is a kind of text categorization. ...
Machine translation software is available in a variety of formats and licenses. ...
A parallel text is a text in one language together with its translation in another language. ...
Look up translate in Wiktionary, the free dictionary. ...
Universal grammar is a theory of linguistics postulating principles of grammar shared by all languages, thought to be innate to humans. ...
In machine translation, Universal Networking Language (UNL) is an artificial pivot language, that relies on the semi-automatic translation from the initial text in a natural language into its pivot equivalent. ...
The universal translator is a fictional device common to many science fiction works. ...
The following is a list of research laboratories that focus on machine translation. ...
Anatomy of a babel fish as illustrated in the BBC TV series by Rod Lord. ...
Notes References - Hutchins, W. John; and Harold L. Somers (1992). An Introduction to Machine Translation. London: Academic Press. ISBN 0-12-362830-X.
External links |