FACTOID # 71: You can be imprisoned for not voting in Fiji, Chile and Egypt - at least in theory.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

Encyclopedia > Frequency analysis

In mathematics, physics and signal processing, frequency analysis is a method to decompose a function, wave, or signal into its frequency components so that it is possible to have the frequency spectrum. Wikibooks Wikiversity has more about this subject: School of Mathematics Wikiquote has a collection of quotations related to: Mathematics Look up Mathematics on Wiktionary, the free dictionary Wikimedia Commons has more media related to: Mathematics Bogomolny, Alexander: Interactive Mathematics Miscellany and Puzzles. ... Since antiquity, people have tried to understand the behavior of matter: why unsupported objects drop to the ground, why different materials have different properties, and so forth. ... Signal processing is the processing, amplification and interpretation of signals. ... In mathematics, a function is a relation, such that each element of a set (the domain) is associated with a unique element of another (possibly the same) set (the codomain, not to be confused with the range). ... A wave is a disturbance that propagates, carrying energy. ... Signaling, or signal, may mean: Look up signal in Wiktionary, the free dictionary. ... In mathematics, physics and signal processing, the frequency spectrum is a representation of a signal or other function in terms of frequency (in the frequency domain). It is the projection of the function onto a set of sinusoidal basis functions. ...


A typical distribution of letters in English language text. Weak ciphers do not sufficiently mask the distribution, and this might be exploited by a cryptanalyst to read the message.
A typical distribution of letters in English language text. Weak ciphers do not sufficiently mask the distribution, and this might be exploited by a cryptanalyst to read the message.

In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers. English single letter frequencies. ... English single letter frequencies. ... The English language is a West Germanic language that originates in England. ... Cryptanalysis (from the Greek kryptós, hidden, and analýein, to loosen or to untie) is the study of methods for obtaining the meaning of encrypted information without access to the secret information which is normally required to do so. ... The frequency of letters in text messages has often been studied for use in cryptography, and frequency analysis in particular. ... This article is about algorithms for encryption and decryption. ... In cryptography, a classical cipher is a type of cipher used historically but which now have fallen, for the most part, into disuse. ...


Frequency analysis is based on the fact that, in any given stretch of written language, certain letters and combinations of letters occur with varying frequencies. Moreover, there is a characteristic distribution of letters that is roughly the same for almost all samples of that language. For instance, given a section of English language, E tends to be very common, while X is very rare. Likewise, ST, NG, TH, and QU are common pairs of letters (termed bigrams or digraphs), while NZ and QJ are rare. The mnemonic phrase "ETAOIN SHRDLU" encodes the 12 most frequent letters in typical English language text. The English language is a West Germanic language that originates in England. ... Bigrams are groups of two written letters, two syllables, or two words, and are very commonly used as the basis for simple statistical analysis of text; one of the most successful language models for Speech Recognition (Collins, 1996). ... ETAOIN SHRDLU is the approximate order of frequency of the twelve most commonly used letters in the English language. ...


In some ciphers, such properties of the natural language plaintext are preserved in the ciphertext, and these patterns have the potential to be exploited in a ciphertext-only attack. In cryptography, a ciphertext-only attack is a form of cryptanalysis where the attacker is assumed to have access only to a set of ciphertexts. ...

Contents


Frequency analysis for simple substitution ciphers

In a simple substitution cipher, each letter of the plaintext is replaced with another, and any particular letter in the plaintext will always be transformed into the same letter in the ciphertext. For instance, all e's will turn into X's. A ciphertext message containing lots of X's would suggest to a cryptanalyst that X represented e. In cryptography, a substitution cipher is a method of encryption by which units of plaintext are substituted with ciphertext according to a regular system; the units may be single letters (the most common), pairs of letters, triplets of letters, mixtures of the above, and so forth. ... The plain text term has shit a different meaning. ...


The basic use of frequency analysis is to first count the frequency of ciphertext letters and then associate guessed plaintext letters with them. More X's in the ciphertext than anything else suggests that X corresponds to e in the plaintext, but this is not certain; t and a are also very common in English, so X might be either of them also. It is unlikely to be a plaintext z or q which are less common. Thus the cryptanalyst may need to try several combinations of mappings between ciphertext and plaintext letters.


More complex use of statistics can be conceived, such as considering counts of pairs of letters, or triplets (trigrams), and so on. This is done to provide more information to the cryptanalyst, for instance, Q and U nearly always occur together in that order in English, even though Q itself is rare.


An example

Suppose Eve has intercepted the cryptogram below, and it is known to be encrypted using a simple substitution cipher: A cryptogram is a type of word puzzle popularly printed in some newspapers and magazines. ...

 LIVITCSWPIYVEWHEVSRIQMXLEYVEOIEWHRXEXIPFEMVEWHKVSTYLXZIXLIKIIXPIJVSZEYPERRGERIM WQLMGLMXQERIWGPSRIHMXQEREKIETXMJTPRGEVEKEITREWHEXXLEXXMZITWAWSQWXSWEXTVEPMRXRSJ GSTVRIEYVIEXCVMUIMWERGMIWXMJMGCSMWXSJOMIQXLIVIQIVIXQSVSTWHKPEGARCSXRWIEVSWIIBXV IZMXFSJXLIKEGAEWHEPSWYSWIWIEVXLISXLIVXLIRGEPIRQIVIIBGIIHMWYPFLEVHEWHYPSRRFQMXLE PPXLIECCIEVEWGISJKTVWMRLIHYSPHXLIQIMYLXSJXLIMWRIGXQEROIVFVIZEVAEKPIEWHXEAMWYEPP XLMWYRMWXSGSWRMHIVEXMSWMGSTPHLEVHPFKPEZINTCMXIVJSVLMRSCMWMSWVIRCIGXMWYMX 

For this example, uppercase letters are used to denote ciphertext, lowercase letters are used to denote plaintext (or guesses at such), and X~t is used to express a guess that ciphertext letter X represents the plaintext letter t.


Eve could use frequency analysis to help solve the message along the following lines: counts of the letters in the cryptogram show that I is the most common single letter, XL most common bigram, and XLI is the most common trigram. e is the most common letter in the English language, th is the most common bigram, and the the most common trigram. This strongly suggests that X~t, L~h and I~e. The second most common letter in the cryptogram is E; since the first and second most frequent letters in the English language, e and t are accounted for, Eve guesses that E~a, the third most frequent letter. Tentatively making these assumptions, the following partial decrypted message is obtained.

 heVeTCSWPeYVaWHaVSReQMthaYVaOeaWHRtatePFaMVaWHKVSTYhtZetheKeetPeJVSZaYPaRRGaReM WQhMGhMtQaReWGPSReHMtQaRaKeaTtMJTPRGaVaKaeTRaWHatthattMZeTWAWSQWtSWatTVaPMRtRSJ GSTVReaYVeatCVMUeMWaRGMeWtMJMGCSMWtSJOMeQtheVeQeVetQSVSTWHKPaGARCStRWeaVSWeeBtV eZMtFSJtheKaGAaWHaPSWYSWeWeaVtheStheVtheRGaPeRQeVeeBGeeHMWYPFhaVHaWHYPSRRFQMtha PPtheaCCeaVaWGeSJKTVWMRheHYSPHtheQeMYhtSJtheMWReGtQaROeVFVeZaVAaKPeaWHtaAMWYaPP thMWYRMWtSGSWRMHeVatMSWMGSTPHhaVHPFKPaZeNTCMteVJSVhMRSCMWMSWVeRCeGtMWYMt 

Using these initial guesses, Eve can spot patterns that confirm her choices, such as "that". Moreover, other patterns suggest further guesses. "Rtate" might be "state", which would mean R~s. Similarly "atthattMZe" could be guessed as "atthattime", yielding M~i and Z~m. Furthemore, "heVe" might be "here", giving V~r. Filling in these guesses, Eve gets:

 hereTCSWPeYraWHarSseQithaYraOeaWHstatePFairaWHKrSTYhtmetheKeetPeJrSmaYPassGasei WQhiGhitQaseWGPSseHitQasaKeaTtiJTPsGaraKaeTsaWHatthattimeTWAWSQWtSWatTraPistsSJ GSTrseaYreatCriUeiWasGieWtiJiGCSiWtSJOieQthereQeretQSrSTWHKPaGAsCStsWearSWeeBtr emitFSJtheKaGAaWHaPSWYSWeWeartheStherthesGaPesQereeBGeeHiWYPFharHaWHYPSssFQitha PPtheaCCearaWGeSJKTrWisheHYSPHtheQeiYhtSJtheiWseGtQasOerFremarAaKPeaWHtaAiWYaPP thiWYsiWtSGSWsiHeratiSWiGSTPHharHPFKPameNTCiterJSrhisSCiWiSWresCeGtiWYit 

In turn, these guesses suggest still others (for example, "remarA" could be "remark", implying A~k) and so on, and it is relatively straightforward to deduce the rest of the letters, eventually yielding the plaintext.


In this example, Eve's guesses were all correct. This would not always be the case, however; the variation in statistics for individual plaintexts can mean that initial guesses are incorrect. It may be necessary to backtrack incorrect guesses or to analyse the available statistics in much more depth than the somewhat simplified justifications given in the above example. Backtracking is a strategy for finding solutions to constraint satisfaction problems. ...


It is also possible that the plaintext does not exhibit the expected distribution of letter frequencies. Shorter messages are likely to show more variation. It is also possible to construct artificially skewed texts. For example, entire novels have been written that omit the letter "e" altogether — a form of literature known as a lipogram. A lipogram (from Greek lipagrammatos, missing letter) is a kind of writing with constraints or word game consisting of writing paragraphs or longer works in which a particular letter or group of letters is missing, usually a common vowel, the most common in English being e (McArthur, 1992). ...


History and usage

First page of Al-Kindi's 9th century Manuscript on Deciphering Cryptographic Messages
First page of Al-Kindi's 9th century Manuscript on Deciphering Cryptographic Messages

The first known recorded explanation of frequency analysis (indeed, of any kind of cryptanalysis) was given by 9th century Arab polymath Abu Yusuf Yaqub ibn Ishaq al-Sabbah Al-Kindi in A Manuscript on Deciphering Cryptographic Messages (Ibraham, 1992). It has been suggested that close textual study of the Qur'an first brought to light that Arabic has a characteristic letter frequency. Its use spread, and was so widely used by European states by the Renaissance that several schemes were invented by cryptographers to defeat it. These included The first page of al-Kindis manuscript On Deciphering Cryptographic Messages, containing the oldest known description of cryptanalysis by frequency analysis. ... The first page of al-Kindis manuscript On Deciphering Cryptographic Messages, containing the oldest known description of cryptanalysis by frequency analysis. ... Abū-Yūsuf Ya’qūb ibn Ishāq al-Kindī (c. ... This earthenware dish was made in 9th century Iraq. ... The Arabs (Arabic: عرب Ê»arab) are an originally Arabian ethnicity widespread in the Middle East and North Africa. ... Leonardo da Vinci A polymath (also known as a polyhistor) is a person who excels in multiple fields, particularly in both arts and sciences. ... Al-Kindi (Arabic: أبو يعقوب يوسف بن إسحاق الكندي) was a Arabn philosopher, scientist and ophthalmologist. ... The Quran (Arabic: al-qurān literally the recitation; also called Al Qurān Al KarÄ«m or The Noble Quran; or transliterated Quran, Koran, and less commonly Alcoran) is the holy book of Islam. ... Arabic (العربية al-arabiyyah, or less formally arabi) is the largest member of the Semitic branch of the Afro-Asiatic language family (classification: South Central Semitic) and is closely related to Hebrew and Aramaic. ...

  • use of homophones — several alternatives to the most common letters in otherwise monoalphabetic substitution ciphers (for example, for English, both X and Y ciphertext might mean plaintext E).
  • polyalphabetic substitution, that is, the use of several alphabets — chosen in assorted, more or less devious, ways (Leone Alberti seems to have been the first to propose this); and
  • polygraphic substitution, schemes where pairs or triplets of plaintext letters are treated as units for substitution, rather than single letters (for example, the Playfair cipher invented by Charles Wheatstone in the mid 1800s).

A disadvantage of all these attempts to defeat frequency counting attacks is that it increases complication of both enciphering and deciphering, leading to mistakes. Famously, a British Foreign Secretary is said to have rejected the Playfair cipher because, even if school boys could cope successfully as Wheatstone and Playfair had shown, 'our attaches could never learn it!'. A polyalphabetic cipher is any cipher based on substitution, using multiple substitution alphabets. ... Leone Battista Alberti (February 1404 - 25th April 1472), Italian painter, poet, linguist, philosopher, cryptographer, musician, architect, and general Renaissance polymath . ... The Playfair system was invented by Charles Wheatstone, first described in 1854. ... Sir Charles Wheatstone (February 6, 1802 - October 19, 1875) was the British inventor of many innovations including the English concertina the Stereoscope an early form of microphone the Playfair cipher (named for Lord Playfair, the person who publicized it) and the Wheatstone bridge. ...


The rotor machines of the first half of the 20th century (for example, the Enigma machine) were essentially immune to straightforward frequency analysis. However, other kinds of analysis ("attacks") successfully decoded messages from some of those machines. In cryptography, a rotor machine is a electro-mechanical device used for encrypting and decrypting secret messages. ... In the history of cryptography, the Enigma was a portable cipher machine used to encrypt and decrypt secret messages. ...


Frequency analysis requires only a basic understanding of the statistics of the plaintext language and some problem solving skills, and, if performed by hand, some tolerance for extensive letter bookkeeping. During World War II (WWII), both the British and the Americans recruited codebreakers by placing crossword puzzles in major newspapers and running contests for who could solve them the fastest. Several of the ciphers used by the Axis powers were breakable using frequency analysis (for example, some of the consular ciphers used by the Japanese). Mechanical methods of letter counting and statistical analysis (generally IBM card type machinery) were first used in WWII, possibly by the US Army's SIS. There are lurid tales of midnight expeditions by the cryptographers to machines in another Department. Today, the hard work of letter counting and analysis has been replaced by computer software, which can carry out such analyses in seconds. With modern computing power, classical ciphers are unlikely to provide any real protection for confidential data. World War II was a truly global conflict with many facets: immense human suffering, fierce indoctrinations, and the use of new, extremely devastating weapons like the atom bomb World War II, also known as the Second World War, was a mid-20th-century conflict that engulfed much of the globe... The crossword is the most common variety of word puzzle in the world. ... This article needs to be cleaned up to conform to a higher standard of quality. ... The Signals Intelligence Service (SIS) was the Armys codebreaking division. ... A computer is a device or machine for processing information according to a program — a compiled list of instructions. ... Computer software (or simply software) refers to one or more computer programs and data held in the storage of a computer for some purpose. ...


Frequency analysis in fiction

Frequency analysis has been described in fiction. Edgar Allan Poe's The Gold Bug, and Arthur Conan Doyle's Sherlock Holmes tale The Adventure of the Dancing Men are examples of stories which describe the use of frequency analysis to attack simple substitution ciphers. The cipher in the Poe story is encrusted with several deception measures, but this is more a literary device than anything significant cryptographically. From The Adventure of the Dancing Men Sherlock Holmes story. ... This daguerreotype of Poe was taken less than a year before his death at the age of 40. ... The Gold Bug is a short story by one of the masters of the genre, Edgar Allan Poe. ... Arthur Conan Doyle Sir Arthur Ignatius Conan Doyle (May 22, 1859 – July 7, 1930) is the British author most famously known for his stories about the detective Sherlock Holmes, which are generally considered a major innovation in the field of crime fiction. ... Sherlock Holmes Sherlock Holmes (1854-1957, according to William S. Baring-Gould) is a fictional detective of the late 19th and early 20th centuries, created by British author and physician Sir Arthur Conan Doyle. ... The Adventure of the Dancing Men, one of the 56 Sherlock Holmes short stories written by British author Sir Arthur Conan Doyle, is one of 13 stories in the cycle collected as The Return of Sherlock Holmes. ...


See also

ETAOIN SHRDLU is the approximate order of frequency of the twelve most commonly used letters in the English language. ... This article is intended to be an analytic glossary, or alternatively, an organized collection of annotated pointers. ... This article may be too technical for most readers to understand. ...

References

  • Helen Fouché Gaines, "Cryptanalysis", 1939, Dover. ISBN 0486200973
  • Ibraham A. “Al-Kindi: The origins of cryptology: The Arab contributions”, Cryptologia, 16(2) (April 1992) pp. 97–126.
  • Abraham Sinkov, "Elementary Cryptanalysis : A Mathematical Approach", The Mathematical Association of America, 1966. ISBN 0883856220.

Cryptologia is a journal in cryptography published quarterly since 1977. ...

External links


Classical cryptography edit
Ciphers: ADFGVX | Affine | Atbash | Autokey | Bifid | Book | Caesar | Four-square | Hill | Permutation | Pigpen | Playfair | Polyalphabetic | Reihenschieber | Running key | Substitution | Transposition | Trifid | Two-square | Vigenère
Cryptanalysis: Frequency analysis | Index of coincidence
Misc: Cryptogram | Polybius square | Scytale | Straddling checkerboard | Tabula recta

  Results from FactBites:
 
Frequency analysis - Wikipedia, the free encyclopedia (1161 words)
In mathematics, physics and signal processing, frequency analysis is a method to decompose a function, wave, or signal into its frequency components so that it is possible to have the frequency spectrum.
In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext.
Frequency analysis is based on the fact that, in any given stretch of written language, certain letters and combinations of letters occur with varying frequencies.
  More results at FactBites »

 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your location
Your comments
Please enter the 5-letter protection code


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.