FACTOID # 39: The eight most developed countries all speak Germanic languages.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Entropy encoding

In information theory an entropy encoding is a data compression scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols. Typically, entropy encoders are used to compress data by replacing symbols represented by equal-length codes with symbols represented by codes where the length of each codeword is proportional to the negative logarithm of the probability. Therefore, the most common symbols use the shortest codes. A bundle of optical fiber. ... In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ... Probability is the chance that something is likely to happen or be the case. ... Entropy of a Bernoulli trial as a function of success probability. ... In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio. ... Above is the graph plots of Logarithms to various bases: is to base e, is to base 10, and is to base 1. ...


According to Shannon's source coding theorem, the optimal code length for a symbol is −logbP, where b is the number of symbols used to make output codes and P is the probability of the input symbol. Claude Elwood Shannon (April 30, 1916 _ February 24, 2001) has been called the father of information theory, and was the founder of practical digital circuit design theory. ... In information theory, the source coding theorem (Shannon 1948) informally states that: N i. ...


Two of the most common entropy encoding techniques are Huffman coding and arithmetic coding. If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code such as unary coding, Elias gamma coding, Fibonacci coding, Golomb coding, or Rice coding may be useful. In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... Arithmetic coding is a method for lossless data compression. ... This should probably be merged with Audio level compression or Companding In telecommunication, the term signal compression has the following meanings: In analog (usually audio) systems, reduction of the dynamic range of a signal by controlling it as a function of the inverse relationship of its instantaneous value relative to... Unary coding is an entropy encoding that represents a number n with n-1 ones followed by a zero. ... Elias gamma code is a universal code encoding the positive integers. ... The Fibonacci code is a universal code which encodes positive integers into binary code words. ... Golomb coding is a form of entropy encoding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. ... Golomb coding is a form of entropy coding invented by Solomon W. Golomb that is optimal for alphabets following geometric distributions, that is, when small values are vastly more common than large values. ...


Entropy as a measure of Similarity

Template:Suggest to remove this section, entropy is not a measure of similarity, not even remotely close. There are other means of estimating similarities such as cross correlation functions.


Besides using entropy encoding as a way to compress (and losslessly recover) digital data, an entropy encoder can also be used to measure the amount of similarity between streams of data. This is done by generating an entropy coder/compressor for each class of data; unknown data is then classified by feeding the uncompressed data to each compressor and seeing which compressor yields the highest compression. The coder with the best compression is probably the coder trained on the data that was most similar to the unknown data.


See also

In data compression, a universal code for integers is a prefix-free code that maps the positive integers onto self-delimiting binary codewords, with the additional property that whatever the true probability distribution on integers, the lengths of the codewords are within a constant factor of the lengths that the... In computer science and information theory, data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. ... A bundle of optical fiber. ...

External links

  • On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay - gives an accessible introduction to Shannon theory and data compression, including the Huffman coding and arithmetic coding.
  • Spam Filtering using Statistical Data Compression Models by Andrej Bratko, Gordon V. Cormack, Bogdan Filipic, Thomas R. Lynam and Blaz Zupan, Journal of Machine Learning Research, Vol 7(Dec), 2006.
  • Anatomy of Range Encoder

An earlier (open content) version of the above article was posted on PlanetMath. David MacKay and David Mackay can refer to more than one person: David MacKay (soldier), a soldier, and winner of the Victoria Cross. ... In computer science and information theory, Huffman coding is an entropy encoding algorithm used for lossless data compression. ... Arithmetic coding is a method for lossless data compression. ...


  Results from FactBites:
 
Information entropy - Wikipedia, the free encyclopedia (1613 words)
Entropy is a measure of this randomness, suggested by Claude E. Shannon in his 1948 paper A Mathematical Theory of Communication.
Entropy effectively bounds the performance of the strongest lossless (or nearly lossless) compression possible, which can be realized in theory by using the typical set or in practice using Huffman, Lempel-Ziv or arithmetic coding.
Thus, the entropy of the source alphabet, with its given empiric probability distribution, is a number equal to the number (possibly fractional) of symbols of the "ideal alphabet", with an optimal probability distribution, necessary to encode for each symbol of the source alphabet.
Entropy encoding - Wikipedia, the free encyclopedia (228 words)
An entropy encoding is a coding scheme that assigns codes to symbols so as to match code lengths with the probabilities of the symbols.
Typically, entropy encoders are used to compress data by replacing symbols represented by equal-length codes with symbols represented by codes proportional to the negative logarithm of the probability.
If the approximate entropy characteristics of a data stream are known in advance (especially for signal compression), a simpler static code such as unary coding, Elias gamma coding, Fibonacci coding, Golomb coding, or Rice coding may be useful.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.