FACTOID # 9: Luxembourgers are the world's richest people - and also the most generous.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Mutual information

In probability theory and, in particular, information theory, the mutual information, or transinformation, of two random variables is a quantity that measures the mutual dependence of the two variables. The most common unit of measurement of mutual information is the bit, in which case the logarithms below should be taken to the base 2. Probability theory is the mathematical study of probability. ... Information theory is the mathematical theory of data communication and storage, generally considered to have been founded in 1948 by Claude E. Shannon. ... A random variable can be thought of as the numeric result of operating a non-deterministic mechanism or performing a non-deterministic experiment to generate a random result. ... Measurement is the determination of the size or magnitude of something. ... This article is about the unit of information. ...


Intuitively, mutual information measures the information about X that is shared by Y. If X and Y are independent, then X contains no information about Y and vice versa, so their mutual information is zero. If X and Y are identical then all information conveyed by X is shared with Y: knowing X reveals nothing new about Y and vice versa, therefore the mutual information is the same as the information conveyed by X (or Y) alone, namely the entropy of X. In a specific sense (see below), mutual information quantifies the distance between the joint distribution of X and Y and the product of their marginal distributions. Entropy of a Bernoulli trial as a function of success probability. ... Given two random variables X and Y, the joint probability distribution of X and Y is the probability distribution of X and Y together. ... In probability theory, given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y, typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the marginal probability mass function can...


Formally, the mutual information of two discrete random variables X ane Y can be defined as:

I(X;Y) = sum_{y in Y} sum_{x in X} p(x,y) log frac{p(x,y)}{f(x),g(y)},

where p is the joint probability distribution function of X and Y, and f and g are the marginal probability distribution functions of X and Y respectively. Given two random variables X and Y, the joint probability distribution of X and Y is the probability distribution of X and Y together. ...


In the continuous case, we replace summation by a definite double integral: Look up Continuum in Wiktionary, the free dictionary. ... In mathematical analysis, there is a serious distinction between a double integral and an iterated integral. ...

I(X;Y) = int_Y int_X p(x,y) log frac{p(x,y)}{f(x),g(y)} ; dx ,dy, !

where p is now the joint probability density function of X and Y, and f and g are the marginal probability density functions of X and Y respectively.


Mutual information is a measure of independence in the following sense: I(X; Y) = 0 iff X and Y are independent random variables. This is easy to see in one direction: if X and Y are independent, then p(x,y) = f(x) × g(y), and therefore: ↔ ⇔ ≡ For other possible meanings of iff, see IFF. In mathematics, philosophy, logic and technical fields that depend on them, iff is used as an abbreviation for if and only if. Common alternative phrases to iff or if and only if include Q is necessary and sufficient for P and P...

log frac{p(x,y)}{f(x),g(y)} = log 1 = 0. !

Moreover, mutual information is nonnegative (i.e. I(X;Y) ≥ 0; see below) and symmetric (i.e. I(X;Y) = I(Y;X)). Square with symmetry group D4 Symmetry is a characteristic of geometrical shapes, equations, and other objects; we say that such an object is symmetric with respect to a given operation if this operation, when applied to the object, does not appear to change it. ...


Several generalizations of mutual information to more than two random variables have been proposed, but a widely agreed on definition has not yet emerged.


Relation to other quantities

Mutual information can be equivalently expressed as

I(X;Y) = H(X) - H(X|Y) ,
= H(Y) - H(Y|X) ,
= H(X) + H(Y) - H(X,Y) ,

where H(X) and H(Y) are entropies, H(X|Y) and H(Y|X) are conditional entropies, and H(Y,X) is the joint entropy of X and Y. Since H(X) ≥ H(X|Y), this characterization is consistent with the nonnegativity property stated above. Entropy of a Bernoulli trial as a function of success probability. ... The conditional entropy is an entropy measure used in information theory. ... The joint entropy is an entropy measure used in information theory. ...


Note that H(X|X) = 0 and therefore H(X) = I(X;X). This is the reason why entropy is often called self-information. Thus I(X;X) ≥ I(X;Y), and one can formulate the basic principle that a variable contains more information about itself than any other variable can provide. Entropy of a Bernoulli trial as a function of success probability. ...


Mutual information can also be expressed as a Kullback-Leibler divergence, of the product p(x) × p(y) of the marginal distributions of the two random variables X and Y, from p(x,y) the random variables' joint distribution: In probability theory and information theory, the Kullback-Leibler divergence, or relative entropy, or information divergence is a natural distance measure from a true probability distribution P to arbitrary probability distribution Q. Typically P represents data, observations, or a precise calculated probability distribution. ... In probability theory, given two jointly distributed random variables X and Y, the marginal distribution of X is simply the probability distribution of X ignoring information about Y, typically calculated by summing or integrating the joint probability distribution over Y. For discrete random variables, the marginal probability mass function can... Given two random variables X and Y, the joint probability distribution of X and Y is the probability distribution of X and Y together. ...

I(X;Y) = D_{mathrm{KL}}(p(x,y)|p(x)p(y)).

Furthermore, let p(x|y) = p(x, y) / p(y). Then

I(X;Y) = sum_y p(y) sum_x p(x|y) log_2 frac{p(x|y)}{p(x)} !
= sum_y p(y) ; D_{mathrm{KL}}(p(x|y)|p(x)) !
= mathbb{E}_Y{D_{mathrm{KL}}(p(x|y)|p(x))}. !

Thus mutual information can thus also be understood as the expectation of the Kullback-Leibler divergence of the univariate distribution p(x) of X from the conditional distribution p(x|y) of X given Y: the more different the distributions p(x|y) and p(x), the greater the information gain. In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical... Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written Y | X) is the probability distribution of Y when X is known to be a particular value. ...


Applications of mutual information

In many applications, one wants to maximize mutual information (thus increasing dependencies), which is often equivalent to minimizing conditional entropy. Examples include:

  • Discriminative training procedures for hidden Markov models have been proposed based on the maximum mutual information (MMI) criterion.
  • Mutual information is used in medical imaging for image registration. Given a reference image (for example, a brain scan), and a second image which needs to be put the same coordinate system as the reference image, this image is deformed until the mutual information between it and the reference image is maximized.

A hidden Markov model (HMM) is a statistical model where the system being modelled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters, from the observable parameters, based on this assumption. ... Feature selection, also known as subset selection or variable selection, is a process commonly used in machine learning, wherein a subset of the features available from the data are selected for application of a learning algorithm. ... ... This article is in need of attention. ... Corpus linguistics is the study of language as expressed in samples (corpora) or real world text. ... Medical imaging is the process by which physicians evaluate an area of the subjects body that is not normally visible. ... In computer vision, sets of data acquired by sampling the same scene or object at different times, or from different perspectives, will be in different coordinate systems. ... See Cartesian coordinate system or Coordinates (elementary mathematics) for a more elementary introduction to this topic. ...

References

  • Athanasios Papoulis. Probability, Random Variables, and Stochastic Processes, second edition. New York: McGraw-Hill, 1984. (See Chapter 15.)
  • Kenneth Ward Church and Patrick Hanks. Word association norms, mutual information, and lexicography, Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 1989.

  Results from FactBites:
 
Mutual information - Wikipedia, the free encyclopedia (659 words)
In probability theory and, in particular, information theory, the mutual information, or transinformation, of two random variables is a quantity that measures the mutual dependence of the two variables.
Mutual information is often used as a significance function for the computation of collocations in corpus linguistics.
Mutual information is used in medical imaging for image registration.
Information theory - Wikipedia, the free encyclopedia (3938 words)
Information theory in the 1950s was sometimes classified as a branch of the then voguish field called "cybernetics", which included many aspects of potential machine representation of the world; it is a broad and deep mathematical theory, with equally broad and deep applications, chief among them coding theory.
In the case of information theory, the choice of logarithmic base merely determines the unit that is used to measure information.
The mutual information has the distinction of being the only one of Shannon's basic measures of information that remains fundamentally meaningful in the continuous case, in the sense of remaining invariant under linear transformation, and furthermore it is non-negative.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.