FACTOID # 66: Australians have a huge 380,000 sq m of land per person - and yet 91% live in urban areas.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Differential entropy

Differential entropy (also referred to as continuous entropy) is a concept in information theory which tries to extend the idea of (Shannon) entropy, a measure of average surprisal of a random variable, to continuous probability distributions. A bundle of optical fiber. ... Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function Entropy is a concept in thermodynamics (see thermodynamic entropy), statistical mechanics and information theory. ... Within the context of information theory, self-information is defined as the amount of information that knowledge about (the outcome of) a certain event, adds to someones overall knowledge. ... A random variable is a mathematical function that maps outcomes of random experiments to numbers. ... In mathematics and statistics, a probability distribution, more properly called a probability density, assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. ...

Contents

Definition

Let X be a random variable with a probability density function f whose support is a set mathbb X. The differential entropy h(X) or h(f) is defined as In mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. ... In mathematics, the support of a real-valued function f on a set X is sometimes defined as the subset of X on which f is nonzero. ...

h(X) = -int_mathbb{X} f(x)log f(x),dx.

As with its discrete analog, the units of differential entropy depend on the base of the logarithm, which is usually 2 (i.e., the units are bits). See logarithmic units for logarithms taken in different bases. Related concepts such as joint, conditional differential entropy, and relative entropy are defined in a similar fashion. One must take care in trying to apply properties of discrete entropy to differential entropy, since probability density functions can be greater than 1. For example, Uniform(0,1/2) has differential entropy int_0^frac{1}{2} -2log2,dx=-1. Logarithms to various bases: is to base e, is to base 10, and is to base 1. ... A bit (binary digit) refers to a digit in the binary numeral system, which consists of base 2 digits (ie. ... Logarithmic units are generic mathematical units in which we can express any quantities (physical or mathematical) that are defined as being proportional to values of a logarithm function. ... The joint entropy is an entropy measure used in information theory. ... The conditional entropy is an entropy measure used in information theory. ... In probability theory and information theory, the Kullback-Leibler divergence (or information divergence, or information gain, or relative entropy) is a natural distance measure from a true probability distribution P to an arbitrary probability distribution Q. Typically P represents data, observations, or a precise calculated probability distribution. ... In mathematics, the continuous uniform distributions are probability distributions such that all intervals of the same length are equally probable. ...


The definition of differential entropy above can be obtained by partitioning the range of X into bins of length Δ with associated sample points iΔ within the bins, for X Riemann integrable. This gives a quantized version of X, defined by XΔ = iΔ if iDelta leq X leq (i+1)Delta. Then the entropy of XΔ is Quantized signal Digital signal In digital signal processing, quantization is the process of approximating a continuous range of values (or a very large set of possible discrete values) by a relatively-small set of discrete symbols or integer values. ...

-sum_i f(iDelta)log f(iDelta)Delta - sum f(iDelta)log(Delta)Delta.

The first term approximates the differential entropy, while the second term is approximately − log(Δ). Note that this procedure suggests that the differential entropy of a discrete random variable should be -infty.


Note that the continuous mutual information I(X;Y) has the distinction of retaining its fundamental significance as a measure of discrete information since it is actually the limit of the discrete mutual information of partitions of X and Y as these partitions become finer and finer. Thus it is invariant under quite general transformations of X and Y, and still represents the amount of discrete information that can be transmitted over a channel that admits a continuous space of values. In probability theory and, in particular, information theory, the mutual information, or transinformation, of two random variables is a quantity that measures the mutual dependence of the two variables. ...


Properties of differential entropy

  • For two densities f and g, D(f||g) geq 0 with equality if f = g almost everywhere. Similarly, for two random variables X and Y, I(X;Y) geq 0 and h(X|Y) leq h(X) with equality if and only if X and Y are independent.
  • The chain rule for differential entropy holds as in the discrete case
h(X_1, ldots, X_n) = sum_{i=1}^{n} h(X_i|X_1, ldots, X_i-1) leq sum h(X_i).
  • Differential entropy is translation invariant, ie, h(X + c) = h(X) for a constant c.
  • Differential entropy is in general not invariant under arbitrary invertible maps. In particular, for a constant a, h(aX) = h(X) + log left| a right|. For a vector valued random variable X and a matrix A, h(Amathbf{X}) = h(mathbf{X}) + log(det A).
  • If a random vector mathbf{X} in mathbb{R}^{n} has mean zero and covariance matrix K, h(mathbf{X}) leq frac{1}{2} log[(2pi e)^n det{K}] with equality if and only if X is jointly gaussian.

In measure theory (a branch of mathematical analysis), one says that a property holds almost everywhere if the set of elements for which the property does not hold is a null set, i. ... In probability theory and statistics, the covariance between two real-valued random variables X and Y, with expected values and is defined as: where E is the expected value. ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ...

Example: Exponential distribution

Let X be an exponentially distributed random variable with parameter λ, that is, with probability density function In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. ...

f(x) = lambda e^{-lambda x} mbox{ for } x geq 0.

Its differential entropy is then

h_e(X), =-int_0^infty lambda e^{-lambda x} log (lambda e^{-lambda x}),dx
= -left(int_0^infty (log lambda)lambda e^{-lambda x},dx + int_0^infty (-lambda x) lambda e^{-lambda x},dxright)
= -log lambda int_0^infty f(x),dx + lambda E[X]
= -loglambda + 1,.

Here, he(X) was used rather than h(X) to make it explicit that the logarithm was taken to base e, to simplify the calculation.


Differential entropies for various distributions

In the table below, Gamma(x) = int_0^{infty} e^{-t} t^{x-1} dt (the gamma function), psi(x) = frac{d}{dx} Gamma(x), B(p,q) = Γ(p)Γ(q), and γ is Euler's constant. The Gamma function along part of the real axis In mathematics, the Gamma function extends the factorial function to complex and non integer numbers (it is already defined on the naturals, and has simple poles at the negative integers). ... The Euler-Mascheroni constant is a mathematical constant, used mainly in number theory, and is defined as the limiting difference between the harmonic series and the natural logarithm: Its approximate value is γ ≈ 0. ...

Table of differential entropies.
Distribution Name Probability density function (pdf) Entropy in nats
Uniform f(x) = frac{1}{b-a} for a leq x leq b ln(b - a) ,
Normal f(x) = frac{1}{sqrt{2pisigma^2}} expleft(-frac{(x-mu)^2}{2sigma^2}right) lnleft(sigmasqrt{2,pi,e}right)!
Exponential f(x) = frac{1}{lambda} expleft(-frac{x}{lambda}right) 1 - ln lambda ,
Rayleigh f(x) = frac{x}{b^2} expleft(-frac{x^2}{2b^2}right) 1 + ln frac{beta}{sqrt{2}} + frac{gamma}{2}
Beta f(x) = frac{x^{p-1}(1-x)^{q-1}}{B(p,q)} for 0 leq x leq 1 ln B(p,q) - (p-1)[psi(p) - psi(p + q)] - (q-1)[psi(q) - psi(p + q)] ,
Cauchy f(x) = frac{lambda}{pi} frac{1}{lambda^2 + x^2} ln(4pilambda) ,
Chi f(x) = frac{2}{2^{n/2} sigma^n Gamma(n/2)} x^{n-1} expleft(-frac{x^2}{2sigma^2}right) ln{frac{sigmaGamma(n/2)}{sqrt{2}}} - frac{n-1}{2} psileft(frac{n}{2}right) + frac{n}{2}
Chi-squared f(x) = frac{1}{2^{n/2} sigma^n Gamma(n/2)} x^{frac{n}{2} - 1} expleft(-frac{x}{2sigma^2}right)

ln 2sigma^{2}Gammaleft(frac{n}{2}right) - left(1 - frac{n}{2}right)psileft(frac{n}{2}right) + frac{n}{2} In mathematics, the continuous uniform distributions are probability distributions such that all intervals of the same length are equally probable. ... The normal distribution, also called Gaussian distribution (named after Carl Friedrich Gauss, a German mathematician, although Gauss was not the first to work with it), is a probability distribution of great importance in many fields. ... In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. ... In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution. ... In probability theory and statistics, the beta distribution is a continuous probability distribution with the probability density function (pdf) defined on the interval [0, 1]: where α and β are parameters that must be greater than zero and B is the beta function. ... The Cauchy-Lorentz distribution, named after Augustin Cauchy, is a continuous probability distribution with probability density function where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). ... In probability theory and statistics, the chi distribution is a continuous probability distribution. ... In probability theory and statistics, the chi-square distribution (also chi-squared or χ2  distribution) is one of the theoretical probability distributions most widely used in inferential statistics, i. ...

Erlang f(x) = frac{beta^n}{(n-1)!} x^{n-1} exp(-beta x) (1-n)psi(n) + ln frac{Gamma(n)}{beta} + n
F f(x) = frac{n_1^{frac{n_1}{2}} n_2^{frac{n_2}{2}}}{B(frac{n_1}{2},frac{n_2}{2})} frac{x^{frac{n_1}{2} - 1}}{(n_2 + n_1 x)^{frac{n_1 + n2}{2}}} ln frac{n_1}{n_2} Bleft(frac{n_1}{2},frac{n_2}{2}right) + left(1 - frac{n_1}{2}right) psileft(frac{n_1}{2}right) -

left(1 + frac{n_2}{2}right)psileft(frac{n_2}{2}right) + frac{n_1 + n_2}{2} psileft(frac{n_1 + n_2}{2}right) The Erlang distribution is a continuous probability distribution with wide applicability primarily due to its relation to the exponential and Gamma distributions. ... In probability theory and statistics, the F-distribution is a continuous probability distribution. ...

Gamma f(x) = frac{x^{alpha - 1} exp(-frac{x}{beta})}{beta^alpha Gamma(alpha)} ln(beta Gamma(a)) + (1 - alpha)psi(alpha) + alpha ,
Laplace f(x) = frac{1}{2lambda} exp(-frac{|x - theta|}{lambda}) 1 + ln(2lambda) ,
Logistic f(x) = frac{e^{-x}}{(1 + e^{-x})^2} 2 ,
Lognormal f(x) = frac{1}{sigma x sqrt{2pi}} expleft(-frac{(ln x - m)^2}{2sigma^2}right) m + frac{1}{2} ln(2pi e sigma^2)
Maxwell-Boltzmann f(x) = 4 pi^{-frac{1}{2}} beta^{frac{3}{2}} x^{2} exp(-beta x^2) frac{1}{2} ln frac{pi}{beta} + gamma - 1/2
Generalized normal f(x) = frac{2 beta^{frac{alpha}{2}}}{Gamma(frac{alpha}{2})} x^{alpha - 1} exp(-beta x^2) ln{frac{Gamma(alpha/2)}{2beta^{frac{1}{2}}}} - frac{alpha - 1}{2} psileft(frac{alpha}{2}right) + frac{alpha}{2}
Pareto f(x) = frac{a k^a}{x^{a+1}} ln frac{k}{a} + 1 + frac{1}{a}
Student's t f(x) = frac{(1 + x^2/n)^{-frac{n+1}{2}}}{sqrt{n}B(frac{1}{2},frac{n}{2})} frac{n+1}{2}psileft(frac{n+1}{2}right) - psileft(frac{n}{2}right) + ln sqrt{n} Bleft(frac{1}{2},frac{n}{2}right)
Triangular f(x) = begin{cases} frac{2x}{a} & 0 leq x leq a frac{2(1-x)}{1-a} & a leq x leq 1 end{cases} frac{1}{2} - ln 2
Weibull f(x) = frac{c}{alpha} x^{c-1} expleft(-frac{x^c}{alpha}right) frac{(c-1)gamma}{c} + ln frac{alpha^{1/c}}{c} + 1
Multivariate normal f_X(x_1, dots, x_N) = frac{1} {(2pi)^{N/2} left|Sigmaright|^{1/2}} exp left( -frac{1}{2} ( x - mu)^top Sigma^{-1} (x - mu) right) frac{1}{2}ln{(2pi e)^{N} det(Sigma)}

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions that represents the sum of exponentially distributed random variables. ... In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. ... In probability theory and statistics, the logistic distribution is a continuous probability distribution. ... In probability and statistics, the log-normal distribution is the probability distribution of any random variable whose logarithm is normally distributed (the base of the logarithmic function is immaterial in that loga X is normally distributed if and only if logb X is normally distributed). ... The introduction to this article provides insufficient context for those unfamiliar with the subject matter. ... The Pareto distribution, named after the Italian economist Vilfredo Pareto, is a power law probability distribution found in a large number of real-world situations. ... In probability and statistics, the t-distribution or Students t-distribution is a probability distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small. ... In probability theory and statistics, the triangular distribution is a continuous probability distribution with lower limit a, mode c and upper limit b. ... In probability theory and statistics, the Weibull distribution (named after Waloddi Weibull) is a continuous probability distribution with the probability density function where and is the shape parameter and is the scale parameter of the distribution. ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ...

See also

Entropy of a Bernoulli trial as a function of success probability, often called the binary entropy function Entropy is a concept in thermodynamics (see thermodynamic entropy), statistical mechanics and information theory. ... A bundle of optical fiber. ... Within the context of information theory, self-information is defined as the amount of information that knowledge about (the outcome of) a certain event, adds to someones overall knowledge. ... In probability theory and information theory, the Kullback-Leibler divergence (or information divergence, or information gain, or relative entropy) is a natural distance measure from a true probability distribution P to an arbitrary probability distribution Q. Typically P represents data, observations, or a precise calculated probability distribution. ...

References

  • Thomas M. Cover, Joy A. Thomas. Elements of Information Theory New York: Wiley, 1991. ISBN 0-471-06259-6
  • Lazo, A. and P. Rathie. On the entropy of continuous probability distributions Information Theory, IEEE Transactions on, 1978. 24(1): p. 120-122.

External links

Differential entropy on PlanetMath PlanetMath is a free, collaborative, online mathematics encyclopedia. ...


  Results from FactBites:
 
PlanetMath: Shannon's entropy (476 words)
Entropy is maximized when the probability distribution is uniform.
Entropy in the continuous case is called differential entropy.
This is version 20 of Shannon's entropy, born on 2001-11-19, modified 2004-08-07.
Entropy Explained (3221 words)
Entropy as defined by mechanics is exactly the same thing as thermal entropy, but described from a different perspective, one that is different from logical entropy.
The inevitable increase in thermal entropy could be described mechanically as an increase in logical entropy ("disorder"), but only of the wasted energy in the system, the energy that is lost and can never be recovered to do work.
Thermal entropy is also a logarithm, but not of the arrangement of bits, but of the distribution of energy among those bits.
  More results at FactBites »


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m