FACTOID # 140: In Switzerland, the average person has to work for 102 minutes to buy a kilogram of beef - one of the longest times in the developed world. On the other hand, they only have work 14 hours to buy a refrigerator for it.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Fisher information

In statistics and information theory, the Fisher information (denoted mathcal{I}(theta)) is the variance of the score. It is named in honor of its inventor, the statistician R.A. Fisher. A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ... To meet Wikipedias quality standards, this article or section may require cleanup. ... In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... In statistics, the score is the derivative, with respect to some parameter θ, of the logarithm of the likelihood function. ... For Wikipedia statistics, see m:Statistics Statistics is the science and practice of developing human knowledge through the use of empirical data expressed in quantitative form. ... Sir Ronald Aylmer Fisher, FRS (17 February 1890 – 29 July 1962) was a English statistician, evolutionary biologist, and geneticist. ...

Contents


Definition

The Fisher information is the amount of information that an observable random variable X carries about an unobservable parameter θ upon which the likelihood function of X, L(θ) = f(X; θ), depends. The likelihood function is the joint probability of the data, the Xs, conditional on the value of θ, as a function of θ. Since the expectation of the score is zero, the variance is simply the second moment of the score, the derivative of the log of the likelihood function with respect to θ. Hence the Fisher information can be written Information as a concept bears a diversity of meanings, from everyday usage to technical settings. ... A random variable is a mathematical function that maps outcomes of random experiments to numbers. ... The factual accuracy of this article is disputed. ... In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ... In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical... In statistics, the score is the derivative, with respect to some parameter θ, of the logarithm of the likelihood function. ... In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... -1... The natural logarithm, formerly known as the hyperbolic logarithm, is the logarithm to the base e, where e is equal to 2. ... In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ...

mathcal{I}(theta) = mathrm{E} left{left. left[ frac{partial}{partialtheta} ln f(X;theta) right]^2 right|thetaright},

which implies 0 leq mathcal{I}(theta) < infty. The Fisher information is thus the expectation of the squared score. A random variable carrying high Fisher information implies that the absolute value of the score is often high.


The Fisher information is not a function of a particular observation, as the random variable X has been averaged out. The concept of information is useful when comparing two methods of observing a given random process.


If the following regularity condition is met:

int frac{partial^2}{partial theta^2}f(X ; theta ) , dx = 0,

then the Fisher information may also be written as:

mathcal{I}(theta) = - mathrm{E} left[ frac{partial^2}{partialtheta^2} ln f(X;theta) right].

Thus Fisher information is the negative of the expectation of the second derivative of the log of f with respect to θ. Information may thus be seen to be a measure of the "sharpness" of the support curve near the maximum likelihood estimate of θ. A "blunt" support curve (one with a shallow maximum) would have low expected second derivative, and thus low information; while a sharp one would have a high expected second derivative and thus high information. In mathematics, the derivative is defined as the instantaneous rate of change of a function. ... The natural logarithm, formerly known as the hyperbolic logarithm, is the logarithm to the base e, where e is equal to 2. ... Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set. ...


Information is additive, in that the information yielded by two independent experiments is the sum of the information from each experiment separately:

mathcal{I}_{X,Y}(theta) = mathcal{I}_X(theta) + mathcal{I}_Y(theta).

This result follows from the elementary fact that if random variables are independent, the variance of their sum is the sum of their variances. Hence the information in a random sample of size n is n times that in a sample of size 1 (if observations are independent).


The information provided by a sufficient statistic is the same as that of the sample X. This may be seen by using Fisher's factorization criterion for a sufficient statistic. If T(X) is sufficient for θ, then In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...

f(X;θ) = g(T(X),θ)h(X)

for some functions g and h. See sufficient statistic for a more detailed explanation. The equality of information then follows from the following fact: In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...

frac{partial}{partialtheta} ln left[f(X  ;theta)right] = frac{partial}{partialtheta} ln left[g(T(X);theta)right]

which follows from the definition of Fisher information, and the independence of h(X) from θ. More generally, if T = t(X) is a statistic, then A statistic (singular) is the result of applying a statistical algorithm to a set of data. ...

mathcal{I}_T(theta) leq mathcal{I}_X(theta)

with equality if and only if T is a sufficient statistic. IFF, Iff or iff can stand for: Interchange File Format - a computer file format introduced by Electronic Arts Identification, friend or foe - a radio based identification system utilizing transponders iff - the mathematics concept if and only if International Flavors and Fragrances - a company producing flavors and fragrances International Freedom Foundation... In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...


The Cramér-Rao inequality states that the reciprocal of the Fisher information is an asymptotic lower bound on the variance of any unbiased estimator of θ. In statistics, the Cramér-Rao inequality, named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, expresses an upper bound on the precision of a statistical estimator, based on Fisher information. ... In statistics, a biased estimator is one that for some reason on average over- or underestimates what is being estimated. ...


Single-parameter Bernoulli experiment

A Bernoulli trial is a random variable with two possible outcomes, "success" and "failure", with "success" having a probability of θ. The outcome can be thought of as determined by a coin toss, with the probability of obtaining a "head" being θ and the probability of obtaining a "tail" being 1 - θ. In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...


The Fisher information contained in n independent Bernoulli trials may be calculated as follows. In the following, A represents the number of successes, B the number of failures, and n = A + B is the total number of trials. In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...

mathcal{I}(theta) = -mathrm{E} left[ frac{partial^2}{partialtheta^2} ln(f(A;theta)) right] qquad (1)
= -mathrm{E} left[ frac{partial^2}{partialtheta^2} ln left[ theta^A(1-theta)^Bfrac{(A+B)!}{A!B!} right] right] qquad (2)
= -mathrm{E} left[ frac{partial^2}{partialtheta^2} left[ A ln (theta) + B ln(1-theta) right] right] qquad (3)
= -mathrm{E} left[ frac{partial}{partialtheta} left[ frac{A}{theta} - frac{B}{1-theta} right] right] (on differentiating ln x, see logarithm) qquad (4)
= +mathrm{E} left[ frac{A}{theta^2} + frac{B}{(1-theta)^2} right] qquad (5)
= frac{ntheta}{theta^2} + frac{n(1-theta)}{(1-theta)^2} (as the expected value of A = nθ, etc.) (6)
= frac{n}{theta(1-theta)} qquad (7)

(1) defines Fisher information. (2) invokes the fact that the information in a sufficient statistic is the same as that of the sample itself. (3) expands the log term and drops a constant. (4) and (5) differentiate with respect to θ. (6) replaces A and B with their expectations. (7) is algebra. Logarithms to various bases: is to base e, is to base 10, and is to base 1. ... In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ... Logarithms to various bases: is to base e, is to base 10, and is to base 1. ...


The end result, namely,

mathcal{I}(theta) = frac{n}{theta(1-theta)},

is the reciprocal of the variance of the mean number of successes in n Bernoulli trials, as expected (see last sentence of the preceding section). In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...


Matrix form

When there are N parameters, so that θ is a Nx1 vector theta = begin{bmatrix} theta_{1}, theta_{2}, cdots , theta_{N} end{bmatrix},, then the Fisher information takes the form of an NxN matrix, the Fisher information matrix (FIM), with typical element: In physics and in vector calculus, a spatial vector is a concept characterized by a magnitude, which is a scalar, and a direction (which can be defined in a 3-dimensional space by the Euler angles). ... In mathematics, a matrix (plural matrices) is a rectangular table of numbers or, more generally, a table consisting of abstract quantities that can be added and multiplied. ...

{left(mathcal{I} left(theta right) right)}_{i, j} = mathrm{E} left[ frac{partial}{partialtheta_i} ln f(X;theta) frac{partial}{partialtheta_j} ln f(X;theta) right].

The FIM is a NxN positive definite symmetric matrix, defining a metric on the N-dimensional parameter space. Exploring this topic requires differential geometry. In linear algebra, a positive-definite matrix is a Hermitian matrix which in many ways is analogous to a positive real number. ... In linear algebra, a symmetric matrix is a matrix that is its own transpose. ... In mathematics, in information geometry, the Fisher information metric is a metric tensor for a statistical differential manifold. ... :For other senses of this word, see dimension (disambiguation). ... In statistics one can study the distribution of a random variable. ... In mathematics, differential topology is the field dealing with differentiable functions on differentiable manifolds. ...


Multivariate normal distribution

The FIM for a N-variate multivariate normal distribution has a special form. Let mu(theta) = begin{bmatrix} mu_{1}(theta), mu_{2}(theta), cdots , mu_{N}(theta) end{bmatrix}, and let Σ(θ) be the covariance matrix of μ(θ). Then the typical element mathcal{I}_{m,n}, 0≤m,n<N, of the FIM for X sim N(mu(theta), Sigma(theta)) is: In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ... In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. ...

mathcal{I}_{m,n} = frac{partial mu}{partial theta_m} Sigma^{-1} frac{partial mu^top}{partial theta_n} + frac{1}{2} mathrm{tr} left( Sigma^{-1} frac{partial Sigma}{partial theta_m} Sigma^{-1} frac{partial Sigma}{partial theta_n} right),

where (..)^top denotes the transpose of a vector, tr(..) denotes the trace of a square matrix, and: In mathematics, and in particular linear algebra, the transpose of a matrix is another matrix, produced by turning rows into columns and vice versa. ... In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A, i. ... For the square matrix section, see square matrix. ...

  • frac{partial mu}{partial theta_m} = begin{bmatrix} frac{partial mu_1}{partial theta_m} & frac{partial mu_2}{partial theta_m} & cdots & frac{partial mu_N}{partial theta_m} & end{bmatrix};
  • frac{partial Sigma}{partial theta_m} = begin{bmatrix} frac{partial Sigma_{1,1}}{partial theta_m} & frac{partial Sigma_{1,2}}{partial theta_m} & cdots & frac{partial Sigma_{1,N}}{partial theta_m}   frac{partial Sigma_{2,1}}{partial theta_m} & frac{partial Sigma_{2,2}}{partial theta_m} & cdots & frac{partial Sigma_{2,N}}{partial theta_m}   vdots & vdots & ddots & vdots   frac{partial Sigma_{N,1}}{partial theta_m} & frac{partial Sigma_{N,2}}{partial theta_m} & cdots & frac{partial Sigma_{N,N}}{partial theta_m} end{bmatrix}.

See also


Other measures employed in information theory: In statistics, the Cramér-Rao inequality, named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, expresses an upper bound on the precision of a statistical estimator, based on Fisher information. ... There are very few or no other articles that link to this one. ... To meet Wikipedias quality standards, this article or section may require cleanup. ...


  Results from FactBites:
 
Fisher Information (687 words)
The book "Science from Fisher Information" is the main reference for the approach (see reference [1b] in list at end).
Unfortunately, there is a downside to information, and that is misinformation.
The latter three fields are usually regarded as exterior to physics, hence the name change of the book to "Science from Fisher Information." That is, most generally, science follows from the use of Fisher information.
Fisher information - Wikipedia, the free encyclopedia (927 words)
The Fisher information is thus the expectation of the square of the score.
Thus Fisher information is the expectation of the second derivative of the log of f with respect to θ.
Information may thus be seen to be a measure of the "sharpness" of the support curve near the maximum likelihood estimate of θ.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.