|
In statistics and information theory, the Fisher information (denoted ) is the variance of the score. It is named in honor of its inventor, the statistician R.A. Fisher. A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...
In statistics, the score is the derivative, with respect to some parameter θ, of the logarithm of the likelihood function. ...
For Wikipedia statistics, see m:Statistics Statistics is the science and practice of developing human knowledge through the use of empirical data expressed in quantitative form. ...
Sir Ronald Aylmer Fisher, FRS (17 February 1890 â 29 July 1962) was a English statistician, evolutionary biologist, and geneticist. ...
Definition
The Fisher information is the amount of information that an observable random variable X carries about an unobservable parameter θ upon which the likelihood function of X, L(θ) = f(X; θ), depends. The likelihood function is the joint probability of the data, the Xs, conditional on the value of θ, as a function of θ. Since the expectation of the score is zero, the variance is simply the second moment of the score, the derivative of the log of the likelihood function with respect to θ. Hence the Fisher information can be written Information as a concept bears a diversity of meanings, from everyday usage to technical settings. ...
A random variable is a mathematical function that maps outcomes of random experiments to numbers. ...
The factual accuracy of this article is disputed. ...
In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ...
In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical...
In statistics, the score is the derivative, with respect to some parameter θ, of the logarithm of the likelihood function. ...
In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...
-1...
The natural logarithm, formerly known as the hyperbolic logarithm, is the logarithm to the base e, where e is equal to 2. ...
In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ...
![mathcal{I}(theta) = mathrm{E} left{left. left[ frac{partial}{partialtheta} ln f(X;theta) right]^2 right|thetaright},](http://upload.wikimedia.org/math/4/7/8/478ebe791630c75766e38428f91a854b.png) which implies . The Fisher information is thus the expectation of the squared score. A random variable carrying high Fisher information implies that the absolute value of the score is often high. The Fisher information is not a function of a particular observation, as the random variable X has been averaged out. The concept of information is useful when comparing two methods of observing a given random process. If the following regularity condition is met:  then the Fisher information may also be written as: ![mathcal{I}(theta) = - mathrm{E} left[ frac{partial^2}{partialtheta^2} ln f(X;theta) right].](http://upload.wikimedia.org/math/e/3/c/e3cb66d1426cc67bfdf8c8c10410cf1b.png) Thus Fisher information is the negative of the expectation of the second derivative of the log of f with respect to θ. Information may thus be seen to be a measure of the "sharpness" of the support curve near the maximum likelihood estimate of θ. A "blunt" support curve (one with a shallow maximum) would have low expected second derivative, and thus low information; while a sharp one would have a high expected second derivative and thus high information. In mathematics, the derivative is defined as the instantaneous rate of change of a function. ...
The natural logarithm, formerly known as the hyperbolic logarithm, is the logarithm to the base e, where e is equal to 2. ...
Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set. ...
Information is additive, in that the information yielded by two independent experiments is the sum of the information from each experiment separately:  This result follows from the elementary fact that if random variables are independent, the variance of their sum is the sum of their variances. Hence the information in a random sample of size n is n times that in a sample of size 1 (if observations are independent). The information provided by a sufficient statistic is the same as that of the sample X. This may be seen by using Fisher's factorization criterion for a sufficient statistic. If T(X) is sufficient for θ, then In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...
- f(X;θ) = g(T(X),θ)h(X)
for some functions g and h. See sufficient statistic for a more detailed explanation. The equality of information then follows from the following fact: In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...
![frac{partial}{partialtheta} ln left[f(X ;theta)right] = frac{partial}{partialtheta} ln left[g(T(X);theta)right]](http://upload.wikimedia.org/math/4/1/6/4168cce91885a68c08aa7bdd2f43705a.png) which follows from the definition of Fisher information, and the independence of h(X) from θ. More generally, if T = t(X) is a statistic, then A statistic (singular) is the result of applying a statistical algorithm to a set of data. ...
 with equality if and only if T is a sufficient statistic. IFF, Iff or iff can stand for: Interchange File Format - a computer file format introduced by Electronic Arts Identification, friend or foe - a radio based identification system utilizing transponders iff - the mathematics concept if and only if International Flavors and Fragrances - a company producing flavors and fragrances International Freedom Foundation...
In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...
The Cramér-Rao inequality states that the reciprocal of the Fisher information is an asymptotic lower bound on the variance of any unbiased estimator of θ. In statistics, the Cramér-Rao inequality, named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, expresses an upper bound on the precision of a statistical estimator, based on Fisher information. ...
In statistics, a biased estimator is one that for some reason on average over- or underestimates what is being estimated. ...
Single-parameter Bernoulli experiment A Bernoulli trial is a random variable with two possible outcomes, "success" and "failure", with "success" having a probability of θ. The outcome can be thought of as determined by a coin toss, with the probability of obtaining a "head" being θ and the probability of obtaining a "tail" being 1 - θ. In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...
The Fisher information contained in n independent Bernoulli trials may be calculated as follows. In the following, A represents the number of successes, B the number of failures, and n = A + B is the total number of trials. In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...
![mathcal{I}(theta) = -mathrm{E} left[ frac{partial^2}{partialtheta^2} ln(f(A;theta)) right] qquad (1)](http://upload.wikimedia.org/math/0/1/3/013e101234340d49bc784337b7b2263f.png) -
![= -mathrm{E} left[ frac{partial^2}{partialtheta^2} ln left[ theta^A(1-theta)^Bfrac{(A+B)!}{A!B!} right] right] qquad (2)](http://upload.wikimedia.org/math/5/c/3/5c3c2854672be5f876ca59b0a87c9f96.png) -
![= -mathrm{E} left[ frac{partial^2}{partialtheta^2} left[ A ln (theta) + B ln(1-theta) right] right] qquad (3)](http://upload.wikimedia.org/math/0/2/b/02b0541978a5b89451376f62adadf0c0.png) -
(on differentiating ln x, see logarithm)  -
![= +mathrm{E} left[ frac{A}{theta^2} + frac{B}{(1-theta)^2} right] qquad (5)](http://upload.wikimedia.org/math/4/f/b/4fb8c1231df29e42e25d9e9d930a33a3.png) -
(as the expected value of A = nθ, etc.) (6) -
 (1) defines Fisher information. (2) invokes the fact that the information in a sufficient statistic is the same as that of the sample itself. (3) expands the log term and drops a constant. (4) and (5) differentiate with respect to θ. (6) replaces A and B with their expectations. (7) is algebra. Logarithms to various bases: is to base e, is to base 10, and is to base 1. ...
In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. ...
Logarithms to various bases: is to base e, is to base 10, and is to base 1. ...
The end result, namely,  is the reciprocal of the variance of the mean number of successes in n Bernoulli trials, as expected (see last sentence of the preceding section). In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...
In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...
Matrix form When there are N parameters, so that θ is a Nx1 vector , then the Fisher information takes the form of an NxN matrix, the Fisher information matrix (FIM), with typical element: In physics and in vector calculus, a spatial vector is a concept characterized by a magnitude, which is a scalar, and a direction (which can be defined in a 3-dimensional space by the Euler angles). ...
In mathematics, a matrix (plural matrices) is a rectangular table of numbers or, more generally, a table consisting of abstract quantities that can be added and multiplied. ...
![{left(mathcal{I} left(theta right) right)}_{i, j} = mathrm{E} left[ frac{partial}{partialtheta_i} ln f(X;theta) frac{partial}{partialtheta_j} ln f(X;theta) right].](http://upload.wikimedia.org/math/9/b/7/9b77d921ae5c8b23daba8138bed54ad1.png) The FIM is a NxN positive definite symmetric matrix, defining a metric on the N-dimensional parameter space. Exploring this topic requires differential geometry. In linear algebra, a positive-definite matrix is a Hermitian matrix which in many ways is analogous to a positive real number. ...
In linear algebra, a symmetric matrix is a matrix that is its own transpose. ...
In mathematics, in information geometry, the Fisher information metric is a metric tensor for a statistical differential manifold. ...
:For other senses of this word, see dimension (disambiguation). ...
In statistics one can study the distribution of a random variable. ...
In mathematics, differential topology is the field dealing with differentiable functions on differentiable manifolds. ...
Multivariate normal distribution The FIM for a N-variate multivariate normal distribution has a special form. Let and let Σ(θ) be the covariance matrix of μ(θ). Then the typical element , 0≤m,n<N, of the FIM for is: In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution, is a specific probability distribution, which can be thought of as a generalization to higher dimensions of the one-dimensional normal distribution (also called a Gaussian distribution). ...
In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. ...
 where denotes the transpose of a vector, tr(..) denotes the trace of a square matrix, and: In mathematics, and in particular linear algebra, the transpose of a matrix is another matrix, produced by turning rows into columns and vice versa. ...
In linear algebra, the trace of an n-by-n square matrix A is defined to be the sum of the elements on the main diagonal (the diagonal from the upper left to the lower right) of A, i. ...
For the square matrix section, see square matrix. ...
See also Other measures employed in information theory: In statistics, the Cramér-Rao inequality, named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, expresses an upper bound on the precision of a statistical estimator, based on Fisher information. ...
There are very few or no other articles that link to this one. ...
To meet Wikipedias quality standards, this article or section may require cleanup. ...
|