FACTOID # 43: Japanese and South Korean kids are the best in the world at science and maths.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Maximum a posteriori

In statistics, the method of maximum a posteriori (MAP, or posterior mode) estimation can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to Fisher's method of maximum likelihood (ML), but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation. A graph of a Normal bell curve showing statistics used in educational assessment and comparing various grading methods. ... Estimation theory is a branch of statistics and signal processing that deals with estimating the values of parameters based on measured/empirical data. ... In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a best guess for an unknown (fixed or random) population parameter. ... Sir Ronald Aylmer Fisher, FRS (17 February 1890 – 29 July 1962) was a British statistician, evolutionary biologist, and geneticist. ... Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. ... A prior probability is a marginal probability, interpreted as a description of what is known about a variable in the absence of some evidence. ... In mathematics, inverse problems are often ill-posed. ...


Introduction

Assume that we want to estimate an unobserved population parameter θ on the basis of observations x. Let f be the sampling distribution of x, so that f(x | θ) is the probability of x when the underlying population parameter is θ. Then the function In statistics, a sampling distribution is the probability distribution, under repeated sampling of the population, of a given statistic (a numerical quantity calculated from the data values in a sample). ...

theta mapsto f(x | theta) !

is known as the likelihood function and the estimate Look up likelihood in Wiktionary, the free dictionary. ...

hat{theta}_{mathrm{ML}}(x) = argmax_{theta} f(x | theta) !

as the maximum likelihood estimate of θ.


Now assume that a prior distribution g over θ exists. This allows us to treat θ as a random variable as in Bayesian statistics. Then the posterior distribution of θ is as follows: A random variable is a mathematical function that maps outcomes of random experiments to numbers. ... Bayesian inference is statistical inference in which probabilities are interpreted not as frequencies or proportions or the like, but rather as degrees of belief. ... In Bayesian probability theory, the posterior probability is the conditional probability of some event or proposition, taking empirical data into account. ...

theta mapsto frac{f(x | theta) , g(theta)}{int_{Theta} f(x | theta') , g(theta') , dtheta'} !

where g is density function of θ, Θ is the domain of g. This is a straightforward application of Bayes' theorem. Bayes theorem (also known as Bayes rule or Bayes law) is a result in probability theory, which relates the conditional and marginal probability distributions of random variables. ...


The method of maximum a posteriori estimation then estimates θ as the mode of the posterior distribution of this random variable: In, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ...

hat{theta}_{mathrm{MAP}}(x) = argmax_{theta} frac{f(x | theta) , g(theta)} {int_{Theta} f(x | theta') , g(theta') , dtheta'} = argmax_{theta} f(x | theta) , g(theta) !

The denominator of the posterior distribution does not depend on θ and therefore plays no role in the optimization. Observe that the MAP estimate of θ coincides with the ML estimate when the prior g is uniform (that is, a constant function). In mathematics a constant function is a function whose values do not vary and thus are constant. ...


MAP estimates can be computed in several ways:

  1. Analytically, when the mode(s) of the posterior distribution can be given in closed form. This is the case when conjugate priors are used.
  2. Via numerical optimization such as the conjugate gradient method or Newton's method. This usually requires first or second derivatives, which have to be evaluated analytically or numerically.
  3. Via a modification of an expectation-maximization algorithm. This does not require derivatives of the posterior density.

While MAP estimation shares the use of a prior distribution with Bayesian statistics, it is not generally seen as a Bayesian method. This is because MAP estimates are point estimates, whereas Bayesian methods are characterized by the use of distributions to summarize data and draw inferences. Bayesian methods tend to report the posterior mean or median together with posterior intervals, rather than the posterior mode. This is especially so when the posterior distribution does not have a simple analytic form: in this case, the posterior distribution can be simulated using Markov chain Monte Carlo techniques, while optimization to find its mode(s) may be difficult or impossible. In mathematics, an equation or system of equations is said to have a closed-form solution just in case a solution can be expressed analytically in terms of a bounded number of well_known operations. ... In Bayesian probability theory, a class of prior probability distributions p(θ) is said to be conjugate to a class of likelihood functions p(x|θ) if the resulting posterior distributions p(θ|x) are in the same family as p(θ). For example, the Gaussian family is conjugate to itself (or self-conjugate... Numerical analysis is the study of approximate methods for the problems of continuous mathematics (as distinguished from discrete mathematics). ... In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. ... In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is symmetric and positive definite. ... In mathematics, Newtons method is a well-known algorithm for finding roots of equations in one or more dimensions. ... For a non-technical overview of the subject, see Calculus. ... In statistical computing, an expectation-maximization (EM) algorithm is an algorithm for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. ... In statistics, mean has two related meanings: Look up mean in Wiktionary, the free dictionary. ... In probability theory and statistics, a median is a number dividing the higher half of a sample, a population, or a probability distribution from the lower half. ... In Bayesian statistics, a credible interval is a posterior probability interval, used for purposes similar to those of confidence intervals in frequentist statistics. ... Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods) are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution. ...


Example

Suppose that we are given a sequence (x_1, dots, x_n) of IID N(mu,sigma_v^2 ) random variables and an a priori distribution of μ is given by N(0,sigma_m^2 ). We wish to find the MAP estimate of μ. A random variable is a mathematical function that maps outcomes of random experiments to numbers. ...


The function to be maximized is then given by

pi(mu) L(mu) = frac{1}{sqrt{2 pi sigma_m}} expleft(-frac{1}{2} left(frac{mu}{sigma_m}right)^2right) prod_{j=1}^n frac{1}{sqrt{2 pi sigma_v}} expleft(-frac{1}{2} left(frac{x_j - mu}{sigma_v}right)^2right),

which is equivalent to minimizing μ in the following

sum_{j=1}^n left(frac{x_j - mu}{sigma_v}right)^2 + left(frac{mu}{sigma_m}right)^2.

Thus, we see that the MAP estimator for μ is given by

hat{mu}_{MAP} = frac{sigma_m^2}{n sigma_m^2 + sigma_v^2 } sum_{j=1}^n x_j.

The case of sigma_m to infty is called a non-informative prior and leads to an ill-defined a priori probability distribution; in this case hat{mu}_{MAP} to hat{mu}_{MLE}.


References

  • M. DeGroot, Optimal Statistical Decisions, McGraw-Hill, (1970).
  • Harold W. Sorenson, (1980) "Parameter Estimation: Principles and Problems", Marcel Dekker.

  Results from FactBites:
 
Maximum a posteriori - Wikipedia, the free encyclopedia (440 words)
In statistics, the method of maximum a posteriori (MAP, or posterior mode) estimation can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data.
While MAP estimation shares the use of a prior distribution with Bayesian statistics, it is not generally seen as a Bayesian method.
This is because MAP estimates are point estimates, whereas Bayesian methods are characterized by the use of distributions to summarize data and draw inferences.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.