FACTOID # 65: In the 1990's, nearly half of all arms exported to developing countries came from the United States of America.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS   

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Likelihood principle

In statistics, the likelihood principle is a controversial principle of statistical inference which asserts that all of the information in a sample is contained in the likelihood function. Template:Otherusescccc A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ... It has been suggested that this article or section be merged with inferential statistics. ... The ASCII codes for the word Wikipedia represented in binary, the numeral system most commonly used for encoding computer information. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ... In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ...


A likelihood function arises from a conditional probability distribution considered as a function of its second argument, holding the first fixed. For example, consider a model which gives the probability density function of observable random variable X as a function of a parameter θ. Then for a specific value x of X, the function L(θ | x) = P(X=x | θ) is a likelihood function of θ: it gives a measure of how "likely" any particular value of θ is, if we know that X has the value x. Two likelihood functions are equivalent if one is a scalar multiple of the other; according to the likelihood principle, all information from the data relevant to inferences about the value of θ is found in the equivalence class. In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ... This article defines some terms which characterize probability distributions of two or more variables. ... In mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. ... A random variable is a mathematical function that maps outcomes of random experiments to numbers. ...

Contents

Example

Suppose

  • X is the number of successes in twelve independent Bernoulli trials with probability θ of success on each trial, and
  • Y is the number of independent Bernoulli trials needed to get three successes, again with probability θ of success on each trial.

Then the observation that X = 3 induces the likelihood function In the theory of probability and statistics, a Bernoulli trial is an experiment whose outcome is random and can be either of two possible outcomes, called success and failure. ...

L(theta|X=3)=begin{pmatrix}123end{pmatrix};theta^3(1-theta)^9=220;theta^3(1-theta)^9

and the observation that Y = 12 induces the likelihood function

L(theta|Y=12)=begin{pmatrix}112end{pmatrix};theta^3(1-theta)^9=55;theta^3(1-theta)^9.

These are equivalent because each is a scalar multiple of the other. The likelihood principle therefore says the inferences drawn about the value of θ should be the same in both cases.


The difference between observing X = 3 and observing Y = 12 is only in the design of the experiment: in one case, one has decided in advance to try twelve times; in the other, to keep trying until three successes are observed. The outcome is the same in both cases. Therefore the likelihood principle is sometimes stated by saying: The first statistician to consider a methodology for the design of experiments was Sir Ronald A. Fisher. ...

The inference should depend only on the outcome of the experiment, and not on the design of the experiment.

The law of likelihood

A related concept is the law of likelihood, the notion that the extent to which the evidence supports one parameter value or hypothesis against another is equal to the ratio of their likelihoods. That is,

Lambda = {L(a|X=x) over L(b|X=x)} = {P(X=x|a) over P(X=x|b)}

is the degree to which the observation x supports parameter value or hypothesis a against b. If this ratio is 1, the evidence is indifferent, and if greater or less than 1, the evidence supports a against b or vice versa. The use of Bayes factors can extend this by taking account of the complexity of different hypotheses. In statistics, the use of Bayes factors is a Bayesian alternative to classical hypothesis testing. ...


Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence. This is the basis for the widely-used method of maximum likelihood. Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. ...


Historical remarks

The likelihood principle was first identified by that name in print in 1962 (Barnard et al., Birnbaum, and Savage et al.), but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of R.A. Fisher in the 1920s. The law of likelihood was identified by that name by I. Hacking (1965). More recently the likelihood principle as a general principle of inference has been championed by A. W. F. Edwards. The likelihood principle has been applied to the philosophy of science by R. Royall. Sir Ronald Fisher Sir Ronald Aylmer Fisher, FRS (February 17, 1890–July 29, 1962) was an extraordinarily talented evolutionary biologist, geneticist and statistician. ... Year 1920 (MCMXX) was a leap year starting on Thursday. ... Ian Hacking, CC (born 1936 in Vancouver) is a philosopher, specializing in the philosophy of science. ... Professor Anthony William Fairbank Edwards (born 1935) is a British statistician, geneticist and evolutionary biologist. ... Philosophy of science is the study of assumptions, foundations, and implications of science, especially in the natural sciences and social sciences. ...


Birnbaum proved that the likelihood principle follows from two more primitive and seemingly reasonable principles, the conditionality principle and the sufficiency principle. The conditionality principle says that if an experiment is chosen by a random process independent of the states of nature θ, then only the experiment actually performed is relevant to inferences about θ. The sufficiency principle says that if T(X) is a sufficient statistic for θ, and if in two experiments with data x1 and x2 we have T(x1) = T(x2), then the evidence about θ given by the two experiments is the same. Allan Birnbaum (May 27, 1923 - July 1, 1976) was an American statistician who contributed to statistical inference, foundations of statistics, statistical genetics, statistical psychology, and history of statistics. ... In statistics, one often considers a family of probability distributions for a random variable X (and X is often a vector whose components are scalar-valued random variables, frequently independent) parameterized by a scalar- or vector-valued parameter, which let us call θ. A quantity T(X) that depends on...


Arguments for and against the likelihood principle

The likelihood principle is not universally accepted. Some widely-used methods of conventional statistics, for example many significance tests, are not consistent with the likelihood principle. Let us briefly consider some of the arguments for and against the likelihood principle. One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. ...


Experimental design arguments on the likelihood principle

Unrealized events do play a role in some common statistical methods. For example, the result of a significance test depends on the probability of a result as extreme or more extreme than the observation, and that probability may depend on the design of the experiment. Thus, to the extent that such methods are accepted, the likelihood principle is denied. One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. ...


Some classical significance tests are not based on the likelihood. A commonly cited example is the optional stopping problem. Suppose I tell you that I tossed a coin 12 times and in the process observed 3 heads. You might make some inference about the probability of heads and whether the coin was fair. Suppose now I tell that I tossed the coin until I observed 3 heads, and I tossed it 12 times. Will you now make some different inference?


The likelihood function is the same in both cases: it is proportional to

p^3 ; (1-p)^9.

According to the likelihood principle, the inference should be the same in either case.


Suppose a number of scientists are assessing the probability of a certain outcome (which we shall call 'success') in experimental trials. Conventional wisdom suggests that if there is no bias towards success or failure then the success probability would be one half. Adam, a scientist, conducted 12 trials and obtains 3 successes and 9 failures. Then he dropped dead.


Bill, a colleague in the same lab, continued Adam's work and published Adam's results, along with a significance test. He tested the null hypothesis that p, the success probability, is equal to a half, versus p < 0.5. The probability of the observed result that out of 12 trials 3 or something fewer (i.e. more extreme) were successes, if H0 is true, is

left({12 choose 9}+{12 choose 10}+{12 choose 11}+{12 choose 12}right)left({1 over 2}right)^{12}

which is 299/4096 = 7.3%. Thus the null hypothesis is not rejected at the 5% significance level.


Charlotte, another scientist, reads Bill's paper and writes a letter, saying that it is possible that Adam kept trying until he obtained 3 successes, in which case the probability of needing to conduct 12 or more experiments is given by

1-left({10 choose 2}left({1 over 2}right)^{11}+{9 choose 2}left({1 over 2}right)^{10}+cdots +{2 choose 2}left({1 over 2}right)^{3}right)

which is 134/4096 = 3.27%. Now the result is statistically significant at the 5% level.


To these scientists, whether a result is significant or not seems to depend on the original design of the experiment, not just the likelihood of the outcome.


Apparently paradoxical results of this kind are considered by some as arguments against the likelihood principle; for others it exemplifies its value and resolves the paradox.


The voltmeter story

An argument in favor of the likelihood principle is given by Edwards in his book Likelihood. He cites the following story from J.W. Pratt, slightly condensed here. Note that the likelihood function depends only on what actually happened, and not on what could have happened.

An engineer draws a random sample of electron tubes and measures their voltage. The measurements range from 75 to 99 volts. A statistician computes the sample mean and a confidence interval for the true mean. Later the statistician discovers that the voltmeter reads only as far as 100, so the population appears to be 'censored'. This necessitates a new analysis, if the statistician is orthodox. However, the engineer says he has another meter reading to 1000 volts, which he would have used if any voltage had been over 100. This is a relief to the statistician, because it means the population was effectively uncensored after all. But, the next day the engineer informs the statistician that this second meter was not working at the time of the measuring. The statistician ascertains that the engineer would not have held up the measurements until the meter was fixed, and informs him that new measurements are required. The engineer is astounded. "Next you'll be asking about my oscilloscope".

One might proceed with this story, and consider the fact that in general the presence could have been different. For instance, high range voltmeters don't get broken at predictable moments in time, but rather at unpredictable moments. So it could have been broken, with some probability. The distribution of the measurements depends on this probability.


This story can be translated to Adam's stopping rule above, as follows. Adam stopped immediately after 3 successes, because his boss Bill had instructed him to do so. Adam did not die. After the publication of the statistical analysis by Bill, Adam discovers that he has missed a second instruction from Bill to conduct 12 trials instead, and that Bill's paper is based on this second instruction. Adam is very glad that he got his 3 successes after exactly 12 trials, and explains to his friend Charlotte that by coincidence he executed the second instruction. Later, he is astonished to hear about Charlotte's letter explaining that now the result is significant.


Optional stopping in clinical trials

The fact that Bayesian and frequentist arguments differ on the subject of optional stopping has a major impact on the way that clinical trial data can be analysed. In frequentist setting there is a major difference between a design which is fixed and one which is sequential, i.e. consisting of a sequence of analyses. Bayesian statistics is inherently sequential and so there is no such distinction.


In a clinical trial it is strictly not valid to conduct an unplanned interim analysis of the data by frequentist methods, whereas this is permissible by Bayesian methods. Similarly, if funding is withdrawn part way through an experiment, and the analyst must work with incomplete data, this is a possible source of bias for classical methods but not for Bayesian methods, which do not depend on the intended design of the experiment. Furthermore, as mentioned above, frequentist analysis is open to unscrupulous manipulation if the experimenter is allowed to choose the stopping point, whereas Bayesian methods are immune to such manipulation.


References

  • G.A. Barnard, G.M. Jenkins, and C.B. Winsten. "Likelihood Inference and Time Series", J. Royal Statistical Society, series A, 125:321-372, 1962.
  • Allan Birnbaum. "On the foundations of statistical inference". J. Amer. Statist. Assoc. 57(298):269–326, 1962. (With discussion.)
  • Anthony W.F. Edwards. Likelihood. 1st edition 1972 (Cambridge University Press), 2nd edition 1992 (Johns Hopkins University Press).
  • Anthony W.F. Edwards. "The history of likelihood". Int. Statist. Rev. 42:9-15, 1974.
  • Ronald A. Fisher. "On the Mathematical Foundations of Theoretical Statistics", Phil. Trans. Royal Soc., series A, 222:326, 1922. (On the web at: [1])
  • Ian Hacking. Logic of Statistical Inference. Cambridge University Press, 1965.
  • Berger J.O., and Wolpert, R.L, (1988). The Likelihood Principle. The Institute of Mathematical Statistics, Haywood, CA.
  • Harold Jeffreys, The Theory of Probability. The Oxford University Press, 1961.
  • Richard M. Royall. Statistical Evidence: A Likelihood Paradigm. London: Chapman & Hall, 1997.
  • Leonard J. Savage et al. The Foundations of Statistical Inference. 1962.

George Alfred Barnard (September 23, 1915 - August 9, 2002) British statistician known particularly for his work on the foundations of statistics and on quality control. ... Allan Birnbaum (May 27, 1923 - July 1, 1976) was an American statistician who contributed to statistical inference, foundations of statistics, statistical genetics, statistical psychology, and history of statistics. ... Sir Ronald Fisher Sir Ronald Aylmer Fisher, FRS (February 17, 1890&#8211;July 29, 1962) was an extraordinarily talented evolutionary biologist, geneticist and statistician. ... Leonard Jimmie Savage (20 November 1917 - 1 November 1971) was a US mathematician and statistician. ...

External links


  Results from FactBites:
 
Likelihood principle - Wikipedia, the free encyclopedia (1810 words)
In statistics, the likelihood principle is a controversial principle of statistical inference which asserts that all of the information in a sample is contained in the likelihood function.
Combining the likelihood principle with the law of likelihood yields the consequence that the parameter value which maximizes the likelihood function is the value which is most strongly supported by the evidence.
The likelihood principle was first identified by that name in print in 1962 (Barnard et al., Birnbaum, and Savage et al.), but arguments for the same principle, unnamed, and the use of the principle in applications goes back to the works of R.A. Fisher in the 1920s.
Likelihood function - Wikipedia, the free encyclopedia (787 words)
For a likelihood function of more than one parameter, it is sometimes possible to write some parameters as functions of other parameters, thereby reducing the number of independent parameters.
Attempting to interpret the likelihood of a hypothesis given observed evidence as the probability of the hypothesis is a common error, with potentially disastrous real-world consequences in medicine, engineering or jurisprudence.
The likelihood function is not a probability density function -- for example, the integral of a likelihood function is not in general 1.
  More results at FactBites »

 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your location
Your comments
Please enter the 5-letter protection code


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.