FACTOID # 53: If you thought Antarctica was inhospitable, think again - its land area is only ninety-eight percent ice. Reassuringly, the other 2% is categorised as "barren rock".
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Stein's example
Jump to: navigation, search

Stein's example, also known as Stein's paradox (after Charles Stein) is a very important example in decision theory which is much celebrated since it contradicts a mathematician's natural intuition. Decision theory is an interdisciplinary area of study, related to and of interest to practitioners in mathematics, statistics, economics, philosophy, management and psychology. ...


This article assumes you are familiar with basic decision theory including the definition of a decision rule and of an inadmissible decision rule. Decision theory is an interdisciplinary area of study, related to and of interest to practitioners in mathematics, statistics, economics, philosophy, management and psychology. ... In classical (frequentist) decision theory, an admissible decision rule is a rule for making a decision that is better in some sense than any other rule that may compete with it. ... In classical (frequentist) decision theory, an admissible decision rule is a rule for making a decision that is better in some sense than any other rule that may compete with it. ...

Contents


Stein's example

Let with be a multivariate Gaussian random variable with distribution In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution) is a specific probability distribution. ...

.

Then the decision rule for the parameter is inadmissible under the loss function . In statistics, decision theory and economics, a loss function is a function that maps an event (technically an element of a sample space) onto a real number representing the economic cost or regret associated with the event. ...


The significance of Stein's example

Decision theory is about estimating a distributional parameter in such a way as to minimise the cost of mistakes. Stein's example is significant because it is so surprising: the decision rule is the maximum likelihood estimator - a decision rule which one would very naturally expect to be admissible (ie as good as any other if not better). Jump to: navigation, search In mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. ... In statistics, the method of maximum likelihood, pioneered by geneticist and statistician Sir Ronald A. Fisher, is a method of point estimation, that uses as an estimate of an unobservable population parameter the member of the parameter space that maximizes the likelihood function. ... In classical (frequentist) decision theory, an admissible decision rule is a rule for making a decision that is better in some sense than any other rule that may compete with it. ...


This becomes even more surprising when one considers it in a real-world context. For example, (assuming the distributions are [approxrimately] Gaussian) the first vector component may be the US wheat yield for 1993, the second may be the number of spectators at the Wimbledon tennis tournament in London in 2001, the third may be the weight of a randomly chosen bag of sugar from the supermarket... etc. Stein's example tells us that we will get a better estimate for the multivariate mean by scaling by an amount which depends on all of these completely independent sample values.


Sketched proof

The risk function of the decision rule is In decision theory, the risk of an estimator the expected value of the loss function as a function on the unknown underlying state of nature: . Categories: Decision theory ...

Now consider the decision rule

where α = n − 2. We will show that d' is a better decision rule than d. The risk function is

— a quadratic in α. We may simplify the middle term by considering a general sufficiently "well behaved" function and using integration by parts. For any such h, for all : In calculus, and more generally in mathematical analysis, integration by parts is a rule that transforms the integral of products of functions into other, possibly simpler, integrals. ...

(This result is known as Stein's lemma.) Steins lemma, named in honor of Charles Stein, may be characterized as a theorem of probability theory that is of interest primarily because of its application to statistical inference -- in particular, its application to James-Stein estimation and empirical Bayes methods. ...


Thus, if we set

then assuming h meets the "well behaved" condition (see end of proof), we have

and so

Then returning to the risk function of d' :

This quadratic in α is maximised at

α = (n − 2)

giving

which of course satisfies:

making d an inadmissible decision rule.


It remains to justify the use of

This function isn't in fact very "well behaved" since it is singular at . However the function

is "well behaved", and after following the algebra through and letting one obtains the same result. The reader may either take this on trust or prove it as an exercise.


Interpreting Stein's example

At first sight (or to the naïve reader) it appears that somehow we get a better estimate for US wheat yield (by way of example) by measuring some other unrelated statistics such as (by way of example) the size of the Egyptian President's brother's camel's left foot and the time taken by a randomly chosen member of the public to hop 100m whilst wearing a pink fluffy cardigan. This is of course absurd; we have not obtained a better estimate for US wheat yield alone, but we have produced an estimate for the means of all of the random variables which has a reduced total risk. So the cost of a bad estimate in one component can be compensated by a better estimate in another component.


External resources

  • Stein, Charles (1956). Inadmissibility of the usual estimator for the mean of a multivariate distribution. Proc. Third Berkeley Symp. Math. Statist. Prob. 1, 197-206.
  • Lehmann, E. L., and Casella, George (1998). Theory of Point Estimation, 2nd ed., ch. 5.

See also



 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.