FACTOID # 13: The United States spends more money on its military than the next 12 nations combined.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Law of large numbers

Contents

The law of large numbers (LLN) is any of several theorems in probability. Probability is the chance that something is likely to happen or be the case. ...


Roughly speaking, the law of large numbers can be stated as follows. Consider an experiment whose outcome is a random variable is done repeatedly, the outcomes on different repetitions being independent. In one example, a die is thrown repeatedly, the outcome each time being one of the six numbers 1, 2, 3, 4, 5, or 6. The law of large numbers says that as the number of independent repetitions of the experiment grows, the average of the observed outcomes approaches the average of all possible outcomes. In the example of the die, the average of all possible outcomes is just the average of those six numbers: A random variable is a mathematical function that maps outcomes of random experiments to numbers. ... Two standard six-sided pipped dice with rounded corners. ... In mathematics, an average or central tendency of a set (list) of data refers to a measure of the middle of the data set. ...

(1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5.

As the die is thrown repeatedly, the average observed outcome approaches 3.5, in the sense that it can be made as close as desired to 3.5 by making the number of trials big enough.


The law of large numbers takes a simple form for a random variable that has two possible outcomes, say failure and success. If failure is assigned a value of 0 and success a value of 1 then the population mean equals the probability of success p, and the observed average value (sample mean) equals the observed proportion of successes. For such a binomial random variable LLN says:

If an event of probability p is observed repeatedly during independent repetitions, the ratio of the observed frequency of that event to the total number of repetitions converges towards p as the number of repetitions becomes arbitrarily large. In probability theory, there exist several different notions of convergence of random variables. ...

More simply, as an experiment is repeated over and over, the observed proportion of successes approaches the proportion in the whole population being sampled. For example, if a fair die is thrown repeatedly, the proportion of times that an ace (i.e. a "1") appears approaches 1/6 as the number of tosses grows.


The proposition, as it is used in both probability and statistics, has practical applications in science, economics, agriculture, production, business, and other important activities. For example, if we do not know the probability of some natural event (say the chance that it will rain), or if we do not know the fraction of some population that meets some condition (such as how many defective parts were produced on an assembly line) we can discover that probability or that percentage through sufficiently numerous observations and experiments. A graph of a Normal bell curve showing statistics used in educational assessment and comparing various grading methods. ...


Origins of the term

Jacob Bernoulli first described the law of large numbers as so simple that even the stupidest man instinctively knows it is true. [1] Despite this, it took him over 20 years to develop a sufficiently rigorous mathematical proof which was published in Ars Conjectandi (The Art of Conjecturing) in 1713. He named this his "Golden Theorem" but it became generally known as "Bernoulli's Theorem". In 1835, S.D. Poisson further described it under the name "La loi de grands nombres" (The law of large numbers).[2]. Thereafter, it was known under both names, but the "Law of large numbers" is most frequently used. James Bernoulli (also known as Jacob I) was born in Basel, Switzerland on December 27, 1654 and lived until August 16, 1705. ... Siméon Poisson. ...


After Bernoulli and Poisson published their efforts, other mathematicians also contributed to refinement of the law, including Chebyshev, Markov, Borel, Cantelli, Kolmogorov, Vapnik and Chervonenkis. These further studies have given rise to two prominent forms of the law of large numbers. One is called the "weak" law and the other the "strong" law. These forms do not describe different laws but instead refer to different ways of describing the convergence of the observed or measured probability to the actual probability. Pafnuty Lvovich Chebyshev (Russian: ) ( May 26 [O.S. May 14] 1821 – December 8 [O.S. November 26] 1894) was a Russian mathematician. ... Andrey (Andrei) Andreyevich Markov (Russian: ) (June 14, 1856 N.S. – July 20, 1922) was a Russian mathematician. ... Félix Édouard Justin Émile Borel (January 7, 1871 – February 3, 1956) was a French mathematician and politician. ... Francesco Cantelli Francesco Paolo Cantelli (1875-1966) was an Italian mathematician. ... Andrey Kolmogorov Andrey Nikolaevich Kolmogorov (Андре́й Никола́евич Колмого́ров) (kahl-mah-GAW-raff) (April 25, 1903 in Tambov - October 20, 1987 in Moscow) was a Soviet mathematician who made major advances in the fields of probability theory and topology. ... Vladimir Naumovich Vapnik is one of the main developers of Vapnik Chervonenkis theory. ... Alexey Chervonenkis is a Russian mathematician, and one of the main developers of the Vapnik Chervonenkis theory, an important part of computational learning theory. ... In the absence of a more specific context, convergence denotes the approach toward a definite value, as time goes on; or to a definite point, a common view or opinion, or toward a fixed or equilibrium state. ...


Probability

The law of large numbers is called "the first fundamental theorem of probability". It was derived by analysis of games of chance - the drawing of lots or the casting of dice which are governed by probability. For example, a fair, six sided die may come up "1","2","3","4","5" or "6" dots on a single throw and if these dots are counted as numbers, it is possible to calculate the value of an "average" roll.


We know that over many rolls, one roll in six will result in a "1". Likewise, one roll in six will result in "2" and so on through all 6 possible rolls. Counting the results as numbers gives:

frac{1}{6} times 1 + frac{1}{6} times 2 + frac{1}{6} times 3 +frac{1}{6} times 4 +frac{1}{6} times 5 +frac{1}{6} times 6 = frac{1+2+3+4+5+6}{6}=frac{21}{6}= 3.5

Of course, there is no single side of the die that has 3.5 dots, and so, no single roll of the die will result in a value of "3.5". But after a large number of rolls are recorded, the average score of all rolls will approach 3.5.


Furthermore, with each roll of the die, the ratio of the number of times a particular result occurs ("1", "2", "3", "4", "5" or "6") to the total number of rolls will approach 1/6.


Misunderstanding this law may lead to the belief that if an event has not occurred in many trials, the probability of it occurring in a subsequent trial is increased. For example, the probability of a fair die turning up a 3 is 1 in 6. LLN says that over a large number of throws, the observed frequency of 3s will be close to 1 in 6 (16 2/3%). This however does not mean that if the first 5 throws of the die do not turn up a 3, the sixth throw is more likely to produce a 3. Each roll is independent and the probability of rolling a 3 remains exactly the same from roll to roll and the value of any one individual observation cannot be predicted based upon past observations. Such erroneous predictions are known as the Gambler's fallacy. The gamblers fallacy is a logical fallacy which encompasses any of the following misconceptions: A random event is more likely to occur because it has not happened for a period of time; A random event is less likely to occur because it has not happened for a period of...


The "law of large numbers" is sometimes invoked to refer to the notion that even very improbable events may occur when a sufficiently large number of instances are given.


Statistics

The law of large numbers was derived through analysis of probability. Statistics evolve from probability theory and in statistics, the law of large numbers means that a large sample is more likely than a smaller sample to have the characteristics of the whole.


To illustrate, picture a water bottling plant producing 10,000 bottles of water a day. The plant manager measures the volume of water in a large number (say 200) of the bottles it produced that day, and finds that the average is .997 liters. In this case, the plant manager may conclude that the average of all bottles that day is not quite 1 liter.


Forms and proofs

The weak law of large numbers

The weak law of large numbers states that if X1, X2, X3, ... is an infinite sequence of random variables, where all the random variables have the same expected value μ and variance σ2; and are uncorrelated (i.e., the correlation between any two of them is zero), then the sample average In mathematics, a sequence is a list of objects (or events) arranged in a linear fashion, such that the order of the members is well defined and significant. ... A random variable is a mathematical function that maps outcomes of random experiments to numbers. ... In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... In probability theory and statistics, to call two real-valued random variables X and Y uncorrelated means that their correlation is zero, or, equivalently, their covariance is zero. ... Positive linear correlations between 1000 pairs of numbers. ...

overline{X}_n=(X_1+cdots+X_n)/n

converges in probability to μ.


Or, somewhat more tersely:


For any positive number ε, no matter how small, we have

lim_{nrightarrowinfty}operatorname{P}left(left|overline{X}_n-muright|<varepsilonright)=1.

Proof

Chebyshev's inequality is used to prove this result. Finite variance operatorname{Var} (X_i)=sigma^2 (for all i) and no correlation yield that In probability theory, Chebyshevs inequality (also known as Tchebysheffs inequality, Chebyshevs theorem, or the Bienaymé-Chebyshev inequality), named after Pafnuty Chebyshev, who first proved it, states that in any data sample or probability distribution, nearly all the values are close to the mean value, and provides a...

operatorname{Var}(overline{X}_n) = frac{nsigma^2}{n^2} = frac{sigma^2}{n}.

The common mean μ of the sequence is the mean of the sample average:

E(overline{X}_n) = mu.

Using Chebyshev's inequality on overline{X}_n results in In probability theory, Chebyshevs inequality (also known as Tchebysheffs inequality, Chebyshevs theorem, or the Bienaymé-Chebyshev inequality), named after Pafnuty Chebyshev, who first proved it, states that in any data sample or probability distribution, nearly all the values are close to the mean value, and provides a...

operatorname{P}( left| overline{X}_n-mu right| geq varepsilon) leq frac{sigma^2}{{nvarepsilon^2}}.

This may be used to obtain the following:

operatorname{P}( left| overline{X}_n-mu right| < varepsilon) = 1 - operatorname{P}( left| overline{X}_n-mu right| geq varepsilon) geq 1 - frac{sigma^2}{varepsilon^2 n}.

As n approaches infinity, the expression approaches 1.


Proof ends here


The result holds also for the 'infinite variance' case, provided the Xi are mutually independent and their (finite) mean μ exists.


A consequence of the weak law of large numbers is the asymptotic equipartition property. The asymptotic equipartition property (AEP) is a general property used extensively in information theory concerning the output samples of a stochastic source. ...


The strong law of large numbers

The strong law of large numbers states that if X1, X2, X3, ... is an infinite sequence of random variables that are pairwise independent and identically distributed with In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. ...

E(X_i) = muquadmbox{ and }quad E(|X_i|) < infty,

then

operatorname{P}left(lim_{nrightarrowinfty}overline{X}_n=muright)=1,

i.e., the sample average converges almost surely to μ. In probability theory, there exist several different notions of convergence of random variables. ... In mathematics, specifically, in probability theory, the phrase almost surely is a concise, precise way to state except on a set or event of probability measure zero. ...


If we replace the finite expectation condition with a finite second moment condition,  E(Xi2) < ∞ (which is the same as assuming that Xi has variance), then we obtain both almost sure convergence and convergence in mean square. In either case, these conditions also imply the consequent weak law of large numbers, since almost sure convergence implies convergence in probability (as, indeed, does convergence in mean square).-1... In probability theory, there exist several different notions of convergence of random variables. ...


This law justifies the intuitive interpretation of the expected value of a random variable as the "long-term average when sampling repeatedly".


A weaker law and proof

Proofs of the above weak and strong laws of large numbers are rather involved. The consequent of the slightly weaker form below is implied by the weak law above (since convergence in distribution is implied by convergence in probability), but has a simpler proof.


Theorem. Let X1, X2, X3, ... be a sequence of random variables, independent and identically distributed with common mean μ < ∞, and define the partial sum Sn := X1 + X2 + ... +Xn. Then,  Sn / n converges in distribution to μ. In probability theory, there exist several different notions of convergence of random variables. ...


Proof. (See [1], p. 174) By Taylor's theorem for complex functions, the characteristic function of any random variable, X, with finite mean μ, can be written as In calculus, Taylors theorem, named after the mathematician Brook Taylor, who stated it in 1712, gives the approximation of a differentiable function near a point by a polynomial whose coefficients depend only on the derivatives of the function at that point. ... Complex analysis is the branch of mathematics investigating holomorphic functions, i. ... Some mathematicians use the phrase characteristic function synonymously with indicator function. ...

varphi(t) = 1 + itmu + o(t), quad t rightarrow 0.

Then, since the characteristic function of the sum of independent random variables is the product of their characteristic functions, the characteristic function of  Sn / n  is

left[varphileft({t over n}right)right]^n = left[1 + imu{t over n} + oleft({t over n}right)right]^n , rightarrow , e^{itmu}, quad textrm{as} quad n rightarrow infty.

The limit  eitμ  is the characteristic function of the constant random variable μ, and hence by the Lévy continuity theorem,  Sn / n converges in distribution to μ. Note that the proof of the central limit theorem, which tells us more about the convergence of the average to μ (when the variance σ 2 is finite), follows a very similar approach. The Lévy continuity theorem in probability theory is the basis for one approach to the central limit theorem. ... A central limit theorem is any of a set of weak-convergence results in probability theory. ...


Applications and activities

This collection of hands-on interactive activities demonstrates both the meaning of the LLN and its practical ramifications. This activity consists of several exercises, built on SOCR experiments, that show the asymptotic behavior of different stochastic processes where one studies the empirical and theoretical properties of the underlying distributions. This activity also illustrates the main two misconceptions about the LLN in practical terms. The Statistics Online Computational Resource (SOCR) is a suite of online tools and interactive aids for hands-on learning and teaching concepts in statistical analyses and probability theory. ... An asymptote is a straight line or curve which a curve approaches as one moves along the curve. ... Stochastic, from the Greek stochos or goal, means of, relating to, or characterized by conjecture; conjectural; random. ... In mathematics and statistics, a probability distribution is a function of the probabilities of a mutually exclusive and exhaustive set of events. ...


References

  1. ^ Jakob Bernoulli, Ars Conjectandi: Usum & Applicationem Praecedentis Doctrinae in Civilibus, Moralibus & Oeconomicis, 1713, Chapter 4,(Translated into English by Oscar Sheynin)
  2. ^ Hacking, Ian. (1983) "19th-century Cracks in the Concept of Determinism"
  • Grimmett, G. R. and Stirzaker, D. R. (1992). Probability and Random Processes, 2nd Edition. Clarendon Press, Oxford. ISBN 0-19-853665-8. 
  • Richard Durrett (1995). Probability: Theory and Examples, 2nd Edition. Duxbury Press. 

See also

The gamblers fallacy is a logical fallacy which encompasses any of the following misconceptions: A random event is more likely to occur because it has not happened for a period of time; A random event is less likely to occur because it has not happened for a period of... The law of averages is a lay term used to express the view that eventually, everything evens out. ...

External links


  Results from FactBites:
 
Law of large numbers - Uncyclopedia (357 words)
The law of large numbers is a term in political theory first coined by Thomas Hobbes in his famous political text, Leviathan.
Hobbes used this term to describe the behavior of numbers in the state of nature; he argued that, left to themselves, without a general theory of mathematics to overawe them all, numbers would descend into a war of all against all where only the largest numbers survived.
The usual use in debate is along the political speech formula of "Your (war/policy) has (killed/injured/maimed/widowed/economically ruined) (large number) (people/soldiers/immigrants/minorities)" A notable opponent of the law of large numbers was Josef Stalin, stating that "the death of one man is a tragedy.
Law of large numbers - Wikipedia, the free encyclopedia (1103 words)
The law of large numbers is a fundamental concept in statistics and probability that describes how the average of a randomly selected sample from a large population is likely to be close to the average of the whole population.
The phrase "law of large numbers" is also sometimes used in a less technical way to refer to the principle that the probability of any possible event (even an unlikely one) occurring at least once in a series increases with the number of events in the series.
A consequence of the weak law of large numbers is the asymptotic equipartition property.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.