FACTOID # 18: Sick of crowds? Move to Greenland! Greenlanders have 38 square kilometres of land per person.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Standard normal
Normal
Probability density function
Probability density function for the Normal distribtion
The green line is the standard normal distribution
Cumulative distribution function
Cumulative distribution function for the Normal distribution
Colors match the pdf above
Parameters μ location (real)
σ2 > 0 squared scale (real)
Support
pdf
cdf
Mean μ
Median μ
Mode μ
Variance σ2
Skewness 0
Kurtosis 0
Entropy
mgf
Char. func.

The normal distribution, also called Gaussian distribution, is an extremely important probability distribution in many fields, especially in physics and engineering. It is a family of distributions of the same general form, differing in their location and scale parameters: the mean ("average") and standard deviation ("variability"), respectively. The standard normal distribution is the normal distribution with a mean of zero and a standard deviation of one (the green curves in the plots to the right). It is often called the bell curve because the graph of its probability density resembles a bell. Download high resolution version (1300x975, 135 KB) Wikipedia does not have an article with this exact name. ... Download high resolution version (1300x975, 135 KB) Wikipedia does not have an article with this exact name. ... In mathematics, the real numbers are intuitively defined as numbers that are in one-to-one correspondence with the points on an infinite line—the number line. ... In statistics, if a family of probabiblity densities parametrized by a parameter s is of the form fs(x) = f(sx)/s then s is called a scale parameter, since its value determines the scale of the probability distribution. ... In mathematics, the support of a numerical function f on a set X is sometimes defined as the subset of X on which f is nonzero. ... In mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. ... In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the variable X takes on a value less than or... In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical... In probability theory and statistics, the median is a number that separates the highest half of a sample, a population, or a probability distribution from the lowest half. ... In statistics, the mode is the value that has the largest number of observations, namely the most frequent value or values. ... In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. ... In probability theory and statistics, kurtosis is a measure of the peakedness of the probability distribution of a real-valued random variable. ... Entropy of a Bernoulli trial as a function of success probability. ... In probability theory and statistics, the moment-generating function of a random variable X is The moment-generating function generates the moments of the probability distribution, as follows: If X has a continuous probability density function f(x) then the moment generating function is given by where is the ith... Some mathematicians use the phrase characteristic function synonymously with indicator function. The indicator function of a subset A of a set B is the function with domain B, whose value is 1 at each point in A and 0 at each point that is in B but not in A... In mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. ... The willingness to question previously held truths and search for new answers resulted in a period of major scientific advancements, now known as the Scientific Revolution. ... Licensure and Qualifications for the Practice of Engineering The Engineers Ring The Ritual of the Calling of an Engineer Engineering Disasters and Learning from Failure American Society of Engineering Education (ASEE) ASEE engineering profile (2003) PDF Categories: Architecture and engineering occupations | Engineering ... A parameter is a measurement or value on which something else depends. ... In statistics, mean has two related meanings: the average in ordinary English, which is more correctly called the arithmetic mean, to distinguish it from geometric mean or harmonic mean. ... In probability and statistics, the standard deviation is the most commonly used measure of statistical dispersion. ... In mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. ... The bells of St Savas A bell is a simple sound-making device. ...

Contents


Overview

The normal distribution is a convenient model of quantitative phenomena in the natural and behavioral sciences. A variety of psychological test scores and physical phenomena like photon counts have been found to approximately follow a normal distribution. While the underlying causes of these phenomena are often unknown, the use of the normal distribution can be theoretically justified in situations where many small effects are added together into a score or variable that can be observed. The normal distribution also arises in many areas of statistics: for example, the sampling distribution of the mean is approximately normal, even if the distribution of the population the sample is taken from is not normal. In addition, the normal distribution maximizes information entropy among all distributions with known mean and variance, which makes it the natural choice of underlying distribution for data summarized in terms of sample mean and variance. The normal distribution is the most widely used family of distributions in statistics and many statistical tests are based on the assumption of normality. In probability theory, normal distributions arise as the limiting distributions of several continuous and discrete families of distributions. The lunar farside as seen from Apollo 11 Natural science is the study of the physical, nonhuman aspects of the Earth and the universe around us. ... Behavioural sciences (or Behavioral science) is a term that encompasses all the disciplines that explores the behaviour and strategies within and between organisms in the natural world. ... Psychology (Classical Greek: psyche = soul or mind, logos = study of) is an academic and applied field involving the study of behaviour, mind and thought and the underlying neurological bases of behaviour. ... The willingness to question previously held truths and search for new answers resulted in a period of major scientific advancements, now known as the Scientific Revolution. ... For the Science Fiction weapon, as seen in Star Trek, see Photon torpedo. ... Statistics is a type of data analysis which practice includes the planning, summarizing, and interpreting of observations of a system possibly followed by predicting or forecasting of future events based on a mathematical model of the system being observed. ... In statistics, a sampling distribution is the probability distribution, under repeated sampling of the population, of a given statistic (a numerical quantity calculated from the data values in a sample). ... In statistics, mean has two related meanings: the average in ordinary English, which is more correctly called the arithmetic mean, to distinguish it from geometric mean or harmonic mean. ... Entropy of a Bernoulli trial as a function of success probability. ... Probability theory is the mathematical study of probability. ... In probability theory, there exist several different notions of convergence of random variables. ... The word discrete comes from the Latin word discretus which means separate. ...


History

The normal distribution was first introduced by de Moivre in an article in 1733 (reprinted in the second edition of his The Doctrine of Chances, 1738) in the context of approximating certain binomial distributions for large n. His result was extended by Laplace in his book Analytical Theory of Probabilities (1812), and is now called the Theorem of de Moivre-Laplace. Abraham de Moivre (May 26, 1667 - November 27, 1754), was a French mathematician famous for de Moivres formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. ... Events February 12 - British colonist James Oglethorpe founds Savannah, Georgia. ... The Doctrine of Chances is a book on probability theory by 18th-century French mathematician Abraham de Moivre, published in 1733. ... Events January 1 - Bouvet Island is discovered by French explorer Jean-Baptiste Charles Bouvet de Lozier. ... See binomial (disambiguation) for a list of other topics using that name. ... Pierre-Simon Laplace Pierre-Simon Laplace (March 23, 1749 – March 5, 1827) was a French mathematician and astronomer, the discoverer of the Laplace transform and Laplaces equation. ... 1812 was a leap year starting on Wednesday (see link for calendar). ... In probability theory, the theorem of de Moivre-Laplace is a special case of the central limit theorem. ...


Laplace used the normal distribution in the analysis of errors of experiments. The important method of least squares was introduced by Legendre in 1805. Gauss, who claimed to have used the method since 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors. Least squares is a mathematical optimization technique that attempts to find a best fit to a set of data by attempting to minimize the sum of the squares of the differences (called residuals) between the fitted function and the data. ... Adrien-Marie Legendre (September 18, 1752–January 10, 1833) was a French mathematician. ... 1805 was a common year starting on Tuesday (see link for calendar). ... Johann Carl Friedrich Gauss Johann Carl Friedrich Gauss (Gauß) (April 30, 1777 – February 23, 1855) was a German mathematician and scientist who contributed significantly to many fields, including number theory, analysis, differential geometry, geodesy, magnetism, astronomy and optics. ... 1794 was a common year starting on Wednesday (see link for calendar). ... 1809 was a common year starting on Sunday (see link for calendar). ...


The name "bell curve" goes back to Jouffret who used the term "bell surface" in 1872 for a bivariate normal with independent components. The name "normal distribution" was coined independently by Charles S. Peirce, Francis Galton and Wilhelm Lexis around 1875. This terminology is unfortunate, since it reflects and encourages the fallacy that many or all probability distributions are "normal". (See the discussion of "occurrence" below.) 1872 was a leap year starting on Monday (see link for calendar). ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution) is a specific probability distribution. ... Charles Sanders Peirce Charles Sanders Peirce (September 10, 1839 – April 19, 1914) was an American logician, philosopher, scientist, and mathematician. ... Francis Galton Sir Francis Galton (February 16, 1822 – January 17, 1911) British anthropologist, explorer, inventor, statistician, a pioneer in eugenics, investigator of the human mind, and the founder of the science of measuring mental faculties: psychometrics. ... Wilhelm Lexis (1837 – 1914) was an eminent German economist and social scientist and a founder of the interdisciplinary study of insurance. ... 1875 was a common year starting on Friday (see link for calendar). ...


That the distribution is called the normal or Gaussian distribution is an instance of Stigler's law of eponymy: "No scientific discovery is named after its original discoverer." Stiglers Law of Eponymy is a process proposed by University of Chicago Department of Statistics Professor Stephen Stigler [1] in his 1980 publication Stigler’s law of eponymy. ...


Specification of the normal distribution

There are various ways to specify a random variable. The most visual is the probability density function (plot at the top), which represents how likely each value of the random variable is. The cumulative density function is a conceptually cleaner way to specify the same information, but to the untrained eye its plot is much less informative (see below). Equivalent ways to specify the normal distribution are: the moments, the cumulants, the characteristic function, the moment-generating function, and the cumulant-generating function. Some of these are very useful for theoretical work, but not intuitive. See probability distribution for a discussion. // Cumulants of probability distributions In probability theory and statistics, the cumulants κn of the probability distribution of a random variable X are given by In other words, κn/n! is the nth coefficient in the power series representation of the logarithm of the moment-generating function. ... Some mathematicians use the phrase characteristic function synonymously with indicator function. The indicator function of a subset A of a set B is the function with domain B, whose value is 1 at each point in A and 0 at each point that is in B but not in A... In probability theory and statistics, the moment-generating function of a random variable X is The moment-generating function generates the moments of the probability distribution, as follows: If X has a continuous probability density function f(x) then the moment generating function is given by where is the ith... In mathematics a generating function is a formal power series whose coefficients encode information about a sequence an that is indexed by the natural numbers. ... In mathematics, a probability distribution assigns to every interval of the real numbers a probability, so that the probability axioms are satisfied. ...


All of the cumulants of the normal distribution are zero, except the first two. // Cumulants of probability distributions In probability theory and statistics, the cumulants κn of the probability distribution of a random variable X are given by In other words, κn/n! is the nth coefficient in the power series representation of the logarithm of the moment-generating function. ...


Probability density function

Probability density function for 4 different parameter sets (green line is the standard normal)
Probability density function for 4 different parameter sets (green line is the standard normal)

The probability density function of the normal distribution with mean μ and variance σ2 (equivalently, standard deviation σ) is an example of a Gaussian function, Download high resolution version (1300x975, 135 KB) Wikipedia does not have an article with this exact name. ... Download high resolution version (1300x975, 135 KB) Wikipedia does not have an article with this exact name. ... In mathematics, a probability density function (pdf) serves to represent a probability distribution in terms of integrals. ... In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... In probability and statistics, the standard deviation is the most commonly used measure of statistical dispersion. ... A Gaussian function (named after Carl Friedrich Gauss) is a function of the form: for some real constants a > 0, b, and c. ...

(See also exponential function and pi.) The exponential function is one of the most important functions in mathematics. ... The minuscule, or lower-case, pi The mathematical constant π represents the ratio of a circles circumference to its diameter and is commonly used in mathematics, physics, and engineering. ...


If a random variable X has this distribution, we write X ~ N(μ,σ2). If μ = 0 and σ = 1, the distribution is called the standard normal distribution and the probability density function reduces to A random variable can be thought of as the numeric result of operating a non-deterministic mechanism or performing a non-deterministic experiment to generate a random result. ...

The image to the right gives the graph of the probability density function of the normal distribution various parameter values.


Some notable qualities of the normal distribution:

  • The density function is symmetric about its mean value.
  • The mean is also its mode and median.
  • 68.27% of the area under the curve is within one standard deviation of the mean.
  • 95.45% of the area is within two standard deviations.
  • 99.73% of the area is within three standard deviations.
  • The inflection points of the curve occur at one standard deviation away from the mean.

In statistics, the mode is the value that has the largest number of observations, namely the most frequent value or values. ... In probability theory and statistics, the median is a number that separates the highest half of a sample, a population, or a probability distribution from the lowest half. ... In mathematics, particularly in calculus, a stationary point is a point on the graph of a function where the tangent to the graph is parallel to the x-axis or, equivalently, where the derivative of the function equals zero (known as a critical number). ...

Cumulative distribution function

Cumulative distribution function of the above pdf
Cumulative distribution function of the above pdf

The cumulative distribution function (cdf) is defined as the probability that a variable X has a value less than or equal to x, and it is expressed in terms of the density function as Download high resolution version (1300x975, 135 KB) Wikipedia does not have an article with this exact name. ... Download high resolution version (1300x975, 135 KB) Wikipedia does not have an article with this exact name. ... In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the variable X takes on a value less than or...

The standard normal cdf, conventionally denoted Φ, is just the general cdf evaluated with μ = 0 and σ = 1,

The standard normal cdf can be expressed in terms of a special function called the error function, as In mathematics, several functions are important enough to deserve their own name. ... In mathematics, the error function (also called the Gauss error function) is a non-elementary function which occurs in probability, statistics and partial differential equations. ...

The inverse cumulative distribution function, or quantile function, can be expressed in terms of the inverse error function:

There is no elementary primitive for f(x) above. This is not to say merely that none is known, but rather that the non-existence of such a function has been proved.


Values of Φ(x) may be approximated very accurately by a variety of methods, such as by use of the Taylor series, or by geometric means. As the degree of the Taylor series rises, it approaches the correct function. ...


Generating functions

Moment generating function

The moment generating function is defined as the expected value of exp(tX). For a normal distribution, it can be shown that the moment generating function is In probability theory and statistics, the moment-generating function of a random variable X is The moment-generating function generates the moments of the probability distribution, as follows: If X has a continuous probability density function f(x) then the moment generating function is given by where is the ith... In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical...

MX(t)
 
 

as can be seen by completing the square in the exponent. Completing the square is a technique of elementary algebra wherein an expression is replaced by one of the form Specifically, we have See quadratic equation. ...


Characteristic function

The characteristic function is defined as the expected value of exp(itX), where i is the imaginary unit and . For a normal distribution, the characteristic function is Some mathematicians use the phrase characteristic function synonymously with indicator function. The indicator function of a subset A of a set B is the function with domain B, whose value is 1 at each point in A and 0 at each point that is in B but not in A... In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical... In mathematics, the imaginary unit i allows the real number system to be extended to the complex number system . ...

 
 

The characteristic function is obtained by replacing t with it in the moment-generating function.


Properties

Some of the properties of the normal distribution:

  1. If and a and b are real numbers, then (see expected value and variance).
  2. If and are independent normal random variables, then:
    • Their sum is normally distributed with .
    • Their difference is normally distributed with .
    • Both U and V are independent of each other.
  3. If and are independent normal random variables, then:
  4. If are independent standard normal variables, then has a chi-squared distribution with n degrees of freedom.

In mathematics, the real numbers are intuitively defined as numbers that are in one-to-one correspondence with the points on an infinite line—the number line. ... In probability theory (and especially gambling), the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects to win per bet if bets with identical... In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... A random variable can be thought of as the numeric result of operating a non-deterministic mechanism or performing a non-deterministic experiment to generate a random result. ... In mathematics, Bessel functions, first defined by the Swiss mathematician Daniel Bernoulli and named after Friedrich Bessel, are canonical solutions y(x) of Bessels differential equation: for an arbitrary real number α (the order). ... The Cauchy-Lorentz distribution, named after Augustin Cauchy, is a continuous probability distribution with probability density function where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). ... For any positive integer , the chi-square distribution with k degrees of freedom is the probability distribution of the random variable where Z1, ..., Zk are independent normal variables, each having expected value 0 and variance 1. ...

Standardizing normal random variables

As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.


If X ~ N(μ,σ2), then

is a standard normal random variable: Z ~ N(0,1). An important consequence is that the cdf of a general normal distribution is therefore

Conversely, if Z ~ N(0,1), then

X = σZ + μ

is a normal random variable with mean μ and variance σ2.


The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.


Moments

Some of the first few moments of the normal distribution are: See also moment (physics). ...

Number Raw moment Central moment Cumulant
0 1 0
1 μ 0 μ
2 μ2 + σ2 σ2 σ2
3 μ3 + 3μσ2 0 0
4 μ4 + 6μ2σ2 + 3σ4 4 0

All of cumulants of the normal distribution beyond the second cumulant are zero. // Cumulants of probability distributions In probability theory and statistics, the cumulants κn of the probability distribution of a random variable X are given by In other words, κn/n! is the nth coefficient in the power series representation of the logarithm of the moment-generating function. ...


Generating normal random variables

For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods and the most basic is to invert the standard normal cdf. More efficient methods are also known, one such method being the Box-Muller transform. A Box-Muller transform is a method of generating pairs of independent standard normally distributed (zero expectation, unit variance) random numbers, given a source of uniformly distributed random numbers. ...


The Box-Muller transform takes two uniformly distributed values as input and maps them to two normally distributed values. This requires generating values from a uniform distribution, for which many methods are known. See also random number generators. In mathematics, the uniform distributions are simple probability distributions. ... A random number generator is a computational or physical device designed to generate a sequence of numbers that does not have any easily discernable pattern, so that the sequence can be treated as being random. ...


The Box-Muller transform is a consequence of the fact that the chi-square distribution with two degrees of freedom (see property 4 above) is an easily-generated exponential random variable.


The central limit theorem

Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pmf of a binomial distribution with n = 48 and p = 1/4
Plot of the pdf of a normal distribution with μ = 12 and σ = 3, approximating the pmf of a binomial distribution with n = 48 and p = 1/4

The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the central limit theorem. Download high resolution version (1300x975, 91 KB) Wikipedia does not have an article with this exact name. ... Download high resolution version (1300x975, 91 KB) Wikipedia does not have an article with this exact name. ... In probability theory, to say that two events are independent intuitively means that knowing whether or not one of them occurs makes it neither more probable nor less probable that the other occurs. ... Central limit theorem - Wikipedia, the free encyclopedia /**/ @import /skins-1. ...


The practical importance of the central limit theorem is that the normal distribution can be used as an approximation to some other distributions.

  • A binomial distribution with parameters n and p is approximately normal for large n and p not too close to 1 or 0 (some books recommend using this approximation only if np and n(1 − p) are both at least 5; in this case, a continuity correction should be applied).

The approximating normal distribution has mean μ = np and variance σ2 = np(1 − p). See binomial (disambiguation) for a list of other topics using that name. ... In probability theory, if a random variable X has a binomial distribution with parameters n and p, i. ...

The approximating normal distribution has mean μ = λ and variance σ2 = λ. In probability theory and statistics, the Poisson distribution is a discrete probability distribution (discovered by Siméon-Denis Poisson (1781–1840) and published, together with his probability theory, in 1838 in his work Recherches sur la probabilité des jugements en matières criminelles et matière civile [Research on the...


Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.


Infinite divisibility

The normal distributions are infinitely divisible probability distributions. The concept of infinite divisibility arises in different ways in philosophy, physics, economics, order theory (a branch of mathematics), and probability theory (also a branch of mathematics). ...


Standard deviation

Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for 68% of the set while two standard deviations from the mean (blue and brown) account for 95% and three standard deviations (blue, brown and green) account for 99.7%.
Dark blue is less than one standard deviation from the mean. For the normal distribution, this accounts for 68% of the set while two standard deviations from the mean (blue and brown) account for 95% and three standard deviations (blue, brown and green) account for 99.7%.

In practice, one often assumes that data are from an approximately normally distributed population. If that assumption is justified, then about 68% of the values are at within 1 standard deviation away from the mean, about 95% of the values are within two standard deviations and about 99.7% lie within 3 standard deviations. This is known as the "68-95-99.7 rule". standard deviation diagram Created in Adobe Illustrator by Jeremy Kemp, 2/9/05 File links The following pages link to this file: Normal distribution Standard deviation Categories: Free use images ... standard deviation diagram Created in Adobe Illustrator by Jeremy Kemp, 2/9/05 File links The following pages link to this file: Normal distribution Standard deviation Categories: Free use images ... In probability and statistics, the standard deviation is the most commonly used measure of statistical dispersion. ... In statistics, mean has two related meanings: the average in ordinary English, which is more correctly called the arithmetic mean, to distinguish it from geometric mean or harmonic mean. ...


Related distributions

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution. ... For any positive integer , the chi-square distribution with degrees of freedom is the probability distribution of the random variable where the are independent standard normal variables (zero expected value and unit variance). ... The phrase degrees of freedom is used in three different branches of science: in physics and physical chemistry, in mechanical and aerospace engineering, and in statistics. ... The Cauchy-Lorentz distribution, named after Augustin Cauchy, is a continuous probability distribution with probability density function where x0 is the location parameter, specifying the location of the peak of the distribution, and γ is the scale parameter which specifies the half-width at half-maximum (HWHM). ... In probability and statistics, the log-normal distribution is the probability distribution of any random variable whose logarithm is normally distributed (the base of the logarithmic function is immaterial in that loga X is normally distributed if and only if logb X is normally distributed). ...

Occurrence

Approximately normal distributions occur in many situations, as a result of the central limit theorem. When there is reason to suspect the presence of a large number of small effects acting additively and independently, it is reasonable to assume that observations will be normal. There are statistical methods to empirically test that assumption, for example the Kolmogorov-Smirnov test. Central limit theorem - Wikipedia, the free encyclopedia /**/ @import /skins-1. ... In statistics, the Kolmogorov-Smirnov test is used to determine whether two underlying probability distributions differ from each other or whether an underlying probability distribution differs from a hypothesized distribution, in either case based in finite samples. ...


Effects can also act as multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal. Logarithms to various bases: red is to base e, green is to base 10, and purple is to base 1. ... In probability and statistics, the log-normal distribution is the probability distribution of any random variable whose logarithm is normally distributed (the base of the logarithmic function is immaterial in that loga X is normally distributed if and only if logb X is normally distributed). ...


Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting marginal distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the theory of errors (see below).


To summarize, here's a list of situations where approximate normality is sometimes assumed. For a fuller discussion, see below.

  • In counting problems (so the central limit theorem includes a discrete-to-continuum approximation) where reproductive random variables are involved, such as
    • Binomial random variables, associated to yes/no questions;
    • Poisson random variables, associated to rare events;
  • In physiological measurements of biological specimens:
    • The logarithm of measures of size of living tissue (length, height, skin area, weight);
    • The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
    • Other physiological measures may be normally distributed, but there is no reason to expect that a priori;
  • Measurement errors are assumed to be normally distributed, and any deviation from normality must be explained;
  • Financial variables
    • The logarithm of interest rates, exchange rates, and inflation; these variables behave like compound interest, not like simple interest, and so are multiplicative;
    • Stock-market indices are supposed to be multiplicative too, but some researchers claim that they are Levy-distributed variables instead of lognormal;
    • Other financial variables may be normally distributed, but there is no reason to expect that a priori;
  • Light intensity
    • The intensity of laser light is normally distributed;
    • Thermal light has a Bose-Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit theorem.

Of relevance to biology and economics is the fact that complex systems tend to display power laws rather than normality. A set of four symmetric centered Lévy distributions with scale factor c=1. ... In probability and statistics, the log-normal distribution is the probability distribution of any random variable whose logarithm is normally distributed (the base of the logarithmic function is immaterial in that loga X is normally distributed if and only if logb X is normally distributed). ... For other topics related to Einstein see Einstein (disambig) In statistical mechanics, Bose-Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium. ... See Also: Watt In physics, a power law relationship between two scalar quantities x and y is any such that the relationship can be written as where a (the constant of proportionality) and k (the exponent of the power law) are constants. ...


Photon counting

Light intensity from a single source varies with time, as thermal fluctuations can be observed if the light is analyzed at sufficiently high time resolution. The intensity is usually assumed to be normally distributed. In the classical theory of optical coherence, light is modelled as an electromagnetic wave,and correlations are observed and analyzed up to the second order, consistently with the assumption of normality. (See Gaussian stochastic process) A Gaussian process is a stochastic process {Xt}t ∈T such that every finite linear combination of the Xt (or, more generally, any linear functional of the sample function Xt) is normally distributed. ...


However, non-classical correlations are sometimes observed. Quantum mechanics interprets measurements of light intensity as photon counting. The natural assumption in this setting is the Poisson distribution. When light intensity is integrated over times longer than the coherence time and is large, the Poisson-to-normal limit is appropriate. Correlations are interpreted in terms of "bunching" and "anti-bunching" of photons with respect to the expected Poisson behaviour. Anti-bunching requires a quantum model of light emission. For the Science Fiction weapon, as seen in Star Trek, see Photon torpedo. ... In probability theory and statistics, the Poisson distribution is a discrete probability distribution (discovered by Siméon-Denis Poisson (1781–1840) and published, together with his probability theory, in 1838 in his work Recherches sur la probabilité des jugements en matières criminelles et matière civile [Research on the...


Ordinary light sources producing light by thermal emission display a so-called blackbody spectrum (of intensity as a function of frequency), and the number of photons at each frequency follows a Bose-Einstein distribution (a geometric distribution). The coherence time of thermal light is exceedingly low, and so a Poisson distribution is appropriate in most cases, even when the intensity is so low as to preclude the approximation by a normal distribution. As the temperature decreases, the peak of the black body radiation curve moves to lower intensities and longer wavelengths. ... For other topics related to Einstein see Einstein (disambig) In statistical mechanics, Bose-Einstein statistics determines the statistical distribution of identical indistinguishable bosons over the energy states in thermal equilibrium. ...


The intensity of laser light has an exactly Poisson intensity distribution and long coherence times. The large intensities make it appropriate to use the normal distribution.


It is interesting that the classical model of light correlations applies only to laser light, which is a macroscopic quantum phenomenon. On the other hand, "ordinary" light sources do not follow the "classical" model or the normal distribution.


Measurement errors

Normality is the central assumption of the mathematical theory of errors. Similarly, in statistical model-fitting, an indicator of goodness of fit is that the residuals (as the errors are called in that setting) be independent and normally distributed. Any deviation from normality needs to be explained. In that sense, both in model-fitting and in the theory of errors, normality is the only observation that need not be explained, being expected. In statistics, the concepts of error and residual are easily confused with each other. ... In statistics, the concepts of error and residual are easily confused with each other. ...


Repeated measurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is assumed that the remaining error must be the result of a large number of very small additive effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account.


Physical characteristics of biological specimens

The overwhelming biological evidence is that bulk growth processes of living tissue proceed by multiplicative, not additive, increments, and that therefore measures of body size should at most follow a lognormal rather than normal distribution. Despite common claims of normality, the sizes of plants and animals is approximately lognormal. The evidence and an explanation based on models of growth was first published in the classic book

Huxley, Julian: Problems of Relative Growth (1932)

Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further make the joint distribution of sizes deviate from lognormality.


The assumption that linear size of biological specimens is normal leads to a non-normal distribution of weight (since weight/volume is roughly the 3rd power of length, and Gaussian distributions are only preserved by linear transformations), and conversely assuming that weight is normal leads to non-normal lengths. This is a problem, because there is no a priori reason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the "problem" goes away if lognormality is assumed.


On the other hand, there are some biological measures where normality is assumed or expected:

  • blood pressure of adult humans is supposed to be normally distributed, but only after separating males and females into different populations (each of which is normally distributed)
  • The length of inert appendages such as hair, nails, teeth, claws and shells is expected to be normally distributed if measured in the direction of growth. This is because the growth of inert appendages depends on the size of the root, and not on the length of the appendage, and so proceeds by additive increments. Hence, we have an example of a sum of very many small increments (possibly lognormal) approaching a normal distribution. Another plausible example is the width of tree trunks, where a new thin ring is produced every year whose width is affected by a large number of factors.

Financial variables

Because of the exponential nature of interest and inflation, financial indicators such as interest rates, stock values, or commodity prices make good examples of multiplicative behavior. As such, they should not be expected to be normal, but lognormal. In finance, interest has three general definitions. ... An interest rate is the rental price of money. ... A stock, also referred to as a share, is commonly a share of ownership in a corporation. ... The word commodity is a term with distinct meanings in business and in Marxist political economy. ... In economics and business, the price is the assigned numerical monetary value of a good, service or asset. ...


Benoît Mandelbrot, the popularizer of fractals, has claimed that even the assumption of lognormality is flawed, and advocates the use of log-Levy distributions. Benoît Mandelbrot was the first to use a computer to plot the Mandelbrot set. ... A fractal is a geometric object which can be divided into parts, each of which is similar to the original object. ... A set of four symmetric centered Lévy distributions with scale factor c=1. ...


It is accepted that financial indicators deviate from lognormality. The distribution of price changes on short time scales is observed to have "heavy tails", so that very small or very large price changes are more likely to occur than a lognormal model would predict. Deviation from lognormality indicates that the assumption of independence of the multiplicative influences is flawed.


Lifetime

Other examples of variables that are not normally distributed include the lifetimes of humans or mechanical devices. Examples of distributions used in this connection are the exponential distribution (memoryless) and the Weibull distribution. In general, there is no reason that waiting times should be normal, since they are not directly related to any kind of additive influence. In probability theory and statistics, the exponential distributions are a class of continuous probability distribution. ... In probability theory and statistics, the Weibull distribution (named after Waloddi Weibull) is a continuous probability distribution with the probability density function where and is the shape parameter and is the scale parameter of the distribution. ...


Test scores

A great deal of confusion exists over whether or not IQ test scores and intelligence are normally distributed. IQ tests are designed to be approximately normally distributed, which results in a bell curve graph of IQ score frequency. ...


As a deliberate result of test construction, IQ scores are always and obviously normally distributed for the majority of the population. Whether intelligence is normally distributed is less clear. The difficulty and number of questions on an IQ test is decided based on which combinations will yield a normal distribution. This does not mean, however, that the information is in any way being misrepresented, or that there is any kind of "true" distribution that is being artificially forced into the shape of a normal curve. Intelligence tests can be constructed to yield any kind of score distribution desired. All true IQ tests have a normal distribution of scores as a result of test design; otherwise IQ scores would be meaningless without knowing what test produced them. Intelligence tests in general, however, can produce any kind of distribution. Intelligence is usually said to involve mental capabilities such as the ability to reason, plan, solve problems, think abstractly, comprehend ideas and language, and learn. ...


For an example of how arbitrary the distribution of intelligence test scores really is, imagine a 20-item multiple-choice test entirely composed of problems that consist mostly of finding the areas of circles. Such a test, if given to a population of high-school students, would likely yield a U-shaped distribution, with the bulk of the scores being very high or very low, instead of a normal curve. If a student understands how to find the area of a circle, he can likely do so repeatedly and with few errors, and thus would get a perfect or high score on the test, whereas a student who has never had geometry lessons would likely get every question wrong, possibly with a few right due to guessing luck. If a test is composed mostly of easy questions, then most of the test-takers will have high scores and very few will have low scores. If a test is composed entirely of questions so easy or so hard that every person gets either a perfect score or a zero, it fails to make any kind of statistical discrimination at all and yields a rectangular distribution. These are just a few examples of the many varieties of distributions that could theoretically be produced by carefully designing intelligence tests.


Whether intelligence itself is normally distributed has been at times a matter of some debate. Some critics maintain that the choice of a normal distribution is entirely arbitrary. Brian Simon once claimed that the normal distribution was specifically chosen by psychometricians to falsely support the idea that superior intelligence is only held by a small minority, thus legitimizing the rule of a privileged elite over the masses of society. Historically, though, intelligence tests were designed without any concern for producing a normal distribution, and scores came out approximately normally distributed anyway. American educational psychologist Arthur Jensen claims that any test that contains "a large number of items," "a wide range of item difficulties," "a variety of content or forms," and "items that have a significant correlation with the sum of all other scores" will inevitably produce a normal distribution. Furthermore, there exists a number of correlations between IQ scores and other human characteristics that are more provably normally distributed, such as nerve conduction velocity and the glucose metabolism rate of a person's brain, supporting the idea that intelligence is normally distributed. For information regarding the parapsychology phenomenon of distance knowledge, see psychometry. ... Arthur Jensen is an American educational psychologist, born August 24, 1923 and educated at the University of California, Berkeley (B.A. 1945), San Diego State College (M.A., 1952) and Columbia University (Ph. ...


Some critics, such as Stephen Jay Gould in his book The Mismeasure of Man, question the validity of intelligence tests in general, not just the fact that intelligence is normally distributed. For further discussion see the article IQ. Stephen Jay Gould Stephen Jay Gould (September 10, 1941 – May 20, 2002) was a New York-born American paleontologist, evolutionary biologist, and historian of science. ... The Mismeasure of Man by Stephen Jay Gould (ISBN 0393039722) is a controversial book critiquing what he saw as scientific racism, starting with ideas such as craniometry and the eugenics movement and concluding with more recent developments in the study of race and intelligence. ... IQ redirects here; for other uses of that term, see IQ (disambiguation). ...


The Bell Curve is a controversial book on the topic of the heritability of intelligence. However, despite its title, the book does not primarily address whether IQ is normally distributed. The Bell Curve is a controversial book published in 1994 by Richard J. Herrnstein and Charles Murray exploring the role of intelligence in understanding social problems in America. ... Heritability, as used professionally in genetics, has a very precise definition. ...


Estimation of parameters

Maximum likelihood estimation of parameters

Suppose

are independent and identically distributed, and are normally distributed with expectation μ and variance σ2. In the language of statisticians, the observed values of these random variables make up a "sample from a normally distributed population." It is desired to estimate the "population mean" μ and the "population standard deviation" σ, based on observed values of this sample. The joint probability density function of these random variables is

(Nota bene: Here the proportionality symbol means proportional as a function of μ and σ, not proportional as a function of . That may be considered one of the differences between the statistician's point of view and the probabilist's point of view. The reason why this is important will appear below.)


As a function of μ and σ this is the likelihood function In statistics, a likelihood function is a conditional probability function considered a function of its second argument with its first argument held fixed, thus: and also any other function proportional to such a function. ...

In the method of maximum likelihood, the values of μ and σ that maximize the likelihood function are taken to be estimates of the population parameters μ and σ. Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set. ...


Usually in maximizing a function of two variables one might consider partial derivatives. But here we will exploit the fact that the value of μ that maximizes the likelihood function with σ fixed does not depend on σ. Therefore, we can find that value of μ, then substitute it from μ in the likelihood function, and finally find the value of σ that maximizes the resulting expression. In mathematics, a partial derivative of a function of several variables is its derivative with respect to one of those variables with the others held constant. ...


It is evident that the likelihood function is a decreasing function of the sum

So we want the value of μ that minimizes this sum. Let

be the "sample mean". Observe that

Only the last term depends on μ and it is minimized by

That is the maximum-likelihood estimate of μ. Substituting that for μ in the sum above makes the last term vanish. Consequently, when we substitute that estimate for μ in the likelihood function, we get

It is conventional to denote the "loglikelihood function", i.e., the logarithm of the likelihood function, by a lower-case , and we have

and then

This derivative is positive, zero, or negative according as σ2 is between 0 and

or equal to that quantity, or greater than that quantity.


Consequently this average of squares of residuals is maximum-likelihood estimate of σ2, and its square root is the maximum-likelihood estimate of σ. In statistics, the concepts of error and residual are easily confused with each other. ...


Surprising generalization

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle and elegant. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1×1 matrix than as a mere scalar. See estimation of covariance matrices. In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution) is a specific probability distribution. ... In mathematics, particularly linear algebra and functional analysis, the spectral theorem is a collection of results about linear operators or about matrices. ... Scalar is a concept that has meaning in mathematics, physics, and computing. ... In multivariate statistics, the importance of the Wishart distribution stems in part from the fact that it is the probability distribution of the maximum likelihood estimator of the covariance matrix of a multivariate normal distribution. ...


Unbiased estimation of parameters

The maximum likelihood estimator of the population mean μ from a sample is an unbiased estimator of the mean, as is the variance when the mean of the population is known a priori. However, if we are faced with a sample and have no knowledge of the mean or the variance of the population from which it is drawn, the unbiased estimator of the variance σ2 is: In statistics, a biased estimator is one that for some reason on average over- or underestimates what is being estimated. ...

See also

In probability theory, it is almost a cliche to say that uncorrelatedness of two random variables does not entail independence. ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution) is a specific probability distribution. ... In statistics, the Kolmogorov-Smirnov test is used to determine whether two underlying probability distributions differ from each other or whether an underlying probability distribution differs from a hypothesized distribution, in either case based in finite samples. ... In probability and statistics, the log-normal distribution is the probability distribution of any random variable whose logarithm is normally distributed (the base of the logarithmic function is immaterial in that loga X is normally distributed if and only if logb X is normally distributed). ... In probability theory and statistics, a multivariate normal distribution, also sometimes called a multivariate Gaussian distribution (in honor of Carl Friedrich Gauss, who was not the first to write about the normal distribution) is a specific probability distribution. ...

References

March 20 is the 79th day of the year in the Gregorian Calendar (80th in Leap years). ... 2005 is a common year starting on Saturday of the Gregorian calendar and is the current year. ... Abraham de Moivre (May 26, 1667 - November 27, 1754), was a French mathematician famous for de Moivres formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. ... Events January 1 - Bouvet Island is discovered by French explorer Jean-Baptiste Charles Bouvet de Lozier. ... The Doctrine of Chances is a book on probability theory by 18th-century French mathematician Abraham de Moivre, published in 1733. ... Stephen Jay Gould Stephen Jay Gould (September 10, 1941 – May 20, 2002) was a New York-born American paleontologist, evolutionary biologist, and historian of science. ... 1981 is a common year starting on Thursday of the Gregorian calendar. ... The Mismeasure of Man by Stephen Jay Gould (ISBN 0393039722) is a controversial book critiquing what he saw as scientific racism, starting with ideas such as craniometry and the eugenics movement and concluding with more recent developments in the study of race and intelligence. ... Richard Herrnstein (1930-1994) was a prominent researcher in comparative psychology who did pioneering work on pigeon intelligence employing the Experimental Analysis of Behavior. ... Charles Murray is the name of several notable people: Charles Murray, the Libertarian and author of The Bell Curve. ... 1994 was a common year starting on Saturday of the Gregorian calendar, and was designated the International year of the Family. ... The Bell Curve is a controversial book published in 1994 by Richard J. Herrnstein and Charles Murray exploring the role of intelligence in understanding social problems in America. ... In the modern age, the free press has taken on multiple meanings. ... Pierre-Simon Laplace Pierre-Simon, Marquis de Laplace (March 23, 1749 – March 5, 1827) was a French mathematician and astronomer who put the final capstone on mathematical astronomy by summarizing and extending the work of his predecessors in his five volume Mécanique Céleste (Celestial Mechanics) (1799-1825). ... 1812 was a leap year starting on Wednesday (see link for calendar). ... March 20 is the 79th day of the year in the Gregorian Calendar (80th in Leap years). ... 2005 is a common year starting on Saturday of the Gregorian calendar and is the current year. ... 1999 is a common year starting on Friday of the Common Era, and was designated the International Year of Older Persons by the United Nations. ... Dr. Eric W. Weisstein Encyclopedist Dr. Eric W. Weisstein (born March 18, 1969, in Bloomington, Indiana) is a noted encyclopedist in several technical areas of science and mathematics. ... MathWorld is an online mathematics reference work, sponsored by Wolfram Research Inc. ... March 20 is the 79th day of the year in the Gregorian Calendar (80th in Leap years). ... 2005 is a common year starting on Saturday of the Gregorian calendar and is the current year. ... 1964 was a leap year starting on Wednesday (link will take you to calendar). ... Abramowitz and Stegun is the informal moniker of a mathematical reference work edited by Milton Abramowitz and Irene Stegun of the U.S. National Bureau of Standards. ... Milton Abramowitz was a mathematician who, with Irene Stegun, wrote the classic mathematics textbook Abramowitz and Stegun. ... As a non-regulatory agency of the United States Department of Commerce’s Technology Administration, the National Institute of Standards (NIST) develops and promotes measurement, standards, and technology to enhance productivity, facilitate trade, and improve the quality of life. ...

External links


  Results from FactBites:
 
WHO | Executive summary (2148 words)
Like the WHO standards, construction of the CDC 2000 growth charts was also based on the LMS method and, therefore, differences between this reference and the WHO standards are largely a reflection of differences in the populations on which the two sets of curves were based.
The upper limit for the weight-forheight standards was influenced by the need to accommodate the tallest children at age 60 months, that is, 120 cm is approximately +2 SD boys' height-for-age at 60 months.
The overlap between the upper end of the weight-for-length standards and the lower end of the weight-for-height standards is intended to facilitate their application in severely undernourished populations and emergency settings.
Adequate Radiological Safety for Hanford Co-Located Workers (8288 words)
The standards recommended by the Contractor in their RESW are also required to comply with all applicable laws and regulations and provide for an adequate level of safety.
Normal events are typical of normal facility operations expected to occur regularly in the course of facility operations; the associated frequency of occurrence during the lifetime of the facility is 1 or more per year.
Standards for a CLW tailored by BNFL Inc., that were approved by DOE are presented in Revision 1 of Volume II, Table 2-1, Radiological Exposure Standards Above Normal Background, of their Safety Requirements Document regulatory submittal.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.