FACTOID # 50: Libya is the only country with a single-coloured flag.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

FACTS & STATISTICS    Simple view

  1. Select countries to view: (hold down Control key and click to select several)

     

     

    Compare:

     

     

  1. Select fact or statistic: (* = graphable)

     

     

     

  2. (OPTIONAL) Compare to statistic: (both need to be graphable)

     

     

     

  3. View result as:

     

       
(OR) SEARCH ALL encyclopedia, stats & forums:   

Encyclopedia > Algorithms for calculating variance

Algorithms for calculating variance play a minor role in statistical computing. A key problem in the design of good algorithms for this problem is that formulas for the variance may involve sums of squares, which can lead to numerical instability as well as to arithmetic overflow when dealing with large values. Flowcharts are often used to represent algorithms. ... In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ... A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ... The term arithmetic overflow or simply overflow has the following meanings. ...

Contents


Algorithm I

A formula for calculating the variance of a population of size n is: In mathematics and in the sciences, a formula is a concise way of expressing information symbolically (as in a mathematical or chemical formula), or a general relationship between quantities. ...

sigma^2 = frac {sum_{i=1}^{n} x_i^2 - (sum_{i=1}^{n} x_i)^2/n}{n}. !

A formula for calculating an unbiased estimate of the population variance from a finite sample of n observations is: In statistics, the term bias is used for two different concepts. ... A sample is that part of a population which is actually observed. ...

s^2 = frac {sum_{i=1}^{n} x_i^2 - (sum_{i=1}^{n} x_i)^2/n}{n-1}. !

Therefore a naive algorithm to calculate the estimated variance is given by the following pseudocode: Pseudocode (i. ...

 long n = 0 double sum = 0 double sum_sqr = 0 foreach x in data: n += 1 sum += x sum_sqr += x * x end for double mean = sum / n double variance = (sum_sqr - sum * mean) / (n - 1) 

This algorithm can easily be adapted to compute the variance of a finite population: simply divide by n instead of n − 1 on the last line.


Because sum_sqr and sum * mean are very similar numbers, the precision of the result can be much less than the inherent precision of the floating-point arithmetic used to perform the computation. This is particularly bad if the variance is small relative to the sum of the numbers. The precision of a measurement or value describes the number of digits that are used to express that value. ... A floating-point number is a digital representation for a number in a certain subset of the rational numbers, and is often used to approximate an arbitrary real number on a computer. ...


Algorithm II

The following formulas can be used to update the mean and (estimated) variance of the sequence, for an additional element xnew. Here, m denotes the estimate of the population mean, s2 the estimate of the population variance, and n the number of elements in the sequence before the addition. In statistics, mean has two related meanings: the average in ordinary English, which is more correctly called the arithmetic mean, to distinguish it from geometric mean or harmonic mean. ...

m_{mathrm{new}} = frac{n ; m_{mathrm{old}} + x_{mathrm{new}}}{n+1} = m_{mathrm{old}} + frac{x_{mathrm{new}} - m_{mathrm{old}}}{n+1} !
s^2_{mathrm{new}} = frac{(n-1) ; s^2_{mathrm{old}} + (x_{mathrm{new}} - m_{mathrm{new}}) , (x_{mathrm{new}} - m_{mathrm{old}})}{n} !

A numerically stable algorithm is given below. It also computes the mean. This algorithm is due to Knuth[1], who cites Welford[2].

 long n = 0 double mean = 0 double S = 0 foreach x in data: n += 1 double delta = x - mean mean += delta / n S += delta * (x - mean) // This expression uses the new value of mean end for double variance = S / (n - 1) 

As above, the population variance can be computed instead of the sample variance by changing the last division.


This algorithm is much less prone to loss of precision due to massive cancellation. For a particularly robust two-pass algorithm for computing the variance, first compute and subtract an estimate of the mean, and then use this algorithm on the residuals.


Example

Assume that all floating point operations use the standard IEEE 754 double-precision arithmetic. Consider the sample (4, 7, 13, 16) from an infinite population. Based on this sample, the estimated population mean is 10, and the estimated population variance is 30. Both algorithms compute these values correctly. Next consider the sample (108 + 4,108 + 7,108 + 13,108 + 16), which gives rise to the same estimated variance as the first sample. Algorithm II computes this variance estimate correctly, but Algorithm I returns 29.333333333333332 instead of 30. While this loss of precision may be tolerable and viewed as a minor flaw of Algorithm I, it is easy to find data that reveal a major flaw in the naive algorithm: Take the sample to be (109 + 4,109 + 7,109 + 13,109 + 16). Again the estimated population variance of 30 is computed correctly by Algorithm II, but the naive algorithm now computes it as −170.66666666666666. This is a serious problem with Algorithm I, since the variance can, by definition, never be negative. The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations. ...


References

  1. Donald E. Knuth (1998). The Art of Computer Programming, volume 2: Seminumerical Algorithms, 3rd edn., p. 232. Boston: Addison-Wesley.
  2. B. P. Welford (1962). "Note on a method for calculating corrected sums of squares and products". Technometrics 4(3):419–420.

Donald Knuth Donald Ervin Knuth (born January 10, 1938) is a renowned computer scientist and Professor Emeritus at Stanford University. ... Cover of books The Art of Computer Programming is a comprehensive monograph written by Donald Knuth which covers many kinds of programming algorithms and their analysis. ...

External links



 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.