|
In statistics and optimization, the concepts of error and residual are easily confused with each other. A graph of a Normal bell curve showing statistics used in educational assessment and comparing various grading methods. ...
In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set. ...
Error is a misnomer; an error is the amount by which an observation differs from its expected value; the latter being based on the whole population from whii ch the statistical unit was chosen randomly. The expected value, being the average of the entire population, is typically unobservable. If the average height in a population of 21-year-old men is 5 feet 9 inches, and one randomly chosen man is 5 feet 11 inches tall, then the "error" is 2 inches; if the randomly chosen man is 5 feet 7 inches tall, then the "error" is −2 inches. The nomenclature arose from random measurement errors in astronomy. It is as if the measurement of the man's height were an attempt to measure the population average, so that any difference between the man's height and the average would be a measurement error. In probability theory the expected value (or mathematical expectation) of a random variable is the sum of the probability of each possible outcome of the experiment multiplied by its payoff (value). Thus, it represents the average amount one expects as the outcome of the random trial when identical odds are...
Observational error is the difference between a measured value of quantity and its true value. ...
A giant Hubble mosaic of the Crab Nebula, a supernova remnant Astronomy is the science of celestial objects (such as stars, planets, comets, and galaxies) and phenomena that originate outside the Earths atmosphere (such as auroras and cosmic background radiation). ...
A residual, on the other hand, is an observable estimate of the unobservable error. The simplest case involves a random sample of n men whose heights are measured. The sample average is used as an estimate of the population average. Then we have: A sample is that part of a population which is actually observed. ...
- The difference between the height of each man in the sample and the unobservable population average is an error, and
- The difference between the height of each man in the sample and the observable sample average is a residual.
- Residuals are observable; errors are not.
Note that the sum of the residuals within a random sample is necessarily zero, and thus the residuals are necessarily not independent. The sum of the errors need not be zero; the errors are independent random variables if the individuals are chosen from the population independently. A random variable is a mathematical function that maps outcomes of random experiments to numbers. ...
- Errors are often independent of each other; residuals are not independent of each other (at least in the simple situation described above, and in many others).
An example, with some of the mathematical theory If we assume a normally distributed population with mean μ and standard deviation σ, and choose individuals independently, then we have The normal distribution, also called Gaussian distribution by scientists (named after Carl Friedrich Gauss due to his rigorous application of the distribution to astronomical data (Havil, 2003)) is a probability distribution of great importance in many fields. ...
In probability and statistics, the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of the spread of its values. ...
 and the sample mean In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ...
 is a random variable distributed thus:  The errors are then  whereas the residuals are  (As is often done, the "hat" over the letter ε indicates an observable estimate of an unobservable quantity called ε.) The sum of squares of the errors, divided by σ2, has a chi-square distribution with n degrees of freedom: In probability theory and statistics, the chi-square distribution (also chi-squared or Ï2 distribution) is one of the theoretical probability distributions most widely used in inferential statistics, i. ...
 This quantity, however, is not observable. The sum of squares of the residuals, on the other hand, is observable. The quotient of that sum by σ2 has a chi-square distribution with only n − 1 degrees of freedom:  It is remarkable that the sum of squares of the residuals and the sample mean can be shown to be independent of each other. That fact and the normal and chi-square distributions given above form the basis of confidence interval calculations relying on Student's t-distribution. In those calculations one encounters the quotient // The definition of variance is either the expected value (when considering a theoretical distribution), or average (for actual experimental data) of squared deviations from the mean. ...
In statistics, a confidence interval (CI) for a population parameter is an interval between two numbers with an associated probability p which is generated from a random sample of an underlying population, such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according...
In probability and statistics, the t-distribution or Students t-distribution is a probability distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small. ...
 in which the σ appears in both the numerator and the denominator and cancels. That is fortunate because in practice one would not know the value of σ2.
References - Residuals and Influence in Regression, R. Dennis Cook, New York : Chapman and Hall, 1982.
Chapman and Hall was a British publishing house, founded in the first half of the 19th century by Edward Chapman and William Hall. ...
See also In statistics, a Studentized residual, named in honor of William Sealey Gosset, who wrote under the pseudonym Student, is a residual adjusted by dividing it by an estimate of its standard deviation. ...
In statistics, when analyzing collected data, the samples observed differ in such things as means and standard deviations from the population from which the sample is taken. ...
The top portion of this graphic depicts probability densities (for a binomial distribution) that show the relative likelihood that the true percentage is in a particular area given a reported percentage of 50%. The bottom portion of this graphic shows the margin of error, the corresponding zone of 95% confidence. ...
In statistics, propagation of uncertainty (or propagation of error) is the effect of variables uncertainties (or errors) on the uncertainty of a function based on them. ...
External links - VIAS Science Cartoons Residuals from the humorous perspective.
|