|
In statistics, the term bias is used for two different concepts. A biased sample is a statistical sample in which members of the statistical population are not equally likely to be chosen. A biased estimator is one that for some reason on average over- or underestimates the quantity that is being estimated. Statistics is a broad mathematical discipline which studies ways to collect, summarize and draw conclusions from data. ...
A sample is that part of a population which is actually observed. ...
In statistics, a statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. ...
In statistics, an estimator is a function of the known data that is used to estimate an unknown parameter; an estimate is the result from the actual application of the function to a particular set of data. ...
While the term bias sounds pejorative, it is not necessarily used in that way in statistics. A biased sample can be difficult to analyze or may lead to inaccurate or wrong inference if the bias is ignored, but biased estimators may have desirable properties, such as small variance, depending on the situation.
Biased sample
A sample is biased if some members of the population are more likely to be chosen in the sample than others. A biased sample will generally give you a misestimate of the quantity being estimated. For example, if your sample contains members with a higher or lower value of the quantity being estimated, the outcome will be higher or lower than the true value. A sample is that part of a population which is actually observed. ...
A famous case of what can go wrong when using a biased sample is found in the 1936 US presidential election polls. The Literary Digest held a poll that forecast that Alfred M. Landon would defeat Franklin Delano Roosevelt by 57% to 43%. George Gallup, using a much smaller sample (300,000 rather than 2,000,000), predicted Roosevelt would win, and he was right. What went wrong with the Literary Digest poll? They had used lists of telephone and automobile owners to select their sample. In those days, these were luxuries, so their sample consisted mainly of middle- and upper-class citizens. These voted in majority for Landon, but the lower classes voted for Roosevelt. Because their sample was biased towards wealthier citizens, their result was incorrect. Presidential electoral votes by state. ...
Alfred M. Landon Alfred Mossman Alf Landon (September 9, 1887 â October 12, 1987) was an American Republican politician from Kansas, notable nationally for his 1936 nomination as the Republican opponent of Franklin D. Roosevelt. ...
Franklin Delano Roosevelt (January 30, 1882–April 12, 1945), 32nd President of the United States, the longest-serving holder of the office and the only man to be elected President more than twice, was one of the central figures of 20th century history. ...
George Horace Gallup (November 18, 1901 - July 26, 1984), American statistician, invented the Gallup poll, a successful statistical method of survey sampling for measuring public opinion. ...
This kind of bias is usually regarded as a worse problem than statistical noise: Problems with statistical noise can be lessened by enlarging the sample, but a biased sample will not go away that easily. In particular, a meta-analysis will distill good data for studies that themselves suffer from statistical noise, but a meta-analysis of biased studies will be biased itself. Statistical noise is the colloquial term for recognized amounts of variation in a sample. ...
A meta-analysis is a statistical practice of combining the results of a number of studies. ...
Biased estimator Another kind of bias in statistics does not involve biased samples, but does involve the use of a statistic whose average value differs from the value of the quantity being estimated. Suppose we are trying to estimate the parameter θ using an estimator (that is, some function of the observed data). Then the bias of is defined to be In statistics, an estimator is a function of the known data that is used to estimate an unknown parameter; an estimate is the result from the actual application of the function to a particular set of data. ...
In words, this would be "the expected value of the estimator minus the true value θ". This may be rewritten as which would read "the expected value of the difference between the estimator and the true value" (the expected value of θ is θ). For example, suppose X1, ..., Xn are independent and identically distributed random variables with expectation μ and variance σ2. Let be the "sample average", and let be a "sample variance". Then S2 is a "biased estimator" of σ2 because But if the sample comes from a normally distributed population, then this biased estimator is, by the commonly used criterion of "mean squared error", actually better (but only very slightly) than the unbiased estimator that results from putting n − 1 in the denominator where n appears in the definition of S2 above. Even then the square root of the unbiased estimator of the population variance is not an unbiased estimator of the population standard deviation; for a non-linear function f and an unbiased estimator U of a parameter p, f(U) is usually not an unbiased estimator of f(p). The normal distribution, also called Gaussian distribution, is an extremely important probability distribution in many fields. ...
In mathematics, the principal square root of a non-negative real number is denoted and represents the non-negative real number whose square (the result of multiplying the number by itself) is For example, since This example suggests how square roots can arise when solving quadratic equations such as or...
In probability theory and statistics, the variance of a random variable is a measure of its statistical dispersion, indicating how far from the expected value its values typically are. ...
In probability and statistics, the standard deviation is the most commonly used measure of statistical dispersion. ...
A far more extreme case of a biased estimator being better than any unbiased estimator is well-known: Suppose X has a Poisson distribution with expectation λ. It is desired to estimate In probability theory and statistics, the Poisson distribution is a discrete probability distribution. ...
The only function of the data constituting an unbiased estimator is If the observed value of X is 100, then the estimate is 1, although the true value of the quantity being estimated is obviously very likely to be near 0, which is the opposite extreme. And if X is observed to be 101, then the estimate is even more absurd: it is −1, although the quantity being estimated obviously must be positive. The (biased) maximum-likelihood estimator Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution of a given data set. ...
is better than this unbiased estimator in the sense that the mean squared error In statistics the mean squared error of an estimator T of an unobservable parameter θ is i. ...
is smaller. Compare the unbiased estimator's MSE of - 1 − e − 4λ
The MSE is a function of the true value λ. The bias of the maximum-likelihood estimator is: - .
The bias of maximum-likelihood estimators can be substantial. Consider a case where n tickets numbered from 1 through to n are placed in a box and one is selected at random, giving a value X. If n is unknown, then the maximum-likelihood estimator of n is X, even though the expectation of X is only n/2; we can only be certain that n is at least X and is probably more. In this case, the natural unbiased estimator is 2X − 1.
See also Confirmation bias is a type of statistical bias describing the tendency to search for or interpret information in a way that confirms ones preconceptions. ...
Publication bias, also called the positive outcome bias, is typically the tendency for researchers to publish experimental results that have a positive result (found something), while consequently not publishing findings which have a negative result (found that something did not happen). ...
Selection bias is the error of distorting a statistical analysis by pre- or post-selecting the samples. ...
Taken generally, recall bias occurs when the way a survey respondent answers a question is affected not just by the correct answer, but also by the respondents memory. ...
Response bias can affect survey results if respondents answer questions in the way they think the questioner wants them to answer rather than according to their true beliefs. ...
External link |