FACTOID # 165: Bolivia has 4,500 Navy personnel - which seems like quite a lot for a landlocked country.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Sample size

The sample size of a statistical sample is the number of repeated measurements that constitute it. It is typically denoted n, and is a non-negative integer (natural number). A sample is that part of a population which is actually observed. ... The integers are commonly denoted by the above symbol. ... In mathematics, a natural number can mean either an element of the set {1, 2, 3, ...} (i. ...


Typically, different sample sizes lead to different PENIS accuracies of measurement. This can be seen in such statistical rules as the law of large numbers and the central limit theorem. All else being equal, a larger sample size n leads to increased precision in estimates of various properties of the population. // The law of large numbers (LLN) is any of several theorems in probability. ... A central limit theorem is any of a set of weak-convergence results in probability theory. ... In Wikipedia, precision has the following meanings: In engineering, science, industry and statistics, precision characterises the degree of mutual agreement among a series of individual measurements, values, or results - see accuracy and precision. ...


A typical example would be when a statistician wishes to estimate the arithmetic mean of a continuous random variable (for example, the height of a person). Assuming that they have a random sample with independent observations, then if the variability of the population (as measured by the standard deviation σ) is known, then the standard error of the sample mean is given by the formula: Statisticians are mathematicians who work with theoretical and applied statistics in the both the private and public sectors. ... In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ... Random redirects here. ... A sample is that part of a population which is actually observed. ... In probability and statistics, the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of the spread of its values. ... Standard error can refer to: In statistics, an expression of the uncertainty in a value - see standard error (statistics). ...

sigma/sqrt{n}

It is easy to show that as n becomes large, this variability becomes very small. This yields to more sensitive hypothesis tests with greater Statistical power and smaller confidence intervals. One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. ... The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). ... In statistics, a confidence interval (CI) for a population parameter is an interval with an associated probability p that is generated from a random sample of an underlying population such that if the sampling was repeated numerous times and the confidence interval recalculated from each sample according to the same...


With more complicated sampling techniques, such as Stratified sampling, the sample can often be split up into sub-samples. Typically, if there are k such sub-samples (from k different strata) then each of them will have a sample size ni, i = 1, 2, ..., k. These ni must conform to the rule that n1 + n2 + ... + nk = n (i.e. that the total sample size is given by the sum of the sub-sample sizes). Selecting these ni optimally can be done in various ways, using (for example) Neyman's optimal allocation. In statistics, stratified sampling is a method of sampling from a population. ...

Contents

Further examples

Central limit theorem

The central limit theorem is a significant result which depends on sample size. A central limit theorem is any of a set of weak-convergence results in probability theory. ...


Estimating proportions

Another typical statement is to say that one can be 95% sure the true value of a parameter is within +or- B of the estimate, where B is an error bound that decreases with increasing n. Such an estimate is referred to as the confidence interval for that parameter. The factual accuracy of this article is disputed. ... Estimation is approximate or uncertain calculation of a result, often based on approximate, uncertain, incomplete, or noisy inputs. ... Estimation is approximate or uncertain calculation of a result, often based on approximate, uncertain, incomplete, or noisy inputs. ... In this diagram, the bars represent observation means and the red lines represent the confidence intervals surrounding them. ... The factual accuracy of this article is disputed. ...


For example, a simple situation is estimating a proportion in a population. To do so, a statistician will estimate the bounds of a 95% confidence interval for an unknown proportion. In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio. ... In this diagram, the bars represent observation means and the red lines represent the confidence intervals surrounding them. ... In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio. ...


The rule of thumb for (a maximum or 'conservative') B for a proportion derives from the fact the estimator of a proportion,  hat p = X/n, (where X is the number of 'positive' observations) has a (scaled) binomial distribution and is also a form of sample mean (from a Bernoulli distribution [0,1] which has a maximum variance of 0.25 for parameter p = 0.5). So, the sample mean X/n has maximum variance 0.25/n. For sufficiently large n (usually this means that we need to have observed at least 10 positive and 10 negative responses), this distribution will be closely approximated by a normal distribution with the same mean and variance. A rule of thumb is an easily learned and easily applied procedure for approximately calculating or recalling some value, or for making some determination. ... In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio. ... In statistics, an estimator is a function of the observable sample data that is used to estimate an unknown population parameter; an estimate is the result from the actual application of the function to a particular set of data. ... In mathematics, two quantities are called proportional if they vary in such a way that one of the quantities is a constant multiple of the other, or equivalently if they have a constant ratio. ... In probability theory and statistics, the binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p. ... A sample is that part of a population which is actually observed. ... In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ... In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jakob Bernoulli, is a discrete probability distribution, which takes value 1 with success probability and value 0 with failure probability . ... In probability theory and statistics, the variance of a random variable (or somewhat more precisely, of a probability distribution) is a measure of its statistical dispersion, indicating how its possible values are spread around the expected value. ... The factual accuracy of this article is disputed. ... The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ...


Using this approximation, it can be shown that ~95% of this distribution's probability lies within 2 standard deviations of the mean. Because of this, an interval of the form

(hat p -2sqrt{0.25/n}, hat p +2sqrt{0.25/n})=(hat p -B, hat p+B)

will form a 95% confidence interval for the true proportion.


If we require the sampling error ε to be no larger than some bound B, we can solve the equation In statistics, when analyzing collected data, the samples observed differ in such things as means and standard deviations from the population from which the sample is taken. ...

varepsilon approx B=2sqrt{0.25/n}=1/sqrt{n}

to give us

So, n = 100 <=> B = 10%, n = 400 <=> B = 5%, n = 1000 <=> B = ~3%, and n = 10000 <=> B = 1%. One sees these numbers quoted often in news reports of opinion polls and other sample surveys. An opinion poll is a survey of opinion from a particular sample. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ...


Extension to other cases

In general, if a population mean is estimated using the sample mean from n observations from a distribution with variance σ2, then if n is large enough (typically >30) the central limit theorem can be applied to obtain an approximate 95% confidence interval of the form In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ... A sample is that part of a population which is actually observed. ... In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ... A central limit theorem is any of a set of weak-convergence results in probability theory. ...

(bar x - B,bar x + B), B=2sigma/sqrt{n}

If the sampling error ε is required to be no larger than bound B, as above, then In statistics, when analyzing collected data, the samples observed differ in such things as means and standard deviations from the population from which the sample is taken. ...

4sigma^2/varepsilon^2 approx 4sigma^2/B^2=n

Note, if the mean is to be estimated using P parameters that must first be estimated themselves from the same sample, then to preserve sufficient "degrees of freedom," the sample size should be at least n + P. In statistics, mean has two related meanings: the arithmetic mean (and is distinguished from the geometric mean or harmonic mean). ... Estimation is approximate or uncertain calculation of a result, often based on approximate, uncertain, incomplete, or noisy inputs. ... The factual accuracy of this article is disputed. ... A sample is that part of a population which is actually observed. ... This article or section is in need of attention from an expert on the subject. ... A sample is that part of a population which is actually observed. ...


Required sample sizes for hypothesis tests

A common problem facing statisticians is calculating the sample size required to yield a certain power for a test, given a predetermined Type I error rate α. A typical example for this is as follows: The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). ... In statistical hypothesis testing, a Type I error consists of rejecting a null hypothesis that is true, in other words finding a result to have statistical significance when this has in fact happened by chance. ...


Let X i , i = 1, 2, ..., n be independent observations taken from a normal distribution with mean μ and variance σ2 . Let us consider two hypotheses, a null hypothesis: The normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. ... In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...

H0:μ = 0

and an alternative hypothesis:

Ha:μ = μ *

for some 'smallest significant difference' μ* >0. This is the smallest value for which we care about observing a difference. Now, if we wish to (1) reject H0 with a probability of at least 1-β when Ha is true (i.e. a power of 1-β), and (2) reject H0 with probability α when H0 is true, then we need the following: The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). ...


If zα is the upper α percentage point of the standard normal distribution, then

 Pr(bar x >z_{alpha}sigma/sqrt{n}|H_0 text{ true})=alpha

and so

'Reject H0 if our sample average (bar x) is more than z_{alpha}sigma/sqrt{n}

is a decision rule which satisfies (2). (Note, this is a 1-tailed test)


Now we wish for this to happen with a probability at least 1-β when Ha is true. In this case, our sample average will come from a Normal distribution with mean μ*. Therefore we require

 Pr(bar x >z_{alpha}sigma/sqrt{n}|H_a text{ true})geq 1-beta

Through careful manipulation, this can be shown to happen when

 n geq left(frac{Phi^{-1}(1-beta)+z_{alpha}}{mu/sigma}right)^2

where Φ is the normal cumulative distribution function. In probability theory, the cumulative distribution function (abbreviated cdf) completely describes the probability distribution of a real-valued random variable, X. For every real number x, the cdf is given by where the right-hand side represents the probability that the random variable X takes on a value less than...


See also

The first statistician to consider a methodology for the design of experiments was Sir Ronald A. Fisher. ... Sampling is that part of statistical practice concerned with the selection of individual observations intended to yield some knowledge about a population of concern, especially for the purposes of statistical inference. ... The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). ... In statistics, stratified sampling is a method of sampling from a population. ...

External links

  • NIST: Selecting Sample Sizes
  • Raven Analytics: Sample Size Calculations

  Results from FactBites:
 
Determining Sample Size (2171 words)
In addition to the purpose of the study and population size, three criteria usually will need to be specified to determine the appropriate sample size: the level of precision, the level of confidence or risk, and the degree of variability in the attributes being measured (Miaoulis and Michener, 1976).
The fourth approach to determining sample size is the application of one of several formulas (Equation 5 was used to calculate the sample sizes in Table 1 and Table 2).
Where n is the sample size, N is the population size, and e is the level of precision.
  More results at FactBites »


 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments
Please enter the 5-letter protection code

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.