| | This article does not cite any references or sources. (February 2008) Please help improve this article by adding citations to reliable sources. Unverifiable material may be challenged and removed. | In statistics, Spearman's rank correlation coefficient, named after Charles Spearman and often denoted by the Greek letter ρ (rho) or as rs, is a non-parametric measure of correlation – that is, it assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables. Image File history File links Question_book-3. ...
This article is about the field of statistics. ...
Charles Edward Spearman (September 10, 1863 - September 7, 1945) was an English psychologist known for work in statistics, as a pioneer of factor analysis, and for Spearmans rank correlation coefficient. ...
Rho (upper case Ρ, lower case Ï) is a letter of the Greek alphabet. ...
Non-Parametric statistics are statistics where it is not assumed that the population fits any parametrized distributions. ...
Positive linear correlations between 1000 pairs of numbers. ...
In mathematics, functions between ordered sets are monotonic (or monotone) if they preserve the given order. ...
In computer science and mathematics, a variable (pronounced ) (sometimes called an object or identifier in computer science) is a symbolic representation used to denote a quantity or expression. ...
In statistics, a frequency distribution is a list of the values that a variable takes in a sample. ...
In computer science and mathematics, a variable is a symbol denoting a quantity or symbolic representation. ...
Assumptions and alternatives
Unlike the Pearson product-moment correlation coefficient, Spearman's rank correlation coefficient does not require the assumption that the relationship between the variables is linear, nor does it require the variables to be measured on interval scales; it can be used for variables measured at the ordinal level. In statistics, the Pearson product-moment correlation coefficient (sometimes known as the PMCC) (r) is a measure of the correlation of two variables X and Y measured on the same object or organism, that is, a measure of the tendency of the variables to increase or decrease together. ...
Graph sample of linear equations A linear equation is an algebraic equation in which each term is either a constant or the product of a constant times the first power of a variable. ...
The level of measurement of a variable in mathematics and statistics is a classification that was proposed in order to describe the nature of information contained within numbers assigned to objects and, therefore, within the variable. ...
The level of measurement of a variable in mathematics and statistics is a classification that was proposed in order to describe the nature of information contained within numbers assigned to objects and, therefore, within the variable. ...
However, Spearman's rho does assume that subsequent ranks indicate equi-distant positions on the variable measured. For example, using Spearman's rho for Likert scales often used in psychology, sociology, biology and related disciplines assumes that the (psychologically) "felt distances" between scale points are the same for all betweens of the Likert scale used. A Likert scale (pronounced lick-urt) is a type of psychometric response scale often used in questionnaires, and is the most widely used scale in survey research. ...
Where equi-distance cannot be justified, correlation between ordinal-level variables can be calculated by using Kendall's (tau). The Kendall tau rank correlation coefficient (or simply the Kendall tau coefficient, Kendalls Ï or Tau test(s)) is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. ...
Calculation In principle, ρ is simply a special case of the Pearson product-moment coefficient in which the data are converted to rankings before calculating the coefficient.[1] In practice, however, a simpler procedure is normally used to calculate ρ. The raw scores are converted to ranks, and the differences d between the ranks of each observation on the two variables are calculated. This article needs to be cleaned up to conform to a higher standard of quality. ...
In statistics and data analysis, a raw score is an original datum that has not been transformed â for example, the original result obtained by a student on a test (i. ...
If there are no tied ranks, i.e.  then ρ is given by:  where: - di = the difference between each rank of corresponding values of x and y, and
- n = the number of pairs of values.
If tied ranks exist, classic Pearson's correlation coefficient between ranks has to be used instead of this formula.[1] You have to assign the same rank to each of the equal values. It is an average of their positions in the ascending order of the values: In probability theory and statistics, correlation, also called correlation coefficient, is a numeric measure of the strength of linear relationship between two random variables. ...
An Example of Averaging Ranks | Variable | Position in the descending order | Rank | | 0.8 | 5 | 5 | | 1.2 | 4 |  | | 1.2 | 3 |  | | 2.3 | 2 | 2 | | 18 | 1 | 1 | Spearman's rank correlation coefficient is equivalent to Pearson correlation on ranks. The formula above is a short-cut to its product-moment form, assuming no tie. The product-moment form can be used in both tied and untied cases. A version of this correlation is called Spearman's rho. In this case ranks are calculated as above, but in the formula of Pearson's correlation a standard deviation is taken as there were no ties.
Example The raw data used in this example is shown below. | IQ | Hours of TV per week. | | 106 | 7 | | 86 | 0 | | 100 | 27 | | 101 | 50 | | 99 | 28 | | 103 | 29 | | 97 | 20 | | 113 | 12 | | 112 | 6 | | 110 | 17 | The first step is to sort this data by the first column. Next, two more columns are created. Both of these are for ranking the first two columns. Notice how the rank of values that are the same is the mean of what their ranks would otherwise be. Then a column "d" is created to hold the differences between the two rank columns. Finally another column "d2" should be created. This is just column d squared. After doing this process with the example data you should end up with something like: | IQ (i) | Hours of TV per week (t) | rank (i) | rank (t) | d | d2 | | 86 | 0 | 1 | 1 | 0 | 0 | | 97 | 20 | 2 | 6 | 4 | 16 | | 99 | 28 | 3 | 8 | 5 | 25 | | 100 | 27 | 4 | 7 | 3 | 9 | | 101 | 50 | 5 | 10 | 5 | 25 | | 103 | 29 | 6 | 9 | 3 | 9 | | 106 | 7 | 7 | 3 | 4 | 16 | | 110 | 17 | 8 | 5 | 3 | 9 | | 112 | 6 | 9 | 2 | 7 | 49 | | 113 | 12 | 10 | 4 | 6 | 36 | The values in the d2 column can now be added to find . The value of n is 10. So these values can now be substituted back into the equation,  which evaluates to ρ = − 0.175758. In the case of ties in the original values, this formula should not be used. Instead, the Pearson correlation coefficient should be calculated on the ranks (where ties are given ranks, as described above).
Determining significance The modern approach to testing whether an observed value of ρ is significantly different from zero (we will always have 1 ≥ ρ ≥ −1) is to calculate the probability that it would be greater than or equal to the observed ρ, given the null hypothesis, by using a permutation test. This approach is almost always superior to traditional methods, unless the data set is so large that computing power is not sufficient to generate permutations, or unless an algorithm for creating permutations that are logical under the null hypothesis is difficult to devise for the particular case (but usually these algorithms are straightforward). In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...
In statistics, resampling is any of a variety of methods for doing one of the following: Estimating the precision of sample statistics (medians, variances, percentiles) by using subsets of available data (jackknife) or drawing randomly with replacement from a set of data points (bootstrapping) Exchanging labels on data points when...
A data set (or dataset) is a collection of data, usually presented in tabular form. ...
Although the permutation test is often trivial to perform for anyone with computing resources and programming experience, traditional methods for determining significance are still widely used. The most basic approach is to compare the observed ρ with published tables for various levels of significance. This is a simple solution if the significance only needs to be known within a certain range or less than a certain value, as long as tables are available that specify the desired ranges. A reference to such a table is given below. However, generating these tables is computationally intensive and complicated mathematical tricks have been used over the years to generate tables for larger and larger sample sizes, so it is not practical for most people to extend existing tables. An alternative approach available for sufficiently large sample sizes is an approximation to the Student's t-distribution. For sample sizes above about 20, the variable In probability and statistics, the t-distribution or Students t-distribution is a probability distribution that arises in the problem of estimating the mean of a normally distributed population when the sample size is small. ...
  has a Student's t-distribution in the null case (zero correlation). In the non-null case (i.e. to test whether an observed ρ is significantly different from a theoretical value, or whether two observed ρs differ significantly) tests are much less powerful, though the t-distribution can again be used. A generalization of the Spearman coefficient is useful in the situation where there are three or more conditions, a number of subjects are all observed in each of them, and we predict that the observations will have a particular order. For example, a number of subjects might each be given three trials at the same task, and we predict that performance will improve from trial to trial. A test of the significance of the trend between conditions in this situation was developed by E. B. Page and is usually referred to as Page's trend test for ordered alternatives. In statistics, the Page test for multiple comparisons between ordered alternatives is a generalisation of the test of the statistical significance of a correlation performed using Spearmans rank correlation coefficient. ...
See also The Kendall tau rank correlation coefficient (or simply the Kendall tau coefficient, Kendalls Ï or Tau test(s)) is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. ...
In statistics, rank correlation is the study of relationships between different rankings on the same set of items. ...
In mathematics, Chebyshevs sum inequality, named after Pafnuty Chebyshev, states that if and then Chebyshevs sum inequality follows from the rearrangement inequality. ...
Let be real numbers and be any permutation of . ...
In statistics, the Pearson product-moment correlation coefficient (sometimes known as the PMCC) (r) is a measure of the correlation of two variables X and Y measured on the same object or organism, that is, a measure of the tendency of the variables to increase or decrease together. ...
External links Microsoft Excel (full name Microsoft Office Excel) is a spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS. It features calculation and graphing tools which, along with aggressive marketing, have made Excel one of the most popular microcomputer applications to date. ...
References - ^ a b Myers, Jerome L.; Arnold D. Well (2003). Research Design and Statistical Analysis, second edition, Lawrence Erlbaum, p. 508. ISBN 0805840370.
- C. Spearman, "The proof and measurement of association between two things" Amer. J. Psychol. , 15 (1904) pp. 72–101
- M.G. Kendall, "Rank correlation methods" , Griffin (1962)
- M. Hollander, D.A. Wolfe, "Nonparametric statistical methods" , Wiley (1973)
This article is about the field of statistics. ...
Descriptive statistics are used to describe the basic features of the data in a study. ...
This article is about mathematical mean. ...
In mathematics and statistics, the arithmetic mean (or simply the mean) of a list of numbers is the sum of all the members of the list divided by the number of items in the list. ...
The geometric mean of a collection of positive data is defined as the nth root of the product of all the members of the data set, where n is the number of members. ...
This article is about the statistical concept. ...
In statistics, mode means the most frequent value assumed by a random variable, or occurring in a sampling of a random variable. ...
Look up range in Wiktionary, the free dictionary. ...
This article is about mathematics. ...
In probability and statistics, the standard deviation of a probability distribution, random variable, or population or multiset of values is a measure of the spread of its values. ...
It has been suggested that this article or section be merged with inferential statistics. ...
One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. ...
In statistics, a result is significant if it is unlikely to have occurred by chance, given that a presumed null hypothesis is true. ...
The power of a statistical test is the probability that the test will reject a false null hypothesis (that it will not make a Type II error). ...
In statistics, a null hypothesis is a hypothesis set up to be nullified or refuted in order to support an alternative hypothesis. ...
In statistics, the Alternative Hypothesis is the hypothesis proposed to explain a statistically significant difference between results, that is if the Null Hypothesis has been rejected. ...
Type I errors (or α error, or false positive) and type II errors (β error, or a false negative) are two terms used to describe statistical errors. ...
The Z-test is a statistical test used in inference. ...
A t test is any statistical hypothesis test in which the test statistic has a Students t distribution if the null hypothesis is true. ...
Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. ...
Compares the various grading methods in a normal distribution. ...
In statistical hypothesis testing, the p-value of a random variable T used as a test statistic is the probability that T will assume a value at least as extreme as the observed value tobserved, given that a null hypothesis being considered is true. ...
In statistics, analysis of variance (ANOVA) is a collection of statistical models and their associated procedures which compare means by splitting the overall observed variance into different parts. ...
Survival analysis is a branch of statistics which deals with death in biological organisms and failure in mechanical systems. ...
The survival function, also known as a survivor function or reliability function, is a property of any random variable that maps a set of events, usually associated with mortality or failure of some system, onto time. ...
The Kaplan-Meier estimator (also known as the Product Limit Estimator) estimates the survival function from life-time data. ...
The logrank test (sometimes called the Mantel-Haenszel test or the Mantel-Cox test) [1] is a hypothesis test to compare the survival distributions of two samples. ...
Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. ...
// Proportional hazards models are a sub-class of survival models in statistics. ...
Positive linear correlations between 1000 pairs of numbers. ...
In statistics, a spurious relationship (or, sometimes, spurious correlation) is a mathematical relationship in which two occurrences have no logical connection, yet it may be implied that they do, due to a certain third, unseen factor (referred to as a confounding factor or lurking variable). The spurious relationship gives an...
In statistics, the Pearson product-moment correlation coefficient (sometimes known as the PMCC) (r) is a measure of the correlation of two variables X and Y measured on the same object or organism, that is, a measure of the tendency of the variables to increase or decrease together. ...
In statistics, rank correlation is the study of relationships between different rankings on the same set of items. ...
The Kendall tau rank correlation coefficient (or simply the Kendall tau coefficient, Kendalls Ï or Tau test(s)) is used to measure the degree of correspondence between two rankings and assessing the significance of this correspondence. ...
In statistics, regression analysis examines the relation of a dependent variable (response variable) to specified independent variables (explanatory variables). ...
In statistics, linear regression is a regression method that models the relationship between a dependent variable Y, independent variables Xi, i = 1, ..., p, and a random term ε. The model can be written as Example of linear regression with one dependent and one independent variable. ...
dataset with approximating polynomials Nonlinear regression in statistics is the problem of fitting a model to multidimensional x,y data, where f is a nonlinear function of x with parameters θ. In general, there is no algebraic expression for the best-fitting parameters, as there is in linear regression. ...
Logistic regression is a statistical regression model for Bernoulli-distributed dependent variables. ...
|