FACTOID # 32: Guatamalan women work 11.5 hours a day, while South African men work only 4.5.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Classical test theory

Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of classical test theory is to understand and improve the reliability of psychological tests. Image File history File links Broom_icon. ... Shortcut: WP:-( Vandalism is indisputable bad-faith addition, deletion, or change to content, made in a deliberate attempt to compromise the integrity of the encyclopedia. ... Shortcut: WP:-( Vandalism is indisputable bad-faith addition, deletion, or change to content, made in a deliberate attempt to compromise the integrity of the encyclopedia. ... Psychometrics is the science of measuring psychological aspects of a person such as knowledge, skills, abilities, or personality. ... One may be faced with the problem of making a definite decision with respect to an uncertain hypothesis which is known only through its observable consequences. ... In psychometrics reliability is the accuracy of the scores of a measure. ...


Classical test theory may be regarded as roughly synonymous with true score theory. The term "classical" refers not only to the chronology of these models but also contrasts with the more recent psychometric theories, generally referred to collectively as item response theory, which sometimes bear the appellation "modern" as in "modern latent trait theory". Item response theory (IRT) is a body of related psychometric theory that provides a foundation for scaling persons and items based on responses to assessment items. ...

Contents

True and error scores

Classical test theory is based on the decomposition of observed scores into true and error scores. The theory views the observed score x of person i, denoted as xi, as a realization of a random variable X. The person is characterized by a probability distribution over the possible realizations of this random variable. This distribution is called a "propensity distribution". Person i's true score, ti, is axiomatically defined as the expectation of this propensity distribution. This definition is formally stated as


(Eq. 1)     {varepsilon}(X_i)=t_i.


Secondly, the so-called error score for person i, Ei, is defined as the difference between i's observed score and his true score: In statistics and optimization, the concepts of error and residual are easily confused with each other. ...


(Eq. 2)     Ei = Xiti.


Note that Xi and Ei are random variables, but ti is a constant. Also note that it directly follows from these definitions that the error score has expectation zero:


(Eq. 3)     {varepsilon}(E_i)={varepsilon}(X_i - t_i)={varepsilon}(X_i)-{varepsilon}(t_i)=t_i - t_i = 0.


Relation to population

The above equations represent the assumptions that classical test theory makes at the level of the individual person. However, the theory is never used to analyze individual test scores; rather, the focus of the theory is on properties of test scores relative to populations of persons. Hence, the next step is to introduce a population-sampling scheme into the structure of classical test theory. When we assume that people are randomly sampled from a population, the true score becomes a random variable too, so that we get the (in)famous equation


(Eq. 4)     X = T + E


Classical test theory is concerned with the relations between the three variables X, T, and E in the population. These relations are used to say something about the quality of test scores. In this regard, the most important concept is that of reliability. The reliability of the observed test scores X, which is denoted as {rho^2_{XT}}, is defined as the ratio of true score variance {sigma^2_T} to the observed score variance {sigma^2_X}:


(Eq. 5)     {rho^2_{XT}} = frac{{sigma^2_T}}{{sigma^2_X}}.


Because the variance of the observed scores can be shown to equal the sum of the variance of true scores and the variance of error scores, this is equivalent to


(Eq. 6)     {rho^2_{XT}} = frac{{sigma^2_T}}{{sigma^2_X}} = frac{{sigma^2_T}}{{sigma^2_T}+{sigma^2_E}}.


This equation, which formulates a signal-to-noise ratio, has intuitive appeal: The reliability of test scores becomes higher as the proportion of error variance in the test scores becomes lower and vice versa. The reliability is equal to the proportion of the variance in the test scores that we could explain if we knew the true scores. The square root of the reliability is the correlation between true and observed scores.


Reliability

Note that reliability is not, as is often suggested in textbooks, a fixed property of tests, but a property of test scores that is relative to a particular population. This is because test scores will not be equally reliable in every population. For instance, as is the case for any correlation, the reliability of test scores will be lowered by restriction of range. Thus, IQ-test scores that are highly reliable in the general population will be less reliable in a population of college students. Also note that test scores are perfectly unreliable for any given individual i, because, as has been noted above, the true score is a constant at the level of the individual, which implies it has zero variance, so that the ratio of true score variance to observed score variance, and hence reliability, is zero. The reason for this is that, in the classical test theory model, all observed variability in i's scores is random error by definition (see Eq. 2). Classical test theory is relevant only at the level of populations, not at the level of individuals.


Reliability cannot be estimated directly since that would require one to observe the true scores, which according to classical test theory is impossible. However, estimates of reliability can be obtained by various means. One way of estimating reliability is by constructing a so-called parallel test. A parallel test is a test that has the property that, for every individual, it yields the same true score and the same observed score variance as the original test. If we have parallel tests x and x', then this means that


(Eq. 7)     {varepsilon}(X_i)={varepsilon}(X'_i)


and


(Eq. 8)     {sigma}^2_{E_i}={sigma}^2_{E'_i}.


Under these assumptions, it follows that the correlation between parallel test scores equals reliability (see Lord & Novick, 1968, Ch. 2, for a proof).


(Eq. 9)      {rho}_{XX'}= frac{{sigma}_{XX'}}{{sigma}_X{sigma}_{X'}}= frac{ {sigma}_T^2 }{ {sigma}_X^2 }= {rho}_{XT}^2.


The estimation of reliability by the use of parallel tests is cumbersome, because parallel tests are very hard to come by. In practice the method is rarely used. Instead, researchers use a measure of internal consistency known as Cronbach's α. Consider a test consisting of k items uj, j=1,ldots,j,ldots,k. The total test score is defined as the sum of the individual item scores, so that for individual i


(Eq. 10)     X_{i}=sum_{j=1}^{k}{U_{ij}}.


Then Cronbach's alpha equals Cronbachs (alpha) has an important use as a measure of the reliability of a psychometric instrument. ...


(Eq. 11)      alpha =frac{k}{k-1}frac{sum_{j=1}^{k}{sigma^{2}_{U_{i}}}}{sigma^2_{X}}.


Cronbach's α can be shown to provide a lower bound for reliability under rather mild assumptions. Thus, the reliability of test scores in a population is always higher than the value of Cronbach's α in that population. Thus, this method is empirically feasible and, as a result, it is very popular among researchers.


As has been noted above, the entire exercise of classical test theory is done to arrive at a suitable definition of reliability. Reliability is supposed to say something about the general quality of the test scores in question. The general idea is that, the higher reliability is, the better. Classical test theory does not say how high reliability is supposed to be. In the literature a value over .80 appears to be deemed 'acceptable'; a value over .90 is 'good'. Values between .70 and .80 are seen as mediocre but still defensible; values below .70 are bad. [Needs reference] It must be noted that these 'criteria' are not based on reasonable arguments but the result of convention. Whether they make any sense or not is unclear.


Alternatives

Classical test theory is by far the most influential theory of test scores in the social sciences. In psychometrics, the theory has been superseded by the more sophisticated models in Item Response Theory (IRT). IRT models, however, are catching on very slowly in mainstream research. One of the main problems causing this is the lack of widely available, user-friendly software; also, IRT is not included in standard statistical packages like SPSS, whereas these packages routinely provide estimates of Cronbach's α. As long as this problem is not solved, classical test theory will probably remain the theory of choice for many researchers. Item response theory (IRT) is a body of related psychometric theory that provides a foundation for scaling persons and items based on responses to assessment items. ...


References

  • Allen, M.J., & Yen, W. M. (2002). Introduction to Measurement Theory. Long Grove, IL: Waveland Press.

See also


  Results from FactBites:
 
Classical test theory - Wikipedia, the free encyclopedia (1135 words)
Classical test theory is a body of related psychometric theory that predict outcomes of psychological testing such as the difficulty of items or the ability of test-takers.
Classical test theory is concerned with the relations between the three variables X, T, and E in the population.
Classical test theory is relevant only at the level of populations, not at the level of individuals.
  More results at FactBites »


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m