FACTOID # 8: North Korea spends the most of its GDP on its military.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

SEARCH ALL

FACTS & STATISTICS    Advanced view

Search encyclopedia, statistics and forums:

 

 

(* = Graphable)

 

 


Encyclopedia > Factor analysis

Factor analysis is a statistical method used to explain variability among observed random variables in terms of fewer unobserved random variables called factors. The observed variables are modeled as linear combinations of the factors, plus "error" terms. The information gained about the interdependencies can be used later to reduce the set of variables in a dataset. Factor analysis originated in psychometrics, and is used in behavioral sciences, social sciences, marketing, product management, operations research, and other applied sciences that deal with large quantities of data. Image File history File links This is a lossless scalable vector image. ... This article is about the field of statistics. ... This article is about mathematics. ... In probability theory, a random variable is a quantity whose values are random and to which a probability distribution is assigned. ... In mathematics, linear combinations are a concept central to linear algebra and related fields of mathematics. ... In statistics and optimization, the concepts of error and residual are easily confused with each other. ... For the parapsychology phenomenon of distance knowledge, see psychometry. ... The social sciences are a group of academic disciplines that study human aspects of the world. ... Next big thing redirects here. ... This article or section does not adequately cite its references or sources. ... Operations Research or Operational Research (OR) is an interdisciplinary branch of mathematics which uses methods like mathematical modeling, statistics, and algorithms to arrive at optimal or good decisions in complex problems which are concerned with optimizing the maxima (profit, faster assembly line, greater crop yield, higher bandwidth, etc) or minima...

Contents

Example

The following example is a simplification for expository purposes, and should not be taken to be realistic.


Suppose a psychologist proposes a theory that there are two kinds of intelligence, "verbal intelligence" and "mathematical intelligence". Note that these are inherently unobservable. Evidence for the theory is sought in the examination scores, from each of 10 different academic fields, of 1000 students. If each student is chosen randomly from a large population, then each student's 10 scores are random variables. The psychologist's theory may say that, for each of the 10 academic fields, the score averaged over the group of all students who share some common pair of values for verbal and mathematical "intelligences" is some constant times their level of verbal intelligence plus another constant times their level of mathematical intelligence, i.e., it is a linear combination of those two "factors". The numbers, for this particular subject, by which the two kinds of intelligence are multiplied to obtain the expected score, are posited by the theory to be the same for all intelligence level pairs, and are called "factor loadings" for this subject. For example, the theory may hold that the average student's aptitude in the field of amphibiology is Intelligence is the mental capacity to reason, plan, solve problems, think abstractly, comprehend ideas and language, and learn. ... This article does not cite any references or sources. ... In statistics, a statistical population is a set of entities concerning which statistical inferences are to be drawn, often based on a random sample taken from the population. ... In mathematics and the mathematical sciences, a constant is a fixed, but possibly unspecified, value. ... In mathematics, linear combinations are a concept central to linear algebra and related fields of mathematics. ...

{ 10 × the student's verbal intelligence } + { 6 × the student's mathematical intelligence }.

The numbers 10 and 6 are the factor loadings associated with amphibiology. Other academic subjects may have different factor loadings.


Two students having identical degrees of verbal intelligence and identical degrees of mathematical intelligence may have different aptitudes in amphibiology because individual aptitudes differ from average aptitudes. That difference is called the "error" — a statistical term that means the amount by which an individual differs from what is average for his or her levels of intelligence (see errors and residuals in statistics). In statistics and optimization, the concepts of error and residual are easily confused with each other. ...


The observable data that go into factor analysis would be 10 scores of each of the 1000 students, a total of 10,000 numbers. The factor loadings and levels of the two kinds of intelligence of each student must be inferred from the data. Even the number of factors (two, in this example) must be inferred from the data.


Mathematical model of the same example

In the example above, for i = 1, ..., 1,000 the ith student's scores are

begin{matrix}x_{1,i} & = & mu_1 & + & ell_{1,1}v_i & + & ell_{1,2}m_i & + & varepsilon_{1,i}  vdots & & vdots & & vdots & & vdots & & vdots  x_{10,i} & = & mu_{10} & + & ell_{10,1}v_i & + & ell_{10,2}m_i & + & varepsilon_{10,i} end{matrix}

where

  • xk,i is the ith student's score for the kth subject
  • μk is the mean of the students' scores for the kth subject (assumed to be zero, for simplicity, in the example as described above, which would amount to a simple shift of the scale used)
  • vi is the ith student's "verbal intelligence",
  • mi is the ith student's "mathematical intelligence",
  • ell_{k,j} are the factor loadings for the kth subject, for j = 1, 2.
  • εk,i is the difference between the ith student's score in the kth subject and the average score in the kth subject of all students whose levels of verbal and mathematical intelligence are the same as those of the ith student,

In matrix notation, we have In mathematics, a matrix (plural matrices) is a rectangular table of elements (or entries), which may be numbers or, more generally, any abstract quantities that can be added and multiplied. ...

X = μ + LF + ε

where

  • X is a 10 × 1,000 matrix of observable random variables,
  • μ is a 10 × 1,000 column vector of unobservable constants (in this case "constants" are quantities not differing from one individual student to the next; and "random variables" are those assigned to individual students; the randomness arises from the random way in which the students are chosen),
  • L is a 10 × 2 matrix of factor loadings (unobservable constants),
  • F is a 2 × 1,000 matrix of unobservable random variables,
  • ε is a 10 × 1,000 matrix of unobservable random variables.

Observe that by doubling the scale on which "verbal intelligence"—the first component in each column of F—is measured, and simultaneously halving the factor loadings for verbal intelligence makes no difference to the model. Thus, no generality is lost by assuming that the standard deviation of verbal intelligence is 1. Likewise for mathematical intelligence. Moreover, for similar reasons, no generality is lost by assuming the two factors are uncorrelated with each other. (However, since any rotation of a solution is also a solution, this makes interpreting the factors difficult. See disadvantages below. In this particular example, if we do not know ex ante that the two types of intelligence are uncorrelated, then we can not interpret the two factors as the two different types of intelligence. Even if they are uncorrelated, we can not tell which factor corresponds to verbal intelligence and which corresponds to mathematical intelligence without an outside argument.) The "errors" ε are taken to be independent of each other. The variances of the "errors" associated with the 10 different subjects are not assumed to be equal. In probability theory and statistics, to call two real-valued random variables X and Y uncorrelated means that their correlation is zero, or, equivalently, their covariance is zero. ...


The values of the loadings L, the averages μ, and the variances of the "errors" ε must be estimated given the observed data X. [How this is done is a subject that must get addressed in this article, which remains "under construction".] This article is about mathematics. ...


Factor analysis in psychometrics

History

Charles Spearman spearheaded the use of factor analysis in the field of psychology and is sometimes credited with the invention of factor analysis. He discovered that schoolchildren's scores on a wide variety of seemingly unrelated subjects were positively correlated, which led him to postulate that a general mental ability, or g, underlies and shapes human cognitive performance. His postulate now enjoys broad support in the field of intelligence research, where it is known as the g theory. Charles Edward Spearman (September 10, 1863 - September 7, 1945) was an English psychologist known for work in statistics, as a pioneer of factor analysis, and for Spearmans rank correlation coefficient. ... Intelligence is the mental capacity to reason, plan, solve problems, think abstractly, comprehend ideas and language, and learn. ... The general intelligence factor (abbreviated g) is a controversial construct used in the field of psychology (see also psychometrics) to quantify what is common to the scores of all intelligence tests. ...


Raymond Cattell expanded on Spearman’s idea of a two-factor theory of intelligence after performing his own tests and factor analysis. He used a multi-factor theory to explain intelligence. Cattell’s theory addressed alternate factors in intellectual development, including motivation and psychology. Cattell also developed several mathematical methods for adjusting psychometric graphs, such as his "scree" test and similarity coefficients. His research lead to the development of his theory of fluid and crystallized intelligence, as well as his 16 Personality Factors theory of personality. Cattell was a strong advocate of factor analysis and psychometrics. He believed that all theory should be derived from research, which supports the continued use of empirical observation and objective testing to study human intelligence. Raymond Bernard Cattell (20 March 1905 - 2 February 1998) was a British and American psychologist who theorized the existence of fluid and crystallized intelligences to explain human cognitive ability. ... In psychometric psychology, fluid and crystallized intelligence (abbreviated gF and gC, respectively) are factors of general intelligence identified by Raymond Cattell (1971). ... 16 PF is the standard abbreviation for the 16 Personality Factors multivariately-derived by psychologist Raymond Cattell. ... For the parapsychology phenomenon of distance knowledge, see psychometry. ...


Applications in psychology

Factor analysis is used to identify "factors" that explain a variety of results on different tests. For example, intelligence research found that people who get a high score on a test of verbal ability are also good on other tests that require verbal abilities. Researchers explained this by using factor analysis to isolate one factor, often called crystallized intelligence or verbal intelligence, that represents the degree in which someone is able to solve problems involving verbal skills. In psychometric psychology, fluid and crystallized intelligence (abbreviated gF and gC, respectively) are factors of general intelligence identified by Raymond Cattell (1971). ...


Factor analysis in psychology is most often associated with intelligence research. However, it also has been used to find factors in a broad range of domains such as personality, attitudes, beliefs, etc. It is linked to psychometrics, as it can assess the validity of an instrument by finding if the instrument indeed measures the postulated factors. For the parapsychology phenomenon of distance knowledge, see psychometry. ...


Advantages

  • Reduction of number of variables, by combining two or more variables into a single factor. For example, performance at running, ball throwing, batting, jumping and weight lifting could be combined into a single factor such as general athletic ability. Usually, in an item by people matrix, factors are selected by grouping related items. In the Q factor analysis technique, the matrix is transposed and factors are created by grouping related people: For example, liberals, libertarians, conservatives and socialists, could form separate groups.
  • Identification of groups of inter-related variables, to see how they are related to each other. For example, Carroll used factor analysis to build his Three Stratum Theory. He found that a factor called "broad visual perception" relates to how good an individual is at visual tasks. He also found a "broad auditory perception" factor, relating to auditory task capability. Furthermore, he found a global factor, called "g" or general intelligence, that relates to both "broad visual perception" and "broad auditory perception". This means someone with an high "g" is likely to have both a high "visual perception" capability and an high "auditory perception" capability, and that "g" therefore explains a good part of why someone is good or bad in both those domains.

In 1993 John Carroll (1916 - 2003) published Human cognitive abilities: A survey of factor-analytic studies, which outlined his hierarchical, Three-Stratum Theory of cognitive abilities. ...

Disadvantages

  • "...each orientation is equally acceptable mathematically. But different factorial theories proved to differ as much in terms of the orientations of factorial axes for a given solution as in terms of anything else, so that model fitting did not prove to be useful in distinguishing among theories." (Sternberg, 1977). This means all rotations represent different underlying processes, but all rotations are equally valid outcomes of standard factor analysis optimization. Therefore, it is impossible to pick the proper rotation using factor analysis alone.
  • Factor analysis can be only as good as the data allows. In psychology, where researchers have to rely on more or less valid and reliable measures such as self-reports, this can be problematic.
  • Interpreting factor analysis is based on using a “heuristic”, which is a solution that is "convenient even if not absolutely true" (Richard B. Darlington). More than one interpretation can be made of the same data factored the same way, and factor analysis can not identify causality.

Factor analysis in marketing

The basic steps are:

  • Identify the salient attributes consumers use to evaluate products in this category.
  • Use quantitative marketing research techniques (such as surveys) to collect data from a sample of potential customers concerning their ratings of all the product attributes.
  • Input the data into a statistical program and run the factor analysis procedure. The computer will yield a set of underlying attributes (or factors).
  • Use these factors to construct perceptual maps and other product positioning devices.

Quantitative marketing research is a social research method that utilizes statistical techniques. ... Statistical surveys are used to collect quantitative information about items in a population. ... Customers are waiting in front of a famous fashion shop for its grand opening in Hong Kong. ... Perceptual mapping is a graphics technique used by marketers that attempts to visually display the perceptions of customers or potential customers. ... A products position is how potential buyers see the product. ...

Information collection

The data collection stage is usually done by marketing research professionals. Survey questions ask the respondent to rate a product sample or descriptions of product concepts on a range of attributes. Anywhere from five to twenty attributes are chosen. They could include things like: ease of use, weight, accuracy, durability, colourfulness, price, or size. The attributes chosen will vary depending on the product being studied. The same question is asked about all the products in the study. The data for multiple products is coded and input into a statistical program such as SPSS, SAS, Stata, and SYSTAT. SPSS is a computer program used for statistical analysis and is also the name of the company (SPSS Inc. ... The SAS System (originally Statistical Analysis System) is an integrated system of software products provided by SAS Institute that enables the programmer to perform: data entry, retrieval, management, and mining report writing and graphics statistical and mathematical analysis business planning, forecasting, and decision support operations research and project management quality... Stata (Statistics/Data Analysis) is a statistical program created in 1985 by Statacorp that is used by many businesses and academic institutions around the world. ...


Analysis

The analysis will isolate the underlying factors that explain the data. Factor analysis is an interdependence technique. The complete set of interdependent relationships are examined. There is no specification of either dependent variables, independent variables, or causality. Factor analysis assumes that all the rating data on different attributes can be reduced down to a few important dimensions. This reduction is possible because the attributes are related. The rating given to any one attribute is partially the result of the influence of other attributes. The statistical algorithm deconstructs the rating (called a raw score) into its various components, and reconstructs the partial scores into underlying factor scores. The degree of correlation between the initial raw score and the final factor score is called a factor loading. There are two approaches to factor analysis: "principal component analysis" (the total variance in the data is considered); and "common factor analysis" (the common variance is considered). In statistics, principal components analysis (PCA) is a technique that can be used to simplify a dataset; more formally it is a linear transformation that chooses a new coordinate system for the data set such that the greatest variance by any projection of the data set comes to lie on... This article is about mathematics. ...


Note that principal component analysis and common factor analysis differ in terms of their conceptual underpinnings. The factors produced by principal component analysis are conceptualized as being linear combinations of the variables whereas the the factors produced by common factor analysis are conceptualized as being latent variables. Computationally, the only difference is that the diagonal of the relationships matrix is replaced with communalities (the variance accounted for by more than one variable) in common factor analysis. This has the result of making the factor scores indeterminate and thus differ depending on the method used to compute them whereas those produced by principal components analysis are not dependent on the method of computation. Although there have been heated debates over the merits of the two methods, a number of leading statisticians have concluded that in practice there is little difference (Velicer and Jackson, 1990) which makes sense since the computations are quite similar despite the differing conceptual bases, especially for datasets where communalities are high and/or there are many variables, reducing the influence of the diagonal of the relationship matrix on the final result (Gorsuch, 1983).


The use of principal components in a semantic space can vary somewhat because the components may only "predict" but not "map" to the vector space. This produces a statistical principal component use where the most salient words or themes represent the preferred basis. [ok] In linear algebra, a basis is a set of vectors that, in a linear combination, can represent every vector in a given vector space, and such that no element of the set can be represented as a linear combination of the others. ...


Advantages

  • Both objective and subjective attributes can be used
  • Factor Analysis can be used to identify the hidden dimensions or contructs which may or may not be apparent from direct analysis.
  • It is fairly easy to do, inexpensive, and accurate
  • It is based on direct inputs from customers
  • There is flexibility in naming and using dimensions

Disadvantages

  • Usefulness depends on the researchers' ability to develop a complete and accurate set of product attributes - If important attributes are missed the value of the procedure is reduced accordingly.
  • Naming of the factors can be difficult - multiple attributes can be highly correlated with no apparent reason.
  • If the observed variables are completely unrelated, factor analysis is unable to produce a meaningful pattern (though the eigenvalues will highlight this: suggesting that each variable should be given a factor in its own right).
  • If sets of observed variables are highly similar to each other but distinct from other items, Factor analysis will assign a factor to them, even though this factor will essentially capture true variance of a single item. In other words, it is not possible to know what the 'factors' actually represent; only theory can help inform the researcher on this.

Factor Analysis in Physical Sciences

Factor analysis has also been widely used in physical sciences such as geochemistry, ecology, and hydrochemistry[1] . The field of geochemistry involves study of the chemical composition of the Earth and other planets, chemical processes and reactions that govern the composition of rocks and soils, and the cycles of matter and energy that transport the Earths chemical components in time and space, and their interaction with... For the journal, see Ecology (journal). ...


In groundwater quality management, it is important to relate the spatial distribution of different chemical parameters to different possible sources, which have different chemical signatures. For example, a sulfide mine is likely to be associated with high levels of acidity, dissolved sulfates and transition metals. These signatures can be identified as factors through R-mode factor analysis, and the location of possible sources can be suggested by contouring the factor scores [2].


In geochemistry, different factors can correspond to different mineral associations, and thus to mineralisation [3]. The field of geochemistry involves study of the chemical composition of the Earth and other planets, chemical processes and reactions that govern the composition of rocks and soils, and the cycles of matter and energy that transport the Earths chemical components in time and space, and their interaction with...


See also

Q Methodology is a research method used in psychology and other social sciences to study peoples subjectivity -- that is, their viewpoint. ... A products position is how potential buyers see the product. ... Perceptual mapping is a graphics technique used by marketers that attempts to visually display the perceptions of customers or potential customers. ... Consumer research redirects here. ... This article or section does not adequately cite its references or sources. ... This is a list of over 200 articles on marketing topics. ... Recommendation systems are programs which attempt to predict items (movies, music, books, news, web pages) that a user may be interested in, given some information about the users profile. ... Principal components analysis (PCA) is a technique used to reduce multidimensional data sets to lower dimensions for analysis. ... Louis Leon Thurstone (29 May 1887–29 September 1955) was a U.S. pioneer in the fields of psychometrics and psychophysics. ... First edition (1981) of The Mismeasure of Man The Mismeasure of Man is a controversial, best-selling 1981 book written by the Harvard paleontologist Stephen Jay Gould (1941-2002). ...

Footnotes

  1. ^ Subbarao, C., Subbarao, N.V., Chandu, S.N., 1995. Characterisation of groundwater contamination using factor analysis. Environmental Geology 28, 175–180
  2. ^ Love, D., Hallbauer, D.K., Amos, A. and Hranova, R.K. 2004. Factor analysis as a tool in groundwater quality management: two southern African case studies. Physics and Chemistry of the Earth, 29, 1135-1143. [1]
  3. ^ Barton, E.S. and Hallbauer, D.K. 1996. Trace-element and U---Pb isotope compositions of pyrite types in the Proterozoic Black Reef, Transvaal Sequence, South Africa: Implications on genesis and age. Chemical Geology, 133, 173-199. [2]

References

  • Charles Spearman. Retrieved July 22, 2004, from http://www.indiana.edu/~intell/spearman.shtml
  • Exploratory Factor Analysis - A Book Manuscript by Tucker, L. & MacCallum R. (1993). Retrieved June 8, 2006, from: http://www.unc.edu/~rcm/book/factornew.htm
  • Factor Analysis. (2004). Retrieved July 22, 2004, from http://comp9.psych.cornell.edu/Darlington/factor.htm
  • Factor Analysis. Retrieved July 23, 2004, from http://www2.chass.ncsu.edu/garson/pa765/factor.htm
  • Raymond Cattell. Retrieved July 22, 2004, from http://www.indiana.edu/~intell/rcattell.shtml
  • Sheppard, A. G. (1996). The sequence of factor analysis and cluster analysis: Differences in segmentation and dimensionality through the use of raw and factor scores. Tourism Analysis, 1(Inaugural Volume), 49-57.
  • Sternberg, R.J.(1990). The geographic metaphor. In R.J. Sternberg, Metaphors of mind: Conceptions of the nature of intelligence (pp.85-111). New York: Cambridge.
  • Stills, D.L. (Ed.). (1989). International encyclopedia of the social sciences: Biographical supplement (Vol. 18). New York: Macmillan.
  • Gorsuch, R. L. (1983) Factor Analysis. Hillsdale, NJ: Lawrence Erlbaum.
  • Velicer, W. F. & Jackson, D. N. (1990). Component analysis versus common factor analysis: Some issues in selecting an appropriate procedure. Multivariate Behavioral Research, 25(1), 1-28.

External links


  Results from FactBites:
 
PA 765: Factor Analysis (8455 words)
Factor analysis generates a table in which the rows are the observed raw indicator variables and the columns are the factors or latent variables which explain as much of the variance in these variables as possible.
While factor analysis may demonstrate that a particular model with a given predicted number of latent variables is not inconsistent with the data by this technique, researchers should understand that other models with different numbers of latent variables may also have good fit by SEM techniques.
Factor scores are coefficients of cases on the factors, whereas factor loadings are coefficients of variables on the factors.
Factor Analysis (9606 words)
Factor analysis was invented nearly 100 years ago by psychologist Charles Spearman, who hypothesized that the enormous variety of tests of mental ability--measures of mathematical skill, vocabulary, other verbal skills, artistic skills, logical reasoning ability, etc.--could all be explained by one underlying "factor" of general intelligence that he called g.
Factor analysis is different; it is used to study the patterns of relationship among many dependent variables, with the goal of discovering something about the nature of the independent variables that affect them, even though those independent variables were not measured directly.
Since factor loadings are among the most important pieces of output from a factor analysis, it seems natural to ask about the standard error of a factor loading, so that for instance we might test the significance of the difference between the factor loadings in two samples.
  More results at FactBites »


 
 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your comments

Want to know more?
Search encyclopedia, statistics and forums:

 


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms, 1022, m