FACTOID # 77: Moldova has one of the smallest artillery forces in Europe, and the highest rate in the world of death by powered lawnmower. Coincidence? Surely not.
 
 Home   Encyclopedia   Statistics   Countries A-Z   Flags   Maps   Education   Forum   FAQ   About 
 
WHAT'S NEW
RECENT ARTICLES
More Recent Articles »
 

Encyclopedia > Instrumental variable

In statistics, an instrumental variable (IV, or instrument) can be used in regression analysis to produce a consistent estimator when the explanatory variables (covariates) are correlated with the error terms. This can be caused by endogeneity, by omitted covariates, or by measurement errors in the covariates. In this situation, ordinary linear regression produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the regression, that is correlated with the suspect explanatory variable, and that is uncorrelated with the error term. A graph of a bell curve in a normal distribution showing statistics used in educational assessment, comparing various grading methods. ... Regression analysis is any statistical method where the mean of one or more random variables is predicted conditioned on other (measured) random variables. ... In statistics, a consistent estimator is an estimator that converges in probability to the quantity being estimated as the sample size grows. ... In probability theory and statistics, correlation, also called correlation coefficient, is a numeric measure of the strength of linear relationship between two random variables. ... In statistics, the concepts of error and residual are easily confused with each other. ... In an economic model, an endogenous change is one that comes from inside the model and is explained by the model itself. ... In statistics, linear regression is a method of estimating the conditional expected value of one variable y given the values of some other variable or variables x. ...


There are three main requirements for using an IV:

  • The instrument must be correlated with the model's predicting variable.
  • The instrument cannot be correlated with the error term in the second stage model (that is, the instrument cannot suffer from the same problem as the original predicting variable).
  • The instrument must act on the outcome only through the predicting variable, not directly.

Mathematics

widehat{beta}_mathrm{OLS} = frac{sum_i x_i y_i}{sum_i x_i^2} = frac{sum_i x_i (x_i beta + epsilon_i)}{sum_i x_i^2} = beta + frac{sum_i x_i epsilon_i}{sum_i x_i^2}.

When x and ε are uncorrelated, the second term goes to zero in the limit and the estimator is consistent. When x and ε are correlated, however, the estimator is biased. In probability theory and statistics, correlation, also called correlation coefficient, is a numeric measure of the strength of linear relationship between two random variables. ... In statistics, a consistent estimator is an estimator that converges in probability to the quantity being estimated as the sample size grows. ... In statistics, the term bias is used for two different concepts. ...


An instrumental variable is one that is correlated with the independent variable but not with the error term. The estimator is In statistics, the concepts of error and residual are easily confused with each other. ...

widehat{beta}_mathrm{IV} = frac{sum_i z_i y_i}{sum_i z_i x_i} = frac{sum_i z_i (x_i beta + epsilon_i)}{sum_i z_i x_i} = beta + frac{sum_i z_i epsilon_i}{sum_i z_i x_i}.

When z and ε are uncorrelated, the final term vanishes in the limit providing a consistent estimator. Note that when x is uncorrelated with the error term, x is itself an instrument. In that case the OLS estimator is a type of IV estimator.


The approach above generalizes in a straightforward way to a regression with multiple explanatory variables. Suppose X is the T x K matrix of explanatory variables resulting from T observations on K variables. Let Z be a T x K matrix of instruments. Then,

widehat{beta}_mathrm{IV} = (Z'X)^{-1}Z'Y = (Z'X)^{-1}Z'(Xbeta+epsilon) = beta + (Z'X)^{-1}Z'epsilon.

One computational method often used for implementing the technique is two-stage least-squares (2SLS). One advantage of this approach is that it can efficiently combine information from multiple instruments for over-identified regressions: where there are fewer covariates than instruments. Under the 2SLS approach, in a first stage, each endogenous covariate (predictor variable) is regressed on all valid instruments, including the full set of exogenous covariates in the main regression. Since the instruments are exogenous, these approximations of the endogenous covariates will not be correlated with the error term. So, intuitively they provide a way to analyze the relationship between the outcome variable and the endogenous covariates. In the second stage, the regression of interest is estimated as usual, except that in this each endogenous covariate is replaced with its approximation estimated in the first stage. The slope estimator thus obtained is unbiased. A small correction must be made to the sum-of-squared residuals in the second-stage fitted model in order that the associated standard errors be computed correctly.

Stage 1: widehat{X}= Z(Z'Z)^{-1}Z'X
Stage 2: widehat{B}_mathrm{IV} = (widehat{X}'widehat{X})^{-1}widehat{X}'Y

Mathematically, this estimator is identical to the single stage estimator presented above when the number of instruments is the same as the number of covariates.


Applications and problems

The use of the instrumental variables estimation technique often provides a useful, convenient and ethical alternative to the classical randomized experiment. In the randomized experiment, exogenous variation in treatment is provided by the random assignment of participants to the treatment and control conditions, causing the investigator to deny the treatment to the control participants. Using IVE, participants can be permitted to self-select into treatment and control, and the investigator can subsequently tease out the exogenous component of the treatment variation using the instrument. Of course, one does not get anything for nothing -- the IVE technique is only as good as the instruments it employs.


In comparison to randomized experiments, IV estimates local average treatment effects (LATE) rather than average treatment effects (ATE). The effect of a program is only identified for the subpopulation that is affected by the instrument. For example, using financial aid as an instrument for college (assuming financial aid changed exogenously due to a policy change) only identifies the returns to education for students who attend college solely because of financial aid. Rich students are not affected by the instrument.


The technique is useful for solving the errors in variables problem and for the recovery of structural parameters from simultaneous equations models such as supply and demand. Unfortunately, there is no way to prove that the independent variables are not correlated with the error term, since the error is by definition unobservable. Consequently, one problem is in the selection and defense of suitable instruments. Good instruments are often created by exogenous policy changes (i.e., the cancellation of federal student aid scholarship program), geographic differences in the application of standards (i.e., different states implement different passing standards for a common exam) or generic randomness (e.g., the Vietnam Draft Lottery) have led to exogenous disruptions in the values of the construct being measured by the selected instrument. In mathematics, simultaneous equations are a set of equations where variables are shared. ... The supply and demand model describes how prices vary as a result of a balance between product availability at each price (supply) and the desires of those with purchasing power at each price (demand). ... It has been suggested that random number be merged into this article or section. ...


Another problem is caused by the selection of "weak" instruments. These are instruments that are very poor predictors of the endogenous question predictor in the first-stage equation. In this latter case, the prediction of the question predictor by the instrument will be poor and the obtained predicted values will have very little variation. Consequently, they are unlikely to have much success in predicting the ultimate outcome when they are used to replace the question predictor in the second-stage equation.


References


  Results from FactBites:
 
Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. (354 words)
Instrumental variable methods have been proposed as a potential approach to control confounding by indication in nonexperimental studies of treatment effects; however, good instruments are hard to find.
We propose an instrument for use in pharmacoepidemiology that is based on a time-varying estimate of the prescribing physician's preference for one drug relative to a competing therapy.
The instrumental variable method that we have proposed appears to have substantially reduced the bias due to unobserved confounding.
  More results at FactBites »

 

COMMENTARY     


Share your thoughts, questions and commentary here
Your name
Your location
Your comments
Please enter the 5-letter protection code


Lesson Plans | Student Area | Student FAQ | Reviews | Press Releases |  Feeds | Contact
The Wikipedia article included on this page is licensed under the GFDL.
Images may be subject to relevant owners' copyright.
All other elements are (c) copyright NationMaster.com 2003-5. All Rights Reserved.
Usage implies agreement with terms.