In statistics, analysis of variance (ANOVA) is a collection of statistical models and their associated procedures which compare means by splitting the overall observed variance into different parts. The initial techniques of the analysis of variance were pioneered by the statistician and geneticistRonald Fisher in the 1920s and 1930s, and is sometimes known as Fisher's ANOVA or Fisher's analysis of variance
There are three conceptual classes of such models:
Fixed-effects model assumes that the data come from normal populations which differ in their means.
Random-effects models assume that the data describe a hierarchy of different populations whose differences are constrained by the hierarchy.
Mixed models describe situations where both fixed and random effects are present.
The fundamental technique is a partitioning of the total sum of squares into components related to the effects in the model used. For example, we show the model for a simplified ANOVA with one type of treatment at different levels. (If the treatment levels are quantitative and the effects are linear, a linear regression analysis may be appropriate.)
SSTotal = SSError + SSTreatments
The number of degrees of freedom (abbreviated df) can be partitioned in a similar way and specifies the Chi-square distribution which describes the associated sums of squares.
dfTotal = dfError + dfTreatments
Fixed-effects model
The fixed-effects model of analysis of variance applies to situations in which the experimenter has subjected his experimental material to several treatments, each of which affects only the mean of the underlying normal distribution of the "response variable".
Random-effects model
Random effects models are used to describe situations in which incomparable differences in experimental material occur. The simplest example is that of estimating the unknown mean of a population whose individuals differ from each other. In this case, the variation between individuals is confounded with that of the observing instrument.
Degrees of freedom
Degrees of freedom indicates the effective number of observations which contribute to the sum of squares in an ANOVA, the total number of observations minus the number of linear constraints in the data.
The key statistic in ANOVA is the F-test of difference of group means, testing if the means of the groups formed by values of the independent variable (or combinations of values for multiple independent variables) are different enough not to have occurred by chance.
Thus some key ANOVA assumptions are that the groups formed by the independent variable(s) are relatively equal in size and have similar variances on the dependent variable ("homogeneity of variances").
That is, the ANOVA F-test is a function of the variance of the set of group means, the overall mean of all observations, and the variances of the observations in each group weighted for group sample size.
When you omit Model (anova() or anova(,...)), the model used by the most recent GLM command such as anova(), regress() or poisson() is used.
For example, the A main effect sum of squares in a two way unbalanced ANOVA is the sum of squares for 'a' from anova("y=b+a") and the B main effect sum of squares is the sum of squares for 'b' from anova("y=a+b").
For example, even when a, b, and c are factors, the commands Cmd> regress("y=a+b+c",weights:w); anova() print a summary of the weighted multiple regression, followed by an weighted regression ANOVA table with 1 degree of freedom for each of a, b and c.