Anova and Regression
Anova and Regression
In this video we'll discuss the link between ANOVA and multiple regression.
You may have noticed that the between and within-group variances are
expressed in terms of sums of squares and mean squares, which we also use
in multiple regression. It's also not a coincidence that the table of mean sums
of squares and sums of squares in multiple regression is often referred to as
the ANOVA table. In fact, multiple regression and ANOVA are technically the
same.
So the intercept represents the population mean of the last group - fed on
dry food, called the reference group. From this it follows that the regression
coefficient 𝛽!"# represents the difference in the population mean of the raw
group minus the population mean of the dry food group - the reference
group. 𝛽!"##$% represents the difference in the population mean of the
canned food group minus the population mean of the dry food group.
With an overall F-test we test the null hypothesis that all regression
coefficients are zero (𝛽! − 𝛽! = 0). So we're testing whether the difference
between the raw and dry food group is zero and whether the difference
between the canned and dry food group is zero (𝜇! − 𝜇! = 𝜇! − 𝜇! = 0). If we
rewrite this equation we can see that we also implicitly test
Since we have only two indicators we can represent the data visually in a
three-dimensional graph. As you can see the three groups are located at the
corresponding values zero and one of the dummy variables. The plane goes
through the means of these groups.
The observations are scattered around these means. In multiple regression
the null hypothesis corresponds to a flat plane, where the means are all the
same, resulting in regression coefficients of zero. As soon as one or more
means differ from the rest the plane will be tilted.
In regression, the predicted health value for cats that eat raw food is the
mean health score in the raw meat group. The same goes for the other
groups. In multiple regression the variation in the residuals or prediction
errors is the variation in the observations in each group around the group
mean. So the residual or error mean sum of squares in multiple regression is
the within-group variance in ANOVA.