0% found this document useful (0 votes)
76 views22 pages

Lecture 3

This document discusses measuring the goodness of fit of a regression line. It introduces the coefficient of determination (R-squared) as a measure of how well the regression line fits the data. R-squared compares the regression line to the mean line, and indicates what proportion of the variation in the dependent variable is explained by the independent variable. The document explains how to calculate R-squared and provides an example using data on mean consumption and income. It finds an R-squared value of 0.98, suggesting income explains 98% of the variation in consumption. The document also introduces the F-test as a way to test if the R-squared value is statistically significant.

Uploaded by

Watani Bidami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
76 views22 pages

Lecture 3

This document discusses measuring the goodness of fit of a regression line. It introduces the coefficient of determination (R-squared) as a measure of how well the regression line fits the data. R-squared compares the regression line to the mean line, and indicates what proportion of the variation in the dependent variable is explained by the independent variable. The document explains how to calculate R-squared and provides an example using data on mean consumption and income. It finds an R-squared value of 0.98, suggesting income explains 98% of the variation in consumption. The document also introduces the F-test as a way to test if the R-squared value is statistically significant.

Uploaded by

Watani Bidami
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Introductory Econometrics for Finance

3. Measuring the goodness of fit of the regression line

Siraj M. Sep, 2023


Goodness of fit

 Having calculated the regression line, we now ask whether it provides


a good fit for the data.
 Do the observations tend to lie close to, or far away from, the line?
 Even though we have fitted a regression line, by itself this tells us
nothing about the closeness of the fit.
 If the fit is poor, perhaps the effect of X upon Y is not so strong after
all.
2
Goodness of fit

 Although b is likely to be small, it is unlikely to be exactly zero.


 Measuring the goodness of fit of the data to the line helps us to
distinguish between good and bad regressions.
 Generally, there will be some positive 𝜇𝑖 and some negative 𝜇𝑖 .
 What we hope for is that these residuals around the regression line are
as small as possible.

3
Goodness of fit

 The coefficient of determination is a summary measure that tells


how well the sample regression line fits the data.

4
Goodness of fit

 The goodness of fit is calculated by comparing two lines: the


regression line and the ‘mean line’ (i.e. a horizontal line drawn at the
mean value of Y).
 The regression line must fit the data better (if the mean line were
the best fit, that is also where the regression line would be) but
the question is how much better?

5
Goodness of fit: Calculation of R squared

This is illustrated in
Figure 3.1, which
demonstrates the principle
behind the calculation of
the coefficient of
determination, denoted by
𝑅2 and usually more
simply referred to as ‘R
squared’.

6
Goodness of fit: Calculation of R squared

 The figure shows the mean value of Y, the calculated sample


regression line and an arbitrarily chosen sample observation (𝑋𝑖 , 𝑌𝑖 ).
 The difference between 𝑌𝑖 and 𝑌 (length 𝑌𝑖 − 𝑌) can be divided up
into:
 That part ‘explained’ by the regression line, 𝑌𝑖 −𝑌(i.e. explained by the value of 𝑋𝑖 ).
 The error term 𝑌𝑖 − 𝑌𝑖 .
 In algebraic terms, 𝑌𝑖 − 𝑌 = 𝑌𝑖 − 𝑌𝑖 + 𝑌𝑖 − 𝑌 (3.1)
7
Goodness of fit: Calculation of R squared

 A good regression model should ‘explain’ a large part of the


differences between the 𝑌𝑖 values and 𝑌 , i.e. the length ( 𝑌𝑖 −𝑌) should
be large relative to 𝑌𝑖 - 𝑌𝑖 .
 A measure of fit could therefore be:

𝑌𝑖 −𝑌
𝑌𝑖 −𝑌

8
Goodness of fit: Calculation of R squared

 We need to apply this to all observations rather than just a single


one; hence we could sum this expression over all the sample
observations.
 A problem with this is that some of the terms would take a
negative value and offset the positive terms.
 To measure the goodness of fit, we do not want the positive and
negative terms to cancel each other out.
9
Goodness of fit: Calculation of R squared

 Hence, to get round this problem, we square each of the terms in


equation (3.1) to make them all positive, and then sum over the
observations. This gives:

10
Goodness of fit: Calculation of R squared

 The measure of goodness of fit, 𝑅2 , is then defined as the ratio


of the regression sum of squares to the total sum of squares, i.e.
2 𝑅𝑆𝑆
𝑅 = (3.2)
𝑇𝑆𝑆

 The better the divergences between 𝑌𝑖 and 𝑌 are explained by


the regression line, the better the goodness of fit, and the higher
the calculated value of 𝑅2 .
11
Goodness of fit: Calculation of R squared

 A value of 𝑅2 = 1 (and hence ESS = 0) indicates that all the


sample observations lie exactly on the regression line (equivalent
to perfect correlation).
 If 𝑅2 = 0, then the regression line is of no use at all -X does not
influence Y (linearly) at all, and to try to predict a value of Y one
might as well use the mean 𝑌 rather than the value Xi inserted
into the sample regression equation.
12
Goodness of fit: Calculation of R squared

 We illustrate the econometric theory developed so far by considering


the data given in Lecture 1, which relates mean (Y) and income (X).
 Basic economics theory tells us, that among many variables, income is
an important determinant of consumption.
 From the data given in Lecture 1, we obtain the estimated regression
line as follows:
𝑌𝑖 = 124.316 + 0.6086𝑋𝑖 (3.3)
13
Goodness of fit: Calculation of R squared

 Geometrically, the estimated regression line is as shown in following


Figure.

14
Goodness of fit: Calculation of R squared

 As we know, each point on the regression line gives an estimate of the


mean value of Y corresponding to the chosen X value.
 The value of 𝛽2 = 0.6086, which measures the slope of the line, shows
that, within the sample range of X values, as X increases by 1, the
estimated increase in mean consumption is about 61 cents.
 That is, each additional unit of income, on average, increases personal
consumption by about 61 cents.
15
Goodness of fit: Calculation of R squared

 The 𝑅2 value of about 0.98 suggests that income explains about 98


percent of the variation in personal consumption.
 Consumption vary around the overall mean value of 1087 and 98% of
this variation is explained by variation in national income.
 This is quite a respectable figure to obtain, leaving only 2% of the
variation in Y left to be explained by other factors (or pure random
variation).
16
Goodness of fit: Calculation of R squared

 The regression seems to make a worthwhile contribution to explaining


why consumption differ.

 However, it does not explain the mechanism by which higher income


leads to a higher consumption.

17
Testing the significance of R2 : the F test

 The Another check of the quality of the regression equation is to


test whether the 𝑅2 value, calculated earlier, is significantly
greater than zero.
 This is a test using the F distribution.
 The null hypothesis for the test is 𝐻0 : 𝑅2 = 0, implying once
again that X does not influence Y (hence equivalent to 𝛽2 = 0).

18
Testing the significance of R2 : the F test

 The test statistic is:

𝑅2 1
𝐹=
1 − 𝑅2 𝑛 − 2
Or equivalently

𝑅𝑆𝑆 1
𝐹=
𝐸𝑆𝑆 𝑛 − 2

19
Testing the significance of R2 : the F test

 The F statistic is therefore the ratio of the regression sum of squares to


the error sum of squares, each divided by their degrees of freedom (for
the RSS there is one degree of freedom because of the one explanatory
variable, for the ESS there are n - 2 degrees of freedom).
 A high value of the F statistic (i.e. RSS is large relative to ESS) rejects
𝐻0 in favor of the alternative hypothesis, 𝐻1 : 𝑅2 > 0.

20
Testing the significance of R2 : the F test

 Evaluating the consumption data worked so far gives:

21
@@@@@@@@@@@@@@@@@@@@@ end of lecture 3 @@@@@@@@@@@@@@@@@@@@

22

You might also like