0% found this document useful (0 votes)
16 views9 pages

SSR 2 PDF

the document is about SSR, SSE, SST

Uploaded by

samuelmaina2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views9 pages

SSR 2 PDF

the document is about SSR, SSE, SST

Uploaded by

samuelmaina2003
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

OLS/SLR Assessment I: Goodness-of-fit

SST • How close? Goodness-of-Fit (GOF) v. Precision/Inference


• Bring on the ANOVA Table! (SST, SSE and SSR)
SSE • Goodness-of-Fit (GOF) metrics
SSR • GOF I: Mean Squared Error (MSE)… and Root MSE (RMSE)

MSE • GOF II: R-squared

RMSE • Thinking about R-squared


• Applications
R2 • Comparing SLR models using Goodness-of-Fit (GOF) metrics

1
How did we do? Goodness-of-Fit (GOF) v. Precision/Inference
 Assessment: How well did we do? How close are the estimated coefficients to the true
parameters, β 0 and β1 ? We'll have several answers. None will be entirely satisfactory…
though they will be informative, nonetheless.
 Goodness-of-Fit : Goodness-of-Fit metrics tell us something about the quality of the overall
model, about how well the predicteds fit the actuals. They may not tell us much about how
precisely we've estimated the true parameters. But if we have a lot of data and the Goodness-
of-Fit metrics look good, maybe we should feel pretty good about our estimated coefficients.
 Precision/Inference: While goodness-of-fit metrics tell us something about how well our
estimated model fits the data, they don't directly tell us anything about how precisely we
have estimated the unknown parameters, the true β ' s . Later on, we will have lots to say
about precision of estimation… but that discussion awaits the development of the tools of
statistical inference, including Confidence Intervals and Hypothesis Tests.
 Who knew? They are related! At first glance that Goodness-of-fit and Precision/Inference
look to be completely unrelated, as one looks at how well a SLR model fits the data whilst
the other considers the precision of estimation of individual parameters. But quite the
contrary!
Stay tuned! 2
Bring on the ANOVA Table! (SSTs, SSEs and SSRs)

• SST (Total Sum of Squares)


 SST =∑ ( yi − y ) 2 =(n − 1) S yy , (n-1) times the variance of the actuals

• SSE (Explained Sum of Squares)


 SSE =∑ ( yˆi − y ) 2 =(n − 1) S yyˆ ˆ , (n-1) times the variance of the predicteds

• SSR (Residual Sum of Squares)


 SSR =∑ i i ∑ i =(n − 1)Suuˆ ˆ , (n − 1) times the variance of the residuals
( y − ˆ
y ) 2
= ˆ
u 2

• SST = SSE + SSR (if there is a constant term in the model)


SST SSE SSR
 Put differently: = + , or S= S yyˆ ˆ + Suuˆ ˆ
n −1 n −1 n −1
yy

The sample variance of the actuals is the


sum of the sample variances of the predicteds and of the residuals.
3
Goodness-of-Fit (GOF) metrics

GOF I: Mean Squared Error (MSE/RMSE)


SSR
• Mean Squared Error: MSE = (in squared units of the y variable)
n−2
SSR
• =
Root Mean Squared Error: RMSE =
MSE (in units of the y variable)
n−2
GOF II: R-squared
SSR
• The Coefficient of Determination, is defined by: R 2 = 1 −
SST
• So long as there is a constant term in the model (so the mean predicted value is the
same as the mean actual value), SSR= SST − SSE and:
SSR SSE SSE / (n − 1) Sample Var ( predicted ) S yyˆ ˆ
R =
1−
2
= = = =.
SST SST SST / (n − 1) SampleVar (actual ) S yy

4
Thinking about R-squared

• R2 is bounded: By construction, 0 ≤ R 2 ≤ 1 (if there is a constant term in the model)…


higher values mean that you've done a better job explaining the variation in the actuals.
Don’t get too excited if R 2 is close to 1, or too depressed if it’s close to 0. Doing good
econometrics is way more than just maximizing R 2 .
• R2 as the Ratio of Variances. Given the results above, R-squared is the ratio of the
Sample Variance of the predicteds to the Sample Variance of the actuals… the percent
of the variation of the actuals explained by the model. This is the most common, and
perhaps the most insightful, interpretation of R 2 .
• R2 as the correlation2 between predicted and actuals. R 2 is also the square of the
sample correlation between the independent and dependent variables, as well as the
sample correlation between the actuals and predicteds: ρ=2
xy ρ=2
ˆ
yy R2 .

5
OLS/SLR estimation … more generally

• The two goodness-of-fit metrics (R-squared and MSE/RMSE) tell you something
about how well your model captures/explains the variation in the dependent variable,
y. They alone, however, do not tell you how well you’ve estimated the unknown
parameter values β 0 and β1 . In some cases, R-squared will be high and MSE/RMSE
will be low, and your parameter estimates will be quite poor… and vice-versa.
• Some examples:
 Suppose you have a sample of size two. With just two data points, R 2 = 1 and
MSE = 0 … and in all likelihood you have miserable estimates of the unknown
parameter values.
 Here are two examples with just five observations randomly generated using a
true relationship given by the solid red line…. and the dashed black line shows
you the OLS estimated SLR relationship for the given dataset. In both cases, the
R 2 is above.5, and the estimated relationship is all wrong. So n matters too!

6
Comparing SLR models using Goodness-of-Fit (GOF) metrics

• You can use R 2 and MSE/RMSE to compare the performance of different SLR models…
but only to a limited extent. And you must be careful!
• If the different models all have the same LHS data (so the y's are the same in the different
models… both in terms of number and in terms of values), then the SSTs and S yy ' s will be
the same across the models, and you can compare R 2 's and MSE/RMSE's. Under these
conditions the R 2 ' s and the MSE/RMSE's will move in opposite directions, since
SSR1 SSR2 SSR1 SSR2
R > R ⇔ 1−
2 2
> 1− ⇔ SSR1 < SSR2 ⇔ < ⇔ MSE1 < MSE2 .
n−2 n−2
1 2
SST SST
• So under these conditions, models with higher R 2 's (and lower MSE/RMSE's) do a better job
of fitting the data, and in that sense are preferable.
• But: If the y's are not the same across the different models, then R 2 's and MSE/RMSE's are
not directly comparable and accordingly, they won’t tell you much unless you make some
adjustments.
7
OLS/SLR Assessment I - GOFs: TakeAways
• Goodness of Fit metrics tell you something about how well your OLS/SLR model fits the data… about the
relationship between predicteds and actuals.
• SSTs, SSEs and SSRs capture the variances in the actuals, predicteds and residuals, respectively. SST = SSE +
SSR
• Two standard GOF metrics are (Root) Mean Squared Error (MSE = SSR/(n-2)) and the Coefficient of
Determination (R-sq = 1-SSR/SST = SSE/SST = varPredicteds/varActuals))
• MSE is essentially an average squared deviation of predicteds from actuals… RMSE is the square root thereof.
MSE (RMSE) magnitudes tell you little about how well your model has performed, as they have no uniform scale.
• R-sq is essentially the variance of the predicteds relative to the variance of the actuals… or, the percent of the
variation in the actuals explained by the model. 0 ≤ R-sq ≤ 1; closer to 1 and the more of the variation in the
dependent variable explained by the model. High R-sq is terrific if nObs are high as well… but maybe not so
much otherwise.
• R-sq is also equal to the square of correlation between the LHS and RHS variables… as well as the square of the
correlation between predicteds and actuals.
• It’s OK to compares MSE’s and R-sq’s across models with the same LHS variable… but if there are changes to the
LHS variable, such comparisons are meaningless without adjustments
8
onwards… to OLS/SLR Examples

You might also like