Week 9 Lecture Slides - T
Week 9 Lecture Slides - T
Regression
Outline
1. Regression Basics
2. Simple Regression Model
3. Goodness of Fit
Limitations
• The correlation coefficient is a good starting point to analyse the
relationship between variables, but it has limitations
• Only allows analysis of relationship between two variables at the same
time
• Does not provide information on the direction of the relationship
(causality)
• Does not quantify marginal effects
Regression Analysis produces a line of ‘best fit’ between the data points.
Associated with this line is the regression equation. We need to estimate
the parameters β0 and β1
Regression Line
Intercept :
Yi = 40.71 − 2.7Xi + εi
We do not have to use just X and Y . We can make this easier to understand if we use more
meaningful letters
where Bri . . . Birth rate of country i and Gri . . . Growth rate of country i
i is an index and corresponds to a specific country. As we have 12 countries, i = 1 . . . 12.
What is the error term εi ?
The error term (Population) or the residual (Sample) is the difference between the line and the data
point
What is the error term εi ?
E.g. for Brazil where GNP growth is 5.1 the model predicts a birth rate of 26.94. As the
actual value is 30 the error here is
+3.06 (errors can also be negative).
R 2 shows what proportion of the total variation in the data is explained by the model
The Components of R 2
Actual observation
What our
model
predicts Regression line
• If all the points are on the line then the model is perfect and the R 2 is 1.
• If all the points are randomly scattered and there is no relationship then the R 2 will be 0.
• For a ‘reasonable’ model we normally are looking for an R 2 of between 0.6 - 0.8.