0% found this document useful (0 votes)
18 views22 pages

Week 9 Lecture Slides - T

Uploaded by

anmolsinghvirk15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views22 pages

Week 9 Lecture Slides - T

Uploaded by

anmolsinghvirk15
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Week 9: Covariance, Correlation and

Regression
Outline
1. Regression Basics
2. Simple Regression Model
3. Goodness of Fit
Limitations
• The correlation coefficient is a good starting point to analyse the
relationship between variables, but it has limitations
• Only allows analysis of relationship between two variables at the same
time
• Does not provide information on the direction of the relationship
(causality)
• Does not quantify marginal effects

Solution? Regression analysis!


Regression Analysis
Definition
Regression measures the impact of X upon Y. For example, how the birth
rate (Y ) depends on the rate of growth in GNP (X ), or how income (Y )
depends on the level of education (X ).
Regression also allows several explanatory variables to influence Y. For
example, we could examine how the birth rate (Y) depends on both GNP
growth (X1) and the level of education (X2).
The linear regression model with two covariates has the following form:
Y = β0 + β1 X1 + β2 X2 + ε
Regression Basics
Data
Y - a quantitative dependent variable
X - a quantitative explanatory variable, also called covariate, independent
variable
Example
Y = annual income, X = number of years of education
Y = micro exam mark, X = seminars attended
Simple Linear Regression Model
The simple linear regression model has the following form
Y = β0 + β1 X + ε

• Ignoring ε for a moment, this equation expresses Y as linear function of X


with slope β1 and y-intercept β0.
• If you plot this equation, you will get a straight line (remember linear
demand and supply curves in Microeconomics)
• β1 > 0: Line slopes upward (positive relationship) β1 = 0: Horizontal
line (Y does not depend on X) β1 < 0: Line slopes downward (negative
relation) For each 1-unit increase in X , Y increases β1 units
Simple Linear Regression Model
Example
Birth rate = β0 + β1 Growth Rate + ε

β0 shows the birth rate when economy experiences 0 growth


β1 shows the marginal effect of a 1 percentage point higher growth rate on
the birth rate

Regression Analysis produces a line of ‘best fit’ between the data points.
Associated with this line is the regression equation. We need to estimate
the parameters β0 and β1
Regression Line

Note how the


slope is negative.
This shows that
as the growth
rates rises the
birth rate falls.
Demographic Economic Paradox
• In a 1974 UN population conference in Bucharest, Karan Singh, a former
minister of population in India, illustrated this trend by stating “Development
is the best contraceptive”.

• It is hypothesized that the observed trend has come about as a response to


increased life expectancy, reduced childhood mortality, improved female
literacy and independence, and urbanization.
Line of Best Fit
• The regression line is not put randomly into the data cloud
• The most popular method of selecting the ‘right’ position of the regression line
is by trying to minimise the ‘error’
• This method is also called Ordinary Least Square (OLS) and led to the
equation for the slope and intercept presented above
• OLS minimises the sum of the squared errors, also referred to as
Residual Sum Squared (RSS)
• You will discuss the formal derivation of the slope and intercept of the
regression line in much more detail next year in Econometrics I
Estimation
How do we calculate values for β0 and β1?
Slope :

Intercept :

Important: If we swap X and Y, results will change!!!


Example
Example Continued…
Example Continued…
We now have the full regression equation:

Yi = 40.71 − 2.7Xi + εi

We do not have to use just X and Y . We can make this easier to understand if we use more
meaningful letters

Bri = 40.71 − 2.7Gri + εi

where Bri . . . Birth rate of country i and Gri . . . Growth rate of country i
i is an index and corresponds to a specific country. As we have 12 countries, i = 1 . . . 12.
What is the error term εi ?
The error term (Population) or the residual (Sample) is the difference between the line and the data
point
What is the error term εi ?
E.g. for Brazil where GNP growth is 5.1 the model predicts a birth rate of 26.94. As the
actual value is 30 the error here is
+3.06 (errors can also be negative).

Bri = 40.71 − 2.70Gri + εi → 30 = 40.71 − 2.7 × 5.1 + 3.06


Prediction
The regression line may be used for prediction: to predict the birth rate for a country with GNP
growing at 3% p.a. we insert this value into the regression equation.

Bri = 40.71 − 2.70Gri + εi


Bˆri= 40.71 − 2.70 × 3 = 32.6

The ‘hat’ (ˆ) above Bri means estimated or predicted

The predicted birth rate in this case is 32.6


How good is the model? Measuring
Goodness of Fit
To measure how good the model is we use the coefficient of determination, R 2
ESS explained variation
R2 =
TSS
Total variation
• 0< <1
R2
• ESS: Explained (or Regression) sum of squares
• TSS: Total sum of squares
• RSS: Residual sum of squares

R 2 shows what proportion of the total variation in the data is explained by the model
The Components of R 2

Actual observation
What our
model
predicts Regression line

average birth rate

TSS = ESS + RSS


Calculating R 2
Calculating R 2

68% of the variation in Y can be explained with the variation in X!


Final Notes on R 2

• If all the points are on the line then the model is perfect and the R 2 is 1.
• If all the points are randomly scattered and there is no relationship then the R 2 will be 0.
• For a ‘reasonable’ model we normally are looking for an R 2 of between 0.6 - 0.8.

You might also like