The Linear Regression Model
The Linear Regression Model
Model
Felix Wisnu Handoyo
PR-EPS
Badan Riset dan Inovasi Nasional
Scatter Plot
wage
25
20
15
10
0
What is the graph?
It is a scatterplot of Wage against
Education
What is a scatterplot?
The scatterplot associates Wage
with Education in a two-
dimensional space
0 5 10 15 20
educ
Scatter Plot with Estimated Regression
Function
25
20
15
10
0
• What are the mechanics of fitting a linear
regression?
• Use the ordinary least squares (OLS)
method
• What is the criterion for obtaining an OLS
estimate?
• For the data at hand, we would like to
minimize the sum of squared
prediction errors
• Construct a probabilistic model to
0 5 10 15 20
describe the properties of the fitted
educ line and the corresponding predictions
wage Fitted values • We assume that Wage is linearly related to
an exogenously determined variable Educ
The Linear Regression
Wagei = ß0 + ui
ß0 hat is the average value of Wage.
Wagei = ß0 + ß1female + ui
ß1 hat is the difference in wage for female in
comparison to male.
The Linear Regression
The Linear Regression
Regression t-test for the significance of the coefficient on female is identical to the t-
test for significant differences in between the female group and male group, with the
same mean, t-stat, and p-value
The Linear Regression
wage
25
20
15
10
25
20
15
10
0
0 .2 .4 .6 .8 1
female
0 .2 .4 .6 .8 1 wage Fitted values
female
Scatter plot wage with female Scatter plot wage with female and the fitted
value
Dummy Trap
• The dummy variable trap refers to the problem that not all categories
can be included in the regression and one category needs to be left
out, which is called a base or reference category.
• For example, male and female cannot be both included in the
regression because of perfect collinearity. (1)
reg_female
(2)
reg_male
(3)
reg_female_male
VARIABLES wage wage wage
• Using Stata if we involve both male and female female
-2.512***
i=1,2,3….n=526
How to predict Wagei given Educi?
For simplicity, denote Wagei as y and Educi as x
The Linear Regression
What is the expectation of yi given xi = x∗?
What Is E(yi|xi = x ∗)?
We want to find out the values of β0 and β1 such that they minimize
the sum of squared residuals
What are residuals?
What are fitted values?
Given a set of estimates of β0 and β1, β0 =ˆβ0 and β1 =ˆβ1, what
are fitted values of y?
The Linear Regression
The variable ˆyi represents the fitted value of yi when the disturbance
term is set to its expectation zero.
Is the fitted value ˆyi the same as the actual value of yi?
We define the difference between the actual value yi and the fitted
value ˆyi as the residual (or in-sample prediction error)
The Linear Regression
The variable ˆyi represents the fitted value of yi when the disturbance
term is set to its expectation zero.
Is the fitted value ˆyi the same as the actual value of yi?
We define the difference between the actual value yi and the fitted
value ˆyi as the residual (or in-sample prediction error)
The Linear Regression
Wagei = ß0 + ß1 Educi
yi= ß0 + ß1 xi
Fitted value ?
Interaction terms with indicator variable
• Interaction terms for variables female and married can be done in two
different ways.
1. Include female and female ∗ married in the regression.
2. Create four categories: female*single, male*single, female*married, and
male*married
and include 3 of them in the regression model (the fourth/omitted category
serves as a base/reference category).
Interaction terms with indicator variable
20
15
10
25
20
15
10
5
0
0 5 10 15 20
0 5 10 15 20 educ
educ
wage wage_educ
wage Fitted values female_educ male_educ
25
20
15
10
0
(1) (2) (3)
interaction_ter model for model for male
VARIABLES m female wage wage
educ 0.539*** 0.453*** 0.539***
(0.0642) (0.0580) (0.0774)
female -1.199
(1.325)
femaleXeduc -0.0860
(0.104)
Constant 0.200 -0.998 0.200
(0.844) (0.729) (1.016)
0 5 10 15 20
educ
Observations 526 252 274 wage female_educ
R-squared 0.260 0.197 0.152 male_educ
Chow test for significant differences in coefficients between females and males. The coefficients are jointly
significantly different for females and males. Two separate models for females and males should be estimated.
24
Terima Kasih