Lecture 2 2025
Lecture 2 2025
This table presents OLS estimates of the relationship between log employment growth
(1998-2005) and treatment, as defined as having received a completed PMGSY road by
2005. The sample is all locations that received a PMGSY road before 2012. Column 1
presents the estimate only controlling for 1998 (log) employment and village population.
Column 2 introduces state fixed e↵ects. Column 3 introduces standard village level controls
of share of land irrigated, log land area, distance from nearest town and number of non-
farm industries present in 1998. Column 4 limits to villages in which the largest habitation
had fewer than 1500 people. Standard errors are clustered at the district level.
Interpreting “Coefficients”
Dependent Variable: Health
Regression Coefficients:
Income: 2.5
Education: 1.5
Health Infr.: 4.0
Dust Particles: -0.7
Interpreting “Coefficients”
Dependent Variable: Health
Regression Coefficients:
Income: 2.5
One unit increase of Income, everything
Education: 1.5 else the same, increases health by 2.5
units
Health Infr.: 4.0
Dust Particles: -0.7
Interpreting “Coefficients”
Dependent Variable: Health
Regression Coefficients:
Income: 2.5
Education: 1.5
Health Infr.: 4.0 One additional unit of Dust
Particle, everything else the
Dust Particles: -0.7 same, decreases health by
0.7 units
Error Term
• The error term cannot be partialled out.
• So the interpretation of coefficients is only true
under the following scenario:
“ when you increase one unit of , say Income, the
error term should not change with the change in
income”
“when you increase one unit of , say Income, no
unobservable variable relevant to explaining health
should change with the change in income”
“Consistent Estimation”
• The estimated coefficient will only be correct if
the error term and none of the observable
independent variables co-vary!
• In statistics: correct means… tending towards
true value… called “Consistent estimator”
• If any variable co- varies with the error term,
the variable is called “endogenous” and the
estimation procedure is incorrect.
Inconsistent Estimation
Dependent Variable: Health
Regression Coefficients:
Income: 3.5
Health Infr.: 4.0
Dust Particles: -0.7
Since Education is not included in the observable part (maybe
information was not collected), it is now captured by the error term.
Since Education and Income are correlated, the error term and income
are correlated
Inconsistent Estimation (Contd.)
• If you think even one variable is endogenous,
the whole regression result is WRONG
• Why: Recall when we look at each coefficient,
we interpret it as the effect partialling out all
other variables. So if there is any problem in any
one variable, the effect will spread to other
variables.
Detecting Inconsistent
Estimation
• Try to think of any variable that is
– Relevant to explain the dependent variable
– Not included in the regression but which you expect is
correlated to some independent variable
DummyMale : 2.4
DummySC: -1.3
DummyST: -2.4
DummyOBC: -2.0
Constant: 4.5
Example
Dependent Variable : Hours of Schooling
Everything else the same,
the male child goes to
school 2.4 hours more than
DummyMale: 2.4 the female child.
DummySC : -1.3
DummyST: -2.4
DummyOBC: -2.0
Constant: 4.5
Example
• Dependent Variable : Hours of Schooling
Children From SC
Households spend 1.3 Hours
lesser than the (reference)
General Cat. Household
DummyMale : 2.4
DummySC: -1.3
DummyST: -2.4
DummyOBC: -2.0
Constant: 4.5
Example
• Dependent Variable : Hours of Schooling
The Constant captures the
average hours of schooling
of all omitted categories:
In this example: The
DummyMale : 2.4 omitted (reference) cat is
Gen Category Female
DummySC: -1.3 Child: The average hours
of schooling for her is 4.5
DummyST: -2.4 hours
DummyOBC: -2.0
Constant: 4.5