0% found this document useful (0 votes)
9 views53 pages

Class 2

The document provides an overview of multiple regression analysis, focusing on the importance of controlling for omitted variable bias to ensure unbiased estimators. It outlines the classical linear model assumptions necessary for Ordinary Least Squares (OLS) to be the best linear unbiased estimator (BLUE). Key concepts include the relationship between independent and dependent variables, the impact of omitted variables, and the statistical properties of OLS estimators.

Uploaded by

yanjinyiemma666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views53 pages

Class 2

The document provides an overview of multiple regression analysis, focusing on the importance of controlling for omitted variable bias to ensure unbiased estimators. It outlines the classical linear model assumptions necessary for Ordinary Least Squares (OLS) to be the best linear unbiased estimator (BLUE). Key concepts include the relationship between independent and dependent variables, the impact of omitted variables, and the statistical properties of OLS estimators.

Uploaded by

yanjinyiemma666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Class 2:

OLS Multiple Regression and


Assumptions about Error Term
Multiple Regression & Introduction to Econometrics

1
Order of Topics
I. Why Multiple Regression?
II. Multiple Regression and OLS
Estimator
III. Classical Linear Assumptions
IV. Statistical Properties of OLS
Estimators
V. Variance of Error and Slope 2
Coefficients
I. WHY MULTIPLE REGRESSION? 3
Multiple Regression Reduces Bias
• Example 1: Test score regression with CA district data

• Is this causal?
• Probably not

• Why?
• Likely omitted variable bias – variables that we left out of the regression
• Depends on whether the left out variables are correlated with both the
average test score and student-teacher ratio

• What are some variables that fit this condition?


• Percent free lunch eligible (lower scores; in error; correlated with STR?)
• Percent English Language Learners (bring down scores; in error tern 4
now, correlated with STR?)
• These bias coefficient downward.
What does bias “look” like?

5
Why is there bias?
• OLS is an unbiased estimator, so why is there bias?
• The unbiasedness of OLS relies on certain assumptions including
the minimal influence of omitted variables
• We must control for – or hold constant – those variables if they
are correlated with both the dependent and independent
variable(s) in the model
• When we leave these variables out, they end up in the error term
and the estimator is biased

• What can we do to get rid of bias?


• Use multiple regression to control for omitted variables

6
Example 2: Earnings and years of education
• What is the likely relationship between earnings and years of
education?
• Positive

• If we estimated a regression with earnings as the outcome and


years of education as the independent variable, would it be
causal?
• Probably not – unbiasedness of OLS requires minimal influence of
omitted variables
• There are other variables that are correlated with both the
independent and dependent variable(s) that are left out here

• What other variables are important?


• Academic Ability (higher earnings and more years of
education?), occupation, experience, age etc. 7
What does bias “look” like,
again?
Earn
Estimated with
Bias

Population or “true” relationship

Yrs Educ
Example 3: Omitted Variable Bias, actual study

9
Example 4: Omitted Variable bias, actual
study

10
11
Summary
• Examples 1 through 4 illustrate “omitted variable bias”

• This is a violation of OLS Assumption III that independent


variables are uncorrelated with error term

• This is one form of what is called endogeneity

• We use multiple regression because it allows us to include


many variables in the model and reduce the likelihood of
omitted variable bias

12
II. MULTIPLE REGRESSION AND OLS 13

ESTIMATOR
Example: California School Districts
Original bivariate regression
Stata: reg testscr str

Let’s add a variable:


14
% of students qualifying for free or
reduced price lunch (FRL)
Multiple regression
Adding %FRL changes the STR coefficient
Stata: reg testscr str frl

15
How does this compare to the coefficient It’s roughly half the
from the original bivariate regression? size (-1.12 vs -2.28)
Ceteris Paribus
• In the bivariate regression, we estimated:

• We said this means that each one unit increase in the


student-teacher ratio is associated with a 2.28 unit
decrease in avg. test scores
• Our multivariate model is:

• How do we interpret the coefficient on STR in the


multivariate model?
• Each one unit increase in the student-teacher ratio is
associated with a 1.12 decrease in average test scores,
16
holding constant the % receiving free or reduced lunch
• Ceteris paribus = all else held constant
Population Regression Function

• “True” relationship between dependent and


independent variables
• What does i indicate?
• An index for the ith observation
• What do 1, 2, . . . k indicate?
• The first independent variable, second independent
variable, and so on
• There are a total of k independent variables
17
Sample Regression Function

• Estimated relationship between dependent and


independent variables
• Based on the sample you observe
• Degrees of freedom are n – k – 1
• The number of observations minus the number of
parameters estimated in the regression
• Degrees of freedom example with mean

Where n is 3, mean is 4, first two observations are 5 and


18
3. Last observation must be 4 (5 + 3 + 4 = 12/3 = 4).
OLS Multiple Regression
• Same idea as bivariate, generalized to k variables, want to
minimize

• OLS beta (for k>1):

• What is ?
• It is the residual you get from regressing Xk on all other X’s
• This is called partialling out – it removes the part of each Xk that
is uncorrelated with the other independent variables

19
Example of partialling
Model:
STR = student-teacher ratio, PFL = % free lunch, PPE = per-pupil expenditures

= --
is the portion of the variation in STR that is not explained by PFL or PPE

= - -
++

is the portion of the variation in PFL that is not explained by STR or PPE

*This is what Stata does in the background to estimate the coefficients for your 20
model. In this cases, it uses to estimate and to estimate , etc.
BREAK AND REVIEW QUESTIONS: Please answer and be ready to contribute answers to class when we return

1. Write the equation for the residual in a bivariate regression equation in terms of:

a. Yi and Yhat

b. Yi , B0hat, B1hat, Xi

2.How can adding more independent (control) variables to a regression model help improve the causal interpretation of a
relationship?

3. Murders per yeari = 400 + 0.2 (% population with handguns)


i = city (40 cities in 2000)

a. You are now informed that the percent of the population age 16 to 30 is positively related to murders per year and negatively
correlated with the % population with handguns.

Does this tell you anything about whether the coefficient on percent population is biased?

If yes, does it tell you anything about the direction of the bias?

b. You are further informed that the percent of population who have hunting licenses is negatively correlated with % population
with handguns and is unrelated to the murders per year. What does this tell you about bias?

c. You are yet further informed that percent of population that is unemployed is positively related to the number of murders but
that the percent of population that is unemployed is uncorrelated with the % population with handguns. What does this tell you 21
about bias.
1. Write the equation for the residual in a bivariate regression equation in terms of:

a. Yi and Yhat
^
ei = Yi – Y

b. Yi , B0hat, B1hat, Xi
^ ^
ei = Yi - Bt - B1 Xi

2.How can adding more independent (control) variables to a regression model help improve the causal interpretation of a
relationship?

By cutting down on the bias of coefficients of interest.

3. Murders per yeari = 400 + 0.2 (% population with handguns)


i = city (40 cities in 2000)

a. You are now informed that the percent of the population age 16 to 30 is positively related to murders per year and negatively
correlated with the % population with handguns.

Does this tell you anything about whether the coefficient on percent population is biased? If yes, does it tell you anything about the
direction of the bias?

Yes, biased since X and e correlated


Direction – estimated is too low (biased downward, should be higher).

b. You are further informed that the percent of population who have hunting licenses is negatively correlated with % population
with handguns and is unrelated to the murders per year. What does this tell you about bias?=
Has no effect on other coefficients. It is unrelated to the dependent variable and thus is NOT in the error term. X and e are not
correlated. Does not matter if something unrelated to murders is correlated with the independent variable.
22
c. You are yet further informed that percent of population that is unemployed is positively related to the number of murders but that t
the percent of population that is unemployed is uncorrelated with the % population with handguns. What does this tell you about bias.
No effect on bias since it has no relationship to the independent variable. It’s in the error term but the e and X not correlated.
III. CLASSICAL LINEAR MODEL (CLM) 23

ASSUMPTIONS
Gauss-Markov Theorem; First go around

• According to the Gauss-Markov Theorem, subject to certain


assumptions, the Ordinary Least Squares (OLS) regression
estimator is the best, linear, unbiased estimator
• Best – minimum variance
• Linear – linear in the parameters
• Unbiased – The sampling distribution of is centered on
(i.e. )
• Estimator – a formula that can estimated
• In other words, of all linear estimators that are unbiased,
OLS is the one with the lowest variance (i.e. the most
precise) 24
Lack of bias and efficiency (having a low standard
error) are both desirable properties for estimators

Would you ever prefer a biased estimator with a small 25


variance to an unbiased estimator with a large variance?
CLM (a.k.a. Gauss-Markov) Assumptions
for OLS to be BLUE
• Assumption 1: The model is linear in coefficients (parameters),
correctly specified, and has an additive error term
• Assumption 2: The error term has zero (population) mean
• Assumption 3: All explanatory variables are uncorrelated with
the error term
• Assumption 4: Observations of the error term are uncorrelated
with each other (no autocorrelation)
• Assumption 5: The error term has constant variance
(homoskedastic)
• Assumption 6: No perfect multicollinearity
• Assumption 7 (optional): Error term is normally distributed
26
NOTE: These assumptions apply to the population regression function, but
we must use the sample regression function to test them when possible (but
it is not always possible)
Assumption 1
• Assumption 1: The model is linear in coefficients
(parameters), correctly specified, and has an additive
error term
• Part 1: Model is linear in the coefficients…
• Slope coefficients (β’s) do not appear in an exponent,
get multiplied together, etc.
• We can model non-linear data with a linear model
• Which of the following models are not linear in the
coefficients?
Linear
Linear
Not Linear 27
Linear
Assumption 1 (Page 2)
• Part 2: …correctly specified…
• This can be violated if
• Relevant variables are excluded (omitted variable bias)
• Irrelevant variables are included (overspecification)
• The relationship between the dependent and
independent variables is non-linear and you are
estimating a linear model (or vice versa)
• Subgroup differences exist – e.g. the impact of a
variable is different for men & women – but this has
not been taken into account in the model
• Part 3 …additive error term…
• The error term gets added on at the end of the
28
function (it doesn’t multiply the function, etc.)
Assumption 2: Error Zero Mean
• Error term has zero (population) mean

• Our model is:

• The Y-intercept () in the population regression function


ensures this assumption is met

29
Assumption 3: No Omitted Variables
• All explanatory variables are uncorrelated with the
error term
• Known as the Zero Conditional Mean Assumption

• The expected value of the error term, given X, is 0


• An implication: no omitted variable bias

30
31
Assumption 4: No autocorrelation
• Observations of the error term are uncorrelated with
each other

• No autocorrelation/serial correlation
• Observations of the residual must be independent of
each other
• Relevant for time series and panel estimators
• Important for making statistical inferences about
confidence intervals
32
Assumption 5: Homoskedastic, no
heteroskdasticity
• The error term has constant variance, or is
homoskedastic
• Formally:

• Our hypothesis tests are constructed based on this


assumption
• If it is violated, we cannot make valid statistical
inferences using the normal OLS standard errors
• This assumption is often violated (when this happens,
errors are called heteroskedastic), but there are ways to
deal with this 33
An example (from the Stata tutorial)
𝐹𝑎𝑟 𝑒𝑖 = 𝛽0 + 𝛽1 𝐷𝑖𝑠 𝑡 𝑖 + 𝛽 2 𝑃𝑎𝑠𝑠𝑒 𝑛𝑖 +𝜖𝑖
300
200
Residuals
100
0
-100

0 2000 4000 6000 8000


avg. passengers per day

34

Estimated equation: Fare = 108.79 + .075Dist - .007 Passen


Does the chart above show evidence of heteroskedasticy?
Assumption 6: No Perfect Multicollinearity
• No perfect multicollinearity
• Perfect multicollinearity happens if one variable is a
linear combination of other variables in the model
• Why do the following situations lead to perfect
multicollinearity?
• You run a regression with % White, % Black, and %
Other Race as variables
• You run a regression with Total Sales and Sales Tax
Receipts for a sample of stores in New York State
• You run a regression that includes all four regions of a
county.
• When this happens, the OLS estimator cannot
35
distinguish between the variables
• Stata will drop variables and/or observations
Assumption 7: Error Term Normally
Distributed; CLT takes care of it
• The error term is normally distributed
• “Optional” because not required for OLS to be BLUE; only
required for valid hypothesis testing
• This is commonly assumed to hold because:
• The error term is most likely a combination of various
small influencers and, as the number of these
influencers grows, normality is more likely
• But, as important, CLT will usually hold for our sample
sizes (n > 100)
• It’s important to assume normality for the t-test and F- 36

test to work in small samples


Summary
• Assumption 1 – correctly specified linear model
• Assumption 2 – error term has 0 mean
Error • Assumption 3 – no omitted variables
term is • Assumption 4 – no autocorrelation
i.i.d. • Assumption 5 – homoskedastic errors
• Assumption 6 – no perfect multicollinearity
• Assumption 7 – normality of error term

• i.i.d. = Independent and identically distributed


• Shorthand for all assumptions is:
37
IV. STATISTICAL PROPERTIES OF OLS 38
Again: OLS Estimator is BLUE
• Estimator: it is a formula or “recipe” for estimating the
association between Y and Xk

• Linear: linear in parameters

• Unbiased: the expected value of the sampling distribution of


is β
• comes from the house where β lives
• Given repeated calculations of the coefficient (using a different
sample of same size each time), the average of these repeated
calculations would be the value of the population parameter, or
“true” β
39
• Best: minimum variance estimator, it minimizes the sum of
squared residuals
Sampling Distribution
• What is the difference between a sample distribution
and a sampling distribution?
• Sample distribution: the probability distribution of raw
data
• Sampling distribution: the probability distribution of a
statistic calculated from raw data, such as or
• In regression, we estimate two statistics for the sampling
distribution of
• The mean of (which is equal to if the regression is
unbiased)
40
• The standard error of
Sampling Distribution Example
• In the California school district example, imagine that the
true relationship for all 420 districts is given by:

• Let’s take a random sample of 30 districts and estimate


this relationship
• If the estimator is unbiased, will our estimate of be ?
• Probably not
• If not, what is meant by the estimator being unbiased?
• It means that . If we take a large number of repeated
samples of size 30, the mean should equal -2.28 41
Example: Population Relationship

720
700
680
660
640
620
600

14 16 18 20 22 24 26
str

testscr Fitted values

42
Example: Sample 1

720
700
680
660
640
620
600

14 16 18 20 22 24 26
str

testscr Fitted values

43
Sample 2

720
700
680
660
640
620
600

14 16 18 20 22 24 26
str

testscr Fitted values

44
0
1
2
3
4
5
< -10
-10 to -9
-9 to -8
-8 to -7
10 Samples

-7 to -6
-6 to -5
-5 to -4
-4 to -3
-3 to -2
and collect the coefficients

-2 to -1
-1 to 0
0 to 1
1 to 2
2 to 3
3 to 4
Mean

4 to 5
5 to 6
6 to 7
10 Simulations

7 to 8
8+
• We can continue this simulation as many times as we like

45
0
5
10
15
20
25
< -10
-10 to -9
-9 to -8
-8 to -7
-7 to -6
100 Samples

-6 to -5
-5 to -4
-4 to -3
-3 to -2
and collect the coefficients

-2 to -1
-1 to 0
0 to 1
1 to 2
2 to 3
3 to 4
Mean

4 to 5
5 to 6
6 to 7
100 Simulations

7 to 8
8+
• We can continue this simulation as many times as we like

46
500 Samples
• We can continue this simulation as many times as we like
and collect the coefficients

100
90 500 Simulations
80 Mean
70 As we include more random
60 samples in our simulation:
50 1) Mean gets closer to (i.e. -2.28)
40 2) Distribution looks more like a
30 normal distribution
20
10
0
< -10
-10 to -9

-1 to 0
0 to 1
1 to 2
2 to 3
3 to 4
4 to 5
5 to 6
6 to 7
7 to 8
-9 to -8
-8 to -7
-7 to -6
-6 to -5
-5 to -4
-4 to -3
-3 to -2
-2 to -1

8+
47
Central Limit Theorem: Review
• What is it?
• Large samples of same size (n=100) repeatedly drawn from the
population will be distributed normally
• This is true even if the variable being measured is not normally
distributed in the population

• Why is it important?
• It allows us to conduct hypothesis tests no matter the shape of a
variable’s underlying distribution

48
V. VARIANCE OF ERROR AND SLOPE
49
COEFFICIENTS
Variance of Error and Slope Coefficients
• If estimated regression slopes have a sampling distribution,
they have a mean and a variance

• Variance of regression slopes and their square roots (called


standard errors or SEs) can be calculated

• What is the difference between a SE and a standard deviation


(SD)?
• Standard errors measure the dispersion of a sampling distribution
• Standard deviation is a measure of dispersion for a sample
distribution
50
Variance of β-hat

• Var( ) is a direct function of (variance of error term)


because (in the bivariate case):

Var( )=

• OLS minimizes Var( )

• Relationship between variance and standard error:


51
BREAK: Please answer these questions and if there is time, we will go over them in this class
.

4.What is attractive about the B and U qualities of the OLS BLUE estimator?

5.What does iid stand for?

6.To which three assumptions of the OLS model does iid relate?

7.What is a sampling distribution?

8.What is the central limit theorem?

52
4.What is attractive about the B and U qualities of the OLS BLUE estimator?

These estimators are unbiased (on average, the sample estimated coefficients will equal the population coefficients) and these
estimators produce minimum variance coefficients, meaning that the distribution of sample coefficients around their population
value is the smallest possible (among linear unbiased estimators).

5.What does iid stand for?

Independently and identically distributed.

6.To which three assumptions of the OLS model does iid relate?

Independent – no autocorrelation between successive error terms (from successive observations)


Independent – no correlation error term and any X.
Identical – no heteroskedasticity or unequal variances of error terms (as X differs for example)

7.What is a sampling distribution?

The sampling distribution is the distribution of a sample statistic (such as the sample mean or an estimated regression coefficient)
that one would obtain from infinite number of samples of the same size taken from one population.

Example – take infinite number of samples of size 900 from population of US and calculate relationship between weight and number
of fast food meals per week. Sample coefficient on fast food meals will differ in each sample, but there will be a distribution of the
estimated coefficients called the sampling distribution of the estimated coefficient.

8.What is the central limit theorem?

With a large enough sample size (usually just 100), sampling distribution (of sample statistic) is normally distributed even if
underlying distribution of the variable is not.

53

You might also like