0% found this document useful (0 votes)
23 views

Lecture 2 SLR - 1

Uploaded by

1621995944
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Lecture 2 SLR - 1

Uploaded by

1621995944
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

ECON 7310 Elements of Econometrics

Lecture 2: Linear Regression with One Regressor

1 / 28
Outline:

▶ The population linear regression model (LRM)


▶ The ordinary least squares (OLS) estimator and the sample regression
line
▶ Measures of fit of the sample regression
▶ The least squares assumptions
▶ The sampling distribution of the OLS estimator

2 / 28
Linear Regression

▶ Linear regression lets us estimate the slope of the population regression


line.
▶ The slope of the population regression line is the expected effect on Y of
a unit change in X .
▶ Ultimately our aim is to estimate the causal effect on Y of a unit change
in X – but for now, just think of the problem of fitting a straight line to data
on two variables, Y and X .

3 / 28
Linear Regression

▶ The problem of statistical inference for linear regression is, at a general


level, the same as for estimation of the mean or of the differences
between two means.
▶ Statistical, or econometric, inference about the slope entails:
▶ Estimation:
How should we draw a line through the data to estimate the population
slope? Answer: ordinary least squares (OLS).
What are advantages and disadvantages of OLS?
▶ Hypothesis testing:
How to test if the slope is zero?
▶ Confidence intervals:
How to construct a confidence interval for the slope?

4 / 28
The Linear Regression Model SW Section 4.1

▶ The population regression line:

Test Score = β0 + β1 STR


▶ β1 = slope of population regression line
= change in test score for a unit change in student-teacher ratio (STR)
▶ Why are β0 and β1 “population” parameters?
▶ We would like to know the population value of β1 .
▶ We don’t know β1 , so must estimate it using data.

5 / 28
The Population Linear Regression Model

Consider
Yi = β0 + β1 Xi + ui
for i = 1, . . . , n
▶ We have n observations, (Xi , Yi ), i = 1, .., n.
▶ X is the independent variable or regressor or right-hand-side variable
▶ Y is the dependent variable or left-hand-side variable
▶ β0 = intercept
▶ β1 = slope
▶ ui = the regression error
▶ The regression error consists of omitted factors. In general, these
omitted factors are other factors that influence Y , other than the variable
X . The regression error also includes error in the measurement of Y .

6 / 28
The population regression model in a picture

▶ Observations on Y and X (n = 7); the population regression line; and


the regression error (the “error term"):

7 / 28
The Ordinary Least Squares Estimator (SW Section 4.2)

▶ How can we estimate β0 and β1 from data? Recall that was the least
squares estimator of µY solves,
n
X
min (Yi − m)2
m
i=1

▶ By analogy, we will focus on the least squares (“ordinary least squares”


or “OLS”) estimator of the unknown parameters β0 and β1 . The OLS
estimator solves,
Xn
min [Yi − (b0 + b1 Xi )]2
b0 ,b1
i=1

▶ In fact, we estimate the conditional expectation function E[Y |X ] under


the assumption that E[Y |X ] = β0 + β1 X

8 / 28
Mechanics of OLS

▶ The population regression line:

Test Score = β0 + β1 STR

9 / 28
Mechanics of OLS

▶ The OLS estimator minimizes the average squared difference between


the actual values of Yi and the prediction (“predicted value”) based on
the estimated line.
▶ This minimization problem can be solved using calculus (Appendix 4.2).
▶ The result is the OLS estimators of β0 and β1 .

10 / 28
OLS estimator, predicted values, and residuals

▶ The OLS estimators are


Pn
i=1 (Xi − X )(Yi − Y)
βb1 = Pn 2
i=1 (Xi − X )

βb0 = Y − βb1 X
▶ These are estimates of the unknown population parameters β0 and β1 .
▶ The OLS predicted (fitted) values Y
bi and residuals u
bi are

Y
bi = βb0 + βb1 Xi
bi = Yi − Y
u bi

▶ The estimated intercept, βb0 , and slope, βb1 , and residuals u


bi are
computed from a sample of n observations (Xi , Yi ) i = 1, . . . , n.

11 / 28
Predicted values & residuals

▶ One of the districts in the data set is Antelope, CA, for which
STR = 19.33 and TestScore = 657.8

predicted value: = 698.9 − 2.28 × 19.33 = 654.8


residual: = 657.8 − 654.8 = 3.0

12 / 28
OLS regression: R output

TestScore = 698.93 − 2.28 × STR


We will discuss the rest of this output later.

13 / 28
Measures of fit Section 4.3

▶ Two regression statistics provide complementary measures of how well


the regression line “fits” or explains the data:
▶ The regression R 2 measures the fraction of the variance of Y that is
explained by X ; it is unit free and ranges between zero (no fit) and one
(perfect fit)
▶ The standard error of the regression (SER) measures the magnitude of
a typical regression residual in the units of Y .

14 / 28
Regression R 2

▶ The sample variance of Yi = n1 ni=1 (Yi − Y )2


P

The sample variance of Y bi = 1 Pn (Y b )2 , where in fact Y


bi − Y b = Y.
n i=1
2
R is simply the ratio of those two sample variances.
▶ Formally, we define R 2 as follows (two equivalent definitions);
Pn b
Explained Sum of Squares (ESS) (Yi − Y )2
R 2 := = Pi=1
n
Total Sum of Squares (TSS) i=1 (Yi − Y )
2
Pn
Residual Sum of Squares (RSS) b2
u
R 2 := 1 − = 1 − Pn i=1 i
Total Sum of Squares i=1 (Yi − Y )
2

▶ R 2 = 0 ⇐⇒ ESS = 0 and R 2 = 1 ⇐⇒ ESS = TSS. Also, 0 ≤ R 2 ≤ 1


▶ For regression with a single X ,
R 2 = the square of the sample correlation coefficient between X and Y

15 / 28
The Standard Error of the Regression (SER)

▶ The SER measures the spread of the distribution of u. The SER is


(almost) the sample standard deviation of the OLS residuals (?)
v
u n
u 1 X
SER := t bi2
u
n−2
i=1

▶ The SER:
▶ has the units of ui , which are the units of Yi
▶ measures the average “size” of the OLS residual (the average “mistake”
made by the OLS regression line)
▶ The root mean squared error (RMSE) is closely related to the SER:
v
u n
u1 X
RMSE := t bi2
u
n
i=1

▶ When n is large, SER ≈ RMSE. 1

1
Here, n − 2 is the degrees of freedom – need to subtract 2 because there are two parameters
to estimate. For details, see section 18.4.
16 / 28
Example of the R 2 and the SER

▶ TestScore = 698.9 − 2.28 × STR, R 2 = 0.05, SER = 18.6


▶ STR explains only a small fraction of the variation in test scores.
▶ Does this make sense?
▶ Does this mean the STR is unimportant in a policy sense?

17 / 28
Least Squares Assumptions (SW Section 4.4)

▶ What, in a precise sense, are the properties of the sampling distribution


of the OLS estimator? When will it be unbiased? What is its variance?
▶ To answer these questions, we need to make some assumptions about
how Y and X are related to each other, and about how they are collected
(the sampling scheme)
▶ These assumptions – there are three – are known as the Least Squares
Assumptions.

18 / 28
Least Squares Assumptions (SW Section 4.4)

Yi = β0 + β1 Xi + ui , i = 1, . . . , n

1. The conditional distribution of u given X has mean zero, that is,


E(u|X = x) = 0.
▶ This implies that OLS estimators are unbiased
2. (Xi , Yi ), i = 1, · · · , n, are i.i.d.
▶ This is true if (X , Y ) are collected by simple random sampling
▶ This delivers the sampling distribution of βb0 and βb1
3. Large outliers in X and/or Y are rare.
▶ Technically, X and Y have finite fourth moments
▶ Outliers can result in meaningless values of βb1

19 / 28
Least squares assumption #1: E(u|X = x) = 0.

For any given value of X , the mean of u is zero:

Example: TestScorei = β0 + β1 STRi + ui , ui = other factors


▶ What are some of these “other factors”?
▶ Is E(u|X = x) = 0 plausible for these other factors?

20 / 28
Least squares assumption #1: E(u|X = x) = 0 (continued)

▶ A benchmark for thinking about this assumption is to consider an ideal


randomized controlled experiment:
▶ X is randomly assigned to people (students randomly assigned to
different size classes; patients randomly assigned to medical
treatments). Randomization is done by computer – using no information
about the individual.
▶ Because X is assigned randomly, all other individual characteristics –
the things that make up u – are distributed independently of X , so u and
X are independent
▶ Thus, in an ideal randomized controlled experiment, E(u|X = x) = 0
(that is, LSA #1 holds)
▶ In actual experiments, or with observational data, we will need to think
hard about whether E(u|X = x) = 0 holds.

21 / 28
Least squares assumption #2: (Xi , Yi ), i = 1, · · · , n are i.i.d.

▶ This arises automatically if the entity (individual, district) is sampled by


simple random sampling:
▶ The entities are selected from the same population, so (Xi , Yi ) are
identically distributed for all i = 1, . . . , n.
▶ The entities are selected at random, so the values of (X , Y ) for different
entities are independently distributed.
▶ The main place we will encounter non-i.i.d. sampling is when data are
recorded over time for the same entity (panel data and time series data)
– we will deal with that complication when we cover panel data.

22 / 28
Least squares assumption #3: Large outliers are rare
Technical statement: E(X 4 ) < ∞ and E(Y 4 ) < ∞

▶ A large outlier is an extreme value of X or Y


▶ On a technical level, if X and Y are bounded, then they have finite fourth
moments. (Standardized test scores automatically satisfy this; STR,
family income, etc. satisfy this too.)
▶ The substance of this assumption is that a large outlier can strongly
influence the results – so we need to rule out large outliers.
▶ Look at your data! If you have a large outlier, is it a typo? Does it belong
in your data set? Why is it an outlier?

23 / 28
OLS can be sensitive to an outlier

▶ Is the lone point an outlier in X or Y ?


▶ In practice, outliers are often data glitches (coding or recording
problems). Sometimes they are observations that really shouldn’t be in
your data set. Plot your data before running regressions!

24 / 28
The Sampling Distribution of the OLS Estimator (SW Section 4.5)

The OLS estimator is computed from a sample of data. A different sample


yields a different value of βb1 . This is the source of the “sampling uncertainty”
of βb1 . We want to:
▶ quantify the sampling uncertainty associated with
▶ use βb1 to test hypotheses such as β1 = 0
▶ construct a confidence interval for β1
▶ All these require figuring out the sampling distribution of the OLS
estimator.

25 / 28
Sampling Distribution of βb1

▶ We can show that βb1 is unbiased, i.e., E[βb1 ] = β1 . Similarly for βb0 .
▶ We do not derive V (βb1 ), as it requires some tedious algebra. Moreover,
we do not need to memorize the formula of it. Here, we just emphasize
two aspects of V (βb1 ).
▶ First, V (βb1 ) is inversely proportional to n, just like V (Y n ). Combining
p
E[βb1 ] = β1 , it is then suggested that βb1 −→ β1 , i.e., βb1 is consistent.
That is, as sample size grows, β1 gets closer to β1 .
b
▶ Second, V (βb1 ) is inversely proportional to the variance of X ; see the
graphs below.

26 / 28
Sampling Distribution of βb1

Low x variation High x variation


⇒ low precision ⇒ high precision

▶ Intuitively, if there is more variation in X , then there is more information


in the data that you can use to fit the regression line.

27 / 28
Sampling Distribution of βb1

▶ The exact sampling distribution is complicated – it depends on the


population distribution of (Y , X ) – but when n is large we get some
simple (and good) approximations:
▶ Let SE(βb1 ) be the standard error (SE) of βb1 , i.e., a consistent estimator
q
for the standard deviation of βb1 which is V (βb1 )
▶ Then, it turns out that

βb1 − β1 approx
∼ N (0, 1)
SE(βb1 )
▶ Using this approximate distribution, we can conduct statistical inference
on βb1 , i.e., hypothesis testing, confidence interval ⇒ Ch5.

28 / 28

You might also like