0% found this document useful (0 votes)
13 views75 pages

Week 3-4

The document discusses the principles of simple linear regression, focusing on the population model, ordinary least squares (OLS) estimation, and the assumptions required for unbiasedness and efficiency of the estimators. It explains the relationship between dependent and independent variables, the importance of error terms, and the conditions under which OLS provides the best linear unbiased estimators (BLUE). Additionally, it covers statistical inference, including hypothesis testing using t-tests under the classical linear model assumptions.

Uploaded by

devaau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views75 pages

Week 3-4

The document discusses the principles of simple linear regression, focusing on the population model, ordinary least squares (OLS) estimation, and the assumptions required for unbiasedness and efficiency of the estimators. It explains the relationship between dependent and independent variables, the importance of error terms, and the conditions under which OLS provides the best linear unbiased estimators (BLUE). Additionally, it covers statistical inference, including hypothesis testing using t-tests under the classical linear model assumptions.

Uploaded by

devaau
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 75

Simple Linear

Regression
Population Model
• Cross-sectional analysis
• Assume that sample is collected randomly from the population.
• We want to know how y varies with changes in x.
• What if y is affected by factors other than x.
• What is the functional form.
• How we can distinguish causality from correlation?
• Consider the following model, that hold in the population:
Population Model
• We allow for other factors to affect y by including u (error term).
• If the other factors in u are held fixed, ∆u = 0, then x has a linear
effect on y.
• Linearity: a one unit change in x has the same effect on y.

• The goal from empirical work is to estimate and (population


parameters).
• and are not directly observable.
• We estimate and using data and ASSUMPTIONS.
A simple assumption

The average value of u, the error term, in the population is 0: E(u) = 0

This is not a restrictive assumption, since we can always use 0 to
normalize E(u) to 0.

Show this!
Zero conditional mean / Mean
independence

We need to make a crucial assumption about how u and x are related

We want it to be the case that knowing something about x does not
give us any information about u, so that they are completely
unrelated.

E(u|x) = E(u) = 0, which implies

E(y|x) = 0 + 1x

This is the most crucial, challenging assumption for the interpretation
of 1 as a causal parameter.
E(y|x) as a linear function of x, where for any x
the distribution of y is centered about E(y|x)

f(y)

. E(y|x) = 0 + 1x

x1 x2
Ordinary Least Squares

Basic idea of regression is to estimate the population parameters
from a sample

Let {(xi,yi): i = 1, …,n} denote a random sample of size n from the
population

For each observation in this sample, it will be the case that: yi = 0 +
1xi + ui

ui is unobserved.
To calculate the estimates of the coefficients The regression equation that estimates
that minimize the differences between the data the equation of the first order linear model
points and the line, use the formulas: is:

cov(XX,,YY))
cov(

bb11 
s22
s xx bb00 bb11xx
ŷŷ 
yy bb11xx
bb00 
Example 17.1 Relationship between odometer
reading and a used car’s selling price.
• A car dealer wants to find
Car Odometer Price
the relationship between
1 37388 5318
the odometer reading and
2 44758 5061
the selling price of used cars. 3 45833 5008
• A random sample of 100 cars is selected, 4 30862 5795
and the data recorded. 5 31705 5784
• Find the regression line. 6 34010 5359
. . .
. . .
. . .

Independent variable x
Dependent variable y
9
Solution
• Solving by hand
• To calculate b0 and b1 we need to calculate several statistics first;

x 36,009.45; s 2x 
 ( x i  x) 2
43,528,688
n 1

y 5,411 .41; cov( X , Y ) 


 ( x  x )( y
i i  y)
 1,356,256
n 1
where n = 100.

cov(X , Y )  1,356,256
b1    .0312
s 2x 43,528,688
b 0 y  b1x 5411.41  (  .0312)(36,009.45) 6,533
ŷ b 0  b1x 6,533  .0312x
10
Alternative approach in deriving OLS
Estimates

To derive the OLS estimates we need to realize that our
main assumption of E(u|x) = E(u) = 0 also implies that

Cov(x,u) = E(xu) = 0

Why? Remember from basic probability that Cov(X,Y) =
E(XY) – E(X)E(Y).

We can write our 2 restrictions just in terms of x, y, 0 and
 , since u = y – 0 – 1x
Alternate approach, continued

If one uses calculus to solve the minimization problem for the
two parameters you obtain the following first order
conditions, which are the same as we obtained before,
multiplied by n:
Deriving OLS continued

We can write our 2 restrictions just in terms of x, y, 0 and  ,
since u = y – 0 – 1x
E(y – 0 – 1x) = 0
E[x(y – 0 – 1x)] = 0


These are called moment restrictions.
• and are the estimates from the data.
More Derivation

Plug into the second equation!


Summary of OLS slope
estimate

The slope estimate is the sample covariance between x and
y divided by the sample variance of x.

If x and y are positively correlated, the slope will be positive

If x and y are negatively correlated, the slope will be
negative

Only need x to vary in our sample
More OLS

Intuitively, OLS is fitting a line through the sample points
such that the sum of squared residuals is as small as
possible, hence the term least squares.

The residual, û, is an estimate of the error term, u, and is
the difference between the fitted line (sample regression
function) and the sample point
Sample regression line, sample data points
and the associated estimated error terms
A short simulation
Residuals and fitted values are uncorrelated, by construction!
Algebraic Properties of OLS

The sum of the OLS residuals is zero, coefficients were
optimally chosen to ensure that the residuals sum to zero.

Thus, the sample average of the OLS residuals is zero as well.

The sample covariance (correlation) between the regressors
and the OLS residuals is zero.

Because fitted values are linear functions of the , fitted
values and residuals are uncorrelated too.

The OLS regression line always goes through the mean of
the sample.

If we plug we predict , that is the point is on the OLS regression
line:
Algebraic Properties of OLS
• Residuals sum to zero!
• , since

• Sample covariance between x and residuals is always zero:

• The fitted values and residuals are uncorrelated too:

• The OLS regression line always goes through the mean of the sample.
Goodness of Fit

How do we think about how well our sample regression line fits our sample data?

Can compute the fraction of the total sum of squares (SST) that is explained by
the model, call this the R-squared of regression.

We can think of each observation as being made up of an unexplained part, and
an explained part
• We then define the following:

is the total sum of squares (SSE).

is the explained sum of squares(SSE).

is the residual sum of squares (SSR).


Proving SST = SSE + SSR
Goodness of Fit
• Then SST = SSE + SSR

R2 = SSE/SST = 1 – SSR/SST

Coefficient determination. It is interpreted as the fraction of the sample
variation in y that is explained by x.
• An R2 of zero means no linear relationship between and
• R2 of one means a perfect linear relationship.
• As R2 increases, the are closer and closer to falling on the OLS regression line.
• R2 never decreases when x is added into the model.
• R2 is a useful summary measure, it does not tell us about causality.
Unbiasedness of OLS
• So far, when we apply OLS on a sample, residuals always average to
zero, regardless of any underlying model.
• Now, we will study the statistical properties of the OLS estimator,
referring to a population model and assuming random sampling.
• How do estimators behave across different samples of data?
• Will we get the right answer if we repeatedly sample?
• We need to find the expected value of across all possible random
samples, and determine whether we are right, on average.
• Unbiasedness:
Unbiasedness of OLS
• is estimator for a specific sample.
• Different samples will generate different .
• Unbiasedness means if we could take as many random samples as we
want and compute each time, the average of the estimates would
be .
Unbiasedness of OLS

Assume the population model is linear in parameters: y = 0 + 1x + u.

Assume a random sample of size n, {(xi, yi): i=1, 2, …, n}, from the
population.

Thus we can write the sample model yi = 0 + 1xi + ui

Assume E(u|x) = 0 and thus E(ui|xi) = 0

Assume there is variation in the xi

How do we show OLS estimator is unbiased, ?
The last term is the slope coefficient from regression of ui on xi .
But this is an imaginary regression since ui is unobserved.
Monte Carlo Simulation

• Suppose we have the following population model:


y = 3 + 2x + u

Where ,
x and u are independent

• We will estimate OLS 1000 times


Monte Carlo Simulation
Unbiasedness of OLS
• Unbiasedness is a property of the procedure of the rule, it is not a
property of the estimate!
• Proof of unbiasedness depends on all of the four assumptions.
Variance of the OLS estimators
• Now, we need to capture the uncertainty in the sampling process.
• The dispersion in the sampling distribution of the estimators.
• The assumptions before are not sufficient to tell us anything about
the variance in the estimator.
• Assume to simplify calculation: Homoscedasticity/constant variance
• u has the same variance given any value of x: V(u|x) =
• is the variance of the stuff other than x that influence y.
Homoskedastic Case
y
f(y|x)

. E(y|x) = 0 + 1x

x1 x2
Heteroskedastic Case
f(y|x)

y .
. E(y|x) = 0 + 1x

.
x1 x2 x3 x
Variance of OLS estimators

The average value of y is allowed to change with x.


The variance does not change with x (homoscedastic).
Sampling variance of OLS

Read Wooldridge section 2-5B (variance of the OLS estimators) to


derive the above results.
Sampling variance of
• is not valid if homoscedastic errors does not hold.

• Remember, homoscedasticity is not used to show unbiasedness!!

• As increased, so does ; the more noise in the relationship between y and x (i.e.
the larger the variability in u), the harder it is to learn something about 1.

• As SSTx rises, decreases; more variation in xi is good.

• Now, we need to estimate , the error variance.


Estimating
• Replace each ui with its estimate

Note that and


The unbiased estimator of uses degrees
of freedom adjustment:
The unbiased estimator of under the FIVE ASSUMPTIONS is:

The standard error of the regression (an estimate of the standard


deviation of the error in the regression):

STATA calls it the root mean square error (RSME).


Given , we can now estimate sd(0) and sd(1)
Gauss-Markov Assumptions
1. Linear in parameters
2. Random sampling
3. Sample variation in x:
4. Zero conditional mean/conditional independence: E(u|x) = 0
5. Homoscedasticity
Under the five assumptions, OLS estimators are Best Linear Unbiased
Estimators (BLUE)
-Best: In the class of LUE, OLS has the smallest variance.
- None will be better than OLS
Robust Standard Errors
• Homoscedasticity is an exception. In real life, errors are
heteroscedastic.
• Unbiasedness does not depend on the assumption about the variance
of the error.
• If errors are heteroscedastic:
Robust Standard Errors
• A valid estimator of under heteroscedasticy of any form (including
homoscedasticity) is

• Option “robust” in stata.


Statistical Inference
Assumption of the Classical Linear
Model (CLM)
• So far, we know that given the Gauss-Markov assumptions, OLS is
BLUE.
• To do classical hypothesis testing, we need to add another
assumption (beyond the Gauss-Markov). Why?
• Under Gauss Markov assumption, the distribution of can be any shape.
• Assume that u is independent of and u is normally distributed with
zero mean and variance : u ∼ Normal(0, ).
• So, assumption of CLM is : Gauss-Markov assumptions + normality
assumption
CLM Assumptions (cont’d)
• Under CLM, OLS is not only BLUE, but is the minimum variance among
ALL unbiased estimator (not necessarily among linear estimators).
• We can summarize the population assumptions of CLM as follows:
• Conditional on x, y has normal distribution with mean (linear in x) and a
constant variance ().
y|x ∼ Normal()
• Normality is sometimes not the case.
• Nonnormality of the errors is not a serious problem with large sample
sizes.
Normal Sampling Distributions
• Under the CLM assumptions,
The t test
• Under the CLM assumptions:
• Note this is a t distribution (vs normal) because we estimate by .
• Knowing the sampling distribution for the standardized estimator
allows us to carry out hypothesis tests.
• Start with a null hypothesis
• For example,
• If fail to reject null, then has no effect on y, controlling for other x’s.
T-test (cont’d)
• To perform our test we first need to form the t statistics for βj :

• We will then use our t statistic along with a rejection rule to


determine whether to “accept” the null hypothesis.
T-test: One sided alternatives
• Besides our null, , we need an alternative hypothesis, , and a
significance level.
• may be one-sided, or two-sided.
• and are one-sided.
• and are two-sided.
• If we want to have only a 5% probability of rejecting if it is true, then
we say our significance level is 5%.
T-test: One sided alternatives
• Having picked a significance level, we look up the (1 − α)th percentile
in a t distribution with n – k – 1 df and call this c, the critical value.
• We can reject the null hypothesis if the t statistic is greater than the
critical value.
• If the t statistic is less than the critical value then we fail to reject the
null.
One-Sided Alternatives (cont)
One sided vs two-sided
• Because the t distribution is symmetric, testing is straightforward.
The critical value is just the negative of before.
• We can reject the null if the t-stat < –c, and if the t-stat > –c then we
fail to reject the null.
• For a two-sided test, we set the critical value based on α/2 and reject
if the absolute value of the t-stat > c.
Two-Sided Alternatives
Summary for
• Unless otherwise stated, the alternative is assumed to be two-sided.
• If we reject the null, we typically say “ is statistically significant at the
α% level”
• If we fail to reject the null, we typically say “ is statistically
insignificant at the α% level”.
Computing p-values for t tests
• An alternative to the classical approach is to ask, “what is the smallest
significance level at which the null would be rejected?”
• So, compute the t statistic, and then look up what percentile it is in
the appropriate t distribution – this is the p-value.
• p-value is the probability we would observe the t statistic we did, if
the null were true.
Confidence Interval
• Another way to use classical statistical testing is to construct a
confidence interval using the same critical value as was used for a
two-sided test.
• A (1 − α)% confidence interval is defined as , where c is the (1 − α/2 )
percentile in a distribution.
Testing Other Hypothesis
• A more general form of the t statistic recognizes that we may want to
test something like
• In this case, the appropriate t statistic is
• for the standard test.
Stata and p-values, t tests, etc.
• Most computer packages will compute the p-value for you, assuming
a two-sided test.
• If you really want a one-sided alternative, just divide the two-sided p-
value by 2.
• Stata provides the t statistic, p-value, and 95% confidence interval for
for you, in columns labeled “t”, “P > |t|” and “[95% Conf. Interval]”,
respectively.
Regression with Stata
The F-stat

In a regression model with k independent variables,
H0: 1 = 2 =…= k = 0
H1: H0 is not true (at least one of the s is different from
zero.
How to proceed?
-
t-stat tests a hypothesis that puts no restrictions on the other
parameters
-
Further, we would have three t-stats. What constitute a rejection
at 5% level?
The F-stat
• Run restricted and unrestricted model, find the SSR.
• How much SSR increases when we drop q variables from the model 
restricted model.
• Whether the increase in SSR is large enough relative to the SSR in the
model with all of the variables  unrestricted model.

• F stat measuring the relative increase in the SSR when moving from
the unrestricted to the restricted model.
F-stat from R-squared
• Sometimes it is more convenient to compute F-stat using R-squares
than SSR.
• SSR = SST(1-)
F-stat (cont’d)

Just as with t statistics, p-values can be calculated by looking
up the percentile in the appropriate F distribution

If only one exclusion is being tested, then F = t2, and the p-
values will be the same.

If Ho is fail to be rejected, this means that we must look for
other variables to explain y.
The F-statistic for Overall
Significance of a Regression
• We use F-statistics


Small R-squared sometimes results in a highly significant F stat.

That’s why we must look F-stat for joint significance on top of R-
squared.

You might also like