Lecture 2. Simple Linear Regression
Lecture 2. Simple Linear Regression
MODEL
1
REGRESSION
Regression is probably the single most important tool at the econometrician’s
disposal.
But what is regression analysis?
It is concerned with describing and evaluating the relationship between a given
variable (usually called the dependent variable) and one or more other variables
(usually known as the independent variable(s)).
Some Notation
Denote the dependent variable by y and the independent variable(s) by x1, x2, ... , xk
where there are k independent variables.
Some alternative names for the y and x variables:
y x
dependent variable independent variables
regressand regressors
effect variable causal variables
explained variable explanatory variable
2
Note that there can be many x variables but we will limit ourselves to the case where
there is only one x variable to start with.
REGRESSION IS DIFFERENT FROM CORRELATION
If we say y and x are correlated, it means that we are treating y
and x in a completely symmetrical way.
In regression, we treat the dependent variable (y) and the
independent variable(s) (x’s) very differently.
The y variable is assumed to be random or “stochastic” in
some way, i.e. to have a probability distribution.
The x variables are, however, assumed to have fixed (“non-
stochastic”) values in repeated samples.
3
SIMPLE REGRESSION
For simplicity, say k=1. This is the situation where y depends on only one x variable.
Examples of the kind of relationship that may be of interest include:
How asset returns vary with their level of market risk
Measuring the long-term relationship between stock prices and
dividends.
Constructing an optimal hedge ratio
Example: suppose that we have the following data on the excess returns on
a fund manager’s portfolio (“fund XXX”) together with the excess returns
on a market index:
Year, t Excess return Excess return on market index
= rXXX,t – rft = rmt - rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3
We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between x4
and y given the data that we have. The first stage would be to form a scatter
plot of the two variables.
GRAPH (SCATTER DIAGRAM)
45
40
Excess return on fund XXX
35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
5
FINDING A LINE OF BEST FIT
We can use the general equation for a straight line,
y=a+bx to get the line that best “fits” the data.
However, this equation (y=a+bx) is completely deterministic.
Is this realistic?
x 7
ORDINARY LEAST SQUARES
The most common method used to fit a line to the data is known as OLS (ordinary
least squares).
What we actually do is take each distance and square it (i.e. take the area of each of
the squares in the diagram) and minimise the total sum of the squares (hence least
squares).
Tightening up the notation, let
yt denote the actual data point t
ŷt denote the fitted value from the regression line
ût denote the residual, yt - ŷt
y
yi
û i
ŷ i
xi x
8
Actual vs fitted
HOW OLS WORKS
5
So min. uˆ uˆ uˆ uˆ uˆ , or minimise t
2 2 2 2 2
1 2 3 4 5 ˆ
u 2
. This is
known as the residual sum of squares. t 1
9
DERIVING THE OLS ESTIMATOR
But yˆ t ˆ ˆx t , so let L ( y t yˆ t ) ( y t ˆ ˆx t )
2 2
t i
L (2)
ˆ
2 xt ( yt ˆ xt ) 0
ˆ t
From (1), (y t ˆ ˆ x t ) 0 y t T ˆ ˆ x t 0
t
But y t Ty and x t Tx .
10
DERIVING THE OLS ESTIMATOR (CONT’D)
So we can write Ty Tˆ Tˆx 0 or y ˆ ˆx 0 (3)
From (2), xt ( y t ˆ ˆxt ) 0 (4)
t
xt ( yt y ˆx ˆxt ) 0
t
t t t
x y y x ˆ
x t
x ˆ
t 0
x
2
t t
x y T y x ˆ
T x 2
ˆ
t 0
x
2
11
DERIVING THE OLS ESTIMATOR (CONT’D)
Rearranging for ,
So overall we have:
ˆ
xt y t T x y
and ˆ y ˆx
xt2 T x 2
0 x 14
THE POPULATION AND THE SAMPLE
The population is the total collection of all objects or people to be studied,
for example,
15
THE DATA GENERATING PROCESS (DGP) AND THE PRF
The population regression function (PRF) is a description of the model
that is thought to be generating the actual data and the true relationship
between the variables (i.e. the true values of and ).
16
LINEARITY
In order to use OLS, we need a model which is linear in the parameters
( and ). It does not necessarily have to be linear in the variables (y
and x).
Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.
Y t e X t e u t ln Y t ln X t u t
yt xt ut
17
LINEAR AND NON-LINEAR MODELS
Estimator or Estimate?
Estimators are the formulae used to calculate the coefficients
Estimates are the actual numerical values for the coefficients.
18
THE ASSUMPTIONS UNDERLYING THE
CLASSICAL LINEAR REGRESSION MODEL (CLRM)
The model which we have used is known as the classical linear
regression model.
We observe data for xt, but since yt also depends on ut, we must be
specific about how the ut are generated.
We usually make the following set of assumptions about the ut’s (the
unobservable error terms):
Assumptions Interpretation
1. E(ut) = 0 The errors have zero mean
2. Var (ut) = 2 The variance of the errors is constant and finite
over all values of xt (homoskedasticity)
3. Cov (ui, uj)=0 The errors are statistically independent of
one another
4. Cov (ut, xt)=0 No relationship between the error and
corresponding x variate
19
THE ASSUMPTIONS UNDERLYING THE CLRM AGAIN
Additional Assumption
5. ut is normally distributed
20
PROPERTIES OF THE OLS ESTIMATOR
21
CONSISTENCY/UNBIASEDNESS/EFFICIENCY
Consistent
The least squares estimators and are consistent.
i.e., the estimates will converge to their true values as the sample size
increases to infinity.
Unbiased
The least squares estimates and are unbiased.
E( ) = and E( ) =
Unbiasedness is a stronger condition than consistency.
Efficiency
An estimator of parameter is said to be efficient if it is unbiased and
no other unbiased estimator has a smaller variance.
If the estimator is efficient, we are minimising the probability that it is a
long way off from the true value of .
22
PRECISION AND STANDARD ERRORS
Recall that the estimators of and from the sample parameters ( and
) are given by:
ˆ t 2 t
x y Tx y
andˆ y ˆx
xt T x 2
x xt
2 2
SE (ˆ ) s t
s ,
T (x x)
t
2
T x T x
2
t
2 2
1 1
SE ( ˆ ) s s
( xt x ) 2
t
x 2
T x 2
T
But this estimator is a biased estimator of s2.
An unbiased estimator of is given by
s
uˆ 2
t
24
T 2
EXAMPLE: HOW TO CALCULATE THE PARAMETERS AND STANDARD ERRORS
Calculations:
830102 (22 * 416.5 * 86.65)
2 0.35
3919654 22 *(416.5)
SE(regression),
s
uˆ t2
130.6
2.55
T 2 20
3919654
SE ( ) 2.55 * 3.35
22 3919654 22 416.5
2
1
SE ( ) 2.55 * 0.0079
3919654 22 416.5
2
yˆ t 59.12 0.35 xt
(3.35) (0.0079)
26
STATISTICAL INFERENCE
We want to make inferences about the likely population values from the
regression parameters.
27
HYPOTHESIS TESTING
We can use the information in the sample to make inferences about the
population.
We will always have two hypotheses that go together, the null hypothesis
(denoted by H0) and the alternative hypothesis (denoted by H1).
The null hypothesis is the statement or the statistical hypothesis that is
actually being tested.
The alternative hypothesis represents the remaining outcomes of interest.
For example, suppose given the regression results above, we are interested
in the hypothesis that the true value of is in fact 0.5.
We would use the notation:
H0 : = 0.5
H1 : 0.5
This would be known as a two sided test.
28
ONE-SIDED HYPOTHESIS TESTS
Sometimes we may have some prior information that, for example, we
would expect > 0.5 rather than < 0.5.
In this case, we would do a one-sided test:
H0 : = 0.5
H1 : > 0.5
or we could have had
H0 : = 0.5
H1 : < 0.5
There are two ways to conduct a hypothesis test:
via the test of significance approach or
via the confidence interval approach.
29
DISTRIBUTION OF THE LEAST SQUARES ESTIMATORS
Since the least squares estimators are linear combinations of the random
variables, i.e. w y
t t
The weighted sum of normal random variables is also normally
distributed, so:
N(, Var())
N(, Var())
30
DISTRIBUTION OF THE LEAST SQUARES ESTIMATORS (CONT’D)
ˆ and ˆ
~ tT 2 ~ tT 2
SE (ˆ ) ˆ
SE ( )
31
TESTING HYPOTHESES: SIGNIFICANCE APPROACH
32
THE TEST OF SIGNIFICANCE APPROACH (CONT’D)
3. We need some tabulated distribution with which to compare the
estimated test statistics.
Test statistics derived in this way can be shown to follow a t-distribution with T-2
degrees of freedom.
As the number of degrees of freedom increases, we need to be less cautious in our
approach since we can be more sure that our results are robust.
f(x)
34
THE REJECTION REGION FOR A 1-SIDED TEST
f(x )
Upper tail
9 5 % n on -re je c tio n
5 % re jec tio n re gio n
f(x)
5% rejection region
95% non-rejection region Lower tail
35
TEST OF SIGNIFICANCE APPROACH: DRAWING CONCLUSIONS
36
THE CONFIDENCE INTERVAL APPROACH TO HYPOTHESIS TESTING
37
HOW TO CARRY OUT A HYPOTHESIS TEST USING CONFIDENCE INTERVALS
3. Use the t-tables to find the appropriate critical value, which will again
have T-2 degrees of freedom.
( ˆ t crit SE ( ˆ ), ˆ t crit SE ( ˆ ))
5. Perform the test: If the hypothesised value of (*) lies outside the
confidence interval, then reject the null hypothesis that = *, otherwise
do not reject the null.
38
CONFIDENCE INTERVALS VERSUS TESTS OF SIGNIFICANCE
Note that the Test of Significance and Confidence Interval approaches
always give the same answer.
*
t crit tcrit
SE ( )
SE(
Rearranging, we would not reject if
t crit SE ( ˆ ) ˆ * t crit SE ( ˆ )
But this is just the rule under the confidence interval approach.
ˆ t crit SE ( ˆ ) * ˆ t crit SE ( ˆ ) 39
EXAMPLE
Using the regression results above,
yˆ t 20.3 0.5091xt
(14.38) (0.2561) , T=22
Using both the test of significance and confidence interval approaches, test
the hypothesis that =1 against a two-sided alternative.
2 .5 % re je c tio n re g io n 2 .5 % re je c tio n re g io n
40
-2 .0 8 6 + 2 .0 8 6
PERFORMING THE TEST
41
CHANGING THE SIZE OF THE TEST
But note that we looked at only a 5% size of test. In marginal cases
(e.g. H0 : = 1), we may get a completely different answer if we use a
different size of test. This is where the test of significance approach is
better than a confidence interval.
For example, say we wanted to use a 10% size of test. Using the test of
significance approach, *
test stat
SE ( )
05091
. 1
1917
.
0.2561
as above. The only thing that changes is the critical t-value.
f(x )
5 % re je c tio n r e g io n 5 % re je c tio n r e g io n
42
- 1 .7 2 5 + 1 .7 2 5
CHANGING THE SIZE OF THE TEST: THE CONCLUSION
t20;10% = 1.725. So now, as the test statistic lies in the rejection region, we
would reject H0.
Reality
H0 is true H0 is false
Significant Type I error
Result of (reject H0) =
Test Insignificant Type II error
( do not =
reject H0)
44
THE TRADE-OFF BETWEEN TYPE I AND TYPE II ERRORS
So there is always a trade off between type I and type II errors when45
choosing a significance level. The only way we can reduce the chances of
both is to increase the sample size.
A SPECIAL TYPE OF HYPOTHESIS TEST: THE T-RATIO
Recall that the formula for a test of significance approach to hypothesis
testing using a t-test was:
i i*
test statistic
SE i
If the test is H 0 : i = 0
H1 : i 0
i.e. a test that the population coefficient is zero against a two-sided
alternative, this is known as a t-ratio test:
i
test stat
Since i* = 0, SE ( i )
46
EXAMPLE OF T-RATIO
47
WHAT DOES THE T-RATIO TELL US?
xt
48
See lab exercise 1 for practical lesson!
49