Ordinary Least Squares
Ordinary Least Squares
Introduction
• Describe the nature of financial data.
• Assess the concepts underlying
regressions analysis
• Describe some examples of financial
models.
• Examine the Ordinary least Squares
(OLS) technique and hypothesis testing
Time Series data
• Examples of Problems that Could be Tackled Using a
Time Series Regression
- How the value of a country’s stock index has varied
with that country’s macroeconomic fundamentals.
- How the value of a company’s stock price has
varied when it announced the value of its dividend
payment.
- The effect on a country’s currency of an increase in
its interest rate
Cross-Sectional data
• Cross-sectional data are data on one or more
variables collected at a single point in time, e.g.
- A poll of usage of internet stock broking
services
- Cross-section of stock returns on the New
York Stock Exchange
- A sample of bond credit ratings for UK banks
Model Estimation
Economic or Financial Theory (Previous Studies)
Collection of Data
Model Estimation
No Yes
0 18
3 21
4 20
3 23
4 25
6 27
4 26
6 28
5 30
6 32
Regression
Chart Title
40
35
y = 2.4024x + 15.606
30
market index
25
market index
20
Linear (market index)
15
10
0
0 2 4 6 8
interest rate
Econometric Model
yt xt ut
yt dependent var iable
cons tan t
slope parameter
xt exp lanatory var iable
ut error term
Estimates
yˆ t 0.7 0.8 xt
1 unit rise in xt gives a 0.8 unit rise in y t
The Residual Term
• (Also called the error term and disturbance term)
• It describes the random component of the
regression. It is caused by:
- Omission of explanatory variables
- The aggregation of the variables
- Mis-specification of the model
- Incorrect functional form of the model
- Measurement error
Least Squares Approach
• The aim of this approach is to minimize
the residual for all residuals
• We square the residual before minimizing
• We can then derive our intercept and
slope parameter using basic calculus
Regression
• Regression is the degree of dependency
of the dependent variable on the
explanatory variables
• Correlation measures the strength of a
linear association between two variables
• Causation suggests the dependent
variable depends on previous values of
the explanatory variable
• Regression does not imply causation
R-Squared Statistic
• This statistic explains the proportion of
total variation in the dependent variable
which is explained by the regression
• The statistic explains the explanatory
power of the regression and measures
how good a fit the data is.
• The value of this statistic lies between 0
and 1(when all the scatter plots lie on the
regression line.)
Interpretation of Results
• Consider the type of model being estimated
• What are the units of measurement of the
variables (unless all the variables are in
logarithmic form)
• The range of observations
• The signs of the variables, does it accord with
the theoretical model
• Are the magnitudes of the parameters plausible
• We need to remember it is only a model, the
parameters are estimates, so it describes
average values, individual cases may vary
Significance Testing
yˆ t 0.7 1.2 xt
(0.7) (0.4)
(Standard errors in parentheses)
Hypotheses Test
H 0 : ˆ 0
H : ˆ 0
1
ˆ 0 1.2 0
T 3
SE ( ˆ ) 0.4
Critical value is 1.98
3 1.98, reject H 0
Hypothesis Testing
• Test the significance of the constant term in the same
way as the slope parameter
• Although the conventional test for significance is at the
5% level, we also test at the 1% and 10% levels
• Use of the t-distribution tells us what to expect ‘by
chance’
• For finite samples, when applying the t-test, we need to
allow for degrees of freedom
• The t-test can be applied to either one or two tailed tests
• The t-test is an absolute value, so we can ignore the sign
The T-test
• If the t-test statistic exceeds the critical
value, reject the null hypothesis, if we are
testing if the coefficient equals zero, this
means it is significant
• If the test statistic is below the critical
value accept the null hypothesis.
• To find the critical value, you need to know
the degrees of freedom, which equal n-k-
1.
1.20
Interpretation of Coefficients, b1 and b2
yt = 1 + 2 x t + t
1. yt = 1 + 2x t + t
2. E( t) = 0 <=> E(yt I xt) = 1 + 2x t
3. var( t) = 2 = var(yt)
b1 = y - b2x
E(b1) = 1
1.30
Equivalent expressions for b2:
(xt x)yt y )
b2 =
xt x ) 2
var(b2) = 2
x t x
2
Given b1 = y b2x
the variance of the estimator b1 is:
2
x t
var(b1) = 2
x t x
2
1.33
Covariance of b1 and b2
2x
cov(b1,b2) =
x t x
2
What factors determine variance 1.34
and covariance of b1 and b2?
1. The larger the 2, the greater the uncertainty about b1 ,
b2 and their relationship.
2. The more spread out the xt values are then the more
confidence we have in b1, b2, etc.
3. The larger the sample size, T, the smaller the
variances and covariances.
4. The variance b1 is large when the (squared) xt values are
far from zero (in either direction).
5. Changing the slope, b2, has no effect on the intercept, b1 ,
when the sample mean is zero. But if sample mean is
positive, the covariance between b1 and b2 will be negative,
and vice versa.
1.35
Gauss-Markov Theorem
2
x 2
t
b1 ~ N 1 ,
x t x 2
2
b2 ~ N 2 ,
x t x 2
1.41
Consistency
^ε = yt b1 b2 x t
t
T
^2
^ ε
t =1
= t
T2
^ is an unbiased estimator of 2
1.43
The Least Squares
Predictor, y^o
x t
2 2
b1 ~ N 1 ,
x t x2
2
b2 ~ N 2 ,
x t x 2
1.45
2
b2 ~ N 2 ,
x t x
2
b2 2
var(b2)
Coefficient of 1.46
Determination
What proportion of the variation
in yt is explained?
2
0< R <1
2 SSR
R =SST
1.47
Coefficient of Determination
SST = SSR + SSE
SST SSR S
Dividing = +
by SST SST SST S
SSR
1 = SSE
SST SST
SSR 2 SSE
SST R =
SST =
1.48
Coefficient of Determination
R2 is only a descriptive
measure.
R2 does not measure the
quality
of the regression model.
Focusing solely on maximizing
R is not a good idea.
2
1.49
In simple linear regression models, there are two ways to test
H0: β2 = 0 vs HA: β2 ≠ 0
Note that
1. It can be show that t2(T-2) = F1, T-2
se(b1) = ^ 1)
var(b = 490.12 = 22.1287
se(b2) = ^ 2)
var(b = 0.0009326 = 0.0305
b1 40.7676
se(b1)
t =
22.1287
b2 0.1283
se(b2) 0.0305
t =
Regression Computer 1.52
Output
Sources of variation in the d
Table 6
Source
Explaine
Unexpla
Total
Regression Computer 1.53
Output
2
SST = (yty) = 79532
^
2
SSR = (yty) = 25221
^
2
SSE = εt = 54311
^ SSE /(T-2) = 2 =
SSR SSE
2
R = = 1
SST SST
1.54
Reporting Regression Results
2
R = 0.317
2
This R value may seem
typical in studies involving
data analyzed at the indiv
2
A considerably higher R value
expected in studies involving ti
analyzed at an aggregate or m