MFIN 305 - Lecture1
MFIN 305 - Lecture1
Methods of Finance
A Brief Overview of the Classical Linear Regression Model
1. What is a Regression Model?
It is the relationship between a given variable and one or more variables
x
4. Line of Best Fit
• The equation of a straight line is used to get the line of best fit:
𝑦 = 𝛼 + 𝛽𝑥 (Perfect fit)
• 𝛼 represents the value of 𝑦 if 𝑥 is zero
• Interpretation of 𝛽: if 𝑥 increases by 1, 𝑦 will be expected, all else
being equal, to increase by 𝛽
• However, it is unrealistic that 𝑥 alone explains 𝑦
4. Line of Best Fit
• Line of best fit:
Too broad as a definition
ൠ ⇒Imprecise
Room for disagrement about "best" fit
• The values of 𝛼 and 𝛽 that place the line as close as possible to all the
points / data are chosen
4. Line of Best Fit
𝑦𝑡 = 𝛼 + 𝛽𝑥𝑡 + 𝑢𝑡 t = 1,2,3, …, T
• 𝛼 and 𝛽 should minimize the distance (vertical) from the data points to
the fitted line
5. Ordinary Least squares
• The most common method of fitting a line to the data is Ordinary
Least Squares (OLS)
• OLS is the workhorse of econometrics/data science!
• OLS entails measuring the vertical distance from a point to the line,
squaring it and then minimizing the sum of squares (hence “least
squares”)
5. Ordinary Least squares
Let:
• 𝑦𝑡 denote the actual data point
• 𝑦ො𝑡 denote the fitted value from a regression line, the value predicted by
the model for a given value x
• ^ denote estimated or fitted values (any value that I calculate)
• 𝑢ො 𝑡 denote the residual: 𝑢ො 𝑡 = 𝑦𝑡 − 𝑦ො𝑡
• We will Min 𝐿 to find the values of 𝛼ො and 𝛽መ that give the line that is
ෝ ,𝛽
𝛼
closest to the data
• From (2): መ 𝑡 =0
σ 𝑦𝑡 − 𝛼ො − 𝛽𝑥 (4)
σ 𝑦𝑡 − 𝑇𝛼ො − 𝛽መ σ 𝑥𝑡 = 0 (5)
Use σ 𝑦𝑡 = 𝑇𝑦 and σ 𝑥𝑡 = 𝑇𝑥 to write:
𝑇𝑦 − 𝑇𝛼ො − 𝑇𝛽 𝑥 = 0 (6)
መ =0
Or 𝑦 − 𝛼ො − 𝛽𝑥 (7)
6. Deriving the OLS Estimator
መ 𝑡 =0
• From (3): σ𝑡 𝑥𝑡 𝑦𝑡 − 𝛼ො − 𝛽𝑥 (8)
መ
• From (7): 𝛼ො = 𝑦 − 𝛽𝑥 (9)
• Substituting into (8) for 𝛼ො from (9) yields:
σ 𝑥𝑡 (𝑦𝑡 − 𝑦 + 𝛽𝑥መ − 𝛽𝑥 መ 𝑡) = 0 (10)
መ σ 𝑥𝑡 − 𝛽መ σ 𝑥𝑡2 = 0
σ 𝑥𝑡 𝑦𝑡 − 𝑦 σ 𝑥𝑡 + 𝛽𝑥 (11)
መ 𝑥ҧ 2 − 𝛽መ σ 𝑥𝑡2 = 0
σ 𝑥𝑡 𝑦𝑡 − 𝑇𝑥 𝑦 + 𝛽𝑇 (12)
6. Deriving the OLS Estimator
መ
• Rearrange for 𝛽:
𝛽መ 𝑇𝑥ҧ 2 − σ𝑡 𝑥𝑡 2 = 𝑇 𝑥 𝑦 − σ 𝑥𝑡 𝑦𝑡 (13)
σ 𝑥𝑡 −𝑥 (𝑦𝑡 −𝑦)
መ
• More intuitively: 𝛽 = Sample cov(x,y)
σ𝑡(𝑥𝑡 −𝑥)ҧ 2
Sample variance of x
6. Deriving the OLS Estimator
σ 𝑥𝑡 −𝑥ҧ (𝑦𝑡 −𝑦)
• Note: Sample covariance 𝜎𝑥𝑦 =
(𝑇−1)
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3
7. What do we use 𝛼ො and 𝛽መ For?
45
Excess return on fund XXX
40
35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
7. What do we use 𝛼ො and 𝛽መ For?
• Using the data, calculate 𝛼ො and 𝛽መ
• We will get 𝛼ො = -1.74 and 𝛽መ = 1.64
𝑦𝑡 𝛼 𝑥𝑡
10. Linearity
• 𝛽 is an elasticity (strictly, unit changes on a logarithmic scale)
• Say 𝛽መ = 1.2, then a rise in x of 1% will lead, on average, everything else
being equal, to a rise in 𝑦 of 1.2%
• If the theory suggests that should be inversely related to a model of the
form:
𝛽
𝑦𝑡 = 𝛼 + + 𝑢𝑡
𝑥𝑡
You can estimate the regression by OLS by setting:
1
𝑧𝑡 =
𝑥𝑡
• Some models are intrinsically non-linear: 𝑦𝑡 = 𝛼 + 𝛽𝑥𝑡𝛿 + 𝑢𝑡
11. Estimator or Estimate?
Estimator Estimate
Actual numerical values for the
Formulae used to calculate the
coefficients obtained from the
coefficients
sample
12. The Classical Linear Regression Model
𝑦𝑡 = 𝛼 + 𝛽𝑥𝑡 + 𝑢𝑡
• After assumptions are made about 𝑢𝑡 , this becomes the classical linear
regression model (CLRM)
Assumptions Interpretation
(1) E 𝑢𝑡 = 0 The errors have zero mean
The variance of the errors is constant and finite over all
(2) var 𝑢𝑡 = 𝜎 2 < ∞
values of 𝓍𝑡
(3) cov 𝑢𝑖 , 𝑢𝑗 = 0 The errors are linearly independent of one another
There is no relationship between the error and
(4) cov 𝑢𝑡 , 𝑥𝑡 = 0
corresponding x variate
12. The Classical Linear Regression Model
• If assumption (1) holds, assumption (4) can be written as E 𝑥𝑡 𝑢𝑡 = 0
(i) This implies that the regressor is orthogonal (i.e. unrelated to) the error
term
(ii) An alternative assumption to (4), which is slightly stronger is that the 𝑥𝑡
are non-stochastic or fixed in repeated samples (i.e. no sampling variation
in 𝑥𝑡 and its value is determined outside the model)
T=300
0 T=100 T=200
13. Properties of the OLS Estimator
Unbiasedness
OLS estimates are unbiased
𝐸 𝛼ො = 𝛼 On average, the estimated values for the
መ ቋ coefficients will be equal to their true
𝐸 𝛽 =𝛽
values
• No systematic over or under-estimation of true coefficients
• Proving unbiasedness requires that cov(𝑥𝑡 , 𝑢𝑡 ) = 0
• It is a stronger condition than consistency since it holds for small as
well larger samples
13. Properties of the OLS Estimator
Efficiency
• An estimator of parameter 𝛽 is said to be efficient if no other
estimators has a smaller variance
• Minimizing the probability that it will be far away from the true value
• Estimator’s probability distribution is Efficient / Best
narrowly dispersed around the true value
0
14. Precision and Standard Errors
• Set of estimates 𝛼ො and 𝛽መ are specific to the sample used in their
estimation
• Different sample → different (𝑥𝑡 , 𝑦𝑡 ) → different 𝛼ො and 𝛽መ
መ How good these estimates are?
• Measure the reliability of 𝛼ො and 𝛽.
• Precision (confidence in the estimates) → standard error
• Would estimates vary from one sample to another (out of the
population)?
• Precision can be calculated using available data
14. Precision and Standard Errors
Given assumptions 1 to 4, valid estimators of the standard errors are
given by:
σ 𝑥𝑡2 σ 𝑥𝑡2
𝑠𝑒 𝛼ො = 𝑠 = 𝑠
𝑇 σ(𝑥𝑡 − 𝑥) 2 𝑇 σ 𝑥𝑡2 − 𝑇 2 𝑥ҧ 2
1 1
𝑠𝑒 𝛽መ = 𝑠 2
=𝑠
σ(𝑥𝑡 − 𝑥) σ 𝑥𝑡 2 − 𝑇𝑥ҧ 2
𝜎 2 = 𝑣𝑎𝑟 𝑢𝑡 = 𝐸(𝑢𝑡2 )
• 𝑢𝑡 is unobservable (population disturbances), thus:
1
𝑠2 = σ 𝑢ො 𝑡2 Biased estimator of 𝜎 2 , but consistent
𝑇
ෝ𝑡2
σ𝑢 ෝ𝑡2
σ𝑢
𝑠2 = ⇒𝑠= standard error of the regression or
𝑇−2 𝑇−2
standard error of the estimate
• s is a broad measure of the fit of the regression equation.
• The smaller s is, the closer is the fit of the line to the actual data
16. Comments on the Standard Error
Estimators
σ 𝑥𝑡2 σ 𝑥𝑡2
𝑠𝑒 𝛼ො = 𝑠 2
=𝑠 2
𝑇 σ(𝑥𝑡 − 𝑥) σ 2
𝑇(( 𝑥𝑡 ) − 𝑇 𝑥 )
መ , se(𝛼)
1. As the sample size T , se(𝛽) ො
2. As 𝑠 2 , se(𝛽)መ , se(𝛼)
ො → The larger the residuals are, the worse is
the fit of the line
3. The larger σ(𝑥t − 𝑥ҧ )2 is, the smaller the coefficient variances
→ More variation in the x’s around their mean is better
The importance of the deviation of x from its
mean
y
y
y y
x 0 x x
0 x
Aside: Accuracy of the intercept estimate
y
0 x
17. How to Calculate the Parameters and
Standard Errors
Assume the following data have been calculated from a regression of y
on a single variable 𝑥 and a constant over 22 observations:
σ 𝑥𝑡 𝑦𝑡 = 830102, 𝑇 = 22, 𝑥 = 416.5, 𝑦 = 86.65
σ 𝑥𝑡2 = 3919654, 𝑅𝑆𝑆 = 130.6
σ 𝑥𝑡 𝑦𝑡 −𝑇 𝑥ҧ 𝑦ത 830102−(22⨯416.5⨯86.65)
መ
•𝛽= 2 = = 0.35
2
σ 𝑥𝑡 −𝑇 𝑥 3919654−22(416.5) 2
σ 𝑥𝑡2 3919654
• 𝑠𝑒 𝛼ො = 𝑠 2 = 2.55 ⨯ = 3.35
𝑇 σ 𝑥𝑡2 −𝑇 2 𝑥 22⨯(3919654−22⨯416.52 )
1 1
• 𝑠𝑒 𝛽መ = 𝑠 σ 𝑥𝑡2 −𝑇𝑥 2
= 2.55 ⨯ = 0.0079
3919654−22⨯416.52
• v𝑎𝑟(𝛼)
ො and v𝑎𝑟(𝛽) መ are unknown. Thus, their counterparts, the calculated
standard errors of the coefficient estimates 𝑠𝑒(𝛼) መ are used
ො and 𝑠𝑒(𝛽)
𝛼ො − 𝛼 𝛽መ − 𝛽
~𝑡𝑇−2 and ~𝑡𝑇−2
𝑠𝑒(𝛼)
ො መ
𝑠𝑒(𝛽)
21. A Note on the t and the Normal
Distribution
• The normal distribution is “bell shaped”, it
is symmetric around the mean 𝑓(𝓍)
ො 𝛽መ and 𝑠𝑒 𝛼ො , 𝑠𝑒(𝛽)
(1) Estimate 𝛼, መ
𝐻0 : 𝛽 = 𝛽 ∗
(2) State the null and alternative hypothesis: ቊ
𝐻1 : 𝛽 ≠ 𝛽 ∗
and complete the test statistic:
𝛽−𝛽 ∗
test statistic=
𝑠𝑒(𝛽)
5% rejection region
95% non-rejection region
- t-critical value 𝛽∗ = 0
22. The Test of Significance Approach
𝐻0 : 𝛽 = 𝛽 ∗
Rejection region for a one-sided hypothesis test: ቊ
𝐻1 : 𝛽 > 𝛽 ∗
𝛽መ
5% rejection region
95% non-rejection region
𝛽∗ = 0 t-critical value
22. The Test of Significance Approach
𝛽መ − 𝛽∗
𝑡𝑒𝑠𝑡 𝑠𝑡𝑎𝑡𝑖𝑠𝑡𝑖𝑐 =
መ
𝑠𝑒(𝛽)
• If: (1) 𝑠𝑒(𝛽) መ , test statistic (we are confident in the estimate)
(2) 𝛽መ is far away from 𝛽∗ , test statistic
• Degrees of freedom: number of pieces of information (additional) beyond
minimum
• If two parameters are estimated, a minimum of two data points are needed
• If the degrees of freedom , critical value , one is more confident of the
results of the hypothesis test
• 𝛼 = significance level = size of the test
23. The Confidence Interval Approach to
Hypothesis Testing
• Say 𝛽መ = 0.93 and the 95% confidence interval for 𝛽 is (0.77, 1.09)
• This means that in many repeated samples, 95% of the time, the true
but unknown value of 𝛽 will be contained within this interval
• Confidence intervals are always for true but unknown population
parameters. Given a level of significance, we have a 1 − 𝛼 %
confidence interval. 1 − 𝛼 is the level of confidence.
𝛼 = 1% 99% CI
𝛼 = 5% 95% CI
𝛼 = 10% 90% CI
23. The Confidence Interval Approach to
Hypothesis Testing
• Confidence intervals and test of significance approaches always give
the same conclusion for a given 𝛼
𝐻0 : 𝛽 = 𝛽 ∗
• ∗ ቋ CIs are always used for two-sided hypotheses
𝐻1 : 𝛽 ≠ 𝛽
• The null hypothesis will not be rejected if the test-statistic lies in the
non-rejection region
23. The Confidence Interval Approach to
Hypothesis Testing
• The confidence interval contains the same information (but
rearranged) as the test of significance approach:
𝛽መ − 𝛽 ∗
−𝑡𝑐𝑟𝑖𝑡 ≤ ≤ +𝑡𝑐𝑟𝑖𝑡
𝑠𝑒 𝛽መ
𝛽መ − 𝑡𝑐𝑟𝑖𝑡 . 𝑠𝑒 𝛽መ ≤ 𝛽 ∗ ≤ 𝛽መ + 𝑡𝑐𝑟𝑖𝑡 . 𝑠𝑒(𝛽)
መ
-2.086 2.086
26. Hypothesis Testing: Example
What if in the example before: 𝛼 = 10%?
• 𝑡20,10% = ±1.725
• Given that the test-statistic = 1.917, the null is rejected
Note:
• Test of significance approach → better if the size of test 𝛼 changes
• CI approach → useful for testing many hypothesis at a given 𝛼
26. The Exact Level of Significance Approach
• Commonly known as the p-value approach
• Gives the marginal significance level where one would be indifferent
between rejecting and not rejecting the null hypothesis
• If the test statistic is large in absolute value, the p-value will be small
and vice-versa
26. The Exact Level of Significance
Approach
• Informally, the p-value is also often referred to as the probability of
being wrong when the null hypothesis is rejected
• A p-value of 0.05 or less leads the researcher to reject the null
hypothesis
• The p-value is also termed the plausibility of the null hypothesis
• Decision rule: if p-value < 𝛼, reject H0
27. Terminology: Significance and t-ratio
• Reject 𝐻0 at 𝛼 = 5% → The result of the test or the estimate of the
parameter is statistically significant
𝐻0 : 𝛽 = 0
𝐻1 : 𝛽 ≠ 0
𝛽
• The t-ratio is
𝑠𝑒(𝛽)
• The ratio of the coefficient to its standard error is known as the t-ratio
28. The Errors That we can make using
Hypothesis Tests
• We usually reject 𝐻0 if the test statistic is statistically significant at a
chosen significance level
̂
29. Frequency Distribution of t-ratios of Mutual
Fund Alphas (gross of transactions costs)
29. Frequency Distribution of t-ratios of Mutual Fund
Alphas (net of transactions costs)
30. Workspace Application
• Obtaining alpha and beta for a mutual fund/ETF
• Obtaining other quantitative measures for a mutual fund/ETF