0% found this document useful (0 votes)
20 views

Linear Regression

Uploaded by

eduardonare700
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Linear Regression

Uploaded by

eduardonare700
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Bedrocks of Quantitative Finance: The Linear Regression

Riley Dunnaway (03/28/24)

Abstract
An exploration of linear regression and various methods of derivation. Exploration
of ordinary least squares, method of moments, and the method of maximum likelihood.
These estimation methods represent staples in the tool belt of all quantitative analysts
and warrant deep mathematical understanding.

1 Motivation for Linear Regression


Suppose we have two data sets corresponding to variables x and y we suspect are linearly
related. The general equation for a line is given by

y = α + βx (1)

The goal of linear regression is to fit a line of form (1) to our data. In the real world this
line will never fit our data perfectly, so we must introduce an error variable, ϵ into equation
(1):

y = α + βx + ϵ (2)
The methods we will discuss focus on minimizing the error term, and yielding the most
accurate linear representation of our data.

2 Ordinary Least Squares


Note that the error, ϵi for each observation xi is equal to the difference between the model
output ŷi and the actual data value yi . One method involves minimizing the sum of all ϵ̂2i .
Notice we square the error so all error is positive and negative/positive errors don’t cancel
out.
The residual sum of squares is given by:
N
X N
X
ϵ̂2i = (yi − ŷi )2 (3)
i=1 i=1

Where N is equal to the number of data observations.


N
X N
X N
X
ϵ̂2i = 2
(yi − ŷi ) = (yi − α̂ − β̂xi )2 (4)
i=1 i=1 i=1

To minimize this residual sum of squares, we take the first derivatives with respect to α̂
and β̂ and set them equal to 0 to find maxes and mins.

1
N N
∂ X X
(yi − α̂ − β̂xi )2 = −2 (yi − α̂ − β̂xi ) = 0 (5)
∂ α̂ i=1 i=1
N N
∂ X 2
X
(yi − α̂ − β̂xi ) = −2 (yi − α̂ − β̂xi )(xi ) = 0 (6)
∂ β̂ i=1 i=1

From equation (6):


N
X N
X N
X
−2 (yi − α̂ − β̂xi ) = yi − N α̂ − β̂ xi = N ȳ − N α̂ − N β̂ x̄ = ȳ − α̂ − β̂ x̄ = 0 (7)
i=1 i=1 i=1

Where the bar notation indicates the mean of all y or x data values. Since equation (8)
shows α̂ = ȳ − β̂ x̄, we substitute this into equation (7):
N
X N
X N
X
−2 (yi − α̂ − β̂xi )(xi ) = (yi − α̂ − β̂xi )(xi ) = (yi − ȳ + β̂ x̄ − β̂xi )(xi )
i=1 i=1 i=1
N N N N N N
(8)
X X X X X X
= xi yi − ȳ xi + β̂ x̄ xi − β̂ x2i = xi yi − N ȳx̄ + β̂N x̄2 − β̂ x2i = 0
i=1 i=1 i=1 i=1 i=1 i=1

Solving equation (9) for β̂ gives:


PN
xi yi − N x̄ȳ
β̂ = PtN1 2
(9)
2
i=1 xi − N x̄

And there we have it! equations (8) and (10) give us the values of α̂ and β̂, or the
equation of the line which minimizes error, in terms of the mean values of the x and y data
sets.

3 Method of Moments
This same result can be achieved by using moments with one additional condition. In
statistics a moment is a quantitative measure on our dataset, namely mean, variance, skew-
ness, and kurtosis. In this case, we will be looking at the mean.
Return to the problem set up in section one:

y = α + βx + ϵ (10)

Our assumption for the method of moments will be that the error is normally distributed
with a mean of 0, i.e.:
ϵ ∼ N (0, σ 2 ) (11)
Given this assumption, one can easily see that the expected value of ϵi would be 0. So,

E(ϵi ) = 0 (12)

2
Rearranging equation (11), we see that y − α − βx = ϵ, so by taking the expected value
and substituting equation (13) we see
E(yi − α − βxi ) = E(ϵi ) = 0 (13)
Since ϵ is normally distributed, we also see
E(ϵi xi ) = 0 (14)
Then plugging in equation (14) into (15),
E((yi − α − βxi )xi ) = 0 (15)
Lastly, notice that the variance of our error, ϵ2 should have the expected value of our
standard deviation squared by definition.
E(ϵ2i ) = σ 2 (16)
Now by rewriting the expected values of equations (14), (16), and (17) as the mean
of our data, we get the following. Notice the introduction of hat notation to differentiate
coefficients of the sample means from those of population moments.
N
1 X
(yi − α̂ − β̂xi ) = 0
N i=1
N
1 X
xi (yi − α̂ − β̂xi ) = 0 (17)
N i=1
N
1 X 2
ϵ̂ = σ̂ 2
N i=1 i

Notice the similarity between (18) and equations (6) and (7) derived in the least squares
method. By following the results of the previous section, we see that the special case of
normally distributed error yields the same coefficients as ordinary least squares but is a
biased estimator.

4 Method of Maximum Likelihood


One final method of for determining linear regression coefficients uses the likelihood
function, or joint-density function, from statistics. Given the same set of assumptions used
in Section 3, we get the likelihood function
N
Y 1 1 2
L= 1 e− 2σ2 ϵi
i=1 (2π) σ 2

1 1 PN 2
(18)
= e− 2σ2 i=1 ϵi
(2π)N/2 σ N

1 1 PN 2
= e− 2σ2 i=1 (yi −α−βxi )
(2π)N/2 σ N

3
To maximize this likelihood function, we want the exponential term to be minimized. To
do this we take the partial derivatives of the exponent with respect to α and β and set them
equal to 0.
N
∂ X
(yi − α − βxi )2 = 0
∂α i=1
N
(19)
∂ X
(yi − α − βxi )2 = 0
∂β i=1

Notice again the similarity to the least squares equations (7) and (8). By solving (19), we
get our linear regression coefficients.

5 Comparisons and Conclusion


Given the similarities that appear in the above derivations, one may wonder about the
differences between the three methods. Firstly, it is important to notice that the Method
of Moments and Method of Maximum Likelihood are only equivalent to the Least Squares
estimation given the normality assumption. When working with non-normal distributions
of error, one can not assume equivalence between the three derivations.
So when is one method more proper than another? Least squares is viewed as accessible
and easily applied in most cases, especially with normal distributions of error. However,
the method of moments and maximum likelihood may be more desirable with alternative
error distributions. While the method of moments is also fairly accessible, the method of
maximum likelihood is generally viewed as the most desirable when computation allows for
it. When comparing the method of maximum likelihood to least squares, one must consider
the end goal before determining regression method. One method returns the line with the
least amount of error while the other returns the most statistically likely line. Both can be
useful, but one may be more appropriate based on the distributions of data.

You might also like