Linear Regression
Linear Regression
I θ = (w , b) : parameters
I w : weights
I b : bias
b can be absorbed into w by defining w = [b, w1 , ..., wD ] and x = [1, x1 , ..., xD ], so
that
f (x; θ) = w > x
x can be replaced by a non-linear function of the inputs φ(x) called basis expansion
function.
f (x; θ) = w > φ(x)
The general form of the linear regression model with all observations:
ŷ = Xw + b
I N : Number of observations
I D : number of features
I ŷ ∈ RN : predictions
I X ∈ RN×D : inputs (design matrix)
I w ∈ RD : weights
I b ∈ R : bias
When the bias b is absorbed
ŷ = Xw
Loss function - Least squares
Goal:
Find the parameters w that minimize the residual sum of squares (loss)
N N
1X 1X
RSS(w ) = (yi − f (xi ))2 = (yi − w > xi )2
2 2
i=1 i=1
y = w >x +
where is the residual error between the predictions and the true response (unmodeled
effects/random noise). We assume has a Gaussian distribution ∼ N (0, σ 2 ).
where θ = (w , σ 2 ).
We estimate the parameters using MaximumQN Likelihood Estimation. We want the
parameters that maximizes the likelihood i=1 p(yi |xi ; θ). It is easier to minimize the
Negative log likelihood
N
X
NLL(θ) = − log p(yi |xi ; θ)
i=1
It can be shown that minimizing the NLL is equivalent to minimizing the RSS.
Ordinary Least Squares
Our loss function is
N
1X 1 1
J(w ) = RSS(w ) = (yi − w > xi )2 = ||Xw − y ||22 = (Xw − y )> (Xw − y )
2 2 2
i=1
The inverse should not be computed directly. X > X can be singular or ill-conditioned.
There are alternatives:
I SVD
I QR decomposition
Explore further
I Polynomial regression (other basis expansions)
I Weighted linear regression
I Bayesian linear regression