CS464 Ch9 LinearRegression
CS464 Ch9 LinearRegression
Linear Regression
X
Regression
• Assume the data is generated by a function that generates a
value for the outcome variable (y) using the values of the
features (x), plus some error
x
Regression
Target Y
Target Y
Feature X Feature X
• Linear regression
x1
Predicted value
• Loss on n training examples: Actual value
Ordinary Least Squares (OLS)
Loss Function on Two Features
Least Squares Linear Fit to Data
• Most popular estimation method is least squares:
– Determine linear coefficients w0, w that minimize sum
of squared loss (SSL)
– Use standard (multivariate) differential calculus:
• differentiate SSL with respect to w0, w
• find zeros of each partial differential equation
• solve for w0, w
Minimize the Squared Loss
• Minimize the empirical squared loss:
Direct Minimization
• Minimize the empirical squared loss:
Maximum likelihood
setting of the parameters
i.e. the mean squared prediction error.
Numerical Solution
• Matrix inversion is computationally very expensive
– Θ(n3) for n features
Stopping
condition
Comments on gradient descent
Comments on gradient descent
Variations on gradient descent
• Batch Gradient Descent
– Update weights after calcualting the error for each
example (epoch)
– Pros: Stable convergence, computationally efficient
– Cons: Might stuck in a local minima
• Stochastic Gradient Descent
– Recalculate the weights for each sample
– Pros: Might avoid local minima
– Cons: Error jumps around, computationally expensive
• Mini batch Gradient Descent
– Hybrid approach
Extending Application of Linear Regression
• The inputs X for linear regression can be:
– Original quantitative inputs
– Transformation of quantitative inputs, e.g. log, exp,
square root, square, etc.
– Polynomial transformation
• Example: y = w0 + w1⋅x + w2⋅x2 + w3⋅x3
– Dummy coding of categorical inputs
• Binary variable for each value of the categorical variable
– Interactions between variables
• Example: x3 = x1 ⋅ x2 This is what statisticians
call an interaction
• This allows use of linear regression techniques to fit much
more complicated non-linear datasets.
Non-linear functions
Basis Functions
Different Basis Functions
Example of fitting polynomial curve with linear model