0% found this document useful (0 votes)
58 views36 pages

TMI04.2 Linear Regression PDF

Uploaded by

Mohd Nomaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views36 pages

TMI04.2 Linear Regression PDF

Uploaded by

Mohd Nomaan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Linear regression

with gradient descent

Ingmar Schuster
Patrick Jähnichen
using slides by Andrew Ng

Institut für Informatik


This lecture covers

● Linear Regression
● Hypothesis formulation,
hypthesis space
● Optimizing Cost with Gradient
Descent
● Using multiple input features
with Linear Regression
● Feature Scaling
● Nonlinear Regression
● Optimizing Cost using
derivatives

Linear regression w. gradient descent 2


Linear Regression

Institut für Informatik


Price for buying a flat in Berlin

● Supervised learning problem


● Expected answer available for each example in data
● Regression Problem
● Prediction of continuous output
Linear regression w. gradient descent 4
Training data of flat prices

● m Number of training examples


Square meters Price in 1000€
● x is input (predictor) variable 73 174
„features“ in ML-speek
146 367
● y is output (response) variable 38 69
124 257
● Notation ... ...

Linear regression w. gradient descent 5


Learning procedure

● Hypothesis parameters
Training data
● linear regression,
one input variable (univariate)

Learning Algorithm

Size Estimated
of flat price

hypothesis
(mapping between
How to choose parameters?
input and output)

Linear regression w. gradient descent 6


Optimization objective

● Purpose of learning algorithm expressed in


optimization objective and cost function (often called J)

● Fit data well ● Few false negatives


● Few false positives ● ...
Fitting data well: least squares cost function

● In regression almost always want to fit data well


● smallest average distance to points in training data
(h(x) close to y for (x,y) in training data)
● Cost function often named J
Number
Number of of
training
training instances
instances

● Squaring
– Penalty for positive and negative deviations the same
– Penalty for large deviations stronger
Linear regression w. gradient descent 8
Optimizing Cost
with Gradient Descent

Linear regression w. gradient descent 9


Gradient Descent Outline

● Want to minimize
● Start with random
● Keep changing to reduce
until we end up at minimum

Linear regression w. gradient descent 10


3D plots and contour plots

Stepwise
Stepwise
descent
descent
towards
towards
minimum
minimum

[plot by Andrew Ng]


Derivatives
Derivatives
work
work only
only for
for
few
few parameters
parameters
Gradient descent
partial
partial
derivative
derivative
beware: incremental
update incorrect!

steps
steps become
become smaller
smaller
without
without changing
changing
Linear regression w. gradient descent learning
learning rate
rate 12
Learning Rate considerations

● Small learning rate leads to slow


convergence

● Overly large learning rate may


not lead to convergence or to
divergence
● Often

Linear regression w. gradient descent 13


Checking convergence

● Gradient descent works


correctly if decreases
with every step
● Possible convergence
criterion: converged if
decreases by less than
constant

Linear regression w. gradient descent 14


Local Minima

● Gradient descent can get stuck at local minima


(e.g. J not squared error for regression with only one variable)

Random restart
X with different
parameter(s)

Linear regression w. gradient descent 15


Variants of Gradient Descent

Using multiple input features

Linear regression w. gradient descent 16


Multiple features
Square Bedrooms Floors Age of building Price in
meters (years) 1000€
x1 x2 x3 x4 y
200 5 1 45 460
131 3 2 40 232
142 3 2 30 315
756 2 1 36 178
… … … … …

● Notation

Linear regression w. gradient descent 17


Hypothesis representation

● More compact

with definition

Linear regression w. gradient descent 18


Gradient descent for multiple variables

● Generalized cost function


● Generalized gradient descent

Linear regression w. gradient descent 19


Partial derivative of cost function for multiple variables

● Calculating the partial derivative

Linear regression w. gradient descent 20


Gradient descent for multiple variables

● Simplified gradient descent

Linear regression w. gradient descent 21


Conversion considerations for multiple variables

● With multiple variables, comparison of variance in data is lost


(scales can vary strongly)

Square meters 30 - 400


Bedrooms 1 - 10
80 000
Price -
2 000 000

● Gradient descent converges faster for features on similar scale

Linear regression w. gradient descent 22


Feature Scaling

Linear regression w. gradient descent 23


Feature scaling

● Different approaches for converting features to comparable


scale
● Min-Max-Scaling makes all data fall into range [0, 1]

(for single data point of feature j)


● Z-score conversion

Linear regression w. gradient descent 24


Z-Score conversion

● Center data on 0
● Scale data so majority falls into range [-1, 1]
mean
mean // empirical
empirical
expected
expected value
value
(mu)
(mu)

empirical
empirical standard
standard
deviation
deviation (sigma)
(sigma)

● Z-score conversion of single data point for feature j

Linear regression w. gradient descent 25


Visualizing standard deviation

Linear regression w. gradient descent 26


Nonlinear Regression
(by cheap trickery)

Linear regression w. gradient descent 27


Nonlinear Regression Problems

Linear regression w. gradient descent 28


Nonlinear Regression Problems (linear approximation)

Linear regression w. gradient descent 29


Nonlinear Regression Problems (nonlinear hypothesis)

Linear regression w. gradient descent 30


Nonlinear Regression with cheap trickery

● Linear Regression can be used for Nonlinear Problems


● Choose nonlinear hypothesis space

Linear regression w. gradient descent 31


Optimizing cost
using derivatives

Linear regression w. gradient descent 32


Comparison Gradient Descent vs. Setting derivative = 0

● Instead of Gradient descent


solve

for all i

Linear regression w. gradient descent 33


Comparison Gradient Descent vs. Setting derivative = 0

Gradient Descent Derivation


● Need to choose ● No need to choose
● Needs many iterations, ● No iterations
random restarts etc. ●

● Works well for many features ● Slow for many features

Linear regression w. gradient descent 34


This lecture covers

● Linear Regression
● Hypothesis formulation,
hypthesis space
● Optimizing Cost with Gradient
Descent
● Using multiple input features
with Linear Regression
● Feature Scaling
● Nonlinear Regression
● Optimizing Cost using
derivatives

Linear regression w. gradient descent 35


Pictures

● Some public domain plots from


en.wikipedia.org and
de.wikipedia.org

Linear regression w. gradient descent 36

You might also like