0% found this document useful (0 votes)

58 views36 pages

TMI04.2 Linear Regression PDF

Uploaded by

Mohd Nomaan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

58 views36 pages

TMI04.2 Linear Regression PDF

Uploaded by

Mohd Nomaan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Linear regression

with gradient descent

Ingmar Schuster
Patrick Jähnichen
using slides by Andrew Ng

Institut für Informatik

This lecture covers

● Linear Regression
● Hypothesis formulation,
hypthesis space
● Optimizing Cost with Gradient
Descent
● Using multiple input features
with Linear Regression
● Feature Scaling
● Nonlinear Regression
● Optimizing Cost using
derivatives

Linear regression w. gradient descent 2

Linear Regression

Institut für Informatik

Price for buying a flat in Berlin

● Supervised learning problem

● Expected answer available for each example in data
● Regression Problem
● Prediction of continuous output
Linear regression w. gradient descent 4
Training data of flat prices

● m Number of training examples

Square meters Price in 1000€
● x is input (predictor) variable 73 174
„features“ in ML-speek
146 367
● y is output (response) variable 38 69
124 257
● Notation ... ...

Linear regression w. gradient descent 5

Learning procedure

● Hypothesis parameters
Training data
● linear regression,
one input variable (univariate)

Learning Algorithm

Size Estimated
of flat price

hypothesis
(mapping between
How to choose parameters?
input and output)

Linear regression w. gradient descent 6

Optimization objective

● Purpose of learning algorithm expressed in

optimization objective and cost function (often called J)

● Fit data well ● Few false negatives

● Few false positives ● ...
Fitting data well: least squares cost function

● In regression almost always want to fit data well

● smallest average distance to points in training data
(h(x) close to y for (x,y) in training data)
● Cost function often named J
Number
Number of of
training
training instances
instances

● Squaring
– Penalty for positive and negative deviations the same
– Penalty for large deviations stronger
Linear regression w. gradient descent 8
Optimizing Cost
with Gradient Descent

Linear regression w. gradient descent 9

Gradient Descent Outline

● Want to minimize
● Start with random
● Keep changing to reduce
until we end up at minimum

Linear regression w. gradient descent 10

3D plots and contour plots

Stepwise
Stepwise
descent
descent
towards
towards
minimum
minimum

[plot by Andrew Ng]

Derivatives
Derivatives
work
work only
only for
for
few
few parameters
parameters
Gradient descent
partial
partial
derivative
derivative
beware: incremental
update incorrect!

steps
steps become
become smaller
smaller
without
without changing
changing
Linear regression w. gradient descent learning
learning rate
rate 12
Learning Rate considerations

● Small learning rate leads to slow

convergence

● Overly large learning rate may

not lead to convergence or to
divergence
● Often

Linear regression w. gradient descent 13

Checking convergence

● Gradient descent works

correctly if decreases
with every step
● Possible convergence
criterion: converged if
decreases by less than
constant

Linear regression w. gradient descent 14

Local Minima

● Gradient descent can get stuck at local minima

(e.g. J not squared error for regression with only one variable)

Random restart
X with different
parameter(s)

Linear regression w. gradient descent 15

Variants of Gradient Descent

Using multiple input features

Linear regression w. gradient descent 16

Multiple features
Square Bedrooms Floors Age of building Price in
meters (years) 1000€
x1 x2 x3 x4 y
200 5 1 45 460
131 3 2 40 232
142 3 2 30 315
756 2 1 36 178
… … … … …

● Notation

Linear regression w. gradient descent 17

Hypothesis representation

● More compact

with definition

Linear regression w. gradient descent 18

Gradient descent for multiple variables

● Generalized cost function

● Generalized gradient descent

Linear regression w. gradient descent 19

Partial derivative of cost function for multiple variables

● Calculating the partial derivative

Linear regression w. gradient descent 20

Gradient descent for multiple variables

● Simplified gradient descent

Linear regression w. gradient descent 21

Conversion considerations for multiple variables

● With multiple variables, comparison of variance in data is lost

(scales can vary strongly)

Square meters 30 - 400

Bedrooms 1 - 10
80 000
Price -
2 000 000

● Gradient descent converges faster for features on similar scale

Linear regression w. gradient descent 22

Feature Scaling

Linear regression w. gradient descent 23

Feature scaling

● Different approaches for converting features to comparable

scale
● Min-Max-Scaling makes all data fall into range [0, 1]

(for single data point of feature j)

● Z-score conversion

Linear regression w. gradient descent 24

Z-Score conversion

● Center data on 0
● Scale data so majority falls into range [-1, 1]
mean
mean // empirical
empirical
expected
expected value
value
(mu)
(mu)

empirical
empirical standard
standard
deviation
deviation (sigma)
(sigma)

● Z-score conversion of single data point for feature j

Linear regression w. gradient descent 25

Visualizing standard deviation

Linear regression w. gradient descent 26

Nonlinear Regression
(by cheap trickery)

Linear regression w. gradient descent 27

Nonlinear Regression Problems

Linear regression w. gradient descent 28

Nonlinear Regression Problems (linear approximation)

Linear regression w. gradient descent 29

Nonlinear Regression Problems (nonlinear hypothesis)

Linear regression w. gradient descent 30

Nonlinear Regression with cheap trickery

● Linear Regression can be used for Nonlinear Problems

● Choose nonlinear hypothesis space
●

Linear regression w. gradient descent 31

Optimizing cost
using derivatives

Linear regression w. gradient descent 32

Comparison Gradient Descent vs. Setting derivative = 0

● Instead of Gradient descent

solve

for all i

Linear regression w. gradient descent 33

Comparison Gradient Descent vs. Setting derivative = 0

Gradient Descent Derivation

● Need to choose ● No need to choose
● Needs many iterations, ● No iterations
random restarts etc. ●

● Works well for many features ● Slow for many features

Linear regression w. gradient descent 34

This lecture covers

Linear regression w. gradient descent 35

Pictures

● Some public domain plots from

en.wikipedia.org and
de.wikipedia.org

Linear regression w. gradient descent 36

Lecture 3
No ratings yet
Lecture 3
32 pages
Learning Together Is Fun!: Learning English Through Sharing Picture Books
No ratings yet
Learning Together Is Fun!: Learning English Through Sharing Picture Books
11 pages
TMI04.2 Linear Regression
No ratings yet
TMI04.2 Linear Regression
36 pages
Week 4
No ratings yet
Week 4
101 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Week 04
No ratings yet
Week 04
101 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
Lecture 2. Regression
No ratings yet
Lecture 2. Regression
61 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
Lecture3 - Linear Regression and Logistic Regression
No ratings yet
Lecture3 - Linear Regression and Logistic Regression
60 pages
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
No ratings yet
Linear Regression With Multiple Variables: Reading Material: Part 1 of Lecture Notes 1
24 pages
Chap 4
No ratings yet
Chap 4
31 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
ML03
No ratings yet
ML03
14 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Regression PDF
No ratings yet
Regression PDF
37 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
20 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
PDA Unit-3 (Full Unit)
No ratings yet
PDA Unit-3 (Full Unit)
61 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Linear Regression With Multiple Variables
100% (1)
Linear Regression With Multiple Variables
38 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Chap6 (Regression)
No ratings yet
Chap6 (Regression)
74 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
Multivariate Linear Regression-Shared
No ratings yet
Multivariate Linear Regression-Shared
41 pages
Lecture365
No ratings yet
Lecture365
28 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
DSCTP 2022 1 ML Slides
No ratings yet
DSCTP 2022 1 ML Slides
110 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Canny Edge Detector: Unveiling the Art of Visual Perception
From Everand
Canny Edge Detector: Unveiling the Art of Visual Perception
Fouad Sabry
No ratings yet
UVEB Technology With 1.5 Nanometer Heteroatom Titanates Zirconates
No ratings yet
UVEB Technology With 1.5 Nanometer Heteroatom Titanates Zirconates
106 pages
Xi Ümumi̇ Sinaq - 4
No ratings yet
Xi Ümumi̇ Sinaq - 4
3 pages
(Edited File) Major-Components
No ratings yet
(Edited File) Major-Components
12 pages
AIR LF Brochure VF
No ratings yet
AIR LF Brochure VF
11 pages
Characteristics of Effective Technical Communication: Section .1
No ratings yet
Characteristics of Effective Technical Communication: Section .1
23 pages
StartNow Overview
No ratings yet
StartNow Overview
22 pages
Small Engines: Global Motorcycle Trends E-Mobility Trends Emissions Legislation Upgrades Motorcycle Market
No ratings yet
Small Engines: Global Motorcycle Trends E-Mobility Trends Emissions Legislation Upgrades Motorcycle Market
8 pages
GoWork Event Space & Price Details (2024)
No ratings yet
GoWork Event Space & Price Details (2024)
29 pages
PIAT-RNU Reliability Review
No ratings yet
PIAT-RNU Reliability Review
15 pages
Lecture4 Chapter1 - Binary - Gray, and ASCII Codes
No ratings yet
Lecture4 Chapter1 - Binary - Gray, and ASCII Codes
36 pages
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
No ratings yet
211 CRT Cable Disconnected Loc1 SM 4 139 Scanner Power Cable Out Loc3 LRG 2 149 Printer Paper Jam Loc2 MED 3
7 pages
Polysafe Strata Product Spec
No ratings yet
Polysafe Strata Product Spec
1 page
Science-Unit-Plann-Final 2
No ratings yet
Science-Unit-Plann-Final 2
111 pages
A Practical Approach To Sedimentology (Roy C. Lindholm, 1987) - (Geo Pedia) PDF
67% (3)
A Practical Approach To Sedimentology (Roy C. Lindholm, 1987) - (Geo Pedia) PDF
291 pages
Berger Paint Project
100% (2)
Berger Paint Project
144 pages
Fluid Mechanics
No ratings yet
Fluid Mechanics
9 pages
Important Effective Teaching Methods and Techniques
No ratings yet
Important Effective Teaching Methods and Techniques
26 pages
The Composite Steel Reinforced Concrete Column Under Axial and Seismic Loads: A Review
No ratings yet
The Composite Steel Reinforced Concrete Column Under Axial and Seismic Loads: A Review
19 pages
Scan Converting Circle
No ratings yet
Scan Converting Circle
36 pages
Tools in Family Assessment
83% (6)
Tools in Family Assessment
3 pages
Summary of Major Events and Problems - US Army Chemical Corps 1959
No ratings yet
Summary of Major Events and Problems - US Army Chemical Corps 1959
42 pages
Rail Gun
100% (1)
Rail Gun
20 pages
Slope Stability PDF
No ratings yet
Slope Stability PDF
6 pages
Catalogue Mitsubishi 6D24TC
No ratings yet
Catalogue Mitsubishi 6D24TC
2 pages
01 Excel Test CL 11 and Below
100% (1)
01 Excel Test CL 11 and Below
23 pages
Department of Electrical Engineering and Computer Science Fall 2020 Quiz EEE 141L.12
No ratings yet
Department of Electrical Engineering and Computer Science Fall 2020 Quiz EEE 141L.12
2 pages
7 Key Principles of Apparel Costing - Textile Tutorials
No ratings yet
7 Key Principles of Apparel Costing - Textile Tutorials
2 pages
4a - Training
No ratings yet
4a - Training
38 pages
François Quesnay
No ratings yet
François Quesnay
5 pages

TMI04.2 Linear Regression PDF

Uploaded by

TMI04.2 Linear Regression PDF

Uploaded by

Linear regression

with gradient descent

Institut für Informatik

Linear regression w. gradient descent 2

Institut für Informatik

● Supervised learning problem

● m Number of training examples

Linear regression w. gradient descent 5

Linear regression w. gradient descent 6

● Purpose of learning algorithm expressed in

● Fit data well ● Few false negatives

● In regression almost always want to fit data well

Linear regression w. gradient descent 9

Linear regression w. gradient descent 10

[plot by Andrew Ng]

● Small learning rate leads to slow

● Overly large learning rate may

Linear regression w. gradient descent 13

● Gradient descent works

Linear regression w. gradient descent 14

● Gradient descent can get stuck at local minima

Linear regression w. gradient descent 15

Using multiple input features

Linear regression w. gradient descent 16

Linear regression w. gradient descent 17

Linear regression w. gradient descent 18

● Generalized cost function

Linear regression w. gradient descent 19

● Calculating the partial derivative

Linear regression w. gradient descent 20

● Simplified gradient descent

Linear regression w. gradient descent 21

● With multiple variables, comparison of variance in data is lost

Square meters 30 - 400

● Gradient descent converges faster for features on similar scale

Linear regression w. gradient descent 22

Linear regression w. gradient descent 23

● Different approaches for converting features to comparable

(for single data point of feature j)

Linear regression w. gradient descent 24

● Z-score conversion of single data point for feature j

Linear regression w. gradient descent 25

Linear regression w. gradient descent 26

Linear regression w. gradient descent 27

Linear regression w. gradient descent 28

Linear regression w. gradient descent 29

Linear regression w. gradient descent 30

● Linear Regression can be used for Nonlinear Problems

Linear regression w. gradient descent 31

Linear regression w. gradient descent 32

● Instead of Gradient descent

Linear regression w. gradient descent 33

Gradient Descent Derivation

● Works well for many features ● Slow for many features

Linear regression w. gradient descent 34

Linear regression w. gradient descent 35

● Some public domain plots from

Linear regression w. gradient descent 36

You might also like