0% found this document useful (0 votes)

20 views

04 LinearRegression

Uploaded by

joselazaromr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

04 LinearRegression

Uploaded by

joselazaromr

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 61

Linear Regression

These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made
their course materials freely available online. Feel free to reuse or adapt these slides for your own academic
purposes, provided that you include proper attribution. Please send comments and corrections to Eric.
Robot Image Credit: Viktoriya Sukhanova © 123RF.com
Regression
Given:
– Data x ( 1 ) , . . . , x ( n ) wher x ( i ) Rd
X =
e
(1)
– Corresponding labels y = y ,. .., where y ( i ) 2 R
y( n )
9
8
September Arctic Sea Ice Extent

7
(1,000,000 sq km)

6
5
4
3
Linear Regression
Quadratic Regression
2
1
0
1970 1980 2010 2020
1990
2000
2
Year
Prostate Cancer Dataset
• 97 samples, partitioned into 67 train / 30 test
• Eight predictors (features):
– 6 continuous (4 log transforms), 1 binary, 1 ordinal
• Continuous outcome variable:
– lpsa: log(prostate specific antigen level)

Based on slide by Jeff Howbert

Linear Regression
• Hypothesis:
Xd
y = ✓ 0 + ✓ 1 x1 + ✓ 2 x 2 + . . . + ✓j x j
✓dxd j =0

=
• Fit model by minimizing sum of squared errors
Assume x 0 = 1

Figures are courtesy of Greg Shakhnarovich 5

Least Squares Linear Regression
• Cost Function
Xn ⇣ ⇣ ⌘
1 (i )
J (✓) = ⌘h2✓ x ( i ) — y
2n
i =1
• Fit by solving min
✓
J (✓)

6
Intuition Behind Cost Function
Xn ⇣ ⇣ ⌘2
1
J (✓) = ⌘h✓ x (i) —
2n
i=1
For insight on J(), let’s assume xy (i) R so ✓ = [✓0 ,
✓1 ]

Based on example
by Andrew Ng 7
Intuition Behind Cost Function
Xn ⇣ ⇣ ⌘
1
J (✓) = ⌘h2✓ x (i) —
2n
i=1
For insight on J(), let’s assume xy (i) R so ✓ = [✓0 ,
✓1 ]
(for fixed , this is a function of x) (function of the parameter )
3 3

2 2
y
1
1
0
0
0.5 1 1.5 2 2.5
0 1 2 3 -0.5 0
x
Based on example
by Andrew Ng 8
Intuition Behind Cost Function
Xn ⇣ ⇣ ⌘
1
J (✓) = ⌘h2✓ x (i) —
2n
i=1
For insight on J(), let’s assume xy (i) R so ✓ = [✓0 ,
✓1 ]
(for fixed , this is a function of x) (function of the parameter )
3 3

2 2
y
1
1
0
0
0.5 1 1.5 2 2.5
0 1 2 3 -0.5 0
x
⇥
Based on example (0.5 — 1)2 + (1 — 2)2 + (1.5 — 3)2 ⇡
by Andrew Ng 1 ⇤ 9
0.58
Intuition Behind Cost Function
Xn ⇣ ⇣ ⌘
1
J (✓) = ⌘h2✓ x (i) —
2n
i=1
For insight on J(), let’s assume xy (i) R so ✓ = [✓0 ,
✓1 ]
(for fixed , this is a function of x) (function of the parameter )
3 3
J ([0, 0]) ⇡
2.333
2 2
y
1 J() is concave
1
0
0
0.5 1 1.5 2 2.5
0 1 2 3 -0.5 0
x
Based on example
by Andrew Ng 10
Intuition Behind Cost Function

Slide by Andrew Ng 11
Intuition Behind Cost Function

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 12
Intuition Behind Cost Function

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 13
Intuition Behind Cost Function

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 14
Intuition Behind Cost Function

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 15
Basic Search Procedure
• Choose initial value for ✓
• Until we reach a minimum:
– Choose a new value for ✓ to reduce J (✓)

J()



Figure by Andrew Ng 16
Basic Search Procedure
• Choose initial value for ✓
• Until we reach a minimum:
– Choose a new value for ✓ to reduce J (✓)

✓
J()



Figure by Andrew Ng 17
Basic Search Procedure
• Choose initial value for ✓
• Until we reach a minimum:
– Choose a new value for ✓ to reduce J (✓)

✓
J()



Figure by Andrew Ng 18
Gradient Descent
• Initialize ✓
• Repeat until convergence
@ simultaneous update
✓j ← ✓ j J (✓
for j = 0 ... d
— ↵ )
learning rate (small)
e.g., α = 0.05 @ 3

✓j 2
J(✓)
1

0
0.5 1 1.5 2 2.5
-0.5 0

✓ 19
Gradient Descent
• Initialize ✓
• Repeat until convergence
@ simultaneous update
✓j ← ✓j J (✓
@j for j = 0 ... d
— ↵ )
@
✓ @ 1 Xn ⇣ ⇣ ⌘
For Linear Regression: J (✓) = ⌘h✓ x (i ) — 2
@ @✓j 2n i =1
✓j y (i )

20
Gradient Descent
• Initialize ✓
• Repeat until convergence
@ simultaneous update
✓j ← ✓j J (✓
@j for j = 0 ... d
— ↵ )
@
✓ @ 1 Xn ⇣ ⇣ ⌘
For Linear Regression: J (✓) = ⌘h✓ x (i ) — 2
@ @✓j 2n i =1
✓j y (i ) !2
@ 1 Xn Xd
= ✓k x (ik —) y (i )
@✓j 2n
i =1 k =0

21
Gradient Descent
• Initialize ✓
• Repeat until convergence
@ simultaneous update
✓j ← ✓j J (✓
@j for j = 0 ... d
— ↵ )
@
✓ @ 1 Xn ⇣ ⇣ ⌘
For Linear Regression: J (✓) = ⌘h✓ x (i ) — 2
@ @✓j 2n i =1
✓j y (i ) !2
@ 1 Xn Xd
= ✓k x (ik ) — (i )
@✓j 2n y
i =1 k =0 ! !
1 Xn Xd @ Xd
= ✓k x (ik ) — y(i ) ⇥ ✓k x (ik —) y (i )
n @
i =1 k =0 k =0
✓j

22
Gradient Descent
• Initialize ✓
• Repeat until convergence
@ simultaneous update
✓j ← ✓j J (✓
@j for j = 0 ... d
— ↵ )
@
✓ @ 1 Xn ⇣ ⇣ ⌘
For Linear Regression: J (✓) = ⌘h✓ x (i ) — 2
@ @✓j 2n i =1
✓j y (i ) !2
@ 1 Xn Xd
= ✓k x (ik ) — (i )
@✓j 2n y
i =1 k =0 ! !
1 Xn Xd @ Xd
= ✓k x (ik ) — (i )
⇥ ✓k x (ik —) y (i )
n @
y k =0
i =1 k =0 ! ✓j
1X Xd
n
= ✓k x (ik ) — (i )
n x (j i )
i =1 k =0
y
23
Gradient Descent for Linear Regression
• Initialize ✓
• Repeat until convergence
n ⇣ ⇣
1 X (i ) ⌘ simultaneous
✓j ← ✓j ⌘h✓ x ( i ) — y x (j i ) update
n i =1 for j = 0 ... d
— ↵
• To achieve simultaneous update
• At the start of each GD iteration, compute h✓
x (i)
• Use this stored value in the update step loop
• Assume convergence when ✓n ew — ✓old k2

< ✏
24
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

h(x) = -900 – 0.1 x

Slide by Andrew Ng 25
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 26
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 27
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 28
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 29
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 30
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 31
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 32
Gradient Descent

(for fixed , this is a function of x) (function of the parameters )

Slide by Andrew Ng 33
Choosing α
α too small α too large
Increasing value for J(✓)
slow convergence

• May overshoot the minimum

• May fail to converge
• May even diverge

To see if gradient descent is working, print out J(✓) each iteration

• The value should decrease at each iteration
• If it doesn’t, adjust α
34
Extending Linear Regression to
More Complex Models
• The inputs X for linear regression can be:
– Original quantitative inputs
– Transformation of quantitative inputs
• e.g. log, exp, square root, square, etc.
– Polynomial transformation
• example: y = 0 + 1x + 2x2 + 3x3
– Basis expansions
– Dummy coding of categorical inputs
– Interactions between variables
• example: x3 = x1  x2

This allows use of linear regression techniques

to fit non-linear datasets.
Linear Basis Function Models
• Generally
, Xd
h ✓(x) = ✓j $ j (x)
j=0
basis function

• Typically, $ 0 ( x ) = 1 so that ✓0 acts as a bias

• In the simplest case, we use linear basis functions :
$ j (x ) = x j

Based on slide by Christopher Bishop (PRML)

Linear Basis Function Models
• Polynomial basis functions:

– These are global; a small

change in x affects all
basis functions

• Gaussian basis
functions:

– These are local; a small change

in x only affect nearby basis
functions. μj and s control
location and scale (width).
Based on slide by Christopher Bishop (PRML)
Linear Basis Function Models
• Sigmoidal basis functions:

where

– These are also local; a small

change in x only affects
nearby basis functions.
μj and s control location
and scale (slope).

Based on slide by Christopher Bishop (PRML)

Example of Fitting a Polynomial Curve
with a Linear Model

Xp
2 p ✓j x j
y=✓0 + ✓
1 x + 2✓ x+ .. . p x =
+✓ j =0
Linear Basis Function Models
Xd
• Basic Linear Model: ✓h (x ) = j xj
✓ j =0

Xd
• Generalized Linear Model: ✓h (x) ✓j $ j (x)
= j =0

• Once we have replaced the data by the outputs of

the basis functions, fitting the generalized model is
exactly the same problem as fitting the basic model
– Unless we use the kernel trick – more on that when
we cover support vector machines
– Therefore, there is no point in cluttering the math with
basis functions
Based on slide by Geoff Hinton 40
Linear Algebra Concepts
• Vector in R d is an ordered set of d real
numbers  1 

6
– e.g., v = [1,6,3,4] is in R 4 
 3
– “[1,6,3,4]” is a column vector: 4
– as opposed to a row vector:
1
6

 1 32 8
 
 4 78 6 

 9 43 2 
Based on slides by Joseph Bradley
• An m-by-n matrix is an object with m rows and n
Linear Algebra Concepts
• Transpose: reflect vector/matrix on line:

 a   a
T
 a b    a c
Tb  c d b d
b 
– Note: (Ax )T =x T A T (We’ll define multiplication
soon…)
• Vector norms: X
! 1
p

|vi|p
– Lp norm of v = (v1, i
– Common
…,v k) is
norms: L1 , 2
L
– Linfinity = maxi |vi|
• Length of a vector v is
Based on slides by Joseph Bradley
L2(v)
Linear Algebra Concepts

• Vector dot product: u  v  u 1 u 2   v 1 v 2    u2v2

u1v1
2
– Note: dot product of u with itself = length(u) = uk22

• Matrix product:
 a11 a12   b11 b12 
A , B   
 a21 a22  b21 b22 
AB   a11b11  a12b21 a11b12  a12b22 
 a 21 11 b 21 a21b12  a22 b22 
22

a b
Based on slides by Joseph Bradley
Linear Algebra Concepts

• Vector products:
– Dot product: u  v  u T v  u1 u2  v1 
 u1 v1  u v
 v2  2 2

– Outer product:

uv   u1 v1
T
v2   u1v1 u1v2 
 u 2   u 2v1 u2v2 

Based on slides by Joseph Bradley

Vectorization
• Benefits of vectorization
– More compact equations
– Faster code (using
optimized
• Consider ourmatrix
model:
libraries) Xd
h(x ) = j j
• Let j =0
2 ✓ x
✓
0 7
✓=6 ✓ x| = 1 x1 . . . xd
4 ..
1
✓
d
• Can write the model in vectorized form as h(x) = 45
Vectorization
• Consider our model for n instances:
⇣ Xd
h ⌘x ( i ) = ✓jj
j =0
• Let 2 x (i )
2 1 (1)
x1 .. (1)
xd
✓0
6 . . ... . 7
6 ✓ 7 6 . . . 7
1 7
✓=6 X = 6 1 . .. . 7
4 .. x (1i ) x (di ) 7
6 ..
✓ .. .. ..
.
d 4 x (1n ) x (dn )
1 .+. 1.)
R (d+1)⇥1 R n ⇥ ( d

• Can write the model in vectorized form as h ✓ (x ) = 46

Vectorization
• For the linear regression cost function:
n ⇣ ⇣
1 X (i ) ⌘
J (✓) = ⌘h✓ x ( i ) — y 2
2n
i =1

1 Xn ⇣ ⌘
= 2n ✓| x (i—) y (i )
2
i =1 R n⇥(d+1)
R (d+1)⇥1
Let: 1
2 = (X ✓ — y)| (X ✓ —
2n
6 y(2) 7
y (1) y)
y= 7
..
y (n)
R 1⇥n 47
Closed Form Solution
• Instead of using GD, solve for optimal ✓
@
– Notice that the solution is when
analytically J (✓) =
@ 0
• Derivation: ✓
1 | 1x1
J (✓) = (X ✓ — (X ✓ —
2n
y) y)|
/ ✓ X X ✓ — y X ✓ — ✓| X | y +
| |

y| y
/ ✓| X | X ✓ — 2✓| X | y + y | y
@
(✓| X and
Take derivative |
X set 2✓to| 0,
✓ —equal |
y +solve
X then y | y)
for
@
=0
✓:✓ (X | X )✓ — X | y = 0
(X | X)✓ = X | y
Closed Form Solution: ✓ = (X | X )— 1 X |
48
Closed Form Solution
• Can obtain ✓ by simply plugging X and y
into 2
1 (11 ) . . . xd( 1 )
✓ = (X .. X )
6 ..
x|
. .— 1 X y
. 7
|
2
y (1)
. 6 (2) 7
6 7 y= 6 y 7
X = 6 x (1i ) (di ) 7
x 7 .4
6 1. .. .. . ..
4 . .. y.
(n )

x (1n ) . x (dn )
1
.. .
• If X T X is not invertible (i.e., singular), may need
to:
– Use pseudo-inverse instead of the inverse
• In python, numpy.linalg.pinv(a)
– Remove redundant (not linearly independent) features
– Remove extra features to ensure that d ≤ n 49
Gradient Descent vs Closed Form
Gradient Descent Closed Form Solution
• Requires multiple iterations • Non-iterative
• Need to choose α • No need for α
• Works well when n is large • Slow if n is large
• Can support incremental – Computing (X T X)-1
learning is roughly O(n3)

50
Improving Learning:
Feature Scaling
• Idea: Ensure that feature have similar scales
Before Feature Scaling After Feature Scaling
20 20
15 15

✓ 10 ✓ 10
5 5
2 2
0 0
0 5 10 15 20 0 5 10 15 20
✓1 ✓1
• Makes gradient descent converge much faster

51
Feature Standardization
• Rescales features to have zero mean and unit variance
1 X n
(i )
– Let μ j be the mean of feature j : µ j = xj
n i =1
– Replace each value with:
(i)
(i ) xj — j for j =
xj ← µ (not x 0!)
1...d
sj
• s j is the standard deviation of feature j
• Could also use the range of feature j (maxj –
minj) for s j

• Must apply the same transformation to instances for

both training and prediction
• Outliers can cause problems 52
Productivity Quality of Fit

Productivity

Productivity
Time Spent Time Spent Time Spent

Underfittin Correct fit Overfitting

g (high (high variance)
bias)

Overfitting:
• The learned hypothesis may fit the training set very
well ( J (✓) ⇡ 0 )
• ...but fails to generalize to new examples

Based on example by Andrew Ng 53

Regularization
• A method for automatically controlling the
complexity of the learned hypothesis
• Idea: penalize for large values of ✓j
– Can incorporate into the cost function
– Works well when we have a lot of features, each that
contributes a bit to predicting the label

• Can also address overfitting by eliminating features

(either manually or via model selection)

54
Regularization
• Linear regression objective function
Xn ⇣ ⇣ ⌘ (i ) X d
1 λ
J (✓) ⌘h2✓ x ( i ) — y +
2n 2 j =1 ✓j
= i =1 2

model fit to data regularization

– λ is the regularization parameter (λ

0)
– No regularization on ✓0 !

55
Understanding Regularization
Xn ⇣ ⇣ ⌘ X d
1 (i) λ
J (✓) ⌘h2✓ x ( i ) — y j
2n + 2 j =1 ✓
= i =1 2

Xd 2 2
• Note that j = k✓1:d k2
✓ j=1
– This is the magnitude of the feature coefficient vector!

• We can also think of this as:

Xd 2
2
(✓j — 0) = ✓1:d — ~ 2
j=1 k 0k
• L2 regularization pulls coefficients toward 0
56
Understanding Regularization
Xn ⇣ ⇣ ⌘ X d
1 (i) λ
J (✓) ⌘h2✓ x ( i ) — y j
2n + 2 j =1 ✓
= i =1 2

• What happens as λ ?

✓0 + ✓1 x + ✓2 x 2 + ✓3 x 3
+ ✓4 x 4
Productivity

Time Spent on Work

57
Understanding Regularization
Xn ⇣ ⇣ ⌘ X d
1 (i) λ
J (✓) ⌘h2✓ x ( i ) — y j
2n + 2 j =1 ✓
= i =1 2

• What happens as λ ?
0

0
Productivity

0
✓0 + ✓1 x + ✓2 x 2 + ✓3 x 3
+ ✓4 x 4
Time Spent on Work
58
Regularized Linear Regression
• Cost Function

• Fit by solving min

✓
J (✓)
• Gradient update:
n
@
J(
@✓
✓)
0

@
J(
@✓
✓)
j
regularization
59
Regularized Linear Regression

• We can rewrite the gradient step

n ⇣ ⇣
as: 1 X (i ) ⌘
✓j ← ✓j (1 — ⌘h✓ x ( i ) — y x (j i )
n i =1
↵λ) — ↵

60
Regularized Linear Regression
• To incorporate regularization into the closed form
solution:
0
2 1 —1
0 0 0 .. .
6
✓= X| X 3
B 0 C X| y
B 0 1 0 . . . 0 7C
7
@ 0 7 A
.. .. .. . .
7
.7
. .
4 . . . . 5
0 0 0 .. . 1

61
Regularized Linear Regression
• To incorporate regularization into the closed form
solution:
0 2 0 0 0 .. . 31— 1
0
0 1 0 .. .
B | 6 0 0 1 0 7C
✓ X X + ... X| y
= B@λ .. .. . . . . .. 7 C
0 5
0 0 0 .. . 1
6
4 A
@
• Can derive this the same way, by solving J (✓) =
@ 0
• Can prove that for λ > 0, inverse exists in✓
the equation above
62

Cell Transport Review Worksheet
No ratings yet
Cell Transport Review Worksheet
5 pages
Bedeser Fieldwork No. 2 Measuring Distances On Level Surfaces With A Tape
No ratings yet
Bedeser Fieldwork No. 2 Measuring Distances On Level Surfaces With A Tape
5 pages
Estimation Question
No ratings yet
Estimation Question
66 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Linear+regression+with+one+variable
No ratings yet
Linear+regression+with+one+variable
48 pages
2022 Linear Regression
No ratings yet
2022 Linear Regression
34 pages
Linear-Regression
No ratings yet
Linear-Regression
55 pages
[ML&PR 2025] Lec2 Regression II
No ratings yet
[ML&PR 2025] Lec2 Regression II
41 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Prs Lab1 Merged
No ratings yet
Prs Lab1 Merged
215 pages
Day 1
No ratings yet
Day 1
41 pages
Linear Regression: Machine Learning
No ratings yet
Linear Regression: Machine Learning
9 pages
Lec06 Matt[1]
No ratings yet
Lec06 Matt[1]
60 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
2 - Multiple Linear Regression
No ratings yet
2 - Multiple Linear Regression
71 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
5 pages
Lecture slides - Linear Regression (2025)
No ratings yet
Lecture slides - Linear Regression (2025)
45 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Stochastic Gradient Descent
No ratings yet
Stochastic Gradient Descent
7 pages
OptimumEngineeringDesign Day2b
No ratings yet
OptimumEngineeringDesign Day2b
24 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Lecture W2ab
No ratings yet
Lecture W2ab
44 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
Nonlinear Optimization: Benny Yakir
No ratings yet
Nonlinear Optimization: Benny Yakir
38 pages
02 01 Regression
No ratings yet
02 01 Regression
14 pages
18-660: Numerical Methods For Engineering Design and Optimization
No ratings yet
18-660: Numerical Methods For Engineering Design and Optimization
27 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
No ratings yet
Linear and Logistic Regression: Marta Arias Marias@lsi - Upc.edu
25 pages
Lec6 Linear Model With LSP
No ratings yet
Lec6 Linear Model With LSP
35 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Lecture1 introductionPCA
No ratings yet
Lecture1 introductionPCA
75 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Linear Regression
100% (1)
Linear Regression
51 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
CPSC 540 Assignment 1 (Due January 19)
100% (1)
CPSC 540 Assignment 1 (Due January 19)
9 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
LinearRegression)byimran
No ratings yet
LinearRegression)byimran
47 pages
Essentials of Linear Regression in Python
No ratings yet
Essentials of Linear Regression in Python
23 pages
Module3_Ch1
No ratings yet
Module3_Ch1
83 pages
Lecture 13 - Least Squares
No ratings yet
Lecture 13 - Least Squares
28 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet
Elementary Calculus
From Everand
Elementary Calculus
George N. Frempong
No ratings yet
Quality Parts
No ratings yet
Quality Parts
144 pages
The Earth's Magnetism: CBSE Class 12 Physics Syllabus 2017
No ratings yet
The Earth's Magnetism: CBSE Class 12 Physics Syllabus 2017
11 pages
Acceptability of A Food Safety Plan Infographic For Street Food Vendors in Lucena City Sy2020,2021
No ratings yet
Acceptability of A Food Safety Plan Infographic For Street Food Vendors in Lucena City Sy2020,2021
8 pages
xc60 d3 Awd Manual 6 Gear 2012 1619163172
No ratings yet
xc60 d3 Awd Manual 6 Gear 2012 1619163172
1 page
Lesson 6 Revising Draft Using Literary Convention
80% (10)
Lesson 6 Revising Draft Using Literary Convention
12 pages
Ai Li Industrial Company LTD Hoja de Preparacion
No ratings yet
Ai Li Industrial Company LTD Hoja de Preparacion
1 page
Single Desk Approval Ref ID: SDPCB008230085
No ratings yet
Single Desk Approval Ref ID: SDPCB008230085
16 pages
BPCI Strategy 2022
No ratings yet
BPCI Strategy 2022
40 pages
Bronze Bushing
No ratings yet
Bronze Bushing
6 pages
Literature Review Solid Waste Management
100% (3)
Literature Review Solid Waste Management
5 pages
Runtime Application Self Protection RASP
No ratings yet
Runtime Application Self Protection RASP
3 pages
Chapter 6: Fourier Series: DX Xe DX e X
No ratings yet
Chapter 6: Fourier Series: DX Xe DX e X
13 pages
Oracle GoldenGate Notes
No ratings yet
Oracle GoldenGate Notes
9 pages
Hydropalat WE 3320
No ratings yet
Hydropalat WE 3320
2 pages
MM Project - Team 2 - Section C - Pottery Barn
No ratings yet
MM Project - Team 2 - Section C - Pottery Barn
16 pages
TEST 2 2025-1 (1)
No ratings yet
TEST 2 2025-1 (1)
4 pages
United States Court of Appeals, Third Circuit
No ratings yet
United States Court of Appeals, Third Circuit
14 pages
VLSpec6 For Stress Relieving
No ratings yet
VLSpec6 For Stress Relieving
11 pages
1 PB
No ratings yet
1 PB
6 pages
Lab 15
No ratings yet
Lab 15
21 pages
Oms Catalog PDF
No ratings yet
Oms Catalog PDF
5 pages
Inflammation - Oral Maxillofacial Pathology - IUP
No ratings yet
Inflammation - Oral Maxillofacial Pathology - IUP
36 pages
Dome Roof Table
No ratings yet
Dome Roof Table
1 page
Anjan CV
No ratings yet
Anjan CV
7 pages
Center For Contemporary Art, Dhaka, Bangladesh
No ratings yet
Center For Contemporary Art, Dhaka, Bangladesh
85 pages
Barclays Case Study Feb 2014
No ratings yet
Barclays Case Study Feb 2014
4 pages
Fire Protection Systems
No ratings yet
Fire Protection Systems
20 pages

04 LinearRegression

Uploaded by

04 LinearRegression

Uploaded by

Linear Regression

Based on slide by Jeff Howbert

Figures are courtesy of Greg Shakhnarovich 5

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

h(x) = -900 – 0.1 x

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

(for fixed , this is a function of x) (function of the parameters )

• May overshoot the minimum

To see if gradient descent is working, print out J(✓) each iteration

This allows use of linear regression techniques

• Typically, $ 0 ( x ) = 1 so that ✓0 acts as a bias

Based on slide by Christopher Bishop (PRML)

– These are global; a small

– These are local; a small change

– These are also local; a small

Based on slide by Christopher Bishop (PRML)

• Once we have replaced the data by the outputs of

• Vector dot product: u  v  u 1 u 2   v 1 v 2    u2v2

Based on slides by Joseph Bradley

• Can write the model in vectorized form as h ✓ (x ) = 46

• Must apply the same transformation to instances for

Underfittin Correct fit Overfitting

Based on example by Andrew Ng 53

• Can also address overfitting by eliminating features

model fit to data regularization

– λ is the regularization parameter (λ

• We can also think of this as:

Time Spent on Work

• Fit by solving min

• We can rewrite the gradient step

You might also like