0% found this document useful (0 votes)

65 views116 pages

LinearRegression Annotated

This document discusses linear regression and gradient descent algorithms. It begins with an introduction to linear regression, including definitions of key terms like hypothesis, parameters, cost function, and the goal of choosing parameters to minimize cost. It then covers gradient descent, explaining how it can be used to optimize the cost function for linear regression by iteratively updating parameters in the direction of the negative gradient. Both batch and stochastic gradient descent algorithms are described. The document provides visual examples and pseudocode to illustrate how gradient descent works for linear regression problems.

Uploaded by

mr robot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views116 pages

LinearRegression Annotated

Uploaded by

mr robot

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 116

Linear Regression

Fundamentals of Data Science

29 October, 3, 5, 10 November 2020
Prof. Fabio Galasso
ALVINN’97

2 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

3 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

4 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Linear Regression
500
Housing Prices
400
(Portland, OR)
300

Price 200

(in 1000s 100

of dollars) 0
0 500 1000 1500 2000 2500 3000
Size (feet2)
Supervised Learning Regression Problem
Given the “right answer” for Predict real-‐valued output
each example in the data.
Training set of Size in feet2 (x) Price ($) in 1000's (y)
housing prices 2104 460
1416 232
(Portland, OR)
1534 315
852 178
… …
Notation:
m = Number of training examples
x’s = “input” variable / features
y’s = “output” variable / “target” variable
Training Set How do we represent h ?

Learning Algorithm

Size of h Estimated
house price

Linear regression with one variable.

Univariate linear regression.
h maps from x’s to y’s
Multiple features (variables)

Size (feet2) Price ($1000)

2104 460
1416 232
1534 315
852 178
… …
Multiple features (variables)
Size (feet2) Number of Number of Age of home Price ($1000)
bedrooms ﬂoors (years)

2104 5 1 45 460
1416 3 2 40 232
1534 3 2 30 315
852 2 1 36 178
… … … … …
Notation:
= number of features
= input (features) of training example.
= value of feature in training example.
Hypothesis:
With one variable:
For convenience of notation, deﬁne .

Multivariate linear regression

Linear Regression:
Cost Function
Size in feet2 (x) Price ($) in 1000's (y)
Training Set
2104 460
1416 232
1534 315
852 178
… …

Hypothesis:
‘s: Parameters
How to choose ‘s ?
3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
y

Idea: Choose so that

is close to for our
training examples
Simpliﬁed
Hypothesis:

Parameters:

Cost Function:

Goal:
(for ﬁxed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -‐0.5 0 0.5 1 1.5 2 2.5
x
(for ﬁxed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -‐0.5 0 0.5 1 1.5 2 2.5
x
(for ﬁxed , this is a function of x) (function of the parameter )

3 3

2 2
y
1 1

0 0
0 1 2 3 -‐0.5 0 0.5 1 1.5 2 2.5
x
Hypothesis:

Parameters:

Cost Function:

Goal:
(for ﬁxed , this is a function of x) (function of the parameters )

500

400
Price ($) 300
in 1000’s
200

100

0
0 1000 2000 3000
Size in feet2 (x)
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

28 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Gradient Descent
Have some function
Want

Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
J()



J()



Gradient descent algorithm

Correct: Simultaneous update Incorrect:

Gradient descent algorithm
If α is too small, gradient descent
can be slow.

If α is too large, gradient descent

can overshoot the minimum. It may
fail to converge, or even diverge.
at local optima

Current value of
Gradient descent can converge to a local
minimum, even with the learning rate α ﬁxed

As we approach a local
minimum, gradient
descent will automatically
take smaller steps. So, no
need to decrease α over
time.
Gradient Descent:
for Linear Regression
Gradient descent algorithm Linear Regression Model
Gradient descent algorithm

update
and
simultaneously
J()



J()



(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
(for fixed , this is a function of x) (function of the parameters )
“Batch” Gradient Descent

“Batch”: Each step of gradient descent

uses all the training examples.
Gradient Descent

• The gradient is the result of considering all dataset samples

• This may be very slow

• Can we do something more efficient?

56 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Stochastic Gradient Descent (SGD)

• For large training sets, the evaluation the gradient over all samples may be expensive
• Stochastic or “online” gradient descent approximates the true gradient by a gradient at a
single example
Pseudocode:
- Choose an initial vector of parameters 𝜽𝟎 and learning rate 𝛼
- Repeat until convergence:
• Randomly shuffle examples in the training set
• For 𝑘 = 1,2, … 𝑚 , do:
𝜽(𝑖+1) = 𝜽(𝑖) − 𝛼 ∇𝐽(𝑘) (𝜽(𝑖) )

57 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Stochastic Gradient Descent (SGD)

58 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Gradient Descent:
for Multiple Variables
Hypothesis:
Parameters:
Cost function:

Gradient descent:
Repeat

(simultaneously update for every )

New algorithm :
Gradient Descent
Repeat
Previously (n=1):
Repeat

(simultaneously update for

)

(simultaneously update )
Feature Scaling
Idea: Make sure features are on a similar scale.
E.g. = size (0-‐2000 feet2) size (feet2)
= number of bedrooms (1-‐5)
number of bedrooms
Feature Scaling
Get every feature into approximately a range.
Mean normalization
Replace with to make features have approximately zero mean
(Do not apply to ).
E.g.
Gradient Descent:
Learning Rate
Gradient descent

-‐ “Debugging”: How to make sure gradient

descent is working correctly.
-‐ How to choose learning rate .
Making sure gradient descent is working correctly.

Example automatic
convergence test:

Declare convergence if
decreases by less than
in one iteration.
0 100 200 300 400
No. of iterations
Making sure gradient descent is working correctly.
Gradient descent not working.
Use smaller .

No. of iterations

-‐ For suﬃciently small , should decrease on every iteration.

-‐ But if is too small, gradient descent can be slow to converge.
Summary:
-‐ If is too small: slow convergence.
-‐ If is too large: may not decrease on
every iteration; may not converge.

To choose , try
Linear Regression:
Features and Polynomial regression
Housing prices prediction
Polynomial regression

Price
(y)

Size (x)
Choice of features

Price
(y)

Size (x)
Dangers of (Polynomial) Regression

Overfitting and Underfitting

Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

75 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Normal Equation
Gradient Descent
𝛿𝐽
𝛿Θ0
Θ ≔ Θ − 𝛼 ∇Θ 𝐽 ∇Θ 𝐽 = ⋮ ∈ ℝ𝑛+1
𝛿𝐽
𝛿Θ𝑛
Gradient Descent
𝛿𝐽
𝛿Θ0
Θ ≔ Θ − 𝛼 ∇Θ 𝐽 ∇Θ 𝐽 = ⋮ ∈ ℝ𝑛+1
𝛿𝐽
𝛿Θ𝑛
Normal equation: what about solving for Θ analytically?
Useful notation

• For 𝑓: ℝ𝑚𝑥𝑛 ↦ ℝ, define:

𝑓 𝐴 ∈ ℝ , 𝐴 ∈ ℝ𝑚𝑥𝑛

• Trace:
n

If 𝐴 ∈ ℝ𝑛𝑥𝑛 tr A =  Aii
i=1

80 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Facts

• Some facts of matrix derivatives (without proof)

tr 𝐴𝐵 = tr 𝐵𝐴

tr 𝐴𝐵𝐶 = tr 𝐶𝐴𝐵 = tr 𝐵𝐶𝐴

T
f(𝐴) = tr 𝐵𝐴 A tr AB = B
T
tr A = tr A

If 𝑎 ∈ ℝ tr a = a
T T T
A tr ABA C = CAB + C AB

81 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
examples ; features.
Cost function
1
ℎ 𝑥 − 𝑦 (1) 𝑛
(𝑖)
𝑋Θ − 𝑦 = ⋮ where Θ𝑇 𝑥 (𝑖) = ෍ Θ𝑗 𝑥𝑗
ℎ 𝑥 𝑚 − 𝑦 (𝑚) 𝑗=0

Recall z 𝑇 𝑧 = ෍ 𝑧𝑖2
𝑖

1 𝑇
1
𝑋Θ − 𝑦 𝑋Θ − 𝑦 = = 𝐽(Θ)
2 2
Intuition: If 1D
Solve for 𝚯 analytically
attained when

Expanding J(Θ):
Solve for 𝚯 analytically

T T T T
Recall A tr ABA C = CAB + C AB A tr AB = B
Solve for 𝚯 analytically

Normal equation
Examples:
(feet22)
Size (feet Number
Numberof
of Number
Numberofof Age
Ageofofhome
home Price ($1000)
bedrooms ﬂoors (years)
bedrooms ﬂoors (years)

1 2104 5 1 45 460
1 1416
1416 33 22 40
40 232
232
1 1534
1534 33 22 30
30 315
315
1 852
852 22 11 36
36 178
178
training examples, features.
Gradient Descent Normal Equation
• Need to choose . • No need to choose .
• Needs many iterations. • Don’t need to iterate.
• Works well even • Need to compute
when is large.
• Slow if is very large.
Normal equation

-‐ What if is non-‐invertible? (singular/

degenerate)
What if is non-‐invertible?

• Redundant features (linearly dependent).

E.g. size in feet2
size in m2

• Too many features (e.g. ).

-‐ Delete some features, or use regularization.
Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

94 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Linear Regression and Correlation
Summary So Far
▪ Given: Set of known (x,y) points
▪ Find: function f(x)=ax+b that “best fits” the
known points, i.e., f(x) is close to y
▪ Use function to predict y values for new x’s
➢ Also can be used to test correlation

96 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Correlation and Causation

Correlation – Values track each other

• Height and Shoe Size
• Grades and Entrance Exam Scores
Causation – One value directly influences another
• Education Level → Starting Salary
• Temperature → Cold Drink Sales

97 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Correlation and Causation (from Overview)

Correlation – Values track each other

• Height and Shoe Size
• Grades and Entrance Exam Scores
Find: function f(x)=ax+b that “best fits” the
known points, i.e., f(x) is close to y

The better the function

fits the points, the more
correlated x and y are

98 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Regression and Correlation
The better the function fits the points,
the more correlated x and y are
▪ Linear functions only
▪ Correlation – Values track each other
Positively – when one goes up the other goes up
▪ Also negative correlation
When one goes up the other goes down
• Latitude versus temperature
• Car weight versus gas mileage
• Class absences versus final grade

99 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Calculating Simple Linear Regression
Method of least squares
▪ Given a point and a line, the error for the point
is its vertical distance d from the line, and the
squared error is d 2
▪ Given a set of points and a line, the sum of
squared error (SSE) is the sum of the squared
errors for all the points
▪ Goal: Given a set of points, find the line that
minimizes the SSE

100 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Calculating Simple Linear Regression
Method of least squares

d4 d5

d2
d3

SSE = d12 + d22 + d32 + d42 + d52

101 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Calculating Simple Linear Regression
Method of least squares
Goal: Find the line that
minimizes the SSE d4 d5

d2
d3

- Gradient Descent
- Normal equation
- software packages,
e.g. Numpy polyfit
SSE = d12 + d22 + d32 + d42 + d52

102 Fabio Galasso

103 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Measuring Correlation
More help from software packages…
Pearson’s Product Moment Correlation (PPMC)
• “Pearson coefficient”, “correlation coefficient”
• Value r between 1 and -1
1 maximum positive correlation
0 no correlation
-1 maximum negative correlation
“The better the function
Coefficient of determination fits the points, the more
• r2, R2, “R squared” correlated x and y are”
• Measures fit of any line/curve to set of points
• Usually between 0 and 1
• For simple linear regression R2 = Pearson2
104 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Correlation Game
http://aionet.eu/corguess (*)
Try to get:
Right answers ≥ 10, Guesses ≤ Right answers × 2
Anti-cheating: Pictures = Right answers + 1
(*) Improved version of “Wilderdom correlation guessing game” thanks to
Poland participant Marcin Piotrowski

Other correlation games:

http://guessthecorrelation.com/
http://www.rossmanchance.com/applets/GuessCorrelation.html
http://www.istics.net/Correlations/

105 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

106 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Locally-Weighted Regression
Recap
examples ; features.

𝑦 (𝑖) ∈ ℝ x0 = 1

𝑛 𝑚
1 2
ℎΘ 𝑥 = ෍ Θ𝑗 𝑥𝑗 = Θ𝑇 𝑥 𝐽 Θ = ෍ ℎΘ 𝑥 (𝑖) − 𝑦 (𝑖)
2
𝑗=0 𝑖=1
Recap: choice of features
Θ0 + Θ1 𝑥1

Θ0 + Θ1 𝑥 + Θ2 𝑥 2
Price
(y) Θ0 + Θ1 𝑥 + Θ2 𝑥 + Θ3 log(𝑥)

Size (x)
Recap: dangers of polynomial regression
Overfitting and Underfitting
Locally-weighted regression

• “Parametric learning algorithm”:

fixed set of parameters
• “Non-parametric learning algorithm”:
parameters grow with the data
(more data to keep in memory)

• Locally-weighted regression y
also named Loess or Lowess

111 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Locally-weighted regression

112 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Outline

• Linear regression
One or multiple variables
Cost function

• Gradient descent, incl. batch/stochastic GD

• Normal equation

• MSE and Correlation

• Locally-Weighted Regression

• Probabilistic Interpretation of Least Squares

113 Fabio Galasso
Fundamentals of Data Science | Winter Semester 2020
Probabilistic Interpretation of Least Squares
Why Least Squares

• Assume 𝑦 (𝑖) = Θ𝑇 𝑥 (𝑖) + 𝜖 (𝑖)

115 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Why Least Squares

• Likelihood 𝐿(Θ) = 𝑃(𝑦|𝑋;

Ԧ Θ)

116 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
References

• Chapter 3.1 in [Bishop, 2006. Pattern Recognition and Machine Learning]

117 Fabio Galasso

Fundamentals of Data Science | Winter Semester 2020
Thank you

Acknowledges: slides and material from Andrew Ng, Eric Xing, Matthew R. Gormley, Jessica Wu

Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
ML 02 Linear Regression
No ratings yet
ML 02 Linear Regression
51 pages
Week 04
No ratings yet
Week 04
101 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Linear Regression
100% (1)
Linear Regression
51 pages
Week 4
No ratings yet
Week 4
101 pages
Gradient Descent - Linear Regression
100% (1)
Gradient Descent - Linear Regression
47 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Lecture Slides - Linear Regression (2025)
No ratings yet
Lecture Slides - Linear Regression (2025)
45 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Machine Learning - 5
No ratings yet
Machine Learning - 5
50 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Unit 4 - Linear Regression
No ratings yet
Unit 4 - Linear Regression
52 pages
(ML&PR 2025) Lec2 Regression II
No ratings yet
(ML&PR 2025) Lec2 Regression II
41 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Updating Weight
No ratings yet
Updating Weight
9 pages
Lecture LinearRegression
No ratings yet
Lecture LinearRegression
42 pages
Regression
No ratings yet
Regression
30 pages
Machine Learning Notes by Standard Andrew NG
No ratings yet
Machine Learning Notes by Standard Andrew NG
142 pages
Linearna Regresija - NG
No ratings yet
Linearna Regresija - NG
7 pages
Stanford ML CS229-Merged Notes
No ratings yet
Stanford ML CS229-Merged Notes
126 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
293 pages
Machine Learning Notes AndrewNg
No ratings yet
Machine Learning Notes AndrewNg
141 pages
CS229
No ratings yet
CS229
69 pages
Notes 1
No ratings yet
Notes 1
30 pages
cs229 2
No ratings yet
cs229 2
275 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
CS229 Lecture Notes: Supervised Learning
No ratings yet
CS229 Lecture Notes: Supervised Learning
30 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Computing For Data Sciences: Introduction To Regression Analysis
No ratings yet
Computing For Data Sciences: Introduction To Regression Analysis
9 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
Regression
No ratings yet
Regression
16 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
cs229 Notes1 PDF
No ratings yet
cs229 Notes1 PDF
28 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
15 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
54 pages
Lecture 2. Regression
No ratings yet
Lecture 2. Regression
61 pages
04 LinearRegression PDF
No ratings yet
04 LinearRegression PDF
61 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Revised-L3-Linear Regression
No ratings yet
Revised-L3-Linear Regression
41 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Regression
No ratings yet
Regression
25 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
Linear Regression: Jia-Bin Huang Virginia Tech
No ratings yet
Linear Regression: Jia-Bin Huang Virginia Tech
59 pages
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Lab6 - Naive Bayes Classification
No ratings yet
Lab6 - Naive Bayes Classification
4 pages
Tutorial - 11-18MAB204T
No ratings yet
Tutorial - 11-18MAB204T
2 pages
2.lines of Best Fit PDF
No ratings yet
2.lines of Best Fit PDF
6 pages
The Econometrics of Financial Market PDF
0% (1)
The Econometrics of Financial Market PDF
8 pages
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
No ratings yet
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
55 pages
Chapter 17 Least Square
No ratings yet
Chapter 17 Least Square
16 pages
2018 April MA204-C - Ktu Qbank
No ratings yet
2018 April MA204-C - Ktu Qbank
2 pages
Test 2 Solutions: IMB 515: Operations Research II
No ratings yet
Test 2 Solutions: IMB 515: Operations Research II
4 pages
? UNIT 3 - PROBABILITY THEORY - Complete Notes
No ratings yet
? UNIT 3 - PROBABILITY THEORY - Complete Notes
4 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
23 pages
Chapter 10
No ratings yet
Chapter 10
45 pages
IML-IITKGP - Assignment 2 Solution
No ratings yet
IML-IITKGP - Assignment 2 Solution
11 pages
Lecture 19 35
No ratings yet
Lecture 19 35
21 pages
Presentation-Bivariate Data
No ratings yet
Presentation-Bivariate Data
112 pages
Mma43 - Mathematical Statistics
No ratings yet
Mma43 - Mathematical Statistics
3 pages
Lecture 11 - SimplerLinear and Simple Logistic Regression
No ratings yet
Lecture 11 - SimplerLinear and Simple Logistic Regression
31 pages
Q-A Sigma Green Belt
No ratings yet
Q-A Sigma Green Belt
32 pages
Computational Statistics and Data Analysis: Tonglin Zhang, Ge Lin
No ratings yet
Computational Statistics and Data Analysis: Tonglin Zhang, Ge Lin
12 pages
Probability and Statistics With R First Edition. Edition Arnholt Download
100% (1)
Probability and Statistics With R First Edition. Edition Arnholt Download
81 pages
Sta404 - Chapter 5 - Bivariate Analysis (Student)
No ratings yet
Sta404 - Chapter 5 - Bivariate Analysis (Student)
27 pages
MMW 6 Data Management Part 3 Central Location Variability PDF
No ratings yet
MMW 6 Data Management Part 3 Central Location Variability PDF
5 pages
Biostatistics Module
No ratings yet
Biostatistics Module
10 pages
Econo Mid-Term Exam
No ratings yet
Econo Mid-Term Exam
4 pages
BMS Ist DSC PDF
No ratings yet
BMS Ist DSC PDF
7 pages
Intermediate Statistics Sample Test
No ratings yet
Intermediate Statistics Sample Test
14 pages
Statistics 2024-25
No ratings yet
Statistics 2024-25
2 pages
L08-Probability Basics
No ratings yet
L08-Probability Basics
29 pages
Veermata Jijabai Technological Institute: Autonomous Institute Affiliated To University of Mumbai
No ratings yet
Veermata Jijabai Technological Institute: Autonomous Institute Affiliated To University of Mumbai
2 pages
Measures of Central Tendency Of: Ungrouped Data
No ratings yet
Measures of Central Tendency Of: Ungrouped Data
15 pages
MMW L10
No ratings yet
MMW L10
2 pages

LinearRegression Annotated

Uploaded by

LinearRegression Annotated

Uploaded by

Linear Regression

Fundamentals of Data Science

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

(in 1000s 100

Linear regression with one variable.

Size (feet2) Price ($1000)

Multivariate linear regression

Idea: Choose so that

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

Correct: Simultaneous update Incorrect:

If α is too large, gradient descent

“Batch”: Each step of gradient descent

• The gradient is the result of considering all dataset samples

• This may be very slow

• Can we do something more efficient?

(simultaneously update for every )

(simultaneously update for

-‐ “Debugging”: How to make sure gradient

-‐ For suﬃciently small , should decrease on every iteration.

Overfitting and Underfitting

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

• For 𝑓: ℝ𝑚𝑥𝑛 ↦ ℝ, define:

• Some facts of matrix derivatives (without proof)

tr 𝐴𝐵𝐶 = tr 𝐶𝐴𝐵 = tr 𝐵𝐶𝐴

-‐ What if is non-‐invertible? (singular/

• Redundant features (linearly dependent).

• Too many features (e.g. ).

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

Correlation – Values track each other

Correlation – Values track each other

The better the function

100 Fabio Galasso

SSE = d12 + d22 + d32 + d42 + d52

101 Fabio Galasso

102 Fabio Galasso

103 Fabio Galasso

Other correlation games:

105 Fabio Galasso

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

• “Parametric learning algorithm”:

111 Fabio Galasso

112 Fabio Galasso

• Gradient descent, incl. batch/stochastic GD

• MSE and Correlation

• Probabilistic Interpretation of Least Squares

• Assume 𝑦 (𝑖) = Θ𝑇 𝑥 (𝑖) + 𝜖 (𝑖)

115 Fabio Galasso

• Likelihood 𝐿(Θ) = 𝑃(𝑦|𝑋;

116 Fabio Galasso

• Chapter 3.1 in [Bishop, 2006. Pattern Recognition and Machine Learning]

117 Fabio Galasso

You might also like