0% found this document useful (0 votes)

7 views59 pages

Introduction To Machine Learning: Slides Credit: CMU AI, Zico Kolter, Pat Virtue

The document introduces the fundamentals of machine learning, focusing on linear regression and gradient descent as key concepts. It explains how to predict peak power consumption using historical data and the relationship between temperature and demand. The document also outlines methods for finding optimal parameters through gradient descent, emphasizing its importance in modern machine learning.

Uploaded by

khushimegha211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views59 pages

Introduction To Machine Learning: Slides Credit: CMU AI, Zico Kolter, Pat Virtue

Uploaded by

khushimegha211

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Introduction to Machine Learning

Slides credit: CMU AI, Zico Kolter, Pat Virtue

Readings
Joel Grus, Data Science from Scratch, 2nd Edition:
• Ch. 8 (Gradient Descent)
• Ch. 14 (Simple Linear Regression)
• Ch. 15 (Multiple Linear Regression)
Outline
Least squares regression: a simple example
and Gradient Descent

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Outline
Least squares regression: a simple example
and Gradient Descent

Machine learning notation

Linear regression

Matrix/vector notation and analytic solutions

Implementing linear regression

A simple example: predicting electricity use
What will peak power consumption tomorrow?

But, relatively easy to record past days of consumption, plus additional features
that affect consumption (i.e., weather)

Date High Temperature (F) Peak Demand (GW)

2011-06-01 84.0 2.651
2011-06-02 73.0 2.081
2011-06-03 75.2 1.844
2011-06-04 84.9 1.959
… … …
Plot of consumption vs. temperature
Plot of high temperature vs. peak demand for summer months (June – August) for
past six years
Hypothesis: linear model
Let’s suppose that the peak demand approximately fits a linear model
Hypothesis: linear model
Let’s suppose that the peak demand approximately fits a linear model

Peak_Demand ≈ 𝜃1 ⋅ High_Temperature + 𝜃2

Here 𝜃1 is the “slope” of the line, and 𝜃2 = is the intercept

Making predictions
Importantly, our model also lets us make predictions about new days

What will the peak demand be tomorrow?

If we know the high temperature will be 72 degrees (ignoring for now that this is
also a prediction), then we can predict peak demand to be:
Predicted_Peak_Demand = 𝜃1 ⋅ 72 + 𝜃2 = 1.821 GW

Equivalent to just “finding the point on the line”

Predicted output for each data point
Peak_Demand(𝑖)
Predicted_Peak_Demand 𝑖 = 𝜃1 ⋅ High_Temperature 𝑖 + 𝜃2
Hypothesis: linear model
Peak_Demand(𝑖)
Predicted_Peak_Demand 𝑖 = 𝜃1 ⋅ High_Temperature 𝑖 + 𝜃2
Hypothesis: linear model
Let’s suppose that the peak demand approximately fits a linear model

Predicted_Peak_Demand = 𝜃1 ⋅ High_Temperature + 𝜃2

Here 𝜃1 is the “slope” of the line, and 𝜃2 is the intercept

How do we find a “good” fit to the data?

Many possibilities, but natural objective is to minimize some difference between this line
and the observed data, e.g. squared loss

2
𝐸 𝜃 = ∑ Predicted_Peak_Demand 𝑖 − Peak_Demand 𝑖

𝑖∈days
2
𝐸 𝜃 = ∑ 𝜃1 ⋅ High_Temperature 𝑖 + 𝜃2 − Peak_Demand 𝑖

𝑖∈days
How do we find parameters?
How do we find the parameters 𝜃1, 𝜃2 that minimize the function
2
𝐸 𝜃 = E 𝜃1, 𝜃2 = ∑ 𝜃1 ⋅ High_Temperature 𝑖 + 𝜃2 − Peak_Demand 𝑖

𝑖∈days

≡ ∑𝑖∈days 𝜃1 ⋅ 𝑥 𝑖
+ 𝜃2 − 𝑦 𝑖
2 𝜃2
Peak_Demand
𝑦

𝜃1

𝑥
High_Temperature
How do we find parameters?
How do we find the parameters 𝜃1, 𝜃2 that minimize the function
2
𝐸 𝜃 = E 𝜃1, 𝜃2 = ∑ 𝜃1 ⋅ High_Temperature 𝑖 + 𝜃2 − Peak_Demand 𝑖

𝑖∈days

≡ ∑𝑖∈days 𝜃1 ⋅ 𝑥 𝑖
+ 𝜃2 − 𝑦 𝑖
2 𝜃2
Peak_Demand
𝑦

𝜃1

𝑖∈days

𝑖 2
≡ ∑𝑖∈days 𝜃1 ⋅ 𝑥 + 𝜃2 − 𝑦 𝑖

Peak_Demand
𝑦
𝐸(𝜃)

𝜃1 𝜃𝑏2
𝑥
High_Temperature
Gradient descent
How do we find the parameters 𝜃1, 𝜃2 that minimize:
2
𝐸 𝜃 = E(𝜃1 , 𝜃2 ) = ∑ 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖
𝐸(𝜃)
𝑖∈days

𝜃2
𝜃1

𝜃2

𝜃1
Gradient descent
How do we find the parameters 𝜃1, 𝜃2 that minimize:
2
𝐸 𝜃 = E(𝜃1 , 𝜃2 ) = ∑ 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖
𝐸(𝜃)
𝑖∈days

𝜃2
𝜃1

𝜃2

𝜃1
Gradient descent
To find a good value of 𝜃, we can repeatedly take steps in the direction of the
negative derivatives for each value

Repeat:
𝜕
𝜃1 ≔ 𝜃1 − 𝛼 𝐸(𝜃1 , 𝜃2 )
𝜕𝜃1
𝜕
𝜃2 ≔ 𝜃2 − 𝛼 𝐸(𝜃1, 𝜃2 )
𝜕𝜃2

where 𝛼 is some small positive number called the step size

This is the gradient decent algorithm, the workhorse of modern machine

learning
Computing gradients (partial derivatives)
How do we find the parameters 𝜃1, 𝜃2 that minimize the function
2
𝐸 𝜃 = E 𝜃1 , 𝜃2 = ∑ 𝜃1 ⋅ High_Temperature 𝑖 + 𝜃2 − Peak_Demand 𝑖

𝑖∈days

𝑖 𝑖 2
≡ ∑𝑖∈days 𝜃1 ⋅ 𝑥 + 𝜃2 − 𝑦

General idea: suppose we want to minimize some function 𝑓 𝜃

Derivative is slope of the function, so negative derivative points “downhill”

Calculus worksheet
𝑑𝑓
A. 𝑓 𝑥 = 𝑥2 + 5𝑥3 =
𝑑𝑥

𝑑𝑓
B. 𝑓 𝑥 = (3 − 5𝑥)2 =
𝑑𝑥

𝜕𝑓
C. 𝑓 𝑥, 𝑧 = 2𝑥 + 3𝑧 + 5𝑥2𝑧 =
𝜕𝑧

𝜕𝑓
D. 𝑓 𝑥, 𝑧 = 2𝑥 + 3𝑧 + 5𝑥2𝑧 =
𝜕𝑥
𝑦
𝑥(1), 𝑦(1)
Computing the derivatives
𝑥(2), 𝑦(2)
Assume we just have m=2 points 𝑥 (1) , 𝑦 (1) and 𝑥(2), 𝑦(2)
𝑚
𝜕 𝜕 2 𝑥
𝜕𝜃1
𝐸 𝜃 =
𝜕𝜃1
∑ 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖

𝑖=1
Computing the derivatives
What are the derivatives of the error function with respect to each parameter 𝜃1 and 𝜃2?
𝜕 𝜕 2
𝐸 𝜃 = ∑ 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖
𝜕𝜃1 𝜕𝜃1
𝑖∈days
𝜕 𝑖 𝑖
2
= ∑ 𝜃1 ⋅ 𝑥 + 𝜃2 − 𝑦
𝜕𝜃1
𝑖∈days
𝜕
= ∑ 2 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖 ⋅
𝜕𝜃1
𝜃1 ⋅ 𝑥 𝑖

𝑖∈days

= ∑ 2 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖 ⋅𝑥 𝑖

𝑖∈days
𝜕
𝐸 𝜃 = ∑ 2 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖
𝜕𝜃2
𝑖∈days
Gradient descent
To find a good value of 𝜃, we can repeatedly take steps in the direction of the
negative derivatives for each value

Repeat:
𝜕
𝜃1 ≔ 𝜃1 − 𝛼 𝐸(𝜃1 , 𝜃2 )
𝜕𝜃1
𝜕
𝜃2 ≔ 𝜃2 − 𝛼 𝐸(𝜃1, 𝜃2 )
𝜕𝜃2

where 𝛼 is some small positive number called the step size

This is the gradient decent algorithm, the workhorse of modern machine

learning
Finding the best 𝜃
To find a good value of 𝜃, we can repeatedly take steps in the direction of the
negative derivatives for each value

Repeat:
𝜃1 ≔ 𝜃1 − 𝛼 ∑ 2 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖 ⋅𝑥 𝑖

𝑖∈days

𝜃2 ≔ 𝜃2 − 𝛼 ∑ 2 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖

𝑖∈days

where 𝛼 is some small positive number called the step size

This is the gradient decent algorithm, the workhorse of modern machine

learning
Gradient descent
Gradient descent

Normalize input by subtracting the mean and

dividing by the standard deviation
Gradient descent – Iteration 1
𝜃2
3.0

2.0

1.0
𝜃1
𝐸 𝜃 0.1 0.2 0.3
= 1427.53
𝜕𝐸 𝜃
𝜕𝜃2 −151.20
𝛼 = 0.001
𝜕𝐸 𝜃 −1243.10
28
𝜕𝜃2
Gradient descent – Iteration 2
𝜃2
3.0

2.0

1.0 292.18
−67.74
𝛼 𝜃1
−556.91

𝐸 𝜃 0.1 0.2 0.3

= 1427.53
Gradient descent – Iteration 3
𝜃2
3.0

2.0
64.31
1.0 292.18

𝜃1
𝐸 𝜃 0.1 0.2 0.3
= 1427.53
Gradient descent – Iteration 4
𝜃2
3.0

2.0 18.58
64.31
1.0 292.18

𝜃1
𝐸 𝜃 0.1 0.2 0.3
= 1427.53
Gradient descent – Iteration 5
𝜃2
3.0

9.40
2.0 18.58
64.31
1.0 292.18

𝜃1
𝐸 𝜃 0.1 0.2 0.3
= 1427.53
Gradient descent – Iteration 10
𝜃2
3.0
7.09
9.40
2.0 18.58
64.31
1.0 292.18

𝜃1
𝐸 𝜃 0.1 0.2 0.3
= 1427.53
Fitted line in “original” coordinates

Important note: requires that we also rescale 𝜃 when un-normalizing

Gradient descent
How do we find the parameters 𝜃1, 𝜃2 that minimize:
2
𝐸 𝜃 = E(𝜃1 , 𝜃2 ) = ∑ 𝜃1 ⋅ 𝑥 𝑖 + 𝜃2 − 𝑦 𝑖
𝐸(𝜃)
𝑖∈days

𝜃2
𝜃1

𝜃2

𝜃1
Extensions
What if we want to add additional features, e.g. day of week, instead of just
temperature?

What if we want to use a different loss function instead of squared error (i.e.,
absolute error)?

What if we want to use a non-linear prediction instead of a linear one?

We can easily reason about all these things by adopting some additional notation…
Outline
Least squares regression: a simple example
and Gradient Descent

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Machine learning
Gradient descent to find the parameters to minimize MSE for a linear model is an
example of a machine learning algorithm

Basic idea: in many domains, it is difficult to hand-build a predictive model, but

easy to collect lots of data; machine learning provides a way to automatically infer
the predictive model from data
Machine learning
The basic process (supervised learning):

Training data Hypothesis function

Machine learning
𝑥 1 ,𝑦 1 (including any
training algorithm
𝑥 2 ,𝑦 2 parameter settings)
𝑥 3 ,𝑦 3

⋮
Prediction
Predicted
New input Hypothesis function Output
𝑥(𝑛𝑒𝑤) ℎ𝜃 𝑥(𝑛𝑒𝑤)
Terminology
Input features: 𝑥 𝑖 ∈ ℝ𝑛 , 𝑖 = 1, … , 𝑚
High_Temperature 𝑖
E. g. : 𝑥 𝑖 = Is_Weekday 𝑖
1

Outputs: 𝑦 𝑖 ∈ 𝒴, 𝑖 = 1, … , 𝑚
E. g. : 𝑦 𝑖 ∈ ℝ = Peak_Demand 𝑖

Model parameters: 𝜃 ∈ ℝ𝑛

Hypothesis function: ℎ 𝜃 : ℝ𝑛 → 𝒴, predicts output given input

𝑛
E. g. : ℎ𝜃 𝑥 = ∑ 𝜃𝑗 ⋅ 𝑥𝑗
𝑗=1
Terminology
Loss function: ℓ: 𝒴 × 𝒴 → ℝ+, measures the difference between a prediction and
an actual output

The canonical machine learning optimization problem:

Virtually every machine learning algorithm has this form, just specify
• What is the hypothesis function?
• What is the loss function?
• How do we solve the optimization problem?
Example machine learning algorithms
Note: we (machine learning researchers) have not been consistent in naming conventions,
many machine learning algorithms actually only specify some of these three elements
• Least squares: {linear hypothesis, squared loss, (usually) analytical
solution}
• Linear regression: {linear hypothesis, *,*}
• Support vector machine: {linear or kernel hypothesis, hinge loss, *}
• Neural network: {Composed non-linear function, *,(usually) gradient
descent)
• Decision tree: {Hierarchical axis-aligned halfplanes, *,greedy optimization}
• Naïve Bayes: {Linear hypothesis, joint probability under certain
independence assumptions, analytical solution}
Outline
Least squares regression: a simple example
and Gradient Descent

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Least squares revisited
Using our new terminology, plus matrix notion, let’s revisit how to solve linear
regression with a squared error loss

Setup:
• Linear hypothesis function: ℎ𝜃 𝑥 = ∑𝑗=1
𝑛 𝜃 ⋅𝑥
𝑗 𝑗
• Squared error loss:
• Resulting machine learning optimization problem:
Derivative of the least squares objective
Compute the partial derivative with respect to an arbitrary model parameter 𝜃𝑗
Gradient descent algorithm
1. Initialize 𝜃𝑘 ≔ 0, 𝑘 = 1, … , 𝑛

2. Repeat:
• For 𝑘 = 1, … , 𝑛:

Note: do not actually implement it like this, you’ll want to use the matrix/vector
notation we will cover soon
Outline
Least squares regression: a simple example
and Gradient Descent

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

The gradient
It is typically more convenient to work with a vector of all partial derivatives, called
the gradient

For a function 𝑓: ℝ𝑛 → ℝ, the gradient is a vector

𝜕𝑓 𝜃
𝜕𝜃1
𝛻𝜃𝑓 𝜃 = ⋮ ∈ ℝ𝑛
𝜕𝑓 𝜃
𝜕𝜃𝑛
Gradient in vector notation
We can actually simplify the gradient computation (both notationally and
computationally) substantially using matrix/vector notation

Putting things in this form also make it more clear how to analytically find the
optimal solution for last squares
Matrix notation, one level deeper
Let’s define the matrices

−𝑥 1 𝑇 − 𝑦 1
𝑇 2
𝑋 = −𝑥
2 − , 𝑦= 𝑦
⋮ ⋮
𝑦 𝑚
−𝑥 𝑚 𝑇 −

Euclidean (L2) norm:

Gradient in linear algebra notation
We can actually simplify the gradient computation (both notationally and
computationally) substantially using matrix/vector notation

Putting things in this form also make it more clear how to analytically find the
optimal solution for last squares
Solving least squares
Gradient also gives a condition for optimality:
• Gradient must equal zero

Solving for 𝛻𝜃𝐸 𝜃 = 0:

2𝑋𝑇(𝑋𝜃 − 𝑦) = 0

These are known as the normal equations an extremely convenient closed-form

solution for least squares
Solving least squares
Gradient also gives a condition for optimality:
• Gradient must equal zero

Solving for 𝛻𝜃𝐸 𝜃 = 0:

Example: electricity demand
Returning to our electricity demand example:
𝑖 High_Temperature 𝑖
−1 𝑋 𝑇 𝑦 0.046
𝑥 = , 𝜃⋆ = 𝑋𝑇𝑋 =
1 −1.574
Example: electricity demand
Returning to our electricity demand example:
High_Temperature 𝑖 0.047
𝑥𝑖 = Is_Weekday 𝑖 , 𝜃⋆ = 𝑋𝑇𝑋 −1 𝑋 𝑇 𝑦 =
0.225
1 −1.803
Outline
Least squares regression: a simple example
and Gradient Descent

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Manual implementation of linear regression
Create data matrices:
# initialize X matrix and y vector
X = np.array([df["Temp"], df["IsWeekday"], np.ones(len(df))]).T
y = df_summer["Load"].values

Compute solution:
# solve least squares
theta = np.linalg.solve(X.T @ X, X.T @ y)
print(theta)
# [ 0.04747948 0.22462824 -1.80260016]

Make predictions:
# predict on new data
Xnew = np.array([[77, 1, 1], [80, 0, 1]])
ypred = Xnew @ theta
print(ypred)
# [ 2.07794778 1.99575797]
Scikit-learn
By far the most popular machine learning library in Python is the scikit-learn library
(http://scikit-learn.org/)

Reasonable (usually) implementation of many different learning algorithms, usually

fast enough for small/medium problems

Important: you need to understand the very basics of how these algorithms work in
order to use them effectively

Sadly, a lot of data science in practice seems to be driven by the default

parameters for scikit-learn classifiers…
Linear regression in scikit-learn
Fit a model and predict on new data
from sklearn.linear_model import LinearRegression

# don't include constant term in X

X = np.array([df_summer["Temp"], df_summer["IsWeekday"]]).T
model = LinearRegression(fit_intercept=True, normalize=False)
model.fit(X, y)

# predict on new data

Xnew = np.array([[77, 1], [80, 0]])
model.predict(Xnew)
# [ 2.07794778 1.99575797]

Inspect internal model coefficients

print(model.coef_, model.intercept_)
# [ 0.04747948 0.22462824] -1.80260016
Scikit-learn-like model, manually
We can easily implement a class that contains a scikit-learn-like interface
class MyLinearRegression:
def init (self, fit_intercept=True):
self.fit_intercept = fit_intercept

def fit(self, X, y):

if self.fit_intercept:
X = np.hstack([X, np.ones((X.shape[0],1))])

self.coef_ = np.linalg.solve(X.T @ X, X.T @ y)

if self.fit_intercept:
self.intercept_ = self.coef_[-1]
self.coef_ = self.coef_[:-1]

def predict(self, X):

pred = X @ self.coef_
if self.fit_intercept:
pred += self.intercept_
return pred

Ii Pu Maths MQP-2024-25 Solutions-Ww
No ratings yet
Ii Pu Maths MQP-2024-25 Solutions-Ww
34 pages
Finite-Dimensional Vector Spaces. Halmos P.R
89% (9)
Finite-Dimensional Vector Spaces. Halmos P.R
205 pages
2D Discrete Fourier Transform PDF
No ratings yet
2D Discrete Fourier Transform PDF
40 pages
Computational Tensor Analysis of Shell Structures
No ratings yet
Computational Tensor Analysis of Shell Structures
320 pages
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
No ratings yet
Linear Classification: Slides Credit: CMU AI, Zico Kolter, Pat Virtue
527 pages
Lecture Notes
No ratings yet
Lecture Notes
128 pages
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
No ratings yet
Collaborative Filtering Matrix Factorization Approach: Jeff Howbert Introduction To Machine Learning Winter 2012 #
30 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Least Square Vs Gradient Descent
100% (1)
Least Square Vs Gradient Descent
52 pages
Unit-03: Vectors - MCQs
No ratings yet
Unit-03: Vectors - MCQs
3 pages
Cost Function
No ratings yet
Cost Function
17 pages
Week 4
No ratings yet
Week 4
101 pages
Syllabus PDF
No ratings yet
Syllabus PDF
170 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
Quick Linear Algebra For Econometrics
No ratings yet
Quick Linear Algebra For Econometrics
9 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
Physical Quantities and Fundamental Units
No ratings yet
Physical Quantities and Fundamental Units
44 pages
15-780 - Machine Learning: J. Zico Kolter
No ratings yet
15-780 - Machine Learning: J. Zico Kolter
71 pages
Models PDF
No ratings yet
Models PDF
86 pages
Dual Simplex Method and Its Illustration
No ratings yet
Dual Simplex Method and Its Illustration
6 pages
Lecture 2. Regression
No ratings yet
Lecture 2. Regression
61 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Notes 3
No ratings yet
Notes 3
59 pages
CS 304.A Training Models
No ratings yet
CS 304.A Training Models
149 pages
10 Linear Regression
No ratings yet
10 Linear Regression
61 pages
ML Lecture 2 2023
No ratings yet
ML Lecture 2 2023
59 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Determinants PDF
No ratings yet
Determinants PDF
5 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lecture 2.1 Linear Regression
No ratings yet
Lecture 2.1 Linear Regression
36 pages
5.1. Intro To Machine Learning
No ratings yet
5.1. Intro To Machine Learning
34 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Machine Learning Shortnote
No ratings yet
Machine Learning Shortnote
14 pages
MLA TAB Lecture3
No ratings yet
MLA TAB Lecture3
70 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
1-Review of Linear Regression
No ratings yet
1-Review of Linear Regression
29 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
Chapter04 Training Models
No ratings yet
Chapter04 Training Models
33 pages
Regression
No ratings yet
Regression
25 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Module 3
No ratings yet
Module 3
27 pages
Geometric Computing in Computer Graphics Using Conformal Geometric Algebra
No ratings yet
Geometric Computing in Computer Graphics Using Conformal Geometric Algebra
10 pages
Cse 445 ML - 1
No ratings yet
Cse 445 ML - 1
28 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Linear Algebra Iii
No ratings yet
Linear Algebra Iii
27 pages
GradientDescent-Regression Slides
No ratings yet
GradientDescent-Regression Slides
26 pages
Karhunen-Loève Transform - KLT: Jankees Van Der Poel D.Sc. Student, Mechanical Engineering
No ratings yet
Karhunen-Loève Transform - KLT: Jankees Van Der Poel D.Sc. Student, Mechanical Engineering
70 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Determinants and Their Properties
No ratings yet
Determinants and Their Properties
7 pages
Regression
No ratings yet
Regression
16 pages
Devc Unit - IV
No ratings yet
Devc Unit - IV
48 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
2 Linear Regression
No ratings yet
2 Linear Regression
14 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Electromagnetic Theory Notes (June-2024)
No ratings yet
Electromagnetic Theory Notes (June-2024)
18 pages
(MLP) Lecture Notes
No ratings yet
(MLP) Lecture Notes
22 pages
4.5 Adjoint and Inverse of A Matrix
No ratings yet
4.5 Adjoint and Inverse of A Matrix
52 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
MLDL I Linear Regression With Gradient Descent - Ipynb Colaboratory
No ratings yet
MLDL I Linear Regression With Gradient Descent - Ipynb Colaboratory
15 pages
Linear Regression Using Batch Gradient Descent
No ratings yet
Linear Regression Using Batch Gradient Descent
7 pages
Jacobi Method (L10)
No ratings yet
Jacobi Method (L10)
7 pages
ML Notes
No ratings yet
ML Notes
14 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
9th Maths Chap01 MCQSSQ 2
No ratings yet
9th Maths Chap01 MCQSSQ 2
3 pages
Alshammari 2024 Ijca 923446
No ratings yet
Alshammari 2024 Ijca 923446
6 pages
05.4 PP 43 57 Wishart Ensemble and MarcenkoPastur Distribution
No ratings yet
05.4 PP 43 57 Wishart Ensemble and MarcenkoPastur Distribution
15 pages
Video 7.1 Vijay Kumar
No ratings yet
Video 7.1 Vijay Kumar
24 pages
Homework 1 - Solutions
No ratings yet
Homework 1 - Solutions
3 pages
Cosc 2836 Test #2 (Matlab)
No ratings yet
Cosc 2836 Test #2 (Matlab)
11 pages
Math-803-Lecture 16-Vector - Spaces
No ratings yet
Math-803-Lecture 16-Vector - Spaces
19 pages
An Introduction To Gradient Descent and Linear Regression
No ratings yet
An Introduction To Gradient Descent and Linear Regression
8 pages
2000-De Rham Diagram For
No ratings yet
2000-De Rham Diagram For
10 pages
CB 03 Corr
No ratings yet
CB 03 Corr
10 pages
Mat PROJEKT MAT 1 PDF
No ratings yet
Mat PROJEKT MAT 1 PDF
10 pages
Ath em Ati CS: L.K .SH Arm A
No ratings yet
Ath em Ati CS: L.K .SH Arm A
8 pages
Math 23 Paced Syllabus (1st Sem 2021-2022)
No ratings yet
Math 23 Paced Syllabus (1st Sem 2021-2022)
2 pages

Introduction To Machine Learning: Slides Credit: CMU AI, Zico Kolter, Pat Virtue

Uploaded by

Introduction To Machine Learning: Slides Credit: CMU AI, Zico Kolter, Pat Virtue

Uploaded by

Introduction to Machine Learning

Slides credit: CMU AI, Zico Kolter, Pat Virtue

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Machine learning notation

Matrix/vector notation and analytic solutions

Implementing linear regression

Date High Temperature (F) Peak Demand (GW)

Here 𝜃1 is the “slope” of the line, and 𝜃2 = is the intercept

What will the peak demand be tomorrow?

Equivalent to just “finding the point on the line”

Here 𝜃1 is the “slope” of the line, and 𝜃2 is the intercept

How do we find a “good” fit to the data?

where 𝛼 is some small positive number called the step size

This is the gradient decent algorithm, the workhorse of modern machine

General idea: suppose we want to minimize some function 𝑓 𝜃

Derivative is slope of the function, so negative derivative points “downhill”

where 𝛼 is some small positive number called the step size

This is the gradient decent algorithm, the workhorse of modern machine

where 𝛼 is some small positive number called the step size

This is the gradient decent algorithm, the workhorse of modern machine

Normalize input by subtracting the mean and

𝐸 𝜃 0.1 0.2 0.3

Important note: requires that we also rescale 𝜃 when un-normalizing

What if we want to use a non-linear prediction instead of a linear one?

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Basic idea: in many domains, it is difficult to hand-build a predictive model, but

Training data Hypothesis function

Hypothesis function: ℎ 𝜃 : ℝ𝑛 → 𝒴, predicts output given input

The canonical machine learning optimization problem:

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

For a function 𝑓: ℝ𝑛 → ℝ, the gradient is a vector

Euclidean (L2) norm:

Solving for 𝛻𝜃𝐸 𝜃 = 0:

These are known as the normal equations an extremely convenient closed-form

Solving for 𝛻𝜃𝐸 𝜃 = 0:

Machine learning notation

Linear regression revisited

Matrix/vector notation and analytic solutions

Implementing linear regression

Reasonable (usually) implementation of many different learning algorithms, usually

Sadly, a lot of data science in practice seems to be driven by the default

# don't include constant term in X

# predict on new data

Inspect internal model coefficients

def fit(self, X, y):

self.coef_ = np.linalg.solve(X.T @ X, X.T @ y)

def predict(self, X):

You might also like