0% found this document useful (0 votes)

16 views60 pages

Ch2 - Lec3 - Linear Regression and Gradient Descent

The document introduces key concepts in machine learning, focusing on linear regression and gradient descent as optimization techniques. It explains the importance of loss functions, specifically Mean Square Error (MSE), in evaluating model accuracy and how gradient descent is used to minimize these errors iteratively. The document also illustrates the process of adjusting model parameters to improve predictions through practical examples and calculations.

Uploaded by

Tesfamariam Tadesse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views60 pages

Ch2 - Lec3 - Linear Regression and Gradient Descent

Uploaded by

Tesfamariam Tadesse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 60

Introduction to Machine Learning

Ch2_Lec3_ Linear Regression and Gradient Descent

1
By Kassahun Tamir
Outline
 Review

 Linear Models

 Optimization

 Loss Function

 Gradient Descent

By Kassahun Tamir 2
Review
 Supervised learning focuses on making
predictions using labeled data to train a model.

 The model learns to identify patterns between the

features and the outputs. Once trained, the model can
then predict the output value for new, unseen data.

By Kassahun Tamir 3
Review
 Regression

Regression is a type of supervised learning that is used to predict

continuous values, such as house prices, stock prices, or customer churn.
Regression algorithms learn a function that maps from the input features to
the output value.

By Kassahun Tamir 4
Linear Models

 Building block for many complex machine learning algorithms,

including deep neural networks

 Linear models predict the target variable using a linear function of the
input features

 Linear models:
✗ Linear Regression
✗ Logistic Regression

By Kassahun Tamir 5
Good Model?!
 Accurate Prediction

Predicted Value ≈ Actual

Value

By Kassahun Tamir 6
Optimization
 The process of adjusting a model’s internal settings (parameters) to
get the best accuracy

 Involves minimizing the error between the model’s prediction and the
actual data

By Kassahun Tamir 7
Example

By Kassahun Tamir 8
Example

By Kassahun Tamir 9
Goal of Optimization
 Minimize the Loss (Cost) Function

 Loss Function: a function that simply quantifies the error of the

model

 Loss function for Linear Regression: Mean Square Error (MSE)

By Kassahun Tamir 10
Linear Regression Example
 Imagine you have data on study hours and test scores of 10 students.
By using linear regression, we will draw a straight line that shows how
much scores tend to increase as study hours go up.

By Kassahun Tamir 11
Linear Regression Example

By Kassahun Tamir 12
Linear Regression Example

By Kassahun Tamir 13
Loss Function

MSE = Σ(yi - ŷi)² / n

 yi: actual value for the i-th data point.

 ŷi (pronounced y-hat): predicted value for the i-th
data point by the linear regression model.
 n: Total number of data points.

By Kassahun Tamir 14
Loss Function

ŷi

By Kassahun Tamir 15
Loss Function

ŷi

By Kassahun Tamir 16
Gradient Descent
 Gradient descent is an optimization technique used in machine
learning to minimize the loss function by iteratively descending
(moving in the direction opposite to the gradient) towards the
function’s minimum.

By Kassahun Tamir 17
Gradient Descent: the previous example

Y=
αx + b

Where,
Y – predicted value
x – input value

α - Slope
b – intercept

By Kassahun Tamir 18
Gradient Descent
 So if our goal is to optimize this model, we are going to have to
optimize its parameters i.e. the slope and intercept.

Optimization could be:

- One parameter
- Two parameter

By Kassahun Tamir 19
Optimizing One Parameter
 Let’s fix m and adjust only the intercept

Assume, m = 4.725

Y= 4.725x+b

 Take random value for the intercept (b)

 Let’s take initial hypothesis of b = 50
Y = 4.725x+50

By Kassahun Tamir 20
Optimizing One Parameter

By Kassahun Tamir 21
Model Prediction

By Kassahun Tamir 22
Loss Function: Mean Squared Error
= Σ(yi - ŷi)² / n

By Kassahun Tamir 23
Loss Function: Mean Squared Error

Intercept Loss
50 233.463

Loss too large!!

Let’s try again!

By Kassahun Tamir 24
Loss Function: Mean Squared Error
Let us update b
b = 60

By Kassahun Tamir 25
Loss Function: Mean Squared Error

By Kassahun Tamir 26
Loss Function: Mean Squared Error

Intercept Loss
50 233.463
60 128.713
Still large !!

So we continue updating

By Kassahun Tamir 27
Loss Function: Mean Squared Error
Let us update b
b = 70

By Kassahun Tamir 28
Loss Function: Mean Squared Error
Let us update b
b = 70

By Kassahun Tamir 29
Loss Function: Mean Squared Error
Intercept Loss
50 233.463
60 128.713
70 223.963

Oops!! The loss increased instead of decreasing

By Kassahun Tamir 30
Intercept vs. Loss

By Kassahun Tamir 31
Intercept vs. Loss

By Kassahun Tamir 32
Intercept vs. Loss

By Kassahun Tamir 33
Intercept vs. Loss
 From the above example we might conclude that the step we took
(Step size = 10) is too large resulting in overshooting

 So let us take smaller steps this time (Step size = 1)

By Kassahun Tamir 34
Intercept Loss
50
233.463
51
213.988
52
196.513
53
181.038
……
………
59
130.188
60
128.713
61
129.238
……
……
70
223.963

By Kassahun Tamir 35
This is better but takes high computation time

A much smaller step size such as 0.1 or 0.01 would basically take
Forever to reach the minimum
By Kassahun Tamir 36
Intercept vs. Loss
 So how to determine the perfect step size?

 Fortunately Gradient Descent gives us an answer to this

 As a principle,

“ Do big steps when far from optimal value and do baby steps when
closer to the optimal value”

By Kassahun Tamir 37
Intercept vs. Loss

By Kassahun Tamir 38
How?
Back to our loss function
Loss = Sum of squared residuals
= ∑(Actual – Predicted)2 Predicted = Slope x input + intercept

= ∑(Actual – (Slope x input + intercept))2

We plug in the values for all the 10 data points and sum them
= (55 – ((4.725 x 1) + intercept )2 + (60 – ((4.725 x 2) + intercept)2 + (90 –
((4.725 x 3.5) + intercept)2 + (85– ((4.725 x 6) + intercept)2 + (100 –
((4.725 x 8) + intercept)2 + (95– ((4.725 x 2.5) + intercept)2 +(75– ((4.725
x 5.5) + intercept)2 +(70– ((4.725 x 4) + intercept)2 +(85– ((4.725 x 6.5) +
intercept)2 +(100– ((4.725 x 6) + intercept)2
Parabolic Function for the curve

By Kassahun Tamir 39
What does this mean?
By taking the derivative of this function we can determine the slop at
any value for the intercept.

So let’s take the derivative of the loss function we obtained with

respect to the intercept.

We will apply calculus derivation rules such as addition property and

chain rule.
d(loss function) = ?
d(intercept)

By Kassahun Tamir 40
Result of Derivation
= - 2 (55 – ((4.725 x 1) + intercept )
+ - 2 (60 – ((4.725 x 2) + intercept)
+ - 2 (90 – ((4.725 x 3.5) + intercept)
+ - 2 (85– ((4.725 x 6) + intercept)
+ - 2 (100 – ((4.725 x 8) + intercept)
+ - 2 (95– ((4.725 x 2.5) + intercept)
+ - 2 (75– ((4.725 x 5.5) + intercept)
+ - 2 (70– ((4.725 x 4) + intercept)
+ - 2 (85– ((4.725 x 6.5) + intercept)
+ - 2 (100– ((4.725 x 6) + intercept)

By Kassahun Tamir 41
Result of Derivation

Now that we have the derivative, Gradient Descent will use it to

find where the loss function is minimum

How?
The optimal intercept is where the slope = 0 or approximates 0

So let us start by taking a random intercept value again and

plug it in the loss function.

By Kassahun Tamir 42
Result of Derivation
Intercept = 50
= - 2 (55 – ((4.725 x 1) + 50))
+ - 2 (60 – ((4.725 x 2) + 50))
+ - 2 (90 – ((4.725 x 3.5) + 50))
+ - 2 (85– ((4.725 x 6) + 50))
+ - 2 (100 – ((4.725 x 8) + 50))
+ - 2 (95– ((4.725 x 2.5) + 50))
+ - 2 (75– ((4.725 x 5.5) + 50))
+ - 2 (70– ((4.725 x 4) + 50))
+ - 2 (85– ((4.725 x 6.5) + 50)) Slope = -204.75
+ - 2 (100– ((4.725
By KassahunxTamir
6) + 50)) 43
Result of Derivation
So what is the next value of the intercept?

Determined by step size

Step Size = Slope x Learning Rate

New intercept = Old intercept – Step Size

Let’s take the Learning rate to be 0.01

Step Size = -204.75x 0.01 = -2.0475

New intercept = 50 – (-2.0475) = 52.0475

By Kassahun Tamir 44
Result of Derivation
Iteration I - Intercept = 52.0475

= - 2 (55 – ((4.725 x 1) + 52.0475))

+ - 2 (60 – ((4.725 x 2) + 52.0475))

+ - 2 (90 – ((4.725 x 3.5) + 52.0475))

+ - 2 (85– ((4.725 x 6) + 52.0475))

+ - 2 (100 – ((4.725 x 8) + 52.0475))

+ - 2 (95– ((4.725 x 2.5) + 52.0475))

+ - 2 (75– ((4.725 x 5.5) + 52.0475))

+ - 2 (70– ((4.725 x 4) + 52.0475))

+ - 2 (85– ((4.725 x 6.5) + 52.0475)) Slope = -163.8

+ - 2 (100– ((4.725
By x 52.0475))
6) + Tamir
Kassahun 45
Result of Derivation
Step Size = -163.8 x 0.01 = -1.638
New intercept = 52.0475 – (-1.638) = 53.6855

Iteration II – Intercept = 53.6855

Slope = -131.04
Step Size = -131.04 x 0.01 = -1.31
New intercept = 53.6855 – (-1.31) = 54.9959

Iteration III – Intercept = 54.9959

Slope = -104.832
Step Size = -104.832 x 0.01 = -1.04832
New intercept = 53.6855 – (-1.04832) = 56.04422
By Kassahun Tamir 46
Result of Derivation
 Until when does these steps repeat?
 Until we reach a step size that is very small (0.001) or
Until we reach maximum number of iteration (usually 1000)

Iteration IV – Intercept = 56.04422

Slope = -83.8656

Step Size = -83.8656 x 0.01 = -0.838656

New intercept = 56.04422 – (-0.838656) = 56.882876

By Kassahun Tamir 47
Iteration Intercept Slope Step size New
intercept
…. ….. …… …… ……
V 56.88288 -67.0924 -0.67092 57.5538
VI 57.5538 -53.674 -0.53674 58.09054
VII 58.09054 -42.9392 -0.42939 58.51993
VIII 58.51993 -34.3514 -0.34351 58.86344
IX 58.86344 -27.4812 -0.27481 59.13825
X 59.13825 -21.985 -0.21985 59.3581
XI 59.3581 -17.588 -0.17588 59.53398
XII 59.53398 -17.2362 -0.17236 59.70634
XIII 59.70634 -10.6232 -0.10623 59.81257
XIV 59.81257 -8.4986 -0.08499 59.89756
XV 59.89756 -6.7988 -0.06799 59.96555
XVI 59.96555 -5.439 -0.05439 60.01994
XVII 60.01994 -4.3512 -0.04351 60.06345
XVIII 60.06345 -3.481Tamir -0.03481 60.09826
By Kassahun 48
Iteration Intercept Slope Step size New
intercept
XIX 60.09826 -2.7848 -0.02785 60.12611
XX 60.12611 -2.2278 -0.02228 60.14839
XXI 60.14839 -1.7822 -0.01782 60.16621
XXII 60.16621 -1.4258 -0.01426 60.18047
XXIII 60.18047 -1.1406 -0.01141 60.19188
XXIV 60.19188 -0.9124 -0.00912 60.201
XXV 60.201 -0.73 -0.0073 60.2083
XXVI 60.2083 -0.584 -0.00584 60.21414
XXVII 60.21414 -0.4672 -0.00467 60.21881
XXVIII 60.21881 -0.3738 -0.00374 60.22255
XXIX 60.22255 -0.299 -0.00299 60.22554
XXX 60.22554 -0.2392 -0.00239 60.22793
XXXI 60.22793 -0.1914 -0.00191 60.22984
XXXII 60.22984 -0.1532 -0.00153 60.23137
XXXIII 60.23137 -0.1226 -0.00123 60.2326
XXXIV 60.2326 By Kassahun Tamir
-0.098 -0.00098 60.23358 49
Procedure
 As we can see we will stop our iteration now as the step size is very
small = 0.00098 so our optimal intercept will be
Intercept = 60.23358 ≈60.24
 So our equation will be Y = 4.725*X + 60.24

 Procedure for Gradient Descent

1.Pick a random guess for the parameter to be optimized
2.Plug parameter values into the derivation of the loss function to
find the slope
3.Calculate the step size using: Step size = slope x Learning rate
4.Calculate New parameter value by subtracting step size from old
value.
5.Repeat step 2-4 until step size is less than 0.001 or you have
By Kassahun Tamir 50
reached 1000 iterations.
Trained Model

By Kassahun Tamir 51
Optimizing Two Parameters (Intercept & Slope)

By Kassahun Tamir 52
Optimizing Two Parameters (Intercept & Slope)
Loss function = Sum of squared residuals
= (Actual – Predicted)2
= (Actual – (mx + b))2
= (55 – ((slope x 1) + intercept )2 + (60 – ((slope
x 2) + intercept)2 + (90 – ((slope x 3.5) + intercept)2 +
(85– ((slope x 6) + intercept)2 + (100 – ((slope x 8) +
intercept)2 + (95– ((slope x 2.5) + intercept)2 +(75–
((slope x 5.5) + intercept)2 +(70– ((slope x 4) + intercept)2
+(85– ((slope x 6.5) + intercept)2 +(100– ((slope x 6) +
intercept)2
When you have two or more derivatives of the
same function, they are called a Gradient.
By Kassahun Tamir 53
Derivation
First take the derivative with respect to the intercept
= - 2 (55 – ((slope x 1) + intercept ) - 2 (60 – ((slope x 2) + intercept) - 2 (90 –
((slope x 3.5) + intercept) - 2 (85– ((slope x 6) + intercept) - 2 (100 – ((slope x
8) + intercept) - 2 (95– ((slope x 2.5) + intercept) - 2 (75– ((slope x 5.5) +
intercept) - 2 (70– ((slope x 4) + intercept) - 2 (85– ((slope x 6.5) + intercept)
- 2 (100– ((slope x 6) + intercept)

Then again take the derivative with respect to the slope

= - 2 x 1 (55 – ((slope x 1) + intercept ) - 2 x 2 (60 – ((slope x 2) + intercept) -
2 x 3.5 (90 – ((slope x 3.5) + intercept) - 2 x 6 (85– ((slope x 6) + intercept) - 2
x 8 (100 – ((slope x 8) + intercept) - 2 x 2.5 (95– ((slope x 2.5) + intercept) - 2
x 5.5 (75– ((slope x 5.5) + intercept) - 2 x 4 (70– ((slope x 4) + intercept )- 2 x
6.5 (85– ((slope x 6.5) + intercept) - 2 x 6 (100– ((slope x 6) + intercept)

By Kassahun Tamir 54
Steps
1. Start by picking a random number for the intercept and slope
(intercept = 60 , Slope = 4)
2. Plug in these values into derivation to find the slope of the curve
i. - 2 (55 – ((4x 1) + 60) - 2 (60 – ((4x 2) + 60) - 2 (90 – ((4x 3.5) + 60)
- 2 (85– ((4x 6) + 60) - 2 (100 – ((4x 8) + 60) - 2 (95– ((4x 2.5) + 60)
- 2 (75– ((4x 5.5) + 60) - 2 (70– ((4x 4) + 60) - 2 (85– ((4x 6.5) + 60)
- 2 (100– ((4x 6) + 60)
= -70
ii. - 2 x 1 (55 – ((4x 1) + 60) - 2 x 2 (60 – ((4x 2) + 60)
- 2 x 3.5 (90 – ((4x 3.5) + 60) - 2 x 6 (85– ((4x 6) + 60)
- 2 x 8 (100 – ((4x 8) + 60) - 2 x 2.5 (95– ((4x 2.5) + 60)
- 2 x 5.5 (75– ((4x 5.5) + 60) - 2 x 4 (70– ((4x 4) + 60)
- 2 x 6.5 (85– ((4x 6.5) + 60) - 2 x 6 (100– ((4x 6) + 60)
=0

By Kassahun Tamir 55
Steps
3. Calculate step size for both
We assume our Learning rate to be = 0.001
step size = - 0.16 for intercept
Step size = 0 for slope
4. Calculate New Intercept and Slope
= Old value – step size = 60.16 for intercept
slope stays the same
Repeat these until step size is super small
Intercept Slope Step New Slope Slope Step Size New
size initial
1 60 -70 -0.07 60.07 4 0 0 4
2 60.07 -68.6 -0.0686 60.1386 4 0 0 4
3 60.1386 -67.228 -0.067228 60.20583 4 0 0 4
4
60.20583 -65.8834 -0.0658834 60.27171 4 0 0 4
5 60.27171 -65.75164 -0.06575164 By Kassahun Tamir4
60.33746 0 0 4 56
Steps

Intercept Slope Step size New Slope Slope Step Size New
initial
6

60.3376 -63.248 -0.063248 60.40085 4 0 0 4

60.40085 -61.983 -0.061983 60.46283 4 0 0 4

60.46283 -60.7434 -0.0607434 60.52357 4 0 0 4

60.52357 -59.5286 -0.0595286 60.5831 4 0 0 4

60.27171 -65.75164 -0.06575164 60.33746 4 0 0 4

….. ….. ….. ….. Tamir
By Kassahun ……. …… …… 57
Gradient Descent Formula

By Kassahun Tamir 58
Key Takeaways
 Optimization is very crucial in machine leaning

 One of the most common technique of optimization is Gradient

Descent that uses the gradient to iteratively descend to the lowest
point in the Loss function hence the name Gradient Descent.

By Kassahun Tamir 59
Questions?

By Kassahun Tamir 60

Statquest Linear Regression Study Guide V2-1adru0
100% (1)
Statquest Linear Regression Study Guide V2-1adru0
14 pages
CRIMINAL LAW - Estrada Problems and Answers
67% (3)
CRIMINAL LAW - Estrada Problems and Answers
3 pages
Research and Practice in Human Resource Management
No ratings yet
Research and Practice in Human Resource Management
6 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
Expt5 ML Lab
No ratings yet
Expt5 ML Lab
6 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
CSE 412 Lab Manual 3 Linear Regression
No ratings yet
CSE 412 Lab Manual 3 Linear Regression
10 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
Gradient Descent
No ratings yet
Gradient Descent
108 pages
20102A0071 DL Experiment5
No ratings yet
20102A0071 DL Experiment5
6 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Introduction To Machine Learning Algorithms: Linear Regression
No ratings yet
Introduction To Machine Learning Algorithms: Linear Regression
1 page
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression by IntuitiveAI v2.5
No ratings yet
Linear Regression by IntuitiveAI v2.5
5 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descent From Scratch Complete Intuition
No ratings yet
Gradient Descent From Scratch Complete Intuition
8 pages
Basic Interview Question of Linear Regression
No ratings yet
Basic Interview Question of Linear Regression
9 pages
ML Notes
No ratings yet
ML Notes
14 pages
M02 Linear Regression Methods
No ratings yet
M02 Linear Regression Methods
40 pages
MACHINE LEARNING ALGORITHM Unit-II
No ratings yet
MACHINE LEARNING ALGORITHM Unit-II
115 pages
2-LR Optim
No ratings yet
2-LR Optim
60 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Math YHPLinear Regression
No ratings yet
Math YHPLinear Regression
13 pages
Lecture 3 Ai
No ratings yet
Lecture 3 Ai
48 pages
Regression and Optimization in ML
No ratings yet
Regression and Optimization in ML
41 pages
Linear Regression
No ratings yet
Linear Regression
61 pages
Linear Regression With Gradient Descent
100% (1)
Linear Regression With Gradient Descent
8 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Lec 3
No ratings yet
Lec 3
20 pages
CH - En.u4cse19101 Cheduri Linearregression
No ratings yet
CH - En.u4cse19101 Cheduri Linearregression
8 pages
Lect03 CSN382
No ratings yet
Lect03 CSN382
31 pages
Linear Regression
No ratings yet
Linear Regression
130 pages
MLDL I Linear Regression With Gradient Descent - Ipynb Colaboratory
No ratings yet
MLDL I Linear Regression With Gradient Descent - Ipynb Colaboratory
15 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Ue21cs352a 20230830120810
No ratings yet
Ue21cs352a 20230830120810
30 pages
Lecture 5 - Linear Regression
No ratings yet
Lecture 5 - Linear Regression
51 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Lecture 2-Linear-Regression-Part1
No ratings yet
Lecture 2-Linear-Regression-Part1
80 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Alshammari 2024 Ijca 923446
No ratings yet
Alshammari 2024 Ijca 923446
6 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
No ratings yet
ML MU Unit 3RegressionTechniquespdf 2025 02-07-10!56!37
115 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
An Introduction To Gradient Descent and Linear Regression
No ratings yet
An Introduction To Gradient Descent and Linear Regression
8 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
Week 04
No ratings yet
Week 04
101 pages
Gradient Descent in Linear Regression
No ratings yet
Gradient Descent in Linear Regression
30 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
1-Review of Linear Regression
No ratings yet
1-Review of Linear Regression
29 pages
(PR 2024) Lec2 Regression II
No ratings yet
(PR 2024) Lec2 Regression II
41 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
IMEE Q&A
No ratings yet
IMEE Q&A
3 pages
chapter 1
No ratings yet
chapter 1
8 pages
DLD Questions and Answers
No ratings yet
DLD Questions and Answers
11 pages
A.electronics Q and A
No ratings yet
A.electronics Q and A
9 pages
Fundamental Circuit Questions
No ratings yet
Fundamental Circuit Questions
4 pages
Question From Dire Dewa University With Answer
No ratings yet
Question From Dire Dewa University With Answer
14 pages
Weekly Time Sheet-1
No ratings yet
Weekly Time Sheet-1
1 page
Network Analysis and Synthesis Sample Problems
No ratings yet
Network Analysis and Synthesis Sample Problems
12 pages
Anthropology Assignment 1 & 2 AASTU
No ratings yet
Anthropology Assignment 1 & 2 AASTU
2 pages
Lecture 2.a Analysis of RC Beams
No ratings yet
Lecture 2.a Analysis of RC Beams
27 pages
List of Books and Notebooks - 2025-26 Class 6-12
No ratings yet
List of Books and Notebooks - 2025-26 Class 6-12
7 pages
Factors That Affect Time Management of Humanities and Social Sciences Grade 11 Senior High School Students
No ratings yet
Factors That Affect Time Management of Humanities and Social Sciences Grade 11 Senior High School Students
8 pages
HV Link Boxe Epp 1665 5-13
No ratings yet
HV Link Boxe Epp 1665 5-13
8 pages
SB3000
No ratings yet
SB3000
76 pages
F 305 Final Bill Checklist
No ratings yet
F 305 Final Bill Checklist
2 pages
C4000 PM en 31 PDF
No ratings yet
C4000 PM en 31 PDF
195 pages
Bio-Based Insulator
No ratings yet
Bio-Based Insulator
15 pages
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
No ratings yet
Muhammad Okasha - KFUPM GRADUATE PROGRAMS - Online Application - Regular Programs - 241
8 pages
(Communication Electronic Circuits) Preface
No ratings yet
(Communication Electronic Circuits) Preface
2 pages
TNOU Hall Ticket
100% (1)
TNOU Hall Ticket
2 pages
Acefone Company Brochure 2025 250604 162726
No ratings yet
Acefone Company Brochure 2025 250604 162726
12 pages
AOA 2023 Solution
No ratings yet
AOA 2023 Solution
25 pages
A Study ON "Training and Development"
No ratings yet
A Study ON "Training and Development"
83 pages
BP 36-56 Ingles
No ratings yet
BP 36-56 Ingles
16 pages
Certificates
No ratings yet
Certificates
54 pages
Plane Areas
No ratings yet
Plane Areas
26 pages
Midterm Exam. (ONLINE) Autumn 2021
No ratings yet
Midterm Exam. (ONLINE) Autumn 2021
9 pages
Experiment #2 - Introduction To TI C2000 Microcontroller, Code Composer Studio (CCS) and Matlab Graphic User Interface (GUI)
No ratings yet
Experiment #2 - Introduction To TI C2000 Microcontroller, Code Composer Studio (CCS) and Matlab Graphic User Interface (GUI)
18 pages
Gold Chemistry 2
100% (3)
Gold Chemistry 2
189 pages
Guide For The IFT Approval
No ratings yet
Guide For The IFT Approval
34 pages
Reyes VS NLRC
No ratings yet
Reyes VS NLRC
2 pages
Journal of Accounting and Economics: Shuping Chen, Ying Huang, Ningzhong Li, Terry Shevlin T
No ratings yet
Journal of Accounting and Economics: Shuping Chen, Ying Huang, Ningzhong Li, Terry Shevlin T
19 pages
1 Introduction To Environmental Science
No ratings yet
1 Introduction To Environmental Science
16 pages
PV Valves Operation and Maintenance Procedure
100% (2)
PV Valves Operation and Maintenance Procedure
6 pages
Alternating Current: Avg. & Rms Values
No ratings yet
Alternating Current: Avg. & Rms Values
41 pages
Sworn Statement of Assets, Liabilities and Net Worth
No ratings yet
Sworn Statement of Assets, Liabilities and Net Worth
2 pages
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
No ratings yet
Generalized Elliptical Distributions Theory and Applications (Thesis) - Frahm (2004)
145 pages

Ch2 - Lec3 - Linear Regression and Gradient Descent

Uploaded by

Ch2 - Lec3 - Linear Regression and Gradient Descent

Uploaded by

Introduction to Machine Learning

Ch2_Lec3_ Linear Regression and Gradient Descent

 The model learns to identify patterns between the

Regression is a type of supervised learning that is used to predict

 Building block for many complex machine learning algorithms,

Predicted Value ≈ Actual

 Loss Function: a function that simply quantifies the error of the

 Loss function for Linear Regression: Mean Square Error (MSE)

MSE = Σ(yi - ŷi)² / n

 yi: actual value for the i-th data point.

Optimization could be:

 Take random value for the intercept (b)

Loss too large!!

Let’s try again!

Oops!! The loss increased instead of decreasing

 So let us take smaller steps this time (Step size = 1)

 Fortunately Gradient Descent gives us an answer to this

= ∑(Actual – (Slope x input + intercept))2

So let’s take the derivative of the loss function we obtained with

We will apply calculus derivation rules such as addition property and

Now that we have the derivative, Gradient Descent will use it to

So let us start by taking a random intercept value again and

Determined by step size

Step Size = Slope x Learning Rate

New intercept = Old intercept – Step Size

Let’s take the Learning rate to be 0.01

Step Size = -204.75x 0.01 = -2.0475

= - 2 (55 – ((4.725 x 1) + 52.0475))

+ - 2 (60 – ((4.725 x 2) + 52.0475))

+ - 2 (90 – ((4.725 x 3.5) + 52.0475))

+ - 2 (85– ((4.725 x 6) + 52.0475))

+ - 2 (100 – ((4.725 x 8) + 52.0475))

+ - 2 (95– ((4.725 x 2.5) + 52.0475))

+ - 2 (75– ((4.725 x 5.5) + 52.0475))

+ - 2 (70– ((4.725 x 4) + 52.0475))

+ - 2 (85– ((4.725 x 6.5) + 52.0475)) Slope = -163.8

Iteration II – Intercept = 53.6855

Iteration III – Intercept = 54.9959

Iteration IV – Intercept = 56.04422

Step Size = -83.8656 x 0.01 = -0.838656

 Procedure for Gradient Descent

Then again take the derivative with respect to the slope

60.3376 -63.248 -0.063248 60.40085 4 0 0 4

60.40085 -61.983 -0.061983 60.46283 4 0 0 4

60.46283 -60.7434 -0.0607434 60.52357 4 0 0 4

60.52357 -59.5286 -0.0595286 60.5831 4 0 0 4

60.27171 -65.75164 -0.06575164 60.33746 4 0 0 4

 One of the most common technique of optimization is Gradient

You might also like