0% found this document useful (0 votes)

12 views37 pages

Optimization and Gradient Descent Algorithm

The document covers optimization and the Gradient Descent algorithm, explaining its intuition, mathematical formulation, and implementation. It discusses various optimization techniques, the importance of differentiability and convexity for Gradient Descent, and challenges such as local optima and learning rate tuning. Additionally, it highlights applications of Gradient Descent in machine learning and provides examples of its implementation.

Uploaded by

aliahmed23456u857

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views37 pages

Optimization and Gradient Descent Algorithm

Uploaded by

aliahmed23456u857

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Machine Learning

Optimization and Gradient Descent Algorithm

Lecture – 3 to 4

Instructor: Qamar Askari

Lecture Headlines
• What is optimization?
• Optimization types and algorithms
• Gradient descent intuition
• Univariate and bivariate GD with examples
• Gradient Descent Algorithm and implementation
• Problem requirements for GD
• Challenges for GD
What is optimization?

TSP 3SAT n-Queen

minimize (total distance travelled) maximize (satisfied clauses) minimize (attacking queens)
What is optimization?

Thousands of applications are there from every field like biology,

chemistry, electronics, gaming, machine learning and music etc.

For example
• Timetabling • Clustering
• Feature selection • Game strategy planning
• Wireless sensor network optimization • Circuit designing
• Vehicle routing • Bioinformatics
… • Watermarking etc.
… ….
Mathematical Formulation of Optimization
Problems
• Optimization can be in terms of minimization or maximization
• Optimization problems can be formulated mathematically
• For example: Optimal point (0, 0)

f(x, y) = −(x² + y²) + 4

find x & y to maximize f
Another example
Optimal point (0, 0)

where A = 10

find x to minimize f

Rastrigin function for n = 2

Optimization Algorithms/Techniques
• Greedy Algorithms
• Dynamic Programming
• Mathematical optimization algorithms
• Linear programming, quadratic programming, convex programming, etc.
• Approximation-based algorithms
• Nature-inspired algorithms
Gradient Descent Algorithm
Gradient Descent Algorithm
• A commonly used algorithm in ML and DL
• It was proposed before the modern era of Computers
Intuition
• Suppose you are standing on some hilly terrain, blindfolded, and
being required to get as low as possible.

What will you do?

Step 1: Figure out which way is the downhill/slope
Step 2: Take a step in that way

Keep repeating these two steps

Intuition
Step 1: Figure out which way is the downhill/slope – Find derivative df(x)/dx
Step 2: Take a step in that way – Subtract derivative from x to take step

Keep repeating these two steps

Height
Note:
We should have some mathematical
formula f(x) to calculate height from
current position x.

Position
Example
• Assume that the height can be computed from x using following
function:
Height = f(x) = 2x – 3
Example
• Assume that the height can be computed from x using following
function:
Height = f(x) = 2x – 3

Step 1: df/dx = 2
Step 2: x = x – 2 = 6 – 2 = 4
Example
• Assume that the height can be computed from x using following
function:
Height = f(x) = 2x – 3

Step 1: df/dx = 2
Step 2: x = x – 2 = 4 – 2 = 2
Example – If slope is negative
• Assume that the height can be computed from x using following
function:
Height = f(x) = -2x + 10
Example – If slope is negative
• Assume that the height can be computed from x using following
function:
Height = f(x) = -2x + 10

Step 1: df/dx = -2
Step 2: x = x – (-2) = 2 + 2 = 4
Example – If the function is non-linear
• Assume that the height can be computed from x using following
function:
Height = f(x) = x2 – 2x + 1

Step 1: df/dx = 2x - 2
Step 2: x = x – (2x - 2) = 4 - 6 = -2
Example – If the function is non-linear
• Assume that the height can be computed from x using following
function:
Height = f(x) = x2 – 2x + 1

What will happen if

one more iteration is
executed?

It will move again on

location 4.
Step 1: df/dx = 2x - 2
Step 2: x = x – (2x - 2) = 4 - 6 = -2
Example – If the function is non-linear
• Solution: we can control the step size by just multiplying the
derivative term with a small fraction before subtracting from the
value of x.
Height = f(x) = x2 – 2x + 1

Step 1: df/dx = 2x - 2
Step 2: x = x – 0.1 * (2x - 2)
= 4 – 0.6 = 3.4
Example – If the function is non-linear
• Solution: we can control the step size by just multiplying the
derivative term with a small fraction before subtracting from the
value of x.
Height = f(x) = x2 – 2x + 1

Step 1: df/dx = 2x - 2
Step 2: x = x – 0.1 * (2x - 2)
= 4 – 0.6 = 3.4
Finalized Gradient Descent Rule

xnew = xold – α df/dx

where α controls step size of the algorithm
Little about α
• The smaller α the longer GD converges, or may reach maximum
iteration before reaching the optimum point
• If α is too big the algorithm may not converge to the optimal point
(jump around) or even to diverge completely.
Little about α
• The smaller α the longer GD converges, or may reach maximum
iteration before reaching the optimum point
• If α is too big the algorithm may not converge to the optimal point
(jump around) or even to diverge completely.
Gradient Descent for multi-variate functions
• What if the height is a multi-variate function?

Height = f(x, y) = 0.5x2 + 0.5y2 + 1

• We’ll compute partial derivatives w.r.t each variable

Updating x:
Step 1: df/dx = 2*0.5 x
Step 2: x = x – 0.2 * (2*0.5 x) = 5 - 1 = 4

Updating y:
Step 1: df/dy = 2*0.5 y
Step 2: y = y – 0.2 * (2*0.5 y) = 4 – 0.8 = 3.2
Demo: Gradient Descent for multi-variate
functions
Gradient Descent Algorithm – Steps
1. Choose/randomly initialize a starting point
2. Calculate gradient at current point
3. Make a scaled step in the opposite direction to the gradient if
minimizing (For maximization step is taken in same direction to the
gradient)
• Repeat points 2 and 3 until one of the criteria is met:
• maximum number of iterations reached
• step size is smaller than the tolerance.
Python code
• Implementation of GD
algorithm in Python is
provided separately.
Following function is
used as an example:
Problem requirements
• Gradient Descent is more productive if the problem/function is:
• Differentiable
• Convex
Problem requirements – Differentiability

Few examples of differentiable functions

Problem requirements – Differentiability

Few examples of non-differentiable functions

Problem requirements – Convexity
Challenges for Gradient Descent:
Local optima and saddle points
Challenges for Gradient Descent:
Dependence on starting point
Challenges for Gradient Descent:
α / learning rate tuning
Applications of Gradient Descent Algorithm
• Applicability to Machine Learning problems
• Error/loss/cost minimization in many algorithms such as ANN, Linear
Regression, Logistic Regression, etc.
Case study: Positioning Class Invigilator
• On board discussion if have time
Gradient Descent variations
• Stochastic Gradient Descent
• Batch Gradient Descent

Ex Tenebris Marking System
No ratings yet
Ex Tenebris Marking System
5 pages
The Possessed (Devils) by Fyodor Dostoevsky
No ratings yet
The Possessed (Devils) by Fyodor Dostoevsky
657 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
No ratings yet
Mscfe XXX (Course Name) - Module X: Collaborative Review Task
19 pages
CCS355 Neural Networks and Deep Learning
No ratings yet
CCS355 Neural Networks and Deep Learning
142 pages
FIAT Q SF 2014 - Raw Items and Tables
No ratings yet
FIAT Q SF 2014 - Raw Items and Tables
2 pages
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
No ratings yet
Gradient Descent - Problem of Hiking Down A Mountain: Derivatives
8 pages
NLM Sas 6
No ratings yet
NLM Sas 6
6 pages
Regression PPT
No ratings yet
Regression PPT
21 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
AGA Report 7-Measurement of Natural Gas by Turbine Meters
No ratings yet
AGA Report 7-Measurement of Natural Gas by Turbine Meters
77 pages
Slides-4 Optimization Extra Gradient Descent
No ratings yet
Slides-4 Optimization Extra Gradient Descent
67 pages
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
No ratings yet
List of Dutch Inventions and Discoveries - Wikipedia, The Free Encyclopedia20151006224847
131 pages
Intro To Microscopes - Compiled Lesson Plan
No ratings yet
Intro To Microscopes - Compiled Lesson Plan
32 pages
3 TrainingNetwork
No ratings yet
3 TrainingNetwork
65 pages
Gradient Descent - Xiaowei Huang
No ratings yet
Gradient Descent - Xiaowei Huang
53 pages
Chapter 4
No ratings yet
Chapter 4
65 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Gradient Descend
No ratings yet
Gradient Descend
64 pages
Lec05-1-Gradient Descent-Detailed
No ratings yet
Lec05-1-Gradient Descent-Detailed
62 pages
11 Gradient Descent
No ratings yet
11 Gradient Descent
58 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
Maths For ML
No ratings yet
Maths For ML
1 page
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
No ratings yet
Name - Shivam Jangid Class - 2-C Enroll. No.-06559301619 Assignment - 3 (PCC & RCC)
28 pages
06 23ECE216 GradientDescent v2
No ratings yet
06 23ECE216 GradientDescent v2
73 pages
L3 Linear Regression and Gradient Descent
No ratings yet
L3 Linear Regression and Gradient Descent
46 pages
Note Set 1 - The Basics: 1.1 - Overview
No ratings yet
Note Set 1 - The Basics: 1.1 - Overview
24 pages
Module2 Optimizations
No ratings yet
Module2 Optimizations
65 pages
Hilbert System Logic
No ratings yet
Hilbert System Logic
55 pages
MTH101 Final Term Solved Subjective Lecture 23 To 45
No ratings yet
MTH101 Final Term Solved Subjective Lecture 23 To 45
43 pages
Gradient - Descent Important 23-24
No ratings yet
Gradient - Descent Important 23-24
37 pages
Screenshot 2024-10-19 at 10.37.25 AM
No ratings yet
Screenshot 2024-10-19 at 10.37.25 AM
25 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Optimization: Optimization and Some Traditional Methods
No ratings yet
Optimization: Optimization and Some Traditional Methods
21 pages
Mathematical Methods of Optimization
No ratings yet
Mathematical Methods of Optimization
62 pages
Lecture 3 Gradient Descent
No ratings yet
Lecture 3 Gradient Descent
37 pages
MScFE 650 MLF - Video - Transcripts - M3
No ratings yet
MScFE 650 MLF - Video - Transcripts - M3
19 pages
Tut02 - Calculus Crash Course
No ratings yet
Tut02 - Calculus Crash Course
24 pages
ML Notes
No ratings yet
ML Notes
14 pages
Lec 5 - Gradient-Descent
No ratings yet
Lec 5 - Gradient-Descent
31 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
ML02
No ratings yet
ML02
25 pages
MAT6007 - Session8 - Gradient Descent
No ratings yet
MAT6007 - Session8 - Gradient Descent
16 pages
5 1 SD 17122020
No ratings yet
5 1 SD 17122020
47 pages
OPIS Global Carbon Offsets Report Sample Issue
No ratings yet
OPIS Global Carbon Offsets Report Sample Issue
27 pages
Cambridge IGCSE: 0500/12 First Language English
No ratings yet
Cambridge IGCSE: 0500/12 First Language English
16 pages
LInear
No ratings yet
LInear
14 pages
Cmos Asynchronous Fifo 2048 X 9, 4096 X 9, 8192 X 9 and 16384 X 9
No ratings yet
Cmos Asynchronous Fifo 2048 X 9, 4096 X 9, 8192 X 9 and 16384 X 9
15 pages
Gradient Descent
No ratings yet
Gradient Descent
18 pages
Sinar Mas Pulping The Planet
No ratings yet
Sinar Mas Pulping The Planet
40 pages
John Research
No ratings yet
John Research
9 pages
Lecture 04
No ratings yet
Lecture 04
15 pages
Lect 5 - Gradient Descent
No ratings yet
Lect 5 - Gradient Descent
31 pages
Math Lecture 4
No ratings yet
Math Lecture 4
27 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Considering Customer Lifetime Network Value in Oligopoly Markets With The Use of Game Theory Approach
No ratings yet
Considering Customer Lifetime Network Value in Oligopoly Markets With The Use of Game Theory Approach
27 pages
MS Broschuere FLUITEX EN Metric
No ratings yet
MS Broschuere FLUITEX EN Metric
12 pages
ProMax LB02A Multifuntion Process Calibrator Datasheet
No ratings yet
ProMax LB02A Multifuntion Process Calibrator Datasheet
5 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
NFL Management Trainee Notification 2024
No ratings yet
NFL Management Trainee Notification 2024
14 pages
Aravind Rangamreddy 500195259 cs3
No ratings yet
Aravind Rangamreddy 500195259 cs3
8 pages
GD Algo
No ratings yet
GD Algo
18 pages
Mit18 S096iap23 Lec06
No ratings yet
Mit18 S096iap23 Lec06
9 pages
Koncar MES IECEx CES14 - 0009X Issue 4 Mot - 7AT 71 315 CURRENT
No ratings yet
Koncar MES IECEx CES14 - 0009X Issue 4 Mot - 7AT 71 315 CURRENT
9 pages
Gradient Descent
No ratings yet
Gradient Descent
12 pages
Gradient Descent
No ratings yet
Gradient Descent
15 pages
Av Log
No ratings yet
Av Log
11 pages
Unit4 Notes
No ratings yet
Unit4 Notes
27 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
What Is Gradient Descent - Built in
No ratings yet
What Is Gradient Descent - Built in
11 pages
Heink 2017
No ratings yet
Heink 2017
9 pages
chp2 Gradient Descent Algorithm
No ratings yet
chp2 Gradient Descent Algorithm
5 pages
Gradient Of A Function هّلادلا رادحنإ
No ratings yet
Gradient Of A Function هّلادلا رادحنإ
11 pages
Optimax Speed - HLP - Hydraulic
No ratings yet
Optimax Speed - HLP - Hydraulic
2 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
Master of Public Management: Admission Requirements
No ratings yet
Master of Public Management: Admission Requirements
3 pages
Gradient DescentSummartyL5
No ratings yet
Gradient DescentSummartyL5
7 pages
AECC Assignment - 2
No ratings yet
AECC Assignment - 2
5 pages
Abrar's Lesson Plan
No ratings yet
Abrar's Lesson Plan
4 pages
Enigma Submission
No ratings yet
Enigma Submission
3 pages
Linear Regression Using Gradient Descent
No ratings yet
Linear Regression Using Gradient Descent
2 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Qs Leadership in Construction
No ratings yet
Qs Leadership in Construction
2 pages
CH 4 - Book Exercise
No ratings yet
CH 4 - Book Exercise
3 pages
Software Evaluation New 2023
No ratings yet
Software Evaluation New 2023
3 pages
BSR Tran Uno Bsu
No ratings yet
BSR Tran Uno Bsu
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Solving Math Problems
From Everand
Solving Math Problems
George N. Frempong
No ratings yet

Optimization and Gradient Descent Algorithm

Uploaded by

Optimization and Gradient Descent Algorithm

Uploaded by

Machine Learning

Optimization and Gradient Descent Algorithm

Instructor: Qamar Askari

TSP 3SAT n-Queen

Thousands of applications are there from every field like biology,

f(x, y) = −(x² + y²) + 4

Rastrigin function for n = 2

What will you do?

Keep repeating these two steps

Keep repeating these two steps

What will happen if

It will move again on

xnew = xold – α df/dx

Height = f(x, y) = 0.5x2 + 0.5y2 + 1

Few examples of differentiable functions

Few examples of non-differentiable functions

You might also like