0% found this document useful (0 votes)

14 views29 pages

PCA and Convex Optimization and Bias, Variance-2

Uploaded by

shashivarma.2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views29 pages

PCA and Convex Optimization and Bias, Variance-2

Uploaded by

shashivarma.2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Convex Optimization

Convex Optimization
• Convex optimization is a powerful tool for solving optimization problems in various fields such as finance,
engineering, and machine learning.

• In a convex optimization problem, the goal is to find a point that maximizes or minimizes the objective
function.

• Linear functions are convex, so linear programming problems are convex problems.

• A convex function is a function whose graph is always curved upwards, which means that the line segment
connecting any two points on the graph is always above or on the graph itself.

• Convex optimization is critical in training machine learning models, which involves finding the optimal
parameters that minimize a given loss function. In machine learning, convex optimization is used to solve
many problems such as linear regression, logistic regression, support vector machines, and neural
networks.
Convex Optimization
• A real-valued function is called convex if the line segment
between any two distinct points on the graph of the function
lies above the graph between the two points.
Convex Optimization

Optimization problem in standard form

Gradient-Based Optimization

Dr. Selva Kumar S (SCOPE)

Gradient-Based Optimization
• Optimization refers to the task of either minimizing or maximizing some function
f(x) by altering x.

• Optimization problems in terms of minimizing f (x) could be done.

• Maximization may be accomplished via a minimization algorithm by minimizing −f (x)

• The function we want to minimize or maximize is called the objective function or

criterion. When we are minimizing it, we may also call it the cost function, loss
function, or error function

• Value that minimizes or maximizes a function is denoted with a superscript

• for example, we might say x∗ = arg min f (x)

Dr. Selva Kumar S (SCOPE)

Gradient Descent

• Gradient Descent is known as one of the most commonly used

optimization algorithms to train machine learning models.

• Gradient descent is also used to train Neural Networks.

• It minimizes errors between actual and expected results.

Dr. Selva Kumar S (SCOPE)

Gradient Descent Cont’d

Dr. Selva Kumar S (SCOPE)

Gradient Descent Cont’d

The Formula of the Gradient Descent Algorithm:

• Gradients are nothing but a vector whose entries are partial

derivatives of a function.
Dr. Selva Kumar S (SCOPE)
Understanding Gradient Descent

Dr. Selva Kumar S (SCOPE)

Learning rate Difference

(a) Large learning rate, (b) Small learning rate, (c) Optimum learning rate
Dr. Selva Kumar S (SCOPE)
Learning rate Difference
So the important points to remember are

•Positive derivative -> reduce

•Negative derivative -> increase

•High absolute derivative -> large step

•Low absolute derivative -> small step

Dr. Selva Kumar S (SCOPE)

Global Minimum
In the case of the linear regression model, there is only one minimum and it is the global
minimum

The local minimum reached depends on the initial coefficients taken into
consideration. Here, point A, B are termed Local Minimum and point C is
Global Minimum.

Dr. Selva Kumar S (SCOPE)

Different Types of Gradient Descent Algorithms
• Batch gradient descent: When the weight update is calculated based on all
examples in the training dataset, it is called batch gradient descent.

• Stochastic gradient descent: When the weight update is calculated

incrementally after each training example or a small group of training
examples, it is called as stochastic gradient descent.

• Mini-batch gradient descent is a gradient descent modification that divides

the training dataset into small batches that are used to compute model error
and update model coefficients.
Dr. Selva Kumar S (SCOPE)
Issues that might occur
• When training a deep neural network with gradient descent and
backpropagation, we calculate the partial derivatives by moving across
the network from the final output layer to the initial layer.

• With the chain rule, layers that are deeper in the network go through
continuous matrix multiplications to compute their derivatives.

• Due to this process, vanishing gradient, exploding gradient and saddle

point occurs

Dr. Selva Kumar S (SCOPE)

Saddle point
• Saddle point injects confusion into the learning process.

• Learning of the model becomes slow.

• It means that the crucial point achieved is the maximum cost value.

• This saddle point gets the focus when the gradient descent works in multi-dimensions.

Dr. Selva Kumar S (SCOPE)

Solutions
• Changing the architecture

• This solution could be used in both the exploding and vanishing gradient problems but requires a
good understanding and outcomes of the change.

• For example, if we reduce the number of layers in our network, Model complexity is reduced.

• Gradient Clipping for Exploding Gradients

• Carefully monitoring and limiting the size of the gradients whilst our model trains is yet another
solution. This requires some deep knowledge of how the changes could impact the overall
performance.

• Careful Weight Initialization

• A more careful initialization of the model parameters for our network is a partial solution since
it does not solve the problem completely.

Dr. Selva Kumar S (SCOPE)

Limitations
• For a good generalization we should have a large training set, which comes
with a huge computational cost.

• i.e., as the training set grows to billions of examples, the time taken to take a
single gradient step becomes long.

Dr. Selva Kumar S (SCOPE)

Choosing Gradient Descent ?

Dr. Selva Kumar S (SCOPE)

Batch Gradient Descent

Dr. Selva Kumar S (SCOPE)

Batch Gradient Descent Cont’d
• In batch gradient descent, we use all our training data in a single
iteration of the algorithm.

• So, we first pass all the training data through the network and
compute the gradient of the loss function for each sample. Then,
we take the average of the gradients and update the parameters
using the computed average.

Dr. Selva Kumar S (SCOPE)

Stochastic Gradient Descent
• SGD is a variant of the optimization algorithm that saves us both time and
computing space while still looking for the best optimal solution

• Stochastic gradient descent is a variant of gradient descent.

• The process simply takes one random stochastic gradient descent example,
iterates, then improves before moving to the next random example.

• However, because it takes and iterates one example at a time, it tends to

result in more noise than we would normally like.

Dr. Selva Kumar S (SCOPE)

Stochastic Gradient Descent

Dr. Selva Kumar S (SCOPE)

One sample will be used

Dr. Selva Kumar S (SCOPE)

Mini-Batch Descent
• Instead of going through the complete dataset or choosing one random
parameter, Mini-batch gradient descent divides the entire dataset into
randomly picked batches and optimizes it.

• The mini-batch is a fixed number of training examples that is less than the
actual dataset. So, in each iteration, we train the network on a different group
of samples until all samples of the dataset are used.

Dr. Selva Kumar S (SCOPE)

Mini Batch Gradient

Dr. Selva Kumar S (SCOPE)

Issue with GD is accidently getting stuck in local minima, where our loss
can still be HUGE
Dr. Selva Kumar S (SCOPE)
Momentum
• Momentum adds to gradient descent by considering previous gradients
(the slope of the hill prior to where the ball is currently at).

• So in the previous case, instead of stopping when the gradient is 0 at the

first local minimum, momentum will continue to move the ball forward
because it takes into consideration how steep the slope before it was.

Dr. Selva Kumar S (SCOPE)

Momentum Cont’d
• Momentum is all about speeding up and smoothening the process of
gradient descent.

• Notice how the ball is “speeds” up after steeper slopes.

• That’s momentum taking into consideration previous steep gradients and

convincing itself to continue moving, regardless of the local minimum.

• Momentum is a good way to prevent getting stuck in local minima.

• Since momentum constantly considers previous gradients, we can say

that momentum calculates moving averages.
Dr. Selva Kumar S (SCOPE)

Dental Chair Manual
100% (3)
Dental Chair Manual
170 pages
Gradient Descent Deep Learning Lecture
No ratings yet
Gradient Descent Deep Learning Lecture
5 pages
Sand Casting
No ratings yet
Sand Casting
92 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
UNIT2
No ratings yet
UNIT2
25 pages
PES3701 Assignment 3
No ratings yet
PES3701 Assignment 3
3 pages
Chap 4 2
No ratings yet
Chap 4 2
214 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
No ratings yet
Gradient Descent (GD) - GD With Momentum - Nesterov Accelerated GD - Stochastic GD - OrIGINAL
25 pages
Rectus Tema
No ratings yet
Rectus Tema
486 pages
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
100% (1)
Robert D'Onofrio-Delay Analysis UK-US Approaches 2018
9 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Rungta College of Engineering and Technology :: Dr. Vishnu Kumar Mishra :: Report
No ratings yet
Rungta College of Engineering and Technology :: Dr. Vishnu Kumar Mishra :: Report
184 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
BUF16821 DC-DC Ic
100% (1)
BUF16821 DC-DC Ic
31 pages
GD Types
No ratings yet
GD Types
98 pages
STA - Chapter 1 Lesson 3 Principles and Characteristics of Good Speech
No ratings yet
STA - Chapter 1 Lesson 3 Principles and Characteristics of Good Speech
4 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Gradient Descent
No ratings yet
Gradient Descent
52 pages
Gradient Descent Algorithms and Variations - PyImageSearch
No ratings yet
Gradient Descent Algorithms and Variations - PyImageSearch
21 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
07 Optimizers
No ratings yet
07 Optimizers
77 pages
Week 06 - Deep Feedforward Networks - Optimization
No ratings yet
Week 06 - Deep Feedforward Networks - Optimization
83 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
Deep Learning Tutorial 9
No ratings yet
Deep Learning Tutorial 9
70 pages
Linear Models-Gradient Descent, Regularization (Introduction)
No ratings yet
Linear Models-Gradient Descent, Regularization (Introduction)
26 pages
Strategic Environmental Assessment Framework
No ratings yet
Strategic Environmental Assessment Framework
30 pages
Optim
No ratings yet
Optim
33 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
Optimizer
No ratings yet
Optimizer
13 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
Gradient Descent - PR
No ratings yet
Gradient Descent - PR
31 pages
Project Aditya)
No ratings yet
Project Aditya)
82 pages
2025 - Fairview Bio Pi Mock F4
No ratings yet
2025 - Fairview Bio Pi Mock F4
13 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Cambridge International Exam Fees Lists May June 2024
No ratings yet
Cambridge International Exam Fees Lists May June 2024
4 pages
S09 DNN Gradients Wip
No ratings yet
S09 DNN Gradients Wip
28 pages
ML Lec 08 Gradient Descent
No ratings yet
ML Lec 08 Gradient Descent
37 pages
Lecture 7 - Optimization Part I
No ratings yet
Lecture 7 - Optimization Part I
38 pages
Gradient Descent and Cost Function
No ratings yet
Gradient Descent and Cost Function
14 pages
Short Term Tender For Supply, Installation and Commissioning of Various Medical Equipments For Covid-19 Pandemic
No ratings yet
Short Term Tender For Supply, Installation and Commissioning of Various Medical Equipments For Covid-19 Pandemic
108 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
ML - Week 06
No ratings yet
ML - Week 06
31 pages
Transactions - 1
No ratings yet
Transactions - 1
41 pages
FreemanWhite Hybrid Operating Room Design Guide PDF
No ratings yet
FreemanWhite Hybrid Operating Room Design Guide PDF
11 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Gradient Decent
No ratings yet
Gradient Decent
15 pages
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
No ratings yet
Gradient Descent Deep Learning: by T.K. Damodharan Vice President, RBS Reg - No: PC2013003013008
37 pages
Air Pollution: Classification of Air Pollutants
No ratings yet
Air Pollution: Classification of Air Pollutants
33 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
ML Lecture2
No ratings yet
ML Lecture2
36 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Gradient Descent
No ratings yet
Gradient Descent
17 pages
Backpropagation, Sgmiod Neuron & Gradient Discend
No ratings yet
Backpropagation, Sgmiod Neuron & Gradient Discend
29 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
A2mot En5
100% (1)
A2mot En5
5 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
More On Gradient Descent
No ratings yet
More On Gradient Descent
12 pages
Technical Writing
No ratings yet
Technical Writing
9 pages
9-Mm Pistol Pmi Training: REF: FM 23 - 35
No ratings yet
9-Mm Pistol Pmi Training: REF: FM 23 - 35
30 pages
Gradient Descent Algorithm Is A First
No ratings yet
Gradient Descent Algorithm Is A First
5 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
Technical Writing
No ratings yet
Technical Writing
8 pages
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
No ratings yet
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
11 pages
Prof K V Subbaraju
No ratings yet
Prof K V Subbaraju
26 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
Gradient Descent Regression
No ratings yet
Gradient Descent Regression
14 pages
Features Features Features Features
No ratings yet
Features Features Features Features
8 pages
ONLINE PRACTICE 26.7.2021 - EC5-14 (Code: N.2)
No ratings yet
ONLINE PRACTICE 26.7.2021 - EC5-14 (Code: N.2)
13 pages
MATH 115: Lecture XIII Notes
No ratings yet
MATH 115: Lecture XIII Notes
3 pages
Gradient Descent
No ratings yet
Gradient Descent
2 pages
An Exhaust Emissions Based Air-Fuel Ratio Calculation
No ratings yet
An Exhaust Emissions Based Air-Fuel Ratio Calculation
8 pages
An Economic Analysis of Selected Road PR
No ratings yet
An Economic Analysis of Selected Road PR
22 pages
Fundamentals of Aerodynamits: MC Graw Hill
No ratings yet
Fundamentals of Aerodynamits: MC Graw Hill
9 pages
5 Optimizers
No ratings yet
5 Optimizers
10 pages
RW A. Com: An Essay On Criticism
No ratings yet
RW A. Com: An Essay On Criticism
1 page
Accelerator 960-1 052018
No ratings yet
Accelerator 960-1 052018
4 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
14-RMSProp and Adam Optimization-12!08!2024
No ratings yet
14-RMSProp and Adam Optimization-12!08!2024
2 pages
DAILY LESSON LOG Organic Compounds
No ratings yet
DAILY LESSON LOG Organic Compounds
4 pages
Admnadvt
No ratings yet
Admnadvt
2 pages
Ultimate Scrum: 15 Scrum Training Courses In A Day
From Everand
Ultimate Scrum: 15 Scrum Training Courses In A Day
Simon Kneafsey
No ratings yet
Hill Climbing: Fundamentals and Applications
From Everand
Hill Climbing: Fundamentals and Applications
Fouad Sabry
No ratings yet