0% found this document useful (0 votes)

31 views16 pages

ML Lec-6

The document discusses different gradient descent algorithms and techniques for feature scaling in machine learning models. It explains batch gradient descent, stochastic gradient descent, mini-batch gradient descent and their advantages. It also covers standardization and min-max normalization for feature scaling.

Uploaded by

BHARGAV RAO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views16 pages

ML Lec-6

Uploaded by

BHARGAV RAO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

ML

LECTURE-6
BY
Dr. Ramesh Kumar Thakur
Assistant Professor (II)
School Of Computer Engineering
v In Batch Gradient Descent, all the training data is taken into consideration to take a single step.
v We take the average of the gradients of all the training examples and then use that mean gradient to update
our parameters. So that’s just one step of gradient descent in one epoch.
v In batch Gradient Descent since we are using the entire training set, the parameters will be updated only
once per epoch.
v Batch Gradient Descent is great for convex or relatively smooth error manifolds.
v In this case, we move somewhat directly towards an optimum solution.
v The graph of cost vs epochs is also quite smooth because we are averaging over all the gradients of
training data for a single step. The cost keeps on decreasing over the epochs.
v In Batch Gradient Descent we were considering all the examples for every step of Gradient Descent. But
what if our dataset is very huge.
v Suppose our dataset has 5 million examples, then just to take one step the model will have to calculate the
gradients of all the 5 million examples.
v This does not seem an efficient way. To tackle this problem we have Stochastic Gradient Descent.
v In Stochastic Gradient Descent (SGD), we consider just one example at a time to take a single step.
v Since we are considering just one example at a time the cost will fluctuate over the training examples and
it will not necessarily decrease. But in the long run, you will see the cost decreasing with fluctuations.
v Because the cost is so fluctuating, it will never reach the minima but it will keep dancing around it.
v SGD can be used for larger datasets. It converges faster when the dataset is large as it causes updates to
the parameters more frequently.
v Batch Gradient Descent converges directly to minima. SGD converges faster for larger datasets. But, since
in SGD we use only one example at a time, we cannot implement the vectorized implementation on it.
v This can slow down the computations. To tackle this problem, a mixture of Batch Gradient Descent and
SGD is used.
v Neither we use all the dataset all at once nor we use the single example at a time.
v We use a batch of a fixed number of training examples which is less than the actual dataset and call it a
mini-batch. Doing this helps us achieve the advantages of both the former variants (GD and SGD).
v Just like SGD, the average cost over the epochs in mini-batch gradient descent fluctuates because we are
averaging a small number of examples at a time.
v When we are using the mini-batch gradient descent we are updating our parameters frequently as well as
we can use vectorized implementation for faster computations.
v In practice, we often encounter different types of variables in the same dataset.

v A significant issue is that the range of the variables may differ a lot.

v Using the original scale may put more weights on the variables with a large range.
v
v In order to deal with this problem, we need to apply the technique of features rescaling to independent
variables or features of data in the step of data pre-processing.

v The terms normalisation and standardisation are sometimes used interchangeably, but they usually refer to
different things.

v The goal of applying Feature Scaling is to make sure features are on almost the same scale so that each
feature is equally important and make it easier to process by most ML algorithms
v This is a dataset that contains a dependent variable (Purchased) and 3 independent variables (Country, Age,
and Salary). We can easily notice that the variables are not on the same scale because the range of Age is
from 27 to 50, while the range of Salary going from 48 K to 83 K. The range of Salary is much wider than
the range of Age. This will cause some issues in our models since a lot of machine learning models such
as k-means clustering and nearest neighbour classification are based on the Euclidean Distance.
v When we calculate the equation of Euclidean distance, the number of (x2-x1)² is much bigger than the
number of (y2-y1)² which means the Euclidean distance will be dominated by the salary if we do not
apply feature scaling.
v The difference in Age contributes less to the overall difference.
v Therefore, we should use Feature Scaling to bring all values to the same magnitudes and, thus, solve this
issue.
v To do this, there are primarily two methods called Standardisation and Normalisation.
v The result of standardization (or Z-score normalization) is that the features will be rescaled to ensure the
mean and the standard deviation to be 0 and 1, respectively. The equation is shown below:

v This technique is to re-scale features value is useful for the optimization algorithms, such as gradient
descent, that are used within machine learning algorithms that weight inputs (e.g., regression and neural
networks).
v Rescaling is also used for algorithms that use distance measurements, for example, K-Nearest-Neighbours
(KNN).
v Another common approach is the so-called Max-Min Normalization (Min-Max scaling).
v This technique is to re-scales features with a distribution value between 0 and 1.
v For every feature, the minimum value of that feature gets transformed into 0, and the maximum value gets
transformed into 1. The general equation is shown below:
v In contrast to standardisation, we will obtain smaller standard deviations through the process of Max-Min
Normalisation. Let me illustrate more in this area using the above dataset.
v Max-Min Normalisation typically allows us to transform the data with varying scales so that no specific
dimension will dominate the statistics, and it does not require making a very strong assumption about the
distribution of the data, such as k-nearest neighbours and artificial neural networks.
v However, Normalisation does not treat outliners very well.
v On the contrary, standardisation allows users to better handle the outliers and facilitate convergence for
some computational algorithms like gradient descent.
v Therefore, we usually prefer standardisation over Min-Max Normalisation.

Lesson 5 Deep Neural Net Optimization Tuning Interpretability
100% (1)
Lesson 5 Deep Neural Net Optimization Tuning Interpretability
105 pages
Lecture 4
No ratings yet
Lecture 4
101 pages
Feature Scaling (Standardization & Normalization)
No ratings yet
Feature Scaling (Standardization & Normalization)
35 pages
Lecture 3
No ratings yet
Lecture 3
32 pages
Linear Regression With Multiple Variables
No ratings yet
Linear Regression With Multiple Variables
37 pages
Optimization23 22
No ratings yet
Optimization23 22
32 pages
Unit 2 ML 2019
No ratings yet
Unit 2 ML 2019
91 pages
Data Mining
No ratings yet
Data Mining
33 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
GD Types
No ratings yet
GD Types
98 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
04 Optimization
No ratings yet
04 Optimization
62 pages
Lecture 4 - More On Linear Regression and Polynomial Regression
No ratings yet
Lecture 4 - More On Linear Regression and Polynomial Regression
26 pages
Discussion 4 CS771
No ratings yet
Discussion 4 CS771
25 pages
3 - AML - Lecture 3 - Feature Engg
No ratings yet
3 - AML - Lecture 3 - Feature Engg
39 pages
Tut04 - One Algorithm To Optimize Them All
No ratings yet
Tut04 - One Algorithm To Optimize Them All
19 pages
5.feauture Engineering
No ratings yet
5.feauture Engineering
34 pages
UNIT3
No ratings yet
UNIT3
37 pages
UNIT2
No ratings yet
UNIT2
25 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
L4 More On Linear Regression and Polynomial Regression
No ratings yet
L4 More On Linear Regression and Polynomial Regression
37 pages
Scaling Techniques
No ratings yet
Scaling Techniques
30 pages
465-Lecture 10-11
No ratings yet
465-Lecture 10-11
79 pages
Ch2-Training, Optimization and Regularization of DNN-new
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new
114 pages
4 - Gradient Descent and Stochastic GD
No ratings yet
4 - Gradient Descent and Stochastic GD
37 pages
ML - Week 06
No ratings yet
ML - Week 06
31 pages
Lec2 Regression
No ratings yet
Lec2 Regression
96 pages
Linear Models (Unit II) Chapter III 1
No ratings yet
Linear Models (Unit II) Chapter III 1
24 pages
PCA and Convex Optimization and Bias, Variance-2
No ratings yet
PCA and Convex Optimization and Bias, Variance-2
29 pages
Cbse - Class X - Maths Worksheet - Trigonometry
67% (6)
Cbse - Class X - Maths Worksheet - Trigonometry
2 pages
Lec 3
No ratings yet
Lec 3
22 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Optim
No ratings yet
Optim
33 pages
Skript Opt Mach
No ratings yet
Skript Opt Mach
49 pages
Lec 6
No ratings yet
Lec 6
11 pages
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000693 2025-01-08 Reference-Material-I
40 pages
17 Large Scale Machine Learning PDF
No ratings yet
17 Large Scale Machine Learning PDF
10 pages
Lec 7
No ratings yet
Lec 7
9 pages
Session 7 Feature Selection & Dimensionality Reduction
No ratings yet
Session 7 Feature Selection & Dimensionality Reduction
20 pages
Effectiveness of Normalization Pre-Processing of Big Data To The Machine Learning Performance
No ratings yet
Effectiveness of Normalization Pre-Processing of Big Data To The Machine Learning Performance
6 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Op Tim Ization
No ratings yet
Op Tim Ization
9 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
ML
No ratings yet
ML
6 pages
3 Types of Gradient Descent Algorithms For Small & Large Datasets
No ratings yet
3 Types of Gradient Descent Algorithms For Small & Large Datasets
9 pages
14 Efficient Learning
No ratings yet
14 Efficient Learning
7 pages
C1 W2 Lab03 Feature Scaling and Learning Rate Soln
No ratings yet
C1 W2 Lab03 Feature Scaling and Learning Rate Soln
10 pages
Sheet 3 Sol 3
No ratings yet
Sheet 3 Sol 3
3 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Linear Models-Gradient Descent, Regularization (Introduction)
No ratings yet
Linear Models-Gradient Descent, Regularization (Introduction)
26 pages
End Sem
No ratings yet
End Sem
22 pages
9.b Handout-3-GD Variants
No ratings yet
9.b Handout-3-GD Variants
3 pages
Machine Learning Interview Questions
No ratings yet
Machine Learning Interview Questions
8 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
5 Gradients
No ratings yet
5 Gradients
26 pages
Stochastic Gradient Descent - Term Paper
No ratings yet
Stochastic Gradient Descent - Term Paper
8 pages
Feature Scaling
No ratings yet
Feature Scaling
13 pages
Lecture 05 06
No ratings yet
Lecture 05 06
40 pages
Standardization Vs Normalization in Pattern Recognition
No ratings yet
Standardization Vs Normalization in Pattern Recognition
1 page
Vdocument - in ZF Astronic Repair Manual Iveco
No ratings yet
Vdocument - in ZF Astronic Repair Manual Iveco
46 pages
Catia V5 Bending Torsion Tension Shear Tutorial
No ratings yet
Catia V5 Bending Torsion Tension Shear Tutorial
18 pages
Earle Brown - Compositional Process
100% (1)
Earle Brown - Compositional Process
19 pages
Maths2b Ipe QN Bank 20-21
No ratings yet
Maths2b Ipe QN Bank 20-21
18 pages
Bucks Engines 2007 GM Powertrain Owners Manual
100% (1)
Bucks Engines 2007 GM Powertrain Owners Manual
11 pages
SUNLU T3 Manul
No ratings yet
SUNLU T3 Manul
23 pages
Edb Postgres Architecture Deep Dive
No ratings yet
Edb Postgres Architecture Deep Dive
5 pages
Effect of Axial Load On Plastic Modulus
No ratings yet
Effect of Axial Load On Plastic Modulus
6 pages
ML Lec-13
No ratings yet
ML Lec-13
17 pages
Suyono & Hariyanto - 2012 - Relationship Between Internal Control, Internal Audit, and Organization Commitment With Good Governance Indonesian Case
No ratings yet
Suyono & Hariyanto - 2012 - Relationship Between Internal Control, Internal Audit, and Organization Commitment With Good Governance Indonesian Case
10 pages
Es 13 - Module 7 - Flanged Bolt Coupling
No ratings yet
Es 13 - Module 7 - Flanged Bolt Coupling
9 pages
Lab. Activity 6 Boolean Algebra and Simplification of Logic Equations
No ratings yet
Lab. Activity 6 Boolean Algebra and Simplification of Logic Equations
5 pages
Physics 2020 QP Set 1 English
No ratings yet
Physics 2020 QP Set 1 English
10 pages
I-F Plus: FANUC Series 0
No ratings yet
I-F Plus: FANUC Series 0
3 pages
ML Lec-9
No ratings yet
ML Lec-9
13 pages
DW Suite Cours
No ratings yet
DW Suite Cours
111 pages
MSA - Manual
No ratings yet
MSA - Manual
18 pages
ML Lec-11
No ratings yet
ML Lec-11
12 pages
Grade 8-Ls-13-Light-Work Book
No ratings yet
Grade 8-Ls-13-Light-Work Book
6 pages
IPXP One Data Sheet
No ratings yet
IPXP One Data Sheet
8 pages
Cable Crimping (Punching)
No ratings yet
Cable Crimping (Punching)
3 pages
ML Lec-3
No ratings yet
ML Lec-3
11 pages
Long-Term Exposure To Ambient Benzene and Brain Disorders Among Urban Adults
No ratings yet
Long-Term Exposure To Ambient Benzene and Brain Disorders Among Urban Adults
16 pages
ML Lec-10
No ratings yet
ML Lec-10
19 pages
Transfer Learning Using VGG-16 With Deep Convoluti
No ratings yet
Transfer Learning Using VGG-16 With Deep Convoluti
9 pages
Learning Curve
No ratings yet
Learning Curve
4 pages
Turbulent Flow Between Two Parallel Plates
No ratings yet
Turbulent Flow Between Two Parallel Plates
7 pages
ML Lec-8
No ratings yet
ML Lec-8
7 pages
1 s2.0 S2405959520304756 Main
No ratings yet
1 s2.0 S2405959520304756 Main
7 pages
Bar 2
No ratings yet
Bar 2
3 pages
ML Lec-7
No ratings yet
ML Lec-7
12 pages
Maxime Cohen Promo Paper Final
No ratings yet
Maxime Cohen Promo Paper Final
58 pages
38th IMO 1997-FIX
No ratings yet
38th IMO 1997-FIX
6 pages
ML Lec-4
No ratings yet
ML Lec-4
6 pages
Kinematics of Motion: Motion Along A Straight Line
No ratings yet
Kinematics of Motion: Motion Along A Straight Line
26 pages
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
No ratings yet
Reliability of Gait Performance Tests in Men and Women With Hemiparesis After Stroke
8 pages
Physics Questions
No ratings yet
Physics Questions
7 pages
Course Handout Math-I (Ma 1003)
No ratings yet
Course Handout Math-I (Ma 1003)
5 pages
Lecture 4 - 7 - Association Between Variables - Correlation
No ratings yet
Lecture 4 - 7 - Association Between Variables - Correlation
29 pages
Lecture 4 - 5 - Association Between Variables - Describing Association
No ratings yet
Lecture 4 - 5 - Association Between Variables - Describing Association
15 pages
Lecture 4 - 9 - Association Between Categorical and Numerical Variables
No ratings yet
Lecture 4 - 9 - Association Between Categorical and Numerical Variables
8 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

ML Lec-6

Uploaded by

ML Lec-6

Uploaded by

ML

You might also like