0% found this document useful (0 votes)

6 views5 pages

Gradient Descent

This paper discusses the gradient descent algorithm and its application in machine learning, particularly in logistic regression for binary classification. It compares batch gradient descent, stochastic gradient descent, and mini-batch gradient descent, analyzing their performance in terms of iteration number and loss function. The findings suggest that while stochastic gradient descent converges faster, mini-batch gradient descent offers a balance between speed and stability.

Uploaded by

kumar.ayana2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views5 pages

Gradient Descent

Uploaded by

kumar.ayana2005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

2021 International Conference on Computer Network, Electronic and Automation (ICCNEA)

Research on the Application of Gradient Descent

Algorithm in Machine Learning
2021 International Conference on Computer Network, Electronic and Automation (ICCNEA) | 978-1-6654-4486-6/21/$31.00 ©2021 IEEE | DOI: 10.1109/ICCNEA53019.2021.00014

Xin Wang Liting Yan Qizhi Zhang

1.
1.
School of Electrical School of Electrical 1.
School of Electrical
Engineering, Xi’an Shiyou Engineering, Xi’an Shiyou Engineering, Xi’an Shiyou
University University University
2.
2.
Shaanxi Provincial Key Lab of Shaanxi Provincial Key Lab of 2.
Shaanxi Provincial Key Lab of
Oil and Gas Well Measurement Oil and Gas Well Measurement Oil and Gas Well Measurement
and Control Technology and Control Technology and Control Technology
Xi’an, China Xi’an, China Xi’an, China
e-mail:[email protected] e-mail:[email protected] e-mail: [email protected]
yu.edu.cn

Abstract—The gradient descent algorithm is a type of training effect. As a result, two new gradient descent
optimization algorithm that is widely used to solve machine algorithms are developed: the stochastic gradient descent
learning algorithm model parameters. Through continuous algorithm and the Mini-batch gradient descent algorithm[1].
iteration, it obtains the gradient of the objective function,
In a general sense, a logistic regression analysis model
gradually approaches the optimal solution of the objective
is a linear regression analysis model that is commonly used
function, and finally obtains the minimum loss function and
in data mining, economic forecasting, medicine, and other
related parameters. The gradient descent algorithm is
fields. It's essentially a conditional probability-based
frequently used in the solution process of logical regression,
discrimination model. At present, logistic regression is
which is a common binary classification approach. This paper
mainly used in the medical field, such as exploring the main
compares and analyzes the differences between batch gradient
factors leading to a certain disease or predicting the
descent and its derivative algorithms — stochastic gradient
possibility of the occurrence of the disease according to the
descent algorithm and mini- batch gradient descent algorithm
factors affecting the disease [2].
in terms of iteration number, loss function through
experiments, and provides some suggestions on how to pick the II. GRADIENT DESCENT ALGORITHM
best algorithm for the logistic regression binary task in The gradient descent method is the most basic method
machine learning. for solving unconstrained optimization problems, as it
considers the negative gradient direction to be the minimum
Keywords—Gradient Descent, Logistic Regression, Machine
of the objective function [3]. During training for algorithms
Learning
like neural networks and regression analysis, gradient
I. INTRODUCTION descent is commonly used to minimize the loss function.
Gradient descent is an iterative optimization algorithm A. Gradient descent algorithm derivation
that finds the smallest value of a function. Through
The general linear regression problem can be expressed
continuous iteration, it obtains the gradient of the objective
as:
function, gradually approaches the optimal solution of the
objective function, and finally obtains the smallest loss n
T
h ( x) x
0 0 x
1 1 x
n n x
i i x (1)
function and related parameters. The conventional gradient i 0

descent algorithm trains all of the samples every time,

Where: h ( x) is the predicted value of linear
which extends the training time and may affect the final

978-1-6654-4486-6/21/$31.00 ©2021 IEEE 11

DOI 10.1109/ICCNEA53019.2021.00014
Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on January 25,2025 at 10:31:56 UTC from IEEE Xplore. Restrictions apply.
regression; 0 is the Bias; 1 n is the parameters that easier to find the best solution. However, if the number of
need to be solved; x1 xn is the sample attribute. samples and attribute values is huge, the input matrix made
up of samples will be large, resulting in a slow iteration
In fact, the predicted value of linear regression is not
speed and failure to achieve the desired effect.
capable of correspond to the real value, and there is often an
error between the predicted value and the real value. We 1) Stochastic gradient descent algorithm
The stochastic gradient descent algorithm randomly
call this error item selects one sample from all samples for iterative training in
each iteration [4][5]. This approach needs less computation
(i ) T (i ) (i )
y x (2) in each iteration for large-scale sample data, and the
convergence speed is clearly faster than other algorithms,
T
Where: i is sample number; x(i ) is predictive value;
resulting in high performance. The iterative formula of
y ( i ) is true value. It is assumed that the error terms are stochastic gradient descent algorithm is:
independently identically distributed and follow a Gaussian
distribution with a mean of zero and a variance of 2 . *
j j ( y (i ) h ( x(i ) )) x j (i ) (7)
Equation (2) is substituted into the Gaussian distribution
expression of the error term : 2) Mini-batch gradient descent algorithm
Although the convergence speed of the stochastic
1 ( y (i ) T
x(i ) )2 gradient descent algorithm is fast, each iteration of a
p( y (i ) | x ( i ) ; ) exp( 2
) (3)
2 2 randomly selected sample cannot guarantee the direction of
convergence for each iteration [6]-[7]. The final outcome
Take logarithm on both sides of Equation (3), and could be worse if the randomly chosen sample points are
finally get the loss function: irregular points. The advantages and drawbacks of batch

m
gradient descent and stochastic gradient descent algorithms
1
J( ) (h ( x (i ) ) y ( i ) ) 2 (4) are thoroughly considered in mini-batch, and the required
2m i 1
number of samples from all the samples are chosen for
Where: m is number of training samples; x is (i ) training in each iteration. The iterative formula of gradient
attributes of the sample; y ( i ) is the predicted value of each descent in mini-batch:
sample.
Take the partial derivative of the loss function J ( ) * 1 i 63 ( k )
j j (y h ( x( k ) ))x j (i ) (8)
64 k i
with respect to j and set the partial derivative equal to
zero:
III. LOGICAL EGRESSION
m
J( ) 1 Logical regression is a probabilistic regression model,
(h ( x (i ) ) y (i ) )x j (i ) 0 (5)
j m i 1 which is a kind of generalized linear regression model
[8]-[9]. It mainly reflects a mapping relationship between
According to Equation (5), the calculation formula for
independent variables and dependent variables with
solving the parameter vector is:
dichotomous properties, that is, the known independent
1 m variables can be used to predict the values of a group of
*
( y (i ) h ( x(i ) ))x j (i ) (6)
j j
m i 1
discrete variables.

1) Sigmoid function
The iterative algorithm of Equation (6) is also known as Logical regression maps any input to the interval [0,1]
the batch gradient descent algorithm. by introducing the Sigmoid function. The regression
B. Improved gradient descent algorithm predicted value is mapped to the Sigmoid function to

Batch gradient descent algorithm needs to consider all complete the transformation from numerical value to

samples in the training process. As a result, this algorithm probability value, resulting in the classification prediction.

will select the overall best path in each iteration, making it The sigmoid function as follows:

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on January 25,2025 at 10:31:56 UTC from IEEE Xplore. Restrictions apply.
1 IV. CASE ANALYSIS
g ( z) z
(9)
1 e
The instance selected data set is a data set that predicts
It can be seen from the sigmoid function that the range whether the student is admitted. The student's two test
of independent variable z is ( , ) , and the range of scores are the input independent variable, and the
function value is [0,1] . The relationship between probability of admission is the output dependent variable.
independent and dependent variables is:
The dataset consists of 100 samples. Partial data of the
1 , z 0 sample are shown in Table 1.
g ( z) 0.5 , z 0 (10)
TABLE I. PARTIAL SAMPLE DATA
0 , z 0
Score 1 Score 2 Admitted
Because of the Sigmoid function's unique relationship
1 95.8615507093572 38.22527805795094 0
between the independent and dependent variables, we can
2 75.01365838958247 30.60326323428011 0
consider all samples with function values greater than or
equal to 0.5 to be positive when making classification 3 82.30705337399482 76.48196330235604 1

predictions and all samples with a function value less than 4 69.36458875970939 97.71869196188608 1
0.5 are classified as negative.
5 39.53833914367223 76.03681085115882 0
2) Logic regression solving
The prediction function h ( x) of linear regression can Where: 0 indicates that the student not admitted;1
be written: means the student admitted.
1) Comparison of convergence speed of different
T 1
h ( x) g( x) T
x
(11) gradient descent algorithms with the same number of
1 e iterations
Suppose the linear fitting function is:
Therefore, the probability of obtaining the positive is
h ( x) , the probability of obtaining the negative is
h ( x) 0 x
1 1 2 2x (15)
1 h ( x)
According to the parameter update formulas of three
p( y | x; ) (h ( x)) y (1 h ( x))1 y
(12)
different gradient descent algorithms, 8000 iterations were
carried out, and the output results are shown in Table 2.
So, the likelihood function is:

m m TABLE II. GRADIENT DESCENT ALGORITHM PARAMETER VALUES

L( ) p( y | x; ) (h ( xi )) yi (1 h ( xi ))1 yi
(13)
i 1 i 1
0 1 2

The logarithm is taken on both sides of the equation BGD -0.00048126 0.00777571 0.00305656

(13), and the iterative formula of the parameter is obtained SGD -0.00049396 0.00782276 0.00268035
according to the gradient decrease method is:
Mini-BGD -0.00048275 0.00770747 0.00294887

m
* 1
j j (h ( xi ) yi ) x j (i ) (14)
m i 1

The logical regression model predicts not only the

actual "category," but also its approximate probability
estimation, also known as "confidence." Furthermore, the
classification is directly modelled, and there is no need to
assume data distribution in advance to avoid problems
caused by incorrect distribution.

Fig. 1. Diagram of loss and number of iterations under BGD

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on January 25,2025 at 10:31:56 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Diagram of loss and number of iterations under SGD Fig. 5. SGD, Iterations=15000, learning rate=0.000002

The loss function fluctuates significantly when the

number of iterations is small and the learning rate is high,
as seen in Figure 4 and Figure 5. The loss function has
modified significantly after increasing the number of
iterations and decreasing the learning rate. However, the
stability remains low, necessitating a slow learning pace to
meet it.

Fig. 3. Diagram of loss and number of iterations under Mini-BGD

It can be obtained from the graph: The loss function of

stochastic gradient descent algorithm has obvious
fluctuation near the optimal point, which indicates that it
has more noise points. The convergence rate is slightly
faster when the mini-batch gradient descent method is used.

2) The Influence of Iteration Number and Learning

Rate on Gradient Descent Algorithm Fig. 6. Diagram of loss and number of iterations under Mini-BGD

It can be seen from the gradient descent method's

derivation process that iteration times and learning rate
have a significant impact on the final loss. If the learning
rate is too slow, there will be too many iterations. If the
learning rate is too fast, the local minimum can be missed,
resulting in an inability to converge.

Fig. 7. Diagram of loss and number of iterations under SGD

In order to reduce the loss function smaller, the data

samples are standardized. It can be seen from the results in
Figure 6 and Figure 7, the loss value can reach 0.22 after
data standardization, which is a great improvement
compared with the previous one. Stochastic gradient
Fig. 4. SGD, Iterations=5000, learning rate=0.001
descent is faster, but requires more iterations; Despite the
fact that mini-batch gradient descent is slower than

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on January 25,2025 at 10:31:56 UTC from IEEE Xplore. Restrictions apply.
stochastic gradient descent, the number of iterations is fluctuate a lot, which is not good for the final result.
nearly 10,000 times lower.
ACKNOWLEDGMENT
V. CONCLUSIONS This research is supported by the foundation project:
Different gradient descent methods can be used in Xi’an Shiyou University’s Graduate Innovation and
realistic applications depending on the data sets. The Practical Ability Training Program.
stochastic gradient descent method has more noise points
REFERENCES
when the training data set sample size is small, and the
[1] Chen X W, Lin X. Big data deep learning: challenges and
fluctuation near the optimal point is evident, making it easy perspectives [J]. IEEE access, 2014, 2: 514-525.
to fall into the local optimal solution; When the sample size [2] Langer D L, Van der Kwast T H, Evans A J, et al. Prostate cancer
detection with multi parametric MRI: Logistic regression analysis
of the training data set is large, the batch gradient descent of quantitative T2, diffusion weighted imaging, and dynamic
contrast enhanced MRI [J]. Journal of Magnetic Resonance
algorithm's complexity is high, and parameter iteration's Imaging: An Official Journal of the International Society for
Magnetic Resonance in Medicine, 2009, 30(2): 327-334.
convergence rate is slow. When the stochastic gradient
[3] Qu Q, Zhang Y, Eldar Y C, et al. Convolutional phase retrieval via
descent method is used to set the parameters, however, it gradient descent[J]. IEEE Transactions on Information Theory, 2019,
66(3): 1785-1821.
typically only takes a few iterations to achieve a better
[4] Zhou F, Cong G. On the convergence properties of a $ K $-step
fitting effect. Furthermore, the mini-batch gradient descent averaging stochastic gradient descent algorithm for nonconvex
optimization [J]. arXiv preprint arXiv:1708.01012, 2017.
algorithm is obviously superior to other algorithms in terms [5] Manogaran G, Lopez D. Health data analytics using scalable logistic
of loss function and convergence speed. regression with stochastic gradient descent [J]. International Journal
of Advanced Intelligence Paradigms, 2018, 10(1-2): 118-132.
[6] Huo Z, Huang H. Asynchronous mini-batch gradient descent with
When using gradient descent method to optimize variance reduction for non-convex optimization[C]//Proceedings of
parameters, the selection of learning rate and iteration times the AAAI Conference on Artificial Intelligence. 2017, 31(1).
[7] Yazan E, Talu M F. Comparison of the stochastic gradient descent
is particularly important. The loss function will converge based optimization techniques[C]//2017 International Artificial
slowly if the learning rate is too low; if the learning rate is Intelligence and Data Processing Symposium (IDAP). IEEE, 2017:
1-5.
too high, it is likely to exceed the global optimum. [8] Sur P, Candès E J. A modern maximum-likelihood theory for
high-dimensional logistic regression[J]. Proceedings of the National
Continuously, if the number of iterations is too large, the Academy of Sciences, 2019, 116(29): 14516-14525.
loss function will fall into the local optimal solution; if the [9] Kuha J, Mills C. On group comparisons with logistic regression
models [J]. Sociological Methods & Research, 2020, 49(2): 498-525.
number of iterations is too low, the loss function will

Authorized licensed use limited to: Netaji Subhas University of Technology New Delhi. Downloaded on January 25,2025 at 10:31:56 UTC from IEEE Xplore. Restrictions apply.

All Recivers Master Code
50% (2)
All Recivers Master Code
20 pages
Vienna Superautomatica Parts Diagram
0% (1)
Vienna Superautomatica Parts Diagram
8 pages
Engineering Safety Handbook - Rev March 2023
No ratings yet
Engineering Safety Handbook - Rev March 2023
64 pages
Logistic Regression Example
100% (1)
Logistic Regression Example
22 pages
CS229 Lecture 2 PDF
100% (1)
CS229 Lecture 2 PDF
48 pages
How To Slow Down Time With Your Mind
50% (2)
How To Slow Down Time With Your Mind
7 pages
Linear Regression
No ratings yet
Linear Regression
75 pages
Gradient Descent Algorithm in Machine Learning
No ratings yet
Gradient Descent Algorithm in Machine Learning
21 pages
Sipass Integrated Afi5100: Installation Manual
No ratings yet
Sipass Integrated Afi5100: Installation Manual
14 pages
Ntop Resources - Lattices
No ratings yet
Ntop Resources - Lattices
3 pages
Art Appreciation: Activity 7: (Answers Can Be Encoded or Written On A Sheet of Paper)
No ratings yet
Art Appreciation: Activity 7: (Answers Can Be Encoded or Written On A Sheet of Paper)
3 pages
JHA Painting
100% (1)
JHA Painting
9 pages
Module3 Ch1
No ratings yet
Module3 Ch1
83 pages
DeepLearning Lect2 3
No ratings yet
DeepLearning Lect2 3
89 pages
Wilderness A Survival Adventure Dos 04bs
No ratings yet
Wilderness A Survival Adventure Dos 04bs
57 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
LittleShrub Report - Thermal Bridging and Thermal Break
No ratings yet
LittleShrub Report - Thermal Bridging and Thermal Break
49 pages
Lec 9-10 Gradient Descent
No ratings yet
Lec 9-10 Gradient Descent
60 pages
Key Performance Indicators: HRSG Plant
No ratings yet
Key Performance Indicators: HRSG Plant
57 pages
01B DL2023 LinearModels
No ratings yet
01B DL2023 LinearModels
47 pages
LinearRegression Annotated
No ratings yet
LinearRegression Annotated
116 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Linear Regression
No ratings yet
Linear Regression
63 pages
A Layman's Guide To The Project
No ratings yet
A Layman's Guide To The Project
34 pages
DIY SMS Gateway - Send Messages Anytime, Anywhere
No ratings yet
DIY SMS Gateway - Send Messages Anytime, Anywhere
27 pages
SPRING Add-On CL V1 1 Protected FR
No ratings yet
SPRING Add-On CL V1 1 Protected FR
42 pages
Lec6 7 Linear Regression
No ratings yet
Lec6 7 Linear Regression
38 pages
Gradient Decent
No ratings yet
Gradient Decent
40 pages
Mlfa Autumn 22 Lec 04
No ratings yet
Mlfa Autumn 22 Lec 04
24 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
UNIT III Part-2
No ratings yet
UNIT III Part-2
39 pages
Fileml
No ratings yet
Fileml
54 pages
Ece Result 6TH Sem Ipu
No ratings yet
Ece Result 6TH Sem Ipu
325 pages
5.1loss Function, Optimization, GD
No ratings yet
5.1loss Function, Optimization, GD
39 pages
Module 3
No ratings yet
Module 3
27 pages
Final ML
No ratings yet
Final ML
54 pages
Lecture 8: Gradient Descent and Logistic Regression
No ratings yet
Lecture 8: Gradient Descent and Logistic Regression
39 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
54 pages
Gradient Descent
No ratings yet
Gradient Descent
13 pages
Study - Id66039 - Electric Vehicles in China
No ratings yet
Study - Id66039 - Electric Vehicles in China
31 pages
SIM900 EVB Kit User Guide V1.04
No ratings yet
SIM900 EVB Kit User Guide V1.04
23 pages
Lecture3 Upload
No ratings yet
Lecture3 Upload
28 pages
Lec 3
No ratings yet
Lec 3
22 pages
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
No ratings yet
Gradient Descent Algorithm in Machine Learning: Dr. P. K. Chaurasia
24 pages
Chapter04 Training Models
No ratings yet
Chapter04 Training Models
33 pages
Notes Unit 1-3 Part-III
No ratings yet
Notes Unit 1-3 Part-III
25 pages
Lecture 08 ML
No ratings yet
Lecture 08 ML
20 pages
Solutions To Some Questions in Microproccessor
No ratings yet
Solutions To Some Questions in Microproccessor
25 pages
Lecture 04
No ratings yet
Lecture 04
24 pages
ML Notes
No ratings yet
ML Notes
14 pages
Effects of Green Seaweeds (Ulva SP.) As Feed Supplements in Red Tilapia Diets
No ratings yet
Effects of Green Seaweeds (Ulva SP.) As Feed Supplements in Red Tilapia Diets
16 pages
Gradient Descent Based Learners
No ratings yet
Gradient Descent Based Learners
11 pages
AI Agents Notes
No ratings yet
AI Agents Notes
26 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
25 pages
Gradient Descent DS Rohit Sharma Fench Knjs
No ratings yet
Gradient Descent DS Rohit Sharma Fench Knjs
15 pages
UNIT2
No ratings yet
UNIT2
25 pages
Unit VI Optimization Techniques Question Bank Solved Answer
No ratings yet
Unit VI Optimization Techniques Question Bank Solved Answer
20 pages
Regression
No ratings yet
Regression
25 pages
3.linear Regression
No ratings yet
3.linear Regression
18 pages
Sermon YOLO
No ratings yet
Sermon YOLO
21 pages
Machine Learning - SoS 2017
No ratings yet
Machine Learning - SoS 2017
15 pages
Papilledema: Epidemiology, Etiology, and Clinical Management
No ratings yet
Papilledema: Epidemiology, Etiology, and Clinical Management
11 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
Memory Interfacing - 20th Jan
No ratings yet
Memory Interfacing - 20th Jan
17 pages
BS EN 10277 5 2008 Bright Steel Products Steel For Quenching and Tempering Part 5 General
No ratings yet
BS EN 10277 5 2008 Bright Steel Products Steel For Quenching and Tempering Part 5 General
11 pages
Lecture Notes 5 Linear Regression
No ratings yet
Lecture Notes 5 Linear Regression
11 pages
Gradient Descent
No ratings yet
Gradient Descent
8 pages
Khilgaon Flyover
No ratings yet
Khilgaon Flyover
18 pages
(Machine Learning Coursera) Lecture Note Week 1
No ratings yet
(Machine Learning Coursera) Lecture Note Week 1
8 pages
Criteria For Funding Under Bharatmala, Gati Shakti and Sagarmala
No ratings yet
Criteria For Funding Under Bharatmala, Gati Shakti and Sagarmala
7 pages
Week 1 Lecture Notes
No ratings yet
Week 1 Lecture Notes
7 pages
AI33
No ratings yet
AI33
6 pages
NN WK 3 Lec 5 6 Gradient Descent
No ratings yet
NN WK 3 Lec 5 6 Gradient Descent
7 pages
Basic Machine Learning: Case Study
No ratings yet
Basic Machine Learning: Case Study
11 pages
Gradient Descent Algorithm.Y...
No ratings yet
Gradient Descent Algorithm.Y...
10 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
L23 Stochastic Gradient and Mini Batch
No ratings yet
L23 Stochastic Gradient and Mini Batch
9 pages
CIDAM World Religions
100% (16)
CIDAM World Religions
18 pages
Yash 21bsds12
No ratings yet
Yash 21bsds12
3 pages
Gradient Descent
No ratings yet
Gradient Descent
4 pages
Lesson 4 Gradient Descent
No ratings yet
Lesson 4 Gradient Descent
13 pages
Screeningthe Antibacterial Activityof Moringa Oleifera Leavesand
No ratings yet
Screeningthe Antibacterial Activityof Moringa Oleifera Leavesand
5 pages
cs188 Fa22 Note21
No ratings yet
cs188 Fa22 Note21
4 pages
Jurnal: PENICILLIN PRODUCTION BY MUTANT OF Penicillium Chrysogenum
No ratings yet
Jurnal: PENICILLIN PRODUCTION BY MUTANT OF Penicillium Chrysogenum
5 pages
Assignment Solution For Microprocessor
No ratings yet
Assignment Solution For Microprocessor
3 pages
Evaluation of A Sensor For Low Interface Pressure Application
No ratings yet
Evaluation of A Sensor For Low Interface Pressure Application
8 pages
cs188 Fa23 Note23
No ratings yet
cs188 Fa23 Note23
2 pages
Nuseed TechSheet Trifecta 2024-East
No ratings yet
Nuseed TechSheet Trifecta 2024-East
2 pages
Reading Froudekrylov
No ratings yet
Reading Froudekrylov
6 pages
Books of M.A I-Ii Etc
No ratings yet
Books of M.A I-Ii Etc
3 pages
Mastering O'Level Islamiyat
98% (47)
Mastering O'Level Islamiyat
343 pages

Gradient Descent

Uploaded by

Gradient Descent

Uploaded by

2021 International Conference on Computer Network, Electronic and Automation (ICCNEA)

Research on the Application of Gradient Descent

Xin Wang Liting Yan Qizhi Zhang

descent algorithm trains all of the samples every time,

978-1-6654-4486-6/21/$31.00 ©2021 IEEE 11

m m TABLE II. GRADIENT DESCENT ALGORITHM PARAMETER VALUES

The logical regression model predicts not only the

Fig. 1. Diagram of loss and number of iterations under BGD

The loss function fluctuates significantly when the

Fig. 3. Diagram of loss and number of iterations under Mini-BGD

It can be obtained from the graph: The loss function of

2) The Influence of Iteration Number and Learning

It can be seen from the gradient descent method's

Fig. 7. Diagram of loss and number of iterations under SGD

In order to reduce the loss function smaller, the data

You might also like