0% found this document useful (0 votes)

198 views4 pages

Comparative Analysis of Optimizers in Deep Neural Networks

The role of optimizer in deep neural networks model impacts the accuracy of the model. Deep learning comes under the umbrella of parametric approaches; however, it tries to relax as many as assumptions as possible

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

198 views4 pages

Comparative Analysis of Optimizers in Deep Neural Networks

Uploaded by

International Journal of Innovative Science and Research Technology

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and Research Technology

ISSN No:-2456-2165

Comparative Analysis of Optimizers in Deep Neural

Networks
Chitra Desai
Professor
Department of Computer Science
National Defence Academy, Pune, India

Abstract:- The role of optimizer in deep neural networks Training deep learning models are iterative and requires
model impacts the accuracy of the model. Deep learning initial point to be specified to start with and it is this
comes under the umbrella of parametric approaches; initialization that strongly affects most algorithm [2]. The
however, it tries to relax as many as assumptions as classical Stochastic Gradient Descent (SGD) [3] and SGD
possible. The process of obtaining parameters from the with momentum have proven track of their suitability for
data is gradient descent. Gradient descent is the chosen learning deep neural network. Enhancement to existing
optimizer in neural network and many of the machine techniques is inevitable and so came set of adaptive learning
learning algorithms. The classical stochastic gradient methods.
descent (SGD) and SGD with momentum which were
used in deep neural networks had several challenges Adaptive learning methods were developed over a
which were attempted to resolve using adaptive learning period of time to claim their supremacy over classical SGD
optimizers. Adaptive learning algorithms like- and SGD with momentum. However, several studies
RMSprop, Adagrad, Adam wherein learning rate for [4][5][6]show that SGD with momentum proved
each parameter is computed were further developments comparatively better than the adaptive learning methods in
for better optimizer. Adam optimizer in Deep Neural particular Adam optimizer which tends to be a default choice.
Networks is often a default choice observed recently.
Adam optimizer is a combination of RMSprop and The paper aims at analyzing the performance of deep
momentum. Though, Adam since its introduction has neural network by applying different optimizer to the chosen
gained popularity, there are claims that report dataset.
convergence problem with Adam optimizer. Also, it is
advocated that SGD with momentum gives better The dataset is divided into training set and test set. The
performance compared to Adam. This paper presents deep neural network is trained on the training data and tested
comparative analysis of SGD, SGD with momentum, on the test data.
RMSprop, Adagrad and Adam optimizer on Seattle
weather dataset.The Seattle weather dataset, was The paper does not cover the underlying data
processed assuming Adam optimizer will prove to be the preprocessing and deep neural network, the focus here is on
better optimizer choice as preferred a default choice by minimizing the training and validation loss and observing the
many, however, SGD with momentum proved to be a testing loss by changing optimizers. The optimizer used for
unsurpassed optimizer for this particular dataset. comparative study in this paper are SGD, RMSprop,
Adagrad, SGD with momentum and Adam.
Keywords:- Gradient Descent, SGD with momentum
RMSprop, Adagrad and Adam. II. DATA AND DATA PRE-PROCESSING

I. INTRODUCTION The dataset for study used is Seattle, US weather

dataset [7]. It is labelled dataset which consists of 4 feature
Deep learning algorithms involve optimizations. variables – DATE, PRCP, TMAX and TMIN and one target
Optimization refers to minimizing or maximizing an variable RAIN which is categorical having value {0,1}. The
objective function, which, is also called cost function or loss dataset contains 25552 records of daily rainfall patterns from
function. Given a training dataset for deep neural network, 1st Jan 1948 to 12th Dec 2017. The data is preprocessed to
there are attempts to find optimal parameters () that provide input to the deep neural network by checking for
significantly reduce the cost function J (). Gradient descent duplicates, removing null values and splitting of DATE
can be used in deep neural network to find the optimal column DAY, MON and YEAR. Table 1 shows the sample
parameters [1]. data after splitting the column. Scaling applied to the data is
standardization. Data is split into train and test with a ratio
of 80:20.

IJISRT20OCT608 www.ijisrt.com 959

Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
TABLE 1. SAMPLE DATA AFTER SPLITTING DATE COLUMN
PRCP TMAX TMIN RAIN YEAR MON DAY
0.47 51 42 1 1948 1 1
0.59 45 36 1 1948 1 2
0.42 45 35 1 1948 1 3
0.31 45 34 1 1948 1 4
0.17 45 32 1 1948 1 5

The elaborated details of data pre-processing for Gradient descent is the way to minimize objective
Seattle weather data set and the Deep neural architecture as function by updating model’s parameter in the opposite
presented in next session can be referred at [8]. direction of the gradient. When the derivative of the function
𝑓(𝑥) is zero, then it provides no information of the direction
III. DNN ARCHITECTURE DESIGN to move, this point is known as critical point [2]. So, a critical
point is a point with slope zero. When the critical point is
The overall structure of the deep neural network lower than the neighboring points, then it is local minima.
organized into layers to study the impact of different When the critical point is higher than the neighboring points
optimizers is presented here. Deep sequential model is used, then it is local maxima. When the critical point has both
the summary of it is shown in table 1. There are six input higher and lower points in its neighboring than it is called
features for the Seattle weather dataset. The shape of the saddle point.
weights depends upon the shape of the input. The target
variable is binary with output either 0 or 1. At hidden layers Gradient descent is effective for training neural network
ReLu [9] activation function is used at hidden layers and based on small local moves and reaching the global solution.
sigmoid function is used that output layer. Weights are In gradient descent the weights are updated incrementally
initialized using uniform optimizers. after each epoch. There are limits on the performance of any
optimization algorithm that are designed for neural network
Model is compiled by setting the learning rate to 0.001, [11]. There are variants of gradient descent [12] and in this
which is chosen by observing the learning curve by plotting paper we discuss SGD, SGD with momentum, RMSprop,
the objective function as a function of time. As the problem Adagrad, and Adam optimizers for analyzing their
belongs to the class of binary classification, the loss is performance in terms of test accuracy.
calculated using cross entropy. The batch size is set to 64 and
epoch to 10. V. OPTIMIZERS

The data is scaled and split into training and test data. For large training set Stochastic Gradient Descent
The model is initialized and then with different optimizers (SGD) [13]is considered as good learning algorithm to train
the model fit to analyze the performance with respect to each neural networks [10]. It updates the parameters using single
optimizer under study. or very few parameters, where the new update parameter is
given by eq.1, here xi and yi are from the training set. It helps
TABLE 2 MODEL SUMMARY to reduce the variance and lead to stable convergence. α is
Layer (type) Output Shape Param # the learning rate.
dense_1 (Dense) (None, 6) 42
dense_2 (Dense) (None, 4) 28  =  − α∇J(; xi, yi) (1)
dense_3 (Dense) (None, 1) 5
Total Params: 75 If the objective is shallow SGD may tend to oscillate.
Trainable params: 75 This problem can be overcome by adding momentum to
Non-trainable params: 0 SGD.  is the current velocity.γ∈(0,1] determines number of
iterations of the previous gradients are incorporated into the
IV. GRADIENT DESCENT current update.

Given function 𝑦 = 𝑓(𝑥), where 𝑥 and 𝑦 are some real  =  + α∇J(; xi, yi) (2)
𝑑𝑦
numbers. The derivative 𝑑𝑥 of the function 𝑓(𝑥) gives the
= − (3)
slope of 𝑓(𝑥) at a point 𝑥. Derivative is useful in minimizing
the function as it tells how a small change in input 𝑥, makes While implementing SGD with momentum the value of
corresponding change in the output 𝑦. To reduce 𝑓(𝑥), we momentum is set to 0.9 during the experiment.
can move 𝑥 in small steps in opposite direction of the
derivative. This technique is known as Gradient Descent The Adagrad [14] adapts all model parameters by
[10]. scaling them inversely proportional to the square root of the
sum of all the historical squared values of gradient. While
training DNN models, from the beginning of training if the

IJISRT20OCT608 www.ijisrt.com 960

Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
squared gradient starts accumulating, it may lead to very less memory. Adam includes bias correction, RMSprop
premature and excessive decrease in the effective learning lacks correction factor.
rate.
VI. RESULTS
RMSprop [15] is an improvement over Adagrad by
changing the gradient accumulation into an exponentially In this section the results obtained for each optimizer
weighted moving average. are presented.

Adam optimizer [16] short for ‘adaptive moments’ is A. SGD and SGD with Momentum
considered as a variant of RMSprop and momentum with Table 3 shows the results for SGD and SGD
few variations. It is computationally efficient and requires withmomentum.

TABLE 3. SGD AND SGD WITH MOMENTUM

Optimizer Learning Rate Momentum Test Loss Test Accuracy Model Training
Time (Sec)
SGD 0.01 - 0.2160 0.9337 2.54
SGD 0.001 - 0.2007 0.9364 2.34
SGD with 0.01 0.9 0.0115 0.9992 2.11
Momentum

It is observed that no significant change in model In case of SGD with momentum a significant increase
performance is observed with change in learning rate from in test accuracy observed compared to SGD and time taken
0.01 to 0.001 in SGD. The time taken to train the model is for training is also lowest. Figure 1,2 and 3 shows the
comparatively less with learning rate 0.001. accuracy and loss with respect training and validation data
over 10 epochs while the model is being trained.

B. Adaptive Learning Algorithms

Table 4 shows the results for adaptive learning
algorithms-Adagrad, RMSprop and Adam optimizers. From
all the three algorithms it is observed that the model performs
best with Adam optimizer. However, time taken to train the
model Adagrad is low.

TABLE 4 ADGRAD, RMSPROP AND ADAM

FIGURE 1 MODEL ACCURACY AND MODEL LOSS FOR SGD
Optimizer Learning Test Test Model
WITH LEARNING RATE 0.01
Rate Loss Accuracy Training
Time
(Sec)
Adagrad 0.01 0.2667 0.8963 2.12
RMSprop 0.01 0.1407 0.9505 2.28
Adam 0.01 0.0837 0.9771 2.26

Figure 4,5 and 6 shows the accuracy and loss for the
training and validation data for Adagrad, RMSprop and
Adam optimizer respectively. The blue line indicates the
FIGURE 2 MODEL ACCURACY AND MODEL LOSS WITH
training data and the orange indicate the validation data in
LEARNING RATE 0.001
each of the figure above.

FIGURE 3 MODEL ACCURACY AND MODEL LOSS FOR SGD

WITH MOMENTUM FIGURE 4 MODEL ACCURACY AND MODEL LOSS FOR
ADAGRAD

IJISRT20OCT608 www.ijisrt.com 961

Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and Research Technology
ISSN No:-2456-2165
[8]. Chitra Desai, “ Rainfall Prediction Using Deep Neural
Network,” unpublished
[9]. D. Zou, Y. Cao, D. Zhou, and Q. Gu, “Stochastic
gradient descent optimizes over-parameterized deep
relu networks,” arXiv preprint arXiv:1811.08888,
2018.
[10]. A. Cauchy. Methodes generales pour la resolution des
syst‘emes dequations simultanees,. C.R. Acad. Sci.
FIGURE 5 MODEL ACCURACY AND MODEL LOSS OR Par., 25:536–538, 1847.
RMSPROP [11]. AVRIM L. BLUM* AND RONALD L. RIVEST, “
Training a 3-Node Neural Network is NP-Complete”,
Neural Networks, Vol. 5, pp. 117-127, 1992
[12]. Ruder, S. (2016). An overview of gradient descent
optimization algorithms. arXiv
preprint arXiv:1609.04747..
[13]. Bottou L. (2012) Stochastic Gradient Descent Tricks.
In: Montavon G., Orr G.B., Müller KR. (eds) Neural
Networks: Tricks of the Trade. Lecture Notes in
Computer Science, vol 7700. Springer, Berlin,
FIGURE 6 MODEL ACCURACY AND MODEL LOSS FOR ADAM Heidelberg. https://doi.org/10.1007/978-3-642-35289-
8_25
[14]. Duchi, J., Hazan, E., & Singer, Y. (2011). Adaptive
Subgradient Methods for Online Learning and
VII. CONCLUSION
Stochastic Optimization. Journal of Machine Learning
The paper presented the impact of different optimizers Research, 12, 2121–2159.
[15]. Geoffrey Hinton Neural Networks for machine
on the chosen labeled data set. The comparison was mainly
learning nline course.
aimed to ensure that for labeled data set a default choice of
Adam, which a adaptive learning algorithm will give best https://www.coursera.org/learn/neural-
networks/home/welcome
model performance. However, when SGD with momentum
[16]. Diederik P. Kingma and Jimmy Lei Ba. Adam: A
was used, it gave comparatively better result then the
method for stochastic optimization. 2014.
adaptive learning algorithms and in particular Adam
optimizer. The model training time was also the lowest for arXiv:1412.6980v9 (2014)
SGD with momentum compared to other optimizers.

REFERENCES

[1]. S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai,

“Gradient descent finds global minima of deep neural
networks,” ICML, arXiv:1811.03804, 2018.
[2]. Ian Goodfellow, Yoshua Bengio and Aaron Courville,
Deep Learning, MIT Press, 2016
[3]. Robbins, H., & Monro, S. (1951). A stochastic
approximation method. The annals of mathematical
statistics, 400-407
[4]. Huang, G., Liu, Z., Weinberger, K. Q., & van der
Maaten, L. (2017). Densely Connected Convolutional
Networks. In Proceedings of CVPR 2017
[5]. Wu, Y., Schuster, M., Chen, Z., Le, Q. V, Norouzi, M.,
Macherey, W., Dean, J. (2016). Google’s Neural
Machine Translation System: Bridging the Gap
between Human and Machine Translation. arXiv
Preprint arXiv:1609.08144.
[6]. Wilson, A. C., Roelofs, R., Stern, M., Srebro, N., &
Recht, B. (2017). The Marginal Value of Adaptive
Gradient Methods in Machine Learning. arXiv Preprint
arXiv:1705.08292. Retrieved
from http://arxiv.org/abs/1705.08292
[7]. https://www.kaggle.com/rtatman/did-it-rain-in-seattle-
19482017

IJISRT20OCT608 www.ijisrt.com 962

Optimization in Machine Learning
No ratings yet
Optimization in Machine Learning
26 pages
Optimizers and Activation Functions in Deep Learning
No ratings yet
Optimizers and Activation Functions in Deep Learning
15 pages
Unit-1 and 2 and 3
No ratings yet
Unit-1 and 2 and 3
212 pages
HCIP-AI-EI Developer V2.0 Training Material
No ratings yet
HCIP-AI-EI Developer V2.0 Training Material
508 pages
Rajesh (DL Unit3) 06dec2024
No ratings yet
Rajesh (DL Unit3) 06dec2024
67 pages
Assessment of Caregivers' Knowledge and Acceptance of The Human Papilloma Virus Vaccine in Maihula Community, Bali Lga, Taraba State, Nigeria
No ratings yet
Assessment of Caregivers' Knowledge and Acceptance of The Human Papilloma Virus Vaccine in Maihula Community, Bali Lga, Taraba State, Nigeria
8 pages
Mathematics 11 02466 v2
No ratings yet
Mathematics 11 02466 v2
37 pages
Main SGD
No ratings yet
Main SGD
32 pages
Module 2
No ratings yet
Module 2
67 pages
Module2 Question and Answer
No ratings yet
Module2 Question and Answer
25 pages
Otimization 2024 - Ver3
No ratings yet
Otimization 2024 - Ver3
42 pages
Module 1
No ratings yet
Module 1
64 pages
IJISRT20OCT608
No ratings yet
IJISRT20OCT608
5 pages
Deep Learning (MODULE-2)
No ratings yet
Deep Learning (MODULE-2)
86 pages
Ijst 2021 1266
No ratings yet
Ijst 2021 1266
15 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
48 pages
AI Generated Text
No ratings yet
AI Generated Text
13 pages
6705-Article Text-13114-1-10-20210220
No ratings yet
6705-Article Text-13114-1-10-20210220
29 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
DL CS 6 M2 Live Session Flow
No ratings yet
DL CS 6 M2 Live Session Flow
32 pages
21BCP181 Ai 10
No ratings yet
21BCP181 Ai 10
8 pages
Lecture 2 Measuring and Measures of Biodiversity Part I
No ratings yet
Lecture 2 Measuring and Measures of Biodiversity Part I
36 pages
5703 19776 1 PB
No ratings yet
5703 19776 1 PB
7 pages
(Fall 2024) Intro To ML
No ratings yet
(Fall 2024) Intro To ML
51 pages
L5 Training Neural Networks Part 2 en v2
No ratings yet
L5 Training Neural Networks Part 2 en v2
70 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Gen Aiml Notes by Piyush
No ratings yet
Gen Aiml Notes by Piyush
39 pages
Dla-Cat 1
No ratings yet
Dla-Cat 1
37 pages
Evaluation LSTM
No ratings yet
Evaluation LSTM
6 pages
2021 ICORIS MorphNet Impacts On Neural Network Optimizers
No ratings yet
2021 ICORIS MorphNet Impacts On Neural Network Optimizers
5 pages
Adam 1
No ratings yet
Adam 1
11 pages
Gradient Descent Overview
No ratings yet
Gradient Descent Overview
14 pages
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
No ratings yet
Temperature-Energy Relationships and Spatial Distribution Analysis for Nano-Enhanced Phase Change Materials Via Thermal Energy Storage
18 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Optimization Techniques (SGD Alternatives)
No ratings yet
Optimization Techniques (SGD Alternatives)
34 pages
2022 - Neural Optimization Machine-A Neural Network Approach For Optimization
No ratings yet
2022 - Neural Optimization Machine-A Neural Network Approach For Optimization
22 pages
AdamZ Research Paper
No ratings yet
AdamZ Research Paper
13 pages
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
No ratings yet
Fenton Reagent-Based Advanced Oxidation For The Degradation of Reactive Black 5 and Methylene Blue Dyes
17 pages
Recent Advances in Stochastic Gradient Descent in
No ratings yet
Recent Advances in Stochastic Gradient Descent in
23 pages
Adam Optimizer
No ratings yet
Adam Optimizer
14 pages
Search For Binary Companions Around Millisecond Pulsars
No ratings yet
Search For Binary Companions Around Millisecond Pulsars
13 pages
Optimizer
No ratings yet
Optimizer
13 pages
Mustapha 2021 J. Phys. Conf. Ser. 1743 012002
No ratings yet
Mustapha 2021 J. Phys. Conf. Ser. 1743 012002
13 pages
Cst414-Deep Learning Module 2
No ratings yet
Cst414-Deep Learning Module 2
13 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
No ratings yet
Parental Participation and Students' Academic Achievement in Selected Government Aided Secondary Schools in Kibaale Town Council, Rakai District, Uganda
11 pages
A Study of The Optimization Algorithms in Deep Learning
No ratings yet
A Study of The Optimization Algorithms in Deep Learning
4 pages
A Modified Adam Algorithm For Deep Neural Network Optimization
No ratings yet
A Modified Adam Algorithm For Deep Neural Network Optimization
18 pages
Robin Assignment
100% (1)
Robin Assignment
29 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
No ratings yet
Analyzing The Efficiency of Hybrid Explainable AI Models For Feature Extraction and Pattern Recognition in High-Dimensional Data Mining Tasks
12 pages
Adas: Adaptive Scheduling of Stochastic Gradients: Preprint. Under Review
No ratings yet
Adas: Adaptive Scheduling of Stochastic Gradients: Preprint. Under Review
19 pages
Efficacy, Safety, and Feasibility of Verapamil in The Management of Atrial Fibrillation in Emergency Services With Limited Resources: A Systematic Review
No ratings yet
Efficacy, Safety, and Feasibility of Verapamil in The Management of Atrial Fibrillation in Emergency Services With Limited Resources: A Systematic Review
13 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms On Convolutional Neural Networks
No ratings yet
A Comparative Analysis of Gradient Descent-Based Optimization Algorithms On Convolutional Neural Networks
8 pages
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
No ratings yet
Solid Dispersion-Based Approaches for Improving Oral Bioavailability: Current Progress and Future Perspectives
8 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
Social Medias Influence On Modern Language and Communication Skills
No ratings yet
Social Medias Influence On Modern Language and Communication Skills
12 pages
Dental Care Flip Model: Dental Health Education To Improve Dental Health Maintenance Behavior of Elementary School Students
No ratings yet
Dental Care Flip Model: Dental Health Education To Improve Dental Health Maintenance Behavior of Elementary School Students
8 pages
Personal-Professional Attributes of Teachers and Learning Competence of Junior High School Students
No ratings yet
Personal-Professional Attributes of Teachers and Learning Competence of Junior High School Students
28 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Isolated Fallopian Tube Torsion Caused by A Mature Cystic Teratoma: A Rare Case Report
No ratings yet
Isolated Fallopian Tube Torsion Caused by A Mature Cystic Teratoma: A Rare Case Report
6 pages
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
No ratings yet
Reviving Chettinad Architecture: A Cultural Legacy of Tamil Nadu
9 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
All Optimizers
No ratings yet
All Optimizers
13 pages
Neural Networks and Deep Learning: Enhancing Ai Through Neural Network Optimization
No ratings yet
Neural Networks and Deep Learning: Enhancing Ai Through Neural Network Optimization
5 pages
Module 3: Linear Programming: Graphical Method: Learning Outcomes
No ratings yet
Module 3: Linear Programming: Graphical Method: Learning Outcomes
9 pages
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
No ratings yet
NPAs and Profitability in Indian Private Sector Banks: Evidence from a Panel Study
7 pages
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
No ratings yet
Ginkgo Biloba-Derived Flavonoids as Metal Chelators in Alzheimer’s Neurochemistry: A Biochemical Approach
7 pages
Pamectomy in Lobular Breast Cancer
No ratings yet
Pamectomy in Lobular Breast Cancer
3 pages
Cardiovascular Catastrophe in Catastrophic Antiphospholipid Syndrome: A Case Report
No ratings yet
Cardiovascular Catastrophe in Catastrophic Antiphospholipid Syndrome: A Case Report
5 pages
F16midterm Sols v2
No ratings yet
F16midterm Sols v2
14 pages
Survey of FNN
No ratings yet
Survey of FNN
25 pages
Intercalating A Multi-Barreled Approach To Educational and Pedagogical Reform: A Brief Summation of Our Publications On Pedagogy
No ratings yet
Intercalating A Multi-Barreled Approach To Educational and Pedagogical Reform: A Brief Summation of Our Publications On Pedagogy
12 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Quantifying, Measuring, and Correlating Socio - Cultural Variables: An Indispensable Technique For Diverse Fields of The Social Sciences
No ratings yet
Quantifying, Measuring, and Correlating Socio - Cultural Variables: An Indispensable Technique For Diverse Fields of The Social Sciences
12 pages
Optimization
No ratings yet
Optimization
3 pages
Childhood Adversity and Its Echoes in Adult Intimate Relationships
No ratings yet
Childhood Adversity and Its Echoes in Adult Intimate Relationships
9 pages
Gastrointestinal Stromal Tumour (GIST)
No ratings yet
Gastrointestinal Stromal Tumour (GIST)
5 pages
Machine Learning and Deep Learning For State of Art
No ratings yet
Machine Learning and Deep Learning For State of Art
21 pages
Digital Transformation in The Judiciary: Evaluating The Impact of Court Case Management Systems On Reducing Case Backlogs and Enhancing Efficiency in Subordinate Courts of Tamil Nadu
No ratings yet
Digital Transformation in The Judiciary: Evaluating The Impact of Court Case Management Systems On Reducing Case Backlogs and Enhancing Efficiency in Subordinate Courts of Tamil Nadu
2 pages
Parental Influence On Aggression and Self-Esteem Among Young Adults: An Indian Context
No ratings yet
Parental Influence On Aggression and Self-Esteem Among Young Adults: An Indian Context
6 pages
SP18 CS182 Midterm Solutions - Edited
No ratings yet
SP18 CS182 Midterm Solutions - Edited
14 pages
Op Tim Ization
No ratings yet
Op Tim Ization
22 pages
Soft Computing Assignment
No ratings yet
Soft Computing Assignment
9 pages
Renaming Grids "Dynamo Script
No ratings yet
Renaming Grids "Dynamo Script
13 pages
17 Lagrange Interpolation Mathematica Program
No ratings yet
17 Lagrange Interpolation Mathematica Program
14 pages
SPH Modeling Using LS Dyna
No ratings yet
SPH Modeling Using LS Dyna
6 pages
SP18 Practice Midterm
No ratings yet
SP18 Practice Midterm
5 pages
Chem 111 Course Outline
No ratings yet
Chem 111 Course Outline
2 pages
Digital Signal Processing Assignment For Final - Solution
No ratings yet
Digital Signal Processing Assignment For Final - Solution
6 pages
Vired
No ratings yet
Vired
4 pages
Probabilistic Load Forecasting For Integrated Energy System - 2024 - Advances in
No ratings yet
Probabilistic Load Forecasting For Integrated Energy System - 2024 - Advances in
13 pages
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
No ratings yet
IO621PE: MACHINE LEARNING (Professional Elective - II) B.Tech. III Year II Sem. L T P C 3 0 0 3 Course Objectives
1 page
Particle Swarm Optimization (PSO) - NEW
No ratings yet
Particle Swarm Optimization (PSO) - NEW
18 pages
Inventory Theory.S3 Stochastic Inventory Models: Probability Distribution For Demand
No ratings yet
Inventory Theory.S3 Stochastic Inventory Models: Probability Distribution For Demand
4 pages
Modeling Discrete Time Systems in Simulink: ECE 351 - Linear Systems II MATLAB Tutorial #5
No ratings yet
Modeling Discrete Time Systems in Simulink: ECE 351 - Linear Systems II MATLAB Tutorial #5
8 pages
3rd Year Maths Guide 19-20v2 (002) - Pages-7-27
No ratings yet
3rd Year Maths Guide 19-20v2 (002) - Pages-7-27
21 pages
Experiment 5
No ratings yet
Experiment 5
5 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
1516 QS025 - 2 Solution
No ratings yet
1516 QS025 - 2 Solution
21 pages
Recommender Systems
No ratings yet
Recommender Systems
8 pages
Time Series Analysis: Christian Kleiber
No ratings yet
Time Series Analysis: Christian Kleiber
14 pages
Restricted Boltzmann Machines
No ratings yet
Restricted Boltzmann Machines
8 pages
Assignment 3
No ratings yet
Assignment 3
2 pages
CBS3006 - Machine-Learning - Eth - 1.0 - 66 - CBS3006 - 61 Acp
No ratings yet
CBS3006 - Machine-Learning - Eth - 1.0 - 66 - CBS3006 - 61 Acp
2 pages
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
No ratings yet
Innovation of Detector Score Plaque Sensor Based to Improve the Effectiveness and Afficiency of Dental Health Services
7 pages
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
No ratings yet
The Impact of Artificial Intelligence Interventions on Adolescent Mental Health: A Multidimensional Study Using ChatGPT, Gemini, and DeepSeek
8 pages
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
No ratings yet
IMPROVE Floodeye: Integrated Mobile System for Predictive Routing and Optimized Vehicle Navigation Using Ensemble Algorithm
6 pages
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
No ratings yet
From Global Standards to Local Fields: Redefining Labour Through MGNREGS in Kerala’s Tribal Heartlands – An Interrogation of ILO Norms
7 pages
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
No ratings yet
From Resilience to Success: An Appreciative Inquiry into the Experiences of Criminologist Licensure Examination Passers
17 pages
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
No ratings yet
Cementing “Optimization Techniques” in Social Sciences Research: Towards Non-Mathematical Optimization Techniques for the Social Sciences
10 pages
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
No ratings yet
Alzheimer's Disease: Advances in Early Diagnosis and Emerging Therapeutics
4 pages
Nuclear Engineering and Technology: Ahmad Salehi, Mohammad Hosein Kazemi, Omid Safarzadeh
No ratings yet
Nuclear Engineering and Technology: Ahmad Salehi, Mohammad Hosein Kazemi, Omid Safarzadeh
7 pages
Engineering Mathematics Ii Ras203
No ratings yet
Engineering Mathematics Ii Ras203
2 pages
Finish Start: Chapter 02: Project Management Solution: Practice Problems
No ratings yet
Finish Start: Chapter 02: Project Management Solution: Practice Problems
5 pages
4.3 Example of Single Exponential Smoothing - Minitab
No ratings yet
4.3 Example of Single Exponential Smoothing - Minitab
3 pages
MSCS - Algorithm Analysis Assignment 3
No ratings yet
MSCS - Algorithm Analysis Assignment 3
1 page

Comparative Analysis of Optimizers in Deep Neural Networks

Uploaded by

Comparative Analysis of Optimizers in Deep Neural Networks

Uploaded by

Volume 5, Issue 10, October – 2020 International Journal of Innovative Science and Research Technology

Comparative Analysis of Optimizers in Deep Neural

I. INTRODUCTION The dataset for study used is Seattle, US weather

IJISRT20OCT608 www.ijisrt.com 959

IJISRT20OCT608 www.ijisrt.com 960

TABLE 3. SGD AND SGD WITH MOMENTUM

B. Adaptive Learning Algorithms

TABLE 4 ADGRAD, RMSPROP AND ADAM

FIGURE 3 MODEL ACCURACY AND MODEL LOSS FOR SGD

IJISRT20OCT608 www.ijisrt.com 961

[1]. S. S. Du, J. D. Lee, H. Li, L. Wang, and X. Zhai,

IJISRT20OCT608 www.ijisrt.com 962

You might also like