0% found this document useful (0 votes)
39 views

MLP Encoder Decoder

The document discusses different optimization algorithms used in machine learning including stochastic gradient descent, gradient descent with momentum, Adagrad, Adam, and gradient descent with adaptive learning rate. It provides details on each algorithm and compares their advantages and disadvantages. Experimental results on these algorithms using a machine learning model are also presented.

Uploaded by

Aya Helmy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

MLP Encoder Decoder

The document discusses different optimization algorithms used in machine learning including stochastic gradient descent, gradient descent with momentum, Adagrad, Adam, and gradient descent with adaptive learning rate. It provides details on each algorithm and compares their advantages and disadvantages. Experimental results on these algorithms using a machine learning model are also presented.

Uploaded by

Aya Helmy
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Milestone1

computational intelligence
Names
Omar Emad EL-din Helmy 1900808

Aml Gamal Mohamed 1900339

Aya Ahmed Helmy 1900201

Under supervision of
DR : Hossam Eldin Hassan
Eng: Yehia Zakr
CONTENTS
Names .................................................................................................................................. 0
problem definition and importance ............................................................................... 3
Methods and Algorithms .................................................................................................. 4

1-Stochastic Gradient Descent ........................................................................................... 4

2-gRADIENT DESCENT WITH MOMENTUM ......................................................................... 4

3-ADAGRAD optimizer ....................................................................................................... 4

4-ADAm optimizer ............................................................................................................. 5

5-Gradient descent with adaptive learning rate. .................................................................. 5


Experimental Results ........................................................................................................ 6

1- SGD ................................................................................................................................. 6
Trial 1: ............................................................................................................................ 6
Trial 2: ............................................................................................................................ 6
Result: ............................................................................................................................. 6

2- Gradient descent with momentum ................................................................................ 7


Trial 1: ............................................................................................................................ 7
Trial 2: ............................................................................................................................ 7
Result: ............................................................................................................................. 7

3- Adagrad .......................................................................................................................... 7

Trial 1: ............................................................................................................................ 8
Trial 2: ............................................................................................................................ 8
Result: ............................................................................................................................. 8

4- Adam .............................................................................................................................. 8

Trial 1: ............................................................................................................................ 8
Trial 2: ............................................................................................................................ 9
Result: ............................................................................................................................. 9

5- Gradient descent with adaptive learning rate ............................................................... 9

1
Trial 1: ............................................................................................................................ 9
result: ............................................................................................................................ 10
discussions ........................................................................................................................ 11

Test accuracy with different optimizers: ......................................................................... 11

Training LOSS: ................................................................................................................. 11

Optimizer performance: .................................................................................................. 11

Learning rate adaptation: ................................................................................................ 12


APPENDIX A: .................................................................................................................... 13

2
PROBLEM DEFINITION AND IMPORTANCE

The concept of optimization is integral to machine learning. Most machine learning


models use training data to learn the relationship between input and output data.
The models can then be used to make predictions about trends or classify new
input data. This training is a process of optimization, as each iteration aims to
improve the model’s accuracy and lower the margin of error.

The process of optimization aims to lower the risk of errors or loss from these
predictions and improve the accuracy of the model. Without the process of
optimization, there would be no learning and development of algorithms. So, the very
premise of machine learning relies on a form of function optimization.

Hyperparameters elements like the learning rate or number of classifications clusters


and is a way of refining a model to fit a specific dataset are vital to achieving an
accurate model. The selection of the right model configurations has a direct impact on
the accuracy of the model and its ability to achieve specific tasks.

The concept of optimization is integral to machine learning. Most machine learning


models use training data to learn the relationship between input and output data.
The models can then be used to make predictions about trends or classify new input
data. This training is a process of optimization, as each iteration aims to improve
the model’s accuracy and lower the margin of error.

3
METHODS AND ALGORITHMS
1-STOCHASTIC GRADIENT DESCENT

is a variant of the Gradient Descent algorithm that is used for optimizing machine
learning models. It addresses the computational inefficiency of traditional Gradient
Descent methods when dealing with large datasets in machine learning projects.
Advantages 1-SGD is faster than other variants of Gradient
Descent, it is memory-efficient and can handle large datasets .
Disadvantages:-The updates in SGD are noisy and have a high
variance, SGD may require more iterations to converge to the
minimum
2-GRADIENT DESCENT WITH MOMENTUM

Momentum-based gradient descent adds a momentum term to the update rule. The
momentum term is computed as a moving average of the past gradients. It helps to
accelerate the optimization process
Advantages :-Escape local minima and saddle points, Reduces model complexity and
prevents overfitting
Disadvantage:-If the momentum is too much,
we could just swing back and forward between
the local minima.
3-ADAGRAD OPTIMIZER

Adagrad (Adaptive Gradient Algorithm) is an optimization algorithm that adapts the


learning rates of each parameter during training
based on the historical gradient information.
Advantages:-Adjusts learning rates individually
for each parameter
Disadvantages:-Tends to reduce learning rates
over time, potentially leading to slow convergence,
Performance can be sensitive to the choice of the initial learning rate

4
4-ADAM OPTIMIZER

The ADAM optimizer, short for Adaptive Moment Estimation, is a powerful


optimization algorithm, automatically adjusts the learning rate for each parameter
individually.
Advantages:- Faster convergence, Faster convergence, Good for large models
Disadvantages:- High memory usage, Updates made by ADAM are less clear
compared to other optimizers.
5-GRADIENT DESCENT WITH ADAPTIVE LEARNING RATE.
is an iterative first-order optimization algorithm, used to find a local
minimum/maximum of a given function. This method is commonly used in machine
learning (ML) and deep learning (DL) to minimize a cost/loss function (e.g. in a linear
regression). Due to its importance and ease of implementation. The gradient descent
depends on some parameters one of them is the learning rate. One of the ways we
want to update our learning rate throughout the training process is adaptive
learning rate:
it’s an optimization technique for the GD to reach the optimum learning rate. It’s
whole concept depends on changing the learning rate and lessen it when the value
tends to go to the minimum value. It can done by doing bunch of instructions.
1- first, we have to set large value for the learning rate and instantiate value
for 𝛽= 0,7

2- then we calculate the new gradient descent step

𝑋𝑛+1 = 𝑋𝑛 + 𝜂∇𝑓(𝑋𝑛 )

3- we check if the loss of the new step is bigger than the previous, we
multiply the learning rate with the beta and then update it and go to step till
divergence

𝑓(𝑋𝑛 + 1) > 𝑓(𝑋𝑛 ) then 𝜂 = 𝛽 ∗ 𝜂 then go to step 2 again.

5
EXPERIMENTAL RESULTS
All models with:
• num_epochs = 5
• batch_size = 16

1- SGD
Using the default parameters of the optimizer function which are:
• learning_rate = 0.01
• momentum = 0.0

TRIAL 1:

TRIAL 2:

RESULT:

6
2- GRADIENT DESCENT WITH MOMENTUM
Using the default parameters of the optimizer function which are:
• learning_rate=0.01

and changing the momentum to 0.9

TRIAL 1:

TRIAL 2:

RESULT:

3- ADAGRAD
Using the default parameters of the optimizer function which are:
• learning_rate=0.001
• initial_accumulator_value=0.1
• epsilon=1e-7

7
TRIAL 1:

TRIAL 2:

RESULT:

4- ADAM
Using the default parameters of the optimizer function which are:
• learning_rate=0.001
• beta_1=0.9
• beta_2=0.999
• epsilon=1e-7

TRIAL 1:

8
TRIAL 2:

RESULT:

5- GRADIENT DESCENT WITH ADAPTIVE LEARNING RATE


setting the initial_learning_rate to 0.01, previous loss to infinity, and beta to 0.7, will guide us
to the following results:

TRIAL 1:

9
RESULT:

10
DISCUSSIONS
TEST ACCURACY WITH DIFFERENT OPTIMIZERS:
• SGD: 97.03%
• GD with momentum: 98. 12 %
• Adagrad: 93 .86%
• Adam: 97.91%
• GD with adaptive learning rate: 9 7.28%

TRAINING LOSS:
• GD with momentum seems to have the lowest final loss.

OPTIMIZER PERFORMANCE:
• All optimizers achieved high accuracy on the MNIST dataset (>93%), indicating the
model's effectiveness.
• GD with momentum reached the highest accuracy (98.12%) and lowest final loss,
suggesting better optimization compared to others.

11
• Adam and GD with adaptive learning rate also achieved similar high accuracy.
• Adagrad showed slightly lower accuracy.

LEARNING RATE ADAPTATION:


• Implementing a custom learning rate scheduler with adaptive decay helped to improve
optimization for SGD.
• This resulted in faster training time and slightly better accuracy compared to standard
SGD.

12
APPENDIX A:
Code is in the notebook in the next link:
https://colab.research.google.com/drive/1PFgIRXACd-9X-
O_oWbMPnNNSLqo9UFJX?usp=sharing

13

You might also like