MLP Encoder Decoder
MLP Encoder Decoder
computational intelligence
Names
Omar Emad EL-din Helmy 1900808
Under supervision of
DR : Hossam Eldin Hassan
Eng: Yehia Zakr
CONTENTS
Names .................................................................................................................................. 0
problem definition and importance ............................................................................... 3
Methods and Algorithms .................................................................................................. 4
1- SGD ................................................................................................................................. 6
Trial 1: ............................................................................................................................ 6
Trial 2: ............................................................................................................................ 6
Result: ............................................................................................................................. 6
3- Adagrad .......................................................................................................................... 7
Trial 1: ............................................................................................................................ 8
Trial 2: ............................................................................................................................ 8
Result: ............................................................................................................................. 8
4- Adam .............................................................................................................................. 8
Trial 1: ............................................................................................................................ 8
Trial 2: ............................................................................................................................ 9
Result: ............................................................................................................................. 9
1
Trial 1: ............................................................................................................................ 9
result: ............................................................................................................................ 10
discussions ........................................................................................................................ 11
2
PROBLEM DEFINITION AND IMPORTANCE
The process of optimization aims to lower the risk of errors or loss from these
predictions and improve the accuracy of the model. Without the process of
optimization, there would be no learning and development of algorithms. So, the very
premise of machine learning relies on a form of function optimization.
3
METHODS AND ALGORITHMS
1-STOCHASTIC GRADIENT DESCENT
is a variant of the Gradient Descent algorithm that is used for optimizing machine
learning models. It addresses the computational inefficiency of traditional Gradient
Descent methods when dealing with large datasets in machine learning projects.
Advantages 1-SGD is faster than other variants of Gradient
Descent, it is memory-efficient and can handle large datasets .
Disadvantages:-The updates in SGD are noisy and have a high
variance, SGD may require more iterations to converge to the
minimum
2-GRADIENT DESCENT WITH MOMENTUM
Momentum-based gradient descent adds a momentum term to the update rule. The
momentum term is computed as a moving average of the past gradients. It helps to
accelerate the optimization process
Advantages :-Escape local minima and saddle points, Reduces model complexity and
prevents overfitting
Disadvantage:-If the momentum is too much,
we could just swing back and forward between
the local minima.
3-ADAGRAD OPTIMIZER
4
4-ADAM OPTIMIZER
𝑋𝑛+1 = 𝑋𝑛 + 𝜂∇𝑓(𝑋𝑛 )
3- we check if the loss of the new step is bigger than the previous, we
multiply the learning rate with the beta and then update it and go to step till
divergence
5
EXPERIMENTAL RESULTS
All models with:
• num_epochs = 5
• batch_size = 16
1- SGD
Using the default parameters of the optimizer function which are:
• learning_rate = 0.01
• momentum = 0.0
TRIAL 1:
TRIAL 2:
RESULT:
6
2- GRADIENT DESCENT WITH MOMENTUM
Using the default parameters of the optimizer function which are:
• learning_rate=0.01
TRIAL 1:
TRIAL 2:
RESULT:
3- ADAGRAD
Using the default parameters of the optimizer function which are:
• learning_rate=0.001
• initial_accumulator_value=0.1
• epsilon=1e-7
7
TRIAL 1:
TRIAL 2:
RESULT:
4- ADAM
Using the default parameters of the optimizer function which are:
• learning_rate=0.001
• beta_1=0.9
• beta_2=0.999
• epsilon=1e-7
TRIAL 1:
8
TRIAL 2:
RESULT:
TRIAL 1:
9
RESULT:
10
DISCUSSIONS
TEST ACCURACY WITH DIFFERENT OPTIMIZERS:
• SGD: 97.03%
• GD with momentum: 98. 12 %
• Adagrad: 93 .86%
• Adam: 97.91%
• GD with adaptive learning rate: 9 7.28%
TRAINING LOSS:
• GD with momentum seems to have the lowest final loss.
OPTIMIZER PERFORMANCE:
• All optimizers achieved high accuracy on the MNIST dataset (>93%), indicating the
model's effectiveness.
• GD with momentum reached the highest accuracy (98.12%) and lowest final loss,
suggesting better optimization compared to others.
11
• Adam and GD with adaptive learning rate also achieved similar high accuracy.
• Adagrad showed slightly lower accuracy.
12
APPENDIX A:
Code is in the notebook in the next link:
https://colab.research.google.com/drive/1PFgIRXACd-9X-
O_oWbMPnNNSLqo9UFJX?usp=sharing
13