0% found this document useful (0 votes)
25 views2 pages

Optimizers lionVSAdam

The document discusses optimizers in deep learning, focusing on the Adam and Lion optimizers. Adam combines two algorithms to adjust learning rates but struggles with noisy data, while Lion, developed by Google Brain and UCLA, improves upon Adam by effectively tracking momentum and handling noise. Choosing the right optimizer depends on the dataset and requires experimentation for optimal results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views2 pages

Optimizers lionVSAdam

The document discusses optimizers in deep learning, focusing on the Adam and Lion optimizers. Adam combines two algorithms to adjust learning rates but struggles with noisy data, while Lion, developed by Google Brain and UCLA, improves upon Adam by effectively tracking momentum and handling noise. Choosing the right optimizer depends on the dataset and requires experimentation for optimal results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Op#mizers: Lion vs Adam

Introduc)on
Deep Learning is a sub-sec0on of machine learning that allows machines to process data in a
similar manner as the human brain. The backbone of DL is a network of nodes that connect
to each other forming layers, and a combina0on of these layers form a neural network. Input
data passes through several layers of neural networks and is refined to make accurate
predic0ons. How a neural network ideally works is that data(features) is passed through an
input layer and produced out of an Output Layer. But between these two layers is where the
major processing and fine-tuning of data takes place, in the Hidden Layer. In this post, I am
focusing on understanding an algorithm that is responsible for the fine-tuning between layers.
These algorithms are called Op0mizers.

What are Op)mizers?


Op0mizers are algorithms that tweak certain aHributes of your neural network such as
weights, learning rate to reduce losses. We are targe0ng three very technical terms in the
Deep Learning world, let’s try and understand what they mean.
1. Weights: They control the strength of the connec0on between two consecu0ve nodes.
They help decide how much one layer will affect the next layer. This helps us
understand how the input layer has contributed to the results provided by the output
layer.
2. Biases: These are constant numbers that adjust the level at which an ac0va0on
func0on is triggered, this func0on is responsible for whether a neuron is ac0vated or
not. It’s like a constant in a linear equa0on. It’s an addi0onal parameter that adjusts
the output.
3. Learning Rate: It’s important for deep learning models to be able to take in new and
updated data and train on it. Learning Rate is a variable that shows how quickly a
model can adapt to change.
4. Losses: Losses calculate how far off our predicted value is from the target value. It
helps understand the accuracy of the model.
Op0mizers are used to minimize these differences called Losses by adjus0ng parameters that
we discussed above.
There are several types of op0mizers to pick from, some commonly used op0mizers are
Gradient Descent Op)mizer, Adam Op)mizer and Stochas)c Gradient Descent with
Momentum. As a beginner I always thought increased number of epochs could yield beHer
results but that is not true. These op0mizers need to be picked keeping in mind what
parameter we are aiming for, and which is adaptable to the amount of data we are planning
to feed the model. We are going to discuss about one such op0mizer called “Adam”.

What does Adam do?


Adam is abbrevia0on for Adap)ve Moment Es)ma)on. It is a combina0on of two other
op0mizers called Stochas)c Gradient Descent with Momentum and RMSprop. To understand
what these two algorithms bring to the table, let’s imagine a hill, and if we were trying to get
to the lowest part of the hill, Gradient Descent algorithm does the job. We are aiming for the
lowest point because that would mean the losses are low. To get to this lowest point of the
hill we might have to cross highs as well, RMSprop does the job of deciding how big our steps
must be for us to reach our point, it can neither be too big nor too small. With the help of
SGD, we take smaller por0ons of the hill to navigate and use the momentum to point us in the
right direc0on. Adam is a combina0on of RMSprop’s ability to improve learning rate and SGD’s
capability to navigate in the right direc0on. Its primary focus is to adjust learning rates for
beHer accuracy of a model. While Adam does a good job with clear data it doesn’t do very
well with noisy data i.e., it makes major fluctua0ons of the learning rate when it encounters
noisy data. Researchers might have found just the right solu0on for that.

What does Lion do?


While Adam might seem like an ideal op0mizer despite its nega0ves, in the recent 0mes
researchers have come up with a new op0mizer called the Lion Op0mizer (EvoLved Sign
Momentum) which solves the disadvantages of the Adam Op0mizer. This algorithm was
discovered by Google Brain along with the University of California (UCLA). it has proven to be
beHer that Adam is several ways. The Lion op0mizer focuses of tracking the momentum, while
leveraging the Sign Opera0on (+) which helps the algorithm move in one direc0on despite all
the noisy data. The simplicity of this algorithm also makes it memory efficient. But don’t let
the simplicity of it make u ques0on it’s accuracy, in several instances it has proven to perform
beHer than Adam.

Conclusion
In conclusion, selec0ng the perfect Op0mizer depends on more than one feature. Every kind
of dataset would require a specific type of Op0mizer and a lot of trial and errors help us
understand how these things work and help us make beHer choices.

You might also like