0% found this document useful (0 votes)

25 views15 pages

Optimizers and Activation functions in Deep Learning

The document provides an overview of optimizers in deep learning, explaining their role in minimizing loss functions and detailing various types such as Gradient Descent, Stochastic Gradient Descent, and Adam. It also discusses activation functions, their importance in neural networks, and compares different types like Sigmoid, Tanh, and ReLU. The document emphasizes the significance of choosing the right optimizer and activation function based on the data and model architecture.

Uploaded by

royalranger5500

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views15 pages

Optimizers and Activation functions in Deep Learning

Uploaded by

royalranger5500

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Optimizers in Deep Learning

What is an optimizer?
Optimizers are algorithms or methods used to minimize an error function(loss function)or to maximize the
efficiency of production. Optimizers are mathematical functions which are dependent on model’s learnable
parameters i.e Weights & Biases. Optimizers help to know how to change weights and learning rate of neural
network to reduce the losses.
This post will walk you through the optimizers and some popular approaches.
Types of optimizers
Let’s learn about different types of optimizers and how they exactly work to minimize the loss function.
1. Gradient Descent
2. Stochastic Gradient Descent (SGD)
3. Mini Batch Stochastic Gradient Descent (MB-SGD)
4. SGD with momentum
5. Nesterov Accelerated Gradient (NAG)
6. Adaptive Gradient (AdaGrad)
7. AdaDelta
8. RMSprop
9. Adam

1. Gradient Descent
Gradient descent is an optimization algorithm based on a convex function and tweaks its parameters
iteratively to minimize a given function to its local minimum. Gradient Descent iteratively reduces a loss
function by moving in the direction opposite to that of steepest ascent. It is dependent on the derivatives of
the loss function for finding minima. uses the data of the entire training set to calculate the gradient of the
cost function to the parameters which requires large amount of memory and slows down the process.
Gradient Descent

Advantages of Gradient Descent

1. Easy to understand
2. Easy to implement
Disadvantages of Gradient Descent
1. Because this method calculates the gradient for the entire data set in one update, the calculation is very
slow.
2. It requires large memory and it is computationally expensive.
Learning Rate
How big/small the steps are gradient descent takes into the direction of the local minimum are determined
by the learning rate, which figures out how fast or slow we will move towards the optimal weights.
Learning Rate

2. Stochastic Gradient Descent

It is a variant of Gradient Descent. It update the model parameters one by one. If the model has 10K dataset
SGD will update the model parameters 10k times.

Stochastic Gradient Descent

Advantages of Stochastic Gradient Descent
1. Frequent updates of model parameter
2. Requires less Memory.
3. Allows the use of large data sets as it has to update only one example at a time.
Disadvantages of Stochastic Gradient Descent
1. The frequent can also result in noisy gradients which may cause the error to increase instead of
decreasing it.
2. High Variance.
3. Frequent updates are computationally expensive.

3. Mini-Batch Gradient Descent

It is a combination of the concepts of SGD and batch gradient descent. It simply splits the training dataset into
small batches and performs an update for each of those batches. This creates a balance between the
robustness of stochastic gradient descent and the efficiency of batch gradient descent. it can reduce the
variance when the parameters are updated, and the convergence is more stable. It splits the data set in
batches in between 50 to 256 examples, chosen at random.
Mini Batch Gradient Descent
Advantages of Mini Batch Gradient Descent:
1. It leads to more stable convergence.
2. more efficient gradient calculations.
3. Requires less amount of memory.
Disadvantages of Mini Batch Gradient Descent
1. Mini-batch gradient descent does not guarantee good convergence,
2. If the learning rate is too small, the convergence rate will be slow. If it is too large, the loss function will
oscillate or even deviate at the minimum value.

4. SGD with Momentum

SGD with Momentum is a stochastic optimization method that adds a momentum term to regular stochastic
gradient descent. Momentum simulates the inertia of an object when it is moving, that is, the direction of the
previous update is retained to a certain extent during the update, while the current update gradient is used to
fine-tune the final update direction. In this way, you can increase the stability to a certain extent, so that you
can learn faster, and also have the ability to get rid of local optimization.

SGD with Momentum

Momentum Formula
Advantages of SGD with momentum
1. Momentum helps to reduce the noise.
2. Exponential Weighted Average is used to smoothen the curve.
Disadvantage of SGD with momentum
1. Extra hyperparameter is added.

5. Nesterov Accelerated Gradient (NAG)

The idea of the NAG algorithm is very similar to SGD with momentum with a slight variant. In the case of
SGD with a momentum algorithm, the momentum and gradient are computed on the previous updated
weight.
Momentum may be a good method but if the momentum is too high the algorithm may miss the local
minima and may continue to rise up. So, to resolve this issue the NAG algorithm was developed. It is a
look ahead method. We know we’ll be using γ.V(t−1) for modifying the weights
so, θ−γV(t−1) approximately tells us the future location. Now, we’ll calculate the cost based on this
future parameter rather than the current one.
V(t) = γ.V(t−1) + α. ∂(J(θ − γV(t−1)))/∂θ
and then update the parameters using θ = θ − V(t)
Again, we set the momentum term γγ to a value of around 0.9. While Momentum first computes the
current gradient (small brown vector in Image 4) and then takes a big jump in the direction of the
updated accumulated gradient (big brown vector), NAG first makes a big jump in the direction of the
previously accumulated gradient (green vector), measures the gradient and then makes a correction
(red vector), which results in the complete NAG update (red vector). This anticipatory update prevents
us from going too fast and results in increased responsiveness, which has significantly increased the
performance of RNNs on a number of tasks.

Both NAG and SGD with momentum algorithms work equally well and share the same advantages and
disadvantages.

6. Adaptive Gradient Descent(AdaGrad)

In all the algorithms that we discussed previously the learning rate remains constant. The intuition behind
AdaGrad is can we use different Learning Rates for each and every neuron for each and every hidden layer
based on different iterations.
Advantages of AdaGrad
1. Learning Rate changes adaptively with iterations.
2. It is able to train sparse data as well.
Disadvantage of AdaGrad
1. If the neural network is deep the learning rate becomes very small number which will cause dead neuron
problem.

7. Root Mean Square Propagation (RMS-Prop )

RMS-Prop is a special version of Adagrad in which the learning rate is an exponential average of the gradients
instead of the cumulative sum of squared gradients. RMS-Prop basically combines momentum with AdaGrad.

Advantages of RMS-Prop
1. In RMS-Prop learning rate gets adjusted automatically and it chooses a different learning rate for each
parameter.
Disadvantages of RMS-Prop
1. Slow Learning
AdaDelta
Adadelta is an extension of Adagrad and it also tries to reduce Adagrad’s aggressive, monotonically reducing
the learning rate and remove decaying learning rate problem. In Adadelta we do not need to set the default
learning rate as we take the ratio of the running average of the previous time steps to the current gradient.
Advantages of Adadelta
1. The main advantage of AdaDelta is that we do not need to set a default learning rate.
Disadvantages of Adadelta
1. Computationally expensive

8. Adaptive Moment Estimation (AdaM)

Adam optimizer is one of the most popular and famous gradient descent optimization algorithms. It is a
method that computes adaptive learning rates for each parameter. It stores both the decaying average of the
past gradients , similar to momentum and also the decaying average of the past squared gradients , similar to
RMS-Prop and Adadelta. Thus, it combines the advantages of both the methods.

Advantages of Adam
1. Easy to implement
2. Computationally efficient.
3. Little memory requirements.

How to choose optimizers?

 If the data is sparse, use the self-applicable methods, namely Adagrad, Adadelta, RMSprop, Adam.
 RMSprop, Adadelta, Adam have similar effects in many cases.
 Adam just added bias-correction and momentum on the basis of RMSprop,
 As the gradient becomes sparse, Adam will perform better than RMSprop.
ACTIVATION FUNCTION
What is an Activation Function?
The activation function is a mathematical function that is used within neural networks and decides whether
a neuron is activated or not. It processes the weighted sum of the neuron’s inputs and calculates a new value
to determine how strongly the signal is passed on to the next layer in the network. In simple terms, the
activation function determines the strength of the neuron’s response to the weighted input values.
The activation function plays a crucial role in the training of neural networks, as it enables the modeling of
non-linear relationships. The choice of the appropriate function for the model architecture and the
underlying data has a decisive influence on the final results and is therefore an important component in the
creation of a neural network.
What Properties do Activation Functions have?
The activation function has an important influence on the performance of neural networks and should be
chosen depending on the complexity of the data and the prediction type. Although there are a variety of
functions to choose from, they all share the following properties, which we explain in more detail in this
section.
One of the most important properties of activation functions is their non-linearity, which enables the
models to learn complex relationships from the data that go beyond simple, linear relationships. Only then
can the challenging applications in image or speech processing be mastered. Although non-linear activation
functions can also be used, these have some disadvantages, as we will see in the following section.
In addition, all activation functions must be differentiable. In other words, it must be mathematically
possible to form the derivative of the function so that the learning process of the neural networks can take
place. This process is based on the backpropagation algorithm, in which the gradient, i.e. the derivative in
several dimensions, is calculated in each iteration and the weights of the individual neurons are changed
based on the results so that the prediction quality increases. Only through this process and the
differentiability of the activation function is the model able to learn and continuously improve.
In addition to these positive or at least neutral properties, activation functions also have problematic
properties that can lead to challenges during training. Some activation functions, such as Sigmoid or Tanh,
have saturation regions in which the gradients become very small and come close to zero. Within these
ranges, changes in the input values cause only very small changes in the activation function, so that the
training of the network slows down considerably. This so-called vanishing gradient effect occurs above all in
the value ranges in which the activation function reaches the minimum or maximum values.
Linear activation function

As a starting point and for better comparability with later functions, we start with the simplest possible
activation function. The linear activation function returns the input value unchanged and is described by the
following formula:

Although it appears that this function does not make any changes to the data, it does have an important
influence on how the network functions. It ensures that the neural network can only recognize linear
relationships in the data. This limits its performance immensely, as no more complex structures can be
learned from the data. For this reason, this simple activation function is rarely used in deep neural networks,
but only in simpler, linear models or in the output layer for regressions.

Sigmoid function

The sigmoid function is one of the oldest non-linear activation functions that has been used in the field of
machine learning for many years. It is described by the following mathematical formula:

This function ensures that the input value is mapped to a range between 0 and 1. The graph follows the
characteristic S-curve, which ensures that small values lie in a range close to 0 and high values are
transformed into a range close to 1.

This range of values makes the sigmoid function particularly suitable for applications in which binary values
are to be predicted so that the output can then be interpreted as the probability of membership. For this
reason, the sigmoid function is primarily used in the last layer of a network if a binary prediction is to be
made. This can be useful, for example, in the area of object recognition in images or in medical diagnoses
where a patient is to be classified as healthy or ill.

Graph of the Sigmoid Function | Source: Author

The main disadvantage of the sigmoid function is that the so-called vanishing gradient problem can occur.
With very large or minimal input values, the gradient value may approach zero during derivation. As a
result, the neurons’ weights are not adjusted at all or only very slightly during backpropagation, making
training slow and inefficient.

It can also lead to problems if the output values of the sigmoid function are not centered around zero, but lie
between 0 and 1. This means that both the positive and negative gradients are always in the same direction,
which further slows down the convergence of the model.

Due to these disadvantages, the sigmoid function is increasingly being replaced in modern network
architectures by other activation functions that enable more efficient training, which is particularly
important in deep architectures.

Hyperbolic tangent (Tanh)

The hyperbolic tangent function, or tanh, is another non-linear activation function used in neural networks
to learn more complex relationships in the data. It is based on the following mathematical formula:

The tanh function transforms the input value into the range between -1 and 1. In contrast to Sigmoid, the
values are therefore distributed around zero. This results in some advantages compared to the previously
presented activation functions, as the centering around zero helps to improve the training effect and the
weight adjustments move faster in the right direction.

It is also advantageous that the tanh function also scales smaller input values more strongly in the output
range so that the values can be better separated from each other, especially when the input range is close
together.

Due to these properties, the hyperbolic tangent is often used in recurrent neural networks where temporal
sequences and dependencies play an important role. By using positive and negative values, the state changes
in an RNN can be represented much more precisely.

Compared to the sigmoid function, the hyperbolic tangent also struggles with the same problems. The
vanishing gradient problem can also occur with this activation function, especially with extremely large or
small values. With very deep neural networks, it then becomes difficult to keep the gradients in the front
part of the network strong enough to make sufficient weight adjustments. In addition, saturation effects can
also occur in these value ranges, so that the gradient decreases sharply near 1 or -1.

Rectified Linear Unit (ReLU)

The Rectified Linear Unit (ReLU for short) is a linear activation function that was introduced to solve the
vanishing gradient problem and has become increasingly popular in recent years. In short, it keeps positive
values and sets negative input values equal to zero. Mathematically, this is expressed by the following term:
In simpler terms, it can be represented using the max function:

The Relu Activation Function has established itself primarily due to the following advantages:

 Simple calculation: Compared to the other options, the ReLU function is very easy to calculate and
therefore saves a lot of computing power, especially for large networks. This is reflected either in
lower costs or a shorter training time.
 No vanishing gradient problem: Due to the linear structure, there are no asymptotic points that are
parallel to the x-axis. This means that the gradient is not vanishingly small and the error runs through
all layers, even with large networks. Finally, it is ensured that the network learns structures and the
learning process is significantly accelerated.
 Better results for new model architectures: Compared to the other activation functions, ReLU can
set values to zero as soon as they are negative. With the sigmoid, softmax, and tanh functions, on the
other hand, the values only approach zero asymptotically, but never become zero. However, this
leads to problems in newer models, such as autoencoders, as real zeros are required in the so-called
code layer in order to achieve good results.
 Economy: The ability of the activation function to set certain input values to zero makes the model
much more economical with computing power. If neurons are permanently given zero values, they
"die" and become inactive. This reduces the complexity of the model and can lead to better
generalization.

However, there are also problems with this simple activation function. Because negative values are
consistently set to zero, it can happen that individual neurons also have a weighting of zero, as they make no
contribution to the learning process and therefore "die off". This may not initially be a problem for
individual neurons, but it has been shown that in some cases as many as 20–50 % of neurons can "die off" as
a result of ReLU.

This problem occurs more frequently if too high a learning rate has been defined so that the weights of the
neuron can change in such a way that the neuron only receives negative values. In the long term, these
neurons remain dead because they no longer generate a gradient and are no longer capable of learning. This
means that models with ReLU as an activation function are also highly dependent on a well-chosen learning
rate, which should be carefully considered in advance.

Furthermore, it is problematic that the ReLU function is not limited and can theoretically assume infinitely
large, positive values. Particularly in applications where the output range is limited, such as the prediction of
probabilities, the ReLU function must then be supplemented with another activation function such as the
softmax so that interpretable results are output.

The ReLU function is primarily used in deep neural networks, as convergence can be significantly
accelerated due to efficient gradient processing. In addition, computational effort can be saved, increasing
the efficiency of the entire model. A central application here is the training of autoencoders, which is used to
learn compressed representations of the data. An efficient and compressed representation can be found
through the sparse activations.
Leaky ReLU

To eliminate this disadvantage and make the ReLU function more robust, an optimization of the function has
been developed, which is known as Leaky ReLU. Compared to the conventional version of the function,
negative values are not set to zero but are given an (albeit small) positive slope. Mathematically, this looks
like this:

In a more compact form, the function looks like this:

The parameter α is a positive constant that must be determined before training and can be 0.01, for
example. This ensures that even if the neuron receives negative values, it still does not become zero and can
therefore still generate a small gradient. This prevents the neurons from dying, as they still make a small
contribution to learning.

In addition to this advantage of the Leaky ReLU function, this activation function is also characterized by the
fact that the learning ability of the model is increased, as it is possible to learn even with negative values,
and their information is not lost. This property can lead to faster convergence, as more neurons remain
active and participate in the learning process. In addition, this activation function has the advantage that it
can be calculated with similar efficiency despite the small changes to the ReLU.

A possible disadvantage is that α introduces another hyperparameter, which must be determined before
training and has a major influence on the quality of the training. A value that is too small can slow down
learning, as some neurons do not die, but come close to zero and therefore contribute little to training.

Softmax

The softmax is a mathematical function that takes a vector as input and converts its values into probabilities,
depending on their size. A high numerical value leads to a high probability in the resulting vector.

In other words, each value of the vector is divided by the sum of all values of the output vector and stored in
the new vector. In purely mathematical terms, this formula looks like this:

A specific example illustrates how the Softmax function works:

The positive feature of this function is that it ensures that the sum of the output values is always less than or
equal to 1. This is particularly advantageous in probability calculations, as it ensures that no summed
probability can be greater than 1.

At first glance, the sigmoid and softmax functions appear relatively similar, as both functions map the input
value to the numerical range between 0 and 1. Their progression is also almost identical with the difference
that the sigmoid function passes through the value 0.5 at x = 0 and the softmax function is still below 0.5 at
this point.

Sigmoid and Softmax function in the range [-4, 4] | Source: Nomidl

The difference between the functions lies in the application. The sigmoid function can be used for binary
classifications, i.e. for models in which a decision is to be made between two different classes. Softmax, on
the other hand, can also be used for classifications that are to predict more than two classes. The function
ensures that the probability of all classes is 1.

The advantages of Softmax are that the outputs are interpretable and represent probabilities, which are
particularly helpful in classification problems. In addition, exponential values are used so that the function is
numerically stable and can also handle large differences in the input data.

Disadvantages include overconfidence, which describes the property that the predictions are overconfident
even though the model is quite uncertain. Therefore, measures for uncertainty assessment should be
included to avoid this problem. Furthermore, although Softmax is suitable for multi-class classifications, the
number of classes should not increase too much, as the exponential calculation for each class would then be
too time-consuming and computationally intensive. In addition, the model may then become unstable as the
probabilities for individual classes become too low.

Deep Learning (MODULE-2) (2)
No ratings yet
Deep Learning (MODULE-2) (2)
86 pages
Rajesh (Dl Unit3) 06dec2024
No ratings yet
Rajesh (Dl Unit3) 06dec2024
67 pages
Soft Computing Assignment
No ratings yet
Soft Computing Assignment
9 pages
Gradient-Based Optimizers
No ratings yet
Gradient-Based Optimizers
54 pages
Module 2
No ratings yet
Module 2
67 pages
Gradient Descent Overview
No ratings yet
Gradient Descent Overview
14 pages
Lesson 5 Deep Neural Net Optimization Tuning Interpretability
100% (1)
Lesson 5 Deep Neural Net Optimization Tuning Interpretability
105 pages
Otimization 2024_ver3
No ratings yet
Otimization 2024_ver3
42 pages
DL Class2
No ratings yet
DL Class2
30 pages
Abstract Algebra
100% (3)
Abstract Algebra
14 pages
DL CS 6 M2 Live Session Flow
No ratings yet
DL CS 6 M2 Live Session Flow
32 pages
optimization techniques (SGD alternatives)
No ratings yet
optimization techniques (SGD alternatives)
34 pages
DL Regularization
No ratings yet
DL Regularization
51 pages
Unit-1 and 2 and 3 (1)
No ratings yet
Unit-1 and 2 and 3 (1)
212 pages
4_Gradient Descent and Stochastic GD
No ratings yet
4_Gradient Descent and Stochastic GD
37 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Unit2 Optimizer
No ratings yet
Unit2 Optimizer
18 pages
Unit 4 Final
No ratings yet
Unit 4 Final
29 pages
Optimization For Deep Learning: Sebastian Ruder
No ratings yet
Optimization For Deep Learning: Sebastian Ruder
49 pages
Optimization Gradient Descent Method
No ratings yet
Optimization Gradient Descent Method
3 pages
BME 6407 - Class 10 (April 2023)
No ratings yet
BME 6407 - Class 10 (April 2023)
31 pages
UNIT3
No ratings yet
UNIT3
37 pages
optim
No ratings yet
optim
33 pages
Gradient Descent Method
No ratings yet
Gradient Descent Method
12 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
Curs6site PDF
No ratings yet
Curs6site PDF
40 pages
Lecture 8 Gradient Descent For Non-Convex Functions
No ratings yet
Lecture 8 Gradient Descent For Non-Convex Functions
21 pages
Optimizer
No ratings yet
Optimizer
13 pages
Comparative Analysis of Optimizers in Deep Neural Networks
No ratings yet
Comparative Analysis of Optimizers in Deep Neural Networks
4 pages
SCSA3015 Deep Learning Unit 4 PDF
No ratings yet
SCSA3015 Deep Learning Unit 4 PDF
30 pages
Unit 2.2
No ratings yet
Unit 2.2
46 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Heat Equation: Modeling Very Long Bars. Solution by Fourier Integrals and Transforms
0% (1)
Heat Equation: Modeling Very Long Bars. Solution by Fourier Integrals and Transforms
7 pages
S09_DNN_Gradients_wip
No ratings yet
S09_DNN_Gradients_wip
28 pages
Most Expected Question New
No ratings yet
Most Expected Question New
46 pages
cours5
No ratings yet
cours5
23 pages
Gradient Descent_PR
No ratings yet
Gradient Descent_PR
31 pages
QB Unit 3
No ratings yet
QB Unit 3
14 pages
Lecture 5
No ratings yet
Lecture 5
34 pages
MLP Encoder Decoder
No ratings yet
MLP Encoder Decoder
14 pages
Important Optimization Algorithms Essentials
No ratings yet
Important Optimization Algorithms Essentials
12 pages
IA-09multiple and Sub Multiple Angles (45-47)
50% (2)
IA-09multiple and Sub Multiple Angles (45-47)
2 pages
19_22
No ratings yet
19_22
9 pages
AdamZ research paper
No ratings yet
AdamZ research paper
13 pages
GD Compare
No ratings yet
GD Compare
5 pages
Mlfa Autumn 23 Optimization
No ratings yet
Mlfa Autumn 23 Optimization
37 pages
DL Test-2
No ratings yet
DL Test-2
28 pages
ADL Unit-3
No ratings yet
ADL Unit-3
21 pages
Optimizers Types
No ratings yet
Optimizers Types
6 pages
ADAM-1
No ratings yet
ADAM-1
11 pages
Calculation Methods For Two-Dimensional Groundwater Flow: P. Van Der Veer
No ratings yet
Calculation Methods For Two-Dimensional Groundwater Flow: P. Van Der Veer
173 pages
2019 Mandelbrot and Julia Sets Via Jungck-CR Iteration With S-Convexity PDF
No ratings yet
2019 Mandelbrot and Julia Sets Via Jungck-CR Iteration With S-Convexity PDF
10 pages
Comparison of Gradient Descent Algorithms On Training Neural Networks
No ratings yet
Comparison of Gradient Descent Algorithms On Training Neural Networks
20 pages
Optimization
No ratings yet
Optimization
3 pages
Gradient Descent Optimization
No ratings yet
Gradient Descent Optimization
27 pages
cst414-deep learning module 2
No ratings yet
cst414-deep learning module 2
13 pages
Optimization Techniques in Deep Learning
No ratings yet
Optimization Techniques in Deep Learning
14 pages
SGD
No ratings yet
SGD
3 pages
Integral Calculus Syllabus
100% (5)
Integral Calculus Syllabus
3 pages
Unit 4 - GRADIENT LEARNING
No ratings yet
Unit 4 - GRADIENT LEARNING
3 pages
Gradient Descent
No ratings yet
Gradient Descent
5 pages
Radiation View Factors
No ratings yet
Radiation View Factors
29 pages
Activations, Loss Functions & Optimizers in ML
No ratings yet
Activations, Loss Functions & Optimizers in ML
29 pages
Homework 1 Solutions: Input Output
No ratings yet
Homework 1 Solutions: Input Output
4 pages
Optimizers
No ratings yet
Optimizers
4 pages
An Overview of Gradient Descent Optimization Algorithms PDF
No ratings yet
An Overview of Gradient Descent Optimization Algorithms PDF
12 pages
Speed Solving - 4th April - S2
No ratings yet
Speed Solving - 4th April - S2
25 pages
ENGDAT1 Module6 PDF
No ratings yet
ENGDAT1 Module6 PDF
29 pages
Notes On Gauss S Lemma and Eisenstein Criterion
No ratings yet
Notes On Gauss S Lemma and Eisenstein Criterion
6 pages
Prev Map
No ratings yet
Prev Map
30 pages
Unit 2. Lesson 1
No ratings yet
Unit 2. Lesson 1
22 pages
Optimal PID-Control On First Order Plus Time Delay Systems & Verification of The SIMC Rules
No ratings yet
Optimal PID-Control On First Order Plus Time Delay Systems & Verification of The SIMC Rules
6 pages
Stat and Prob Q3 Module 7
33% (3)
Stat and Prob Q3 Module 7
8 pages
Radicals and RE To RadE
No ratings yet
Radicals and RE To RadE
8 pages
Statistical Methods in Quality Management
No ratings yet
Statistical Methods in Quality Management
71 pages
PEUN Assignment 2
No ratings yet
PEUN Assignment 2
7 pages
All About Structural Similarity Index (SSIM) - Theory + Code in PyTorch - by Pranjal Datta - SRM MIC - Medium
No ratings yet
All About Structural Similarity Index (SSIM) - Theory + Code in PyTorch - by Pranjal Datta - SRM MIC - Medium
16 pages
01 - Entry Ticket To Functions
No ratings yet
01 - Entry Ticket To Functions
10 pages
ISC-2025 Sample Question Paper - 5
No ratings yet
ISC-2025 Sample Question Paper - 5
6 pages
ANSYS Mechanical APDL Structural Analysis Guide
No ratings yet
ANSYS Mechanical APDL Structural Analysis Guide
4 pages
2250 Matrix Exponential
No ratings yet
2250 Matrix Exponential
7 pages
Faster Evolutionary Algorithm Based Optimal Power Flow Using
No ratings yet
Faster Evolutionary Algorithm Based Optimal Power Flow Using
13 pages
SPM Add Maths Paper 1 Practice Questions
100% (1)
SPM Add Maths Paper 1 Practice Questions
11 pages
Rules of Derivative Used in Economics (ECO401)
No ratings yet
Rules of Derivative Used in Economics (ECO401)
8 pages
Functions and Graphs Summary
No ratings yet
Functions and Graphs Summary
2 pages
Appendix Dirac Delta Function
No ratings yet
Appendix Dirac Delta Function
15 pages
Gen Math11 - Q1 - Mod5 6 Week 2
No ratings yet
Gen Math11 - Q1 - Mod5 6 Week 2
39 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet