0% found this document useful (0 votes)

12 views83 pages

06 AIS302 ANN Backpropagation

This lecture focuses on the optimization of artificial neural networks through backpropagation, detailing the steps involved in training a neural network, including forward propagation, loss computation, and weight updates. It discusses various activation functions, their advantages and disadvantages, and emphasizes the importance of choosing appropriate functions for effective training. Additionally, it covers advanced optimization techniques like momentum and Adam to enhance learning efficiency.

Uploaded by

Hana El Gabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views83 pages

06 AIS302 ANN Backpropagation

Uploaded by

Hana El Gabry

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 83

AIS302: ANN (Artificial Neural Networks)

Lecture 6: Optimization - Backpropagation

Spring 2025

Dr. Ensaf Hussein

Associate Professor, Artificial Intelligence,
School of Information Technology and Computer Science,
Nile University.
Course Map Selected Topics in
Deep Learning

Foundation Shallow Artificial Deep Computer Deep Sequence Deep Generative Deep
concepts NN Vision Modeling Models Reinforcement

Training
Convolutional NN Object Detection Recurrent NN VAE
Parameters

Pre-trained
LSTM GAN
Models

Transfer Learning Transformers

2
Lecture 6
Optimization - Backpropagation
and
Evaluation Metrics

Lectures are based on:

• Traditional Learning: Machine learning Andrew Ng [Full course]
• Stanford University CS231n,Deep Learning for Computer Vision
• MIT Introduction to Deep Learning | 6.S191
3
Recap: Linear Classifier Training Steps
• Step 1: Start with a random W and b
• Step 2: Calculate the score function. For instance, given our input
feature vectors, the score function takes these data points, applies
some function f (our score function), and then returns the predicted
class labels. ➔ Forward Propagation
• Step 3: Calculate the loss function (i.e. the error). It quantifies what it
means to have a "good" W, measure our unhappiness with outcomes.
• Step 4: Optimization step. It is the process of finding the set of
parameters W that minimize the loss function using small step (i.e.
learning rate). ➔ Backward Propagation
• Step 5: Repeat Step 2 to 4 for a specified number (i.e. number of
epochs) or until loss function is near to zero (until converge).

4
Perceptron &
Activation Functions
Non-linear classifier

5
The perceptron: Forward Propagation

m is the number of inputs for the node (cell).

wi is the weight of the input sample
xi is the input sample (feature)
g is the activation fn.
y is the hypothesis fn.

6
Activation Function

7
Activation Functions
Sigmoid Leaky ReLU

tanh Maxout

ReLU ELU

Important feature of a Activation function is that it should be differentiable

8
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating <firing rate= of a
neuron

• 3 problems:
1. Saturated neurons <kill= the
Sigmoid gradients (Vanishing gradient)
Your neuron is saturated if it outputs either 0 or 1, then the
gradient will be killed. It’ll just be multiplied by a very tiny
number then gradients can't backpropagate through the
network because they'll be stopped learning.
The gradients only flow if you're kind of in a safer zone and what
we call an active region of a sigmoid
9
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating <firing rate= of a
neuron

• 3 problems:
1. Saturated neurons <kill= the
Sigmoid gradients
(logistic function) 2. Sigmoid outputs are not zero-
centered
It causes a zigzag path to reach the minima

10
Activation Functions
- Squashes numbers to range [0,1]
- Historically popular since they
have nice interpretation as a
saturating <firing rate= of a
neuron

• 3 problems:
1. Saturated neurons <kill= the
Sigmoid gradients
2. Sigmoid outputs are not
zero-centered
3. exp() is a bit computed
expensively
11
Activation Functions

- Squashes numbers to range [-1,1]

- zero centered (nice)
- still kills gradients when saturated :(

tanh(x)

[LeCun et al., 1991]

12
Activation Functions Computes f(x) = max(0,x)
- Does not saturate (in +region)
- Very computationally efficient
- Converges much faster than
sigmoid/tanh in practice (e.g. 6x)

- Not zero-centered output

ReLU - Dying ReLU problem—A form of
(Rectified Linear Unit) the vanishing gradient problem
[Krizhevsky et al., 2012] hint: what is the gradient when x < 0?

13
ReLU (Rectified Linear Unit) Activation Function
Features:
1.Does not saturate (in the + region)
1. For x>0, ReLU has a constant gradient of 1, avoiding the vanishing gradient issue in this
region.
2.Very computationally efficient
1. Just a simple comparison (max(0,x)), much cheaper than sigmoid/tanh.
3.Converges much faster than sigmoid/tanh
1. ReLU helps networks train up to 6x faster due to its strong gradients and simple
computation.
Problems:
1.Not zero-centered output
1. Outputs are in [0,∞), which can still cause imbalanced weight updates.
2.Dying ReLU problem
1. If x<0, the gradient is 0, meaning neurons stop updating weights and become inactive
("dead").
2. This is a form of the vanishing gradient problem.

14
Reasons of Dead ReLU
• Dying ReLU problem—when inputs approach zero or are
negative, the gradient of the function becomes zero, the
network cannot perform backpropagation and cannot learn.
active
The issue can happen when:
(1) Very unlucky initialization of your network may
ReLU
cause that the neurons only activate in the region
outside of your data cloud then this dead ReLU you DATA CLOUD
will never become activated and then it will never
update.
=> people like to initialize ReLU neurons with slightly dead ReLU
positive biases (e.g. 0.01)
will never activate
2) High learning rate can cause saturated neurons.
=> never update
15
Activation Functions
- Does not saturate
- Computationally efficient
- Converges much faster than
sigmoid/tanh in practice! (e.g. 6x)
- will not <die=.

Parametric Rectifier (PReLU)

Leaky ReLU

backprop into \alpha

[Mass et al., 2013] [He et al., 2015] (parameter)

16
Activation Functions
Exponential Linear Units (ELU)
- All benefits of ReLU
- Closer to zero mean outputs
- Negative saturation regime
compared with Leaky ReLU
adds some robustness to noise

- Computation requires exp()

[Clevert et al., 2015]
17
Maxout <Neuron=
- Does not have the basic form of dot product ->
nonlinearity
- Generalizes ReLU and Leaky ReLU
- Linear Regime! Does not saturate! Does not die!

Problem: doubles the number of parameters/neuron :(

[Goodfellow et al., 2013]

18
TLDR: In practice:

- Use ReLU. Be careful with your learning rates

- Try out Leaky ReLU / Maxout / ELU
- Try out tanh but don’t expect much
- Don’t use sigmoid

19
Importance of Activation Function

20
Forward-Propagation of a Neuron

21
Backward-Propagation of a Neuron

22
ANN Steps
Forward and Backward propagation

23
means at layer 2
(1)
�㕊 �㕊 (2) <activation= of unit in layer
�㕊 (Ā) = matrix of weights controlling
function mapping from layer to
layer

/�㕊 ą
�㕊30
�㕊31
�㕊32
�㕊33

<Input=
What is the dimension of �㕾(1) ?
=4x3

Size of �㕾(�㖋) ?

�㖔�㖋+ÿ x (�㖔�㖋 +ÿ)

Forward Propagation
2 3
ą0 = þ0 = þ0 =

2 2 3 3 4 4
ý1 þ1 ý1 þ1 ý1 þ1
/�㕊 (ą)

2 2 3 3
ý2 þ2 ý2 þ2

(ą (ÿ) , Ć (ÿ) )
One sample
26
Forward Propagation
2 3
ą0 = þ0 = þ0 =
(1)
�㕊20
2 2 3 3 4 4
ý1 þ1 ý1 þ1 ý1 þ1
(1) /�㕊 (ą)
�㕊21
Ā Ā 3 3
ýĀ þĀ ý2 þ2
(1)
�㕊22

(ą (ÿ) , Ć (ÿ) ) 2 (1) (1) (1) þ2

2
= Ą(ý2
2
)=
1
ý2 = �㕊20 ą0 +�㕊21 ą1 +�㕊22 ą2 2
2ý2
One sample 1+Ă
Cost function
• Neural network: Use the logistic loss function

m = number of samples
K = number of classes
Gradient Computation

Need code to compute:

-
-
(Don’t initialize with zero)
Training a neural network (Cont.)

34
•Initialize Weights – Start with random values for the model’s parameters.
•Forward Propagation – Compute predictions by passing inputs through the network.
•Compute Cost Function – Measure how far predictions are from actual values using a loss
function.
•Backpropagation – Calculate how the weights should be adjusted by computing gradients.
•Gradient Checking – Verify if backpropagation gradients are correct using numerical
estimation.
•Optimization – Update weights using gradient descent (or advanced methods) to minimize
the cost.
•The goal is to minimize the cost function J(Θ) by updating weights Θ.
•Gradient Descent:
•Update weights using the computed gradients:

•where α is the learning rate.

•This gradually moves the parameters toward values that reduce error.

•Advanced Optimization Methods:

•Momentum: Accelerates learning by considering past gradients.
•Adam (Adaptive Moment Estimation): Adapts learning rate for each parameter dynamically.
•RMSprop: Normalizes gradients to stabilize learning.
35
In the next Example
Focusing on a single example , , and ignoring regularization ( ),

ĀĀĀā ÿ = Ć ÿ log /�㕊 ą ÿ + 12Ć ÿ log (1 2 /�㕊 ą ÿ

ÿ ÿ 2
(Think of ĀĀĀā ÿ ≈ /�㕊 (ą ) 2 Ć )
Numerical Example

initial weights
the biases, and
training inputs/outputs

• https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
37
Numerical Example: Forward Pass-hidden layer
• Calculate total net input for h1 (neth1)
(Ā)
ýÿ
(Ā)
= þÿ

• Apply the sigmoid function to get the output of h1 (outh1)

(Ā) (Ā)
þÿ = �㖈(ýÿ )
(Ā)
= þĀ

• Carrying out the same process for h2, we get:

(Ā)
þĀ

38
Numerical Example: Forward Pass-output layer
Repeat same process for output layer:
• Calculate total net input for o1 (neto1)
(ā)
ýÿ

(ā)
= þÿ

• Apply the sigmoid function to get the output of o1 (outo1)

(ā) (ā)
þÿ = �㖈(ýÿ )
(ā)
= þĀ

• Carrying out the same process for o2, we get:

(ā)
þĀ

39
Numerical Example: Calculate Total Error
• Calculate the error for each output neuron using the squared error
function and sum them to get the total error:
Used for an approximate
cost for one sample

• For example, the target output for o1 is 0.01 but the neural network
output 0.75136507, therefore its error is:

• Repeating this process for o2 (remembering that the target is 0.99)

we get:

• The total error for the neural network is the sum of these errors:
Numerical Example: Backward Pass-output layer

• Consider w5, we want to know how much a change in w5

affects the total error. By applying the chain rule we know
that:
Numerical Example: Backward Pass-output layer

1- The total error change with respect to the output o1

• Error Definition:
•The total error is calculated using the squared difference between the target output and the actual output.
•This measures how far the network's prediction (outo1) is from the desired value (targeto1).
•Derivative Meaning:
•The derivative tells us how much the total error changes when the output o1 changes.
•Since error is (target - output)², taking the derivative follows the power rule.
•Step-by-Step Derivation:
•The squared error function for one output:

•Differentiate with respect to the output:

•Simplifies to:

• The negative sign means that if the output is too large, the error will decrease if we lower it.
•Why is this Useful?
• This derivative helps adjust the weights in backpropagation to reduce error.
• It tells us whether we need to increase or decrease the output to get closer to the target.
43
Numerical Example: Backward Pass-output layer

2- The output o1 change with respect to its total net input

Numerical Example: Backward Pass-output layer

3- The total net input of o1 change with respect to w5

Numerical Example: Backward Pass-output layer

= (outo1-target)*outo1 (1-outo1)*outh1

4- Putting all together

Numerical Example: Backward Pass-output layer
• 5- To decrease the error, we then subtract this value from the current weight
(optionally multiplied by some learning rate α, which we’ll set to 0.5):
+ �㔕ýþĀþþ�㕙
Ą5 = Ą5 2 �㗼 = 0.4 2 0.5 ∗ 0.082167041 = 0.35891648
�㔕Ą5

• We can repeat this process to all weights of output layer to get the new weights
w6, w7 and w8:
Numerical Example: Backward Pass-hidden layer
• we’ll continue the backwards pass by calculating new values for w1, w2,w3 and
w4

Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕡�㕜�㕡�㕎�㕙 ĀĀÿþℎ1 Āÿþþℎ1 Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕜1 Ā�㔸�㕜2

= * * where, = +
Āÿ1 ĀĀÿþℎ1 Āÿþþℎ1 Āÿ1 ĀĀÿþℎ1 ĀĀÿþℎ1 ĀĀÿþℎ1
Numerical Example: Backward Pass-hidden layer
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕡�㕜�㕡�㕎�㕙 ĀĀÿþℎ1 Āÿþþℎ1
= * *
Āÿ1 ĀĀÿþℎ1 Āÿþþℎ1 Āÿ1

Where,
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕜1 Ā�㔸�㕜2
= +
ĀĀÿþℎ1 ĀĀÿþℎ1 ĀĀÿþℎ1

Ā�㔸�㕜1 Ā�㔸�㕜1 ĀĀÿþ�㕜1 Āÿþþ�㕜1

1- Starting with: = ∗ ∗
ĀĀÿþℎ1 ĀĀÿþ�㕜1 Āÿþþ�㕜1 ĀĀÿþℎ1

using values
calculated
earlier
Not updated
weight
Numerical Example: Backward Pass-hidden layer
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕡�㕜�㕡�㕎�㕙 ĀĀÿþℎ1 Āÿþþℎ1
= * *
Āÿ1 ĀĀÿþℎ1 Āÿþþℎ1 Āÿ1

Where,
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕜1 Ā�㔸�㕜2
= +
ĀĀÿþℎ1 ĀĀÿþℎ1 ĀĀÿþℎ1
Numerical Example: Backward Pass-hidden layer
2 3
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕡�㕜�㕡�㕎�㕙 ĀĀÿþℎ1 Āÿþþℎ1
= * *
Āÿ1 ĀĀÿþℎ1 Āÿþþℎ1 Āÿ1

Where,
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕜1 Ā�㔸�㕜2
= +
ĀĀÿþℎ1 ĀĀÿþℎ1 ĀĀÿþℎ1

3-
Numerical Example: Backward Pass-hidden layer
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕡�㕜�㕡�㕎�㕙 ĀĀÿþℎ1 Āÿþþℎ1
= * *
Āÿ1 ĀĀÿþℎ1 Āÿþþℎ1 Āÿ1

Where,
Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕜1 Ā�㔸�㕜2
= +
ĀĀÿþℎ1 ĀĀÿþℎ1 ĀĀÿþℎ1

4- Put all together

Ā�㔸�㕡�㕜�㕡�㕎�㕙 Ā�㔸�㕡�㕜�㕡�㕎�㕙 ĀĀÿþℎ1 Āÿþþℎ1
= * *
Āÿ1 ĀĀÿþℎ1 Āÿþþℎ1 Āÿ1

Ā�㔸�㕡�㕜�㕡�㕎�㕙
= 0.036350306 ∗ 0.241300709 ∗ 0.05 = 0.000438568
Āÿ1
Numerical Example: Backward Pass- hidden layer
5- We can update w1:

We can repeat this process to all weights of hidden layer to get the new
weights w2, w3 and w4:
Finally,
➢ When we fed forward the 0.05 and 0.1 inputs originally,
• Total error was 0.298371109.
➢ After this first round of backpropagation,
• Total error is now 0.291027924.
➢ But after repeating this process 10,000 times, for example,
• Total error drops to 0.0000351085.
➢ At this point, when we feed forward 0.05 and 0.1,the two outputs
neurons generate
• 0.015912196 (vs 0.01 target) and
• 0.984065734 (vs 0.99 target).
Hypothesis Evaluation

55
To Evaluate hypothesis
• One way to break down our dataset into the three sets is:
• Training set: 60%
• Cross validation set: 20% (This validation set is essentially used as a fake test
set to tune the hyper-parameters)
• Test set: 20%

60% 20% 20%

56
Why validation set is important ?
1.
2.
3.

10.
Choose
How well does the model generalize? Report test set error

Problem: is likely to be an optimistic estimate of

generalization error. i.e. our extra parameter ( = degree of
polynomial) is fit to test set.
57
Model Selection
• We can now calculate three separate error values for the three
different sets using the following method:
1.Optimize the parameters in Θ using the training set for each
polynomial degree.
2.Find the polynomial degree d with the least error using the cross
validation set.
3.Estimate the generalization error using the test set with �㔽þþýþ (Θ(ý) )
(d = theta from polynomial with lower error)
• This way, the degree of the polynomial d has not been trained using
the test set.

59
Cross-Validation
• For small datasets, sometimes we use a more sophisticated
technique for hyperparameter tuning called cross-validation.
• ◮ Instead of arbitrarily picking the first data points to be the
validation set and rest training set,
• ◮ Get a better and less noisy estimate of how well hyperparameters
work by iterating over different validation sets and averaging the
performance across these.

61
Cross-Validation
• For example: 5-fold cross-validation
1. Split the training data into 5 equal folds (parts),
2. Use 4 of them for training, and 1 for validation.
3. Iterate over which fold is the validation fold, and evaluate the performance,
4. Finally average the performance across the different folds.

62
Performance Metrics

63
Accuracy in a Classification Model
• Accuracy is measured as the percentage of predicted results that
match the expected results.

• Ex: if there are 1000 results and 850 predicted results match the
expected results , then the accuracy is 85%

64
Problem with accuracy metric (measure):
Skewed classes
• Skewed classes basically refer to a dataset, wherein the number of
training example belonging to one class out-numbers heavily the
number of training examples belonging to the other.

• Consider a binary classification (cancer is labelled 1 and not cancer

labelled 0), where a cancerous patient is to be detected based on some
features.
- only 1 % of the data provided has cancer positive.
• If a system naively gives the prediction as all 0’s, still the prediction
accuracy will be 99%.

65
Commonly used Metrics
• Accuracy is only one metric.

• Other metrics commonly used are:

- Precision
- Recall (Sensitivity)
- Specificity
- F1-score
- ROC AUC

67
Confusion Matrix
• The confusion matrix is a performance measurement technique that
visualizes the accuracy of a classifier by comparing the actual and
predicted classes.
• It is called a confusion matrix because it shows how confused the
model is between the classes.
• The class of interest is commonly called the positive class, and the
rest negative class

68
Binary Confusion Matrix
Confusion matrix Predicted class
Positive Negative
Positive TP FN

Actual
Class
Negative FP TN

69
Example of Confusion Matrix
• If class <Daisy= is the positive class (y=1), so:
Predicted Label
• TP=9 FN=1
Daisy Tulip
• FP=2 TN=8

True Label
Daisy 9 1

Tulip 2 8

70
Accuracy

Predicted Label

1 0
True Label

1 9 1
0 2 8
71
Precision
(predicted as positives)

Predicted Label

1 0
True Label

1 9 1
0 2 8
72
Recall

(true positives)

Predicted Label

1 0
True Label

1 9 1
0 2 8
73
Specificity
• Specificity= True Negative Rate

(true negatives)

Predicted Label

1 0 þÿ
Specificity=
þÿ+ÿā
True Label

1 9 1
0 2 8 ▪ Specificity= 8/(8+2)= 0.8
74
Precision/Recall for skewed data
in presence of rare class (i.e. has cancer) that we want to detect
Precision
Actual Class
(Of all patients where we predicted , what
1 0 fraction actually has cancer?)
Predicted Class

1 True False �㕇ÿĂĂ āĀĀÿāÿăĂ �㕇ÿĂĂ āĀĀÿāÿăĂ

=
positive Positive # āÿĂāÿĀāĂā āĀĀÿāÿăĂ �㕇ÿĂĂ āĀĀÿāÿăĂ + þþ�㕙ĀĂ āĀĀÿāÿăĂ

0 False True
Recall
Negative Negative
(Of all patients that actually have cancer, what fraction
did we correctly detect as having cancer?)
�㕇ÿĂĂ āĀĀÿāÿăĂ �㕇ÿĂĂ āĀĀÿāÿăĂ
=
# þĀāĂþ�㕙 āĀĀÿāÿăĂ �㕇ÿĂĂ āĀĀÿāÿăĂ + þþ�㕙ĀĂ ÿĂĄþāÿăĂ

Now, if we evaluate a scenario where the classifier predicts all 0’s then TP=0, and the recall of the model
will be 0, which then points out the inability of the system.
75
Trading off precision and recall
Logistic regression:
Predict 1 if , 0.7, 0.9, 0.3
Predict 0 if , 0.7, 0.9, 0.3
Suppose we want to predict 1

Precision
(cancer)only if very confident.
→ Higher precision, lower recall
0.5
Suppose we want to avoid missing too many
cases of cancer (avoid false negatives).
→ Higher recall, lower precision
0.5 1
Recall

76
Threshold
• More generally: Predict 1 if threshold.

77
F1 Score (F score)
How to compare precision/recall numbers?
Precision(P) Recall (R) Average F1 Score
Algorithm 1 0.5 0.4 0.45 0.444
Algorithm 2 0.7 0.1 0.4 0.175
Algorithm 3 0.02 1.0 0.51 0.0392

Average:

F1 Score:

78
ROC Curve

79
ROC Curve

80
How to Plot ROC Curve?

81
How to Plot ROC Curve?

82
AUC

The ROC curve is a useful tool for a few reasons:

•The curves of different models can be compared directly
in general or for different thresholds.
•The area under the curve (AUC) can be used as a
summary of the model skill.

83
Model Diagnosis

95
Debugging a learning algorithm
• However, when you test your hypothesis on a new test set, you find
that it makes unacceptably large errors in its predictions. What should
you try next?

- Get more training examples

- Try smaller sets of features
- Try getting additional features
- Try adding polynomial features
- Try decreasing
- Try increasing

96
Machine Learning Diagnostic
• Diagnostic: A test that you can run to gain insight into what
is/isn’t working with a learning algorithm, and gain guidance
as to how best to improve its performance.

• Diagnostics can take time to implement, but doing so can be

a very good use of your time.

97
Bias/variance as a function of the degree of
polynomial
Suppose your learning algorithm is performing less well than
you were hoping. ( or is high.) Is it a bias
problem or a variance problem?
Bias (underfit):
(cross validation
error

error)

Variance (overfit):
(training error)

degree of polynomial d

98
ăÿą /ÿĄ/ ăþÿÿþÿĀĂ
ăÿą /ÿĄ/ ăþÿÿþÿĀĂ
ăÿą /ÿĄ/ ÿÿþĀ

ăÿą /ÿĄ/ ÿÿþĀ

ăÿą /ÿĄ/ ăþÿÿþÿĀĂ

99
Thanks
100

Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
MLS 1 - Presentation
No ratings yet
MLS 1 - Presentation
11 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
36 pages
Unit 4
No ratings yet
Unit 4
19 pages
Activation Function To Back Pro
No ratings yet
Activation Function To Back Pro
22 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
26 - Netinput Activation Function Forward and Back Propogation
No ratings yet
26 - Netinput Activation Function Forward and Back Propogation
41 pages
Lab 3
No ratings yet
Lab 3
40 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Lecture 9-NN - Modified
No ratings yet
Lecture 9-NN - Modified
94 pages
ANNs
No ratings yet
ANNs
57 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
Annette Paper
No ratings yet
Annette Paper
7 pages
Artificial Neural Networks (ANN)
No ratings yet
Artificial Neural Networks (ANN)
67 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
Lecture 09 Slides - After
No ratings yet
Lecture 09 Slides - After
57 pages
7 Types of Neural Network Activation Functions
No ratings yet
7 Types of Neural Network Activation Functions
16 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Ad3451 ML Unit 4 Notes
No ratings yet
Ad3451 ML Unit 4 Notes
34 pages
DL Mentoring Session - Final
No ratings yet
DL Mentoring Session - Final
17 pages
Neural Networks
No ratings yet
Neural Networks
29 pages
CS 329 Lecture4 2025new
No ratings yet
CS 329 Lecture4 2025new
61 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Slides 11
No ratings yet
Slides 11
48 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Unit 2
No ratings yet
Unit 2
35 pages
Module 2
No ratings yet
Module 2
13 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
003 Activation Functions in Machine Learning
No ratings yet
003 Activation Functions in Machine Learning
19 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
No ratings yet
Activation Functions: Sigmoid, Tanh, Relu, Leaky Relu, Prelu, Elu, Threshold Relu and Softmax Basics For Neural Networks and Deep Learning
15 pages
Deep Learning
No ratings yet
Deep Learning
5 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
NN Unit - 1
No ratings yet
NN Unit - 1
27 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Unit - 4 Artificial Neural Networks
No ratings yet
Unit - 4 Artificial Neural Networks
33 pages
Session NN
No ratings yet
Session NN
32 pages
Deep Learning Module-02 Search Creators
No ratings yet
Deep Learning Module-02 Search Creators
15 pages
Need and Use of Activation Functions in Anndeep Learning
No ratings yet
Need and Use of Activation Functions in Anndeep Learning
7 pages
DEEP LEARNING Paper
No ratings yet
DEEP LEARNING Paper
12 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Activation Functions in Neural Networks - 241102 - 224129
No ratings yet
Activation Functions in Neural Networks - 241102 - 224129
7 pages
Deep Learning
No ratings yet
Deep Learning
189 pages
Module1 - Upto Loss Function
No ratings yet
Module1 - Upto Loss Function
137 pages
Feed Forward NN
No ratings yet
Feed Forward NN
35 pages
Understanding Activation Functions in Neural Networks
No ratings yet
Understanding Activation Functions in Neural Networks
15 pages
Training Neural Networks
No ratings yet
Training Neural Networks
109 pages
Activation F
No ratings yet
Activation F
4 pages
CS490 Advanced Topics in Computing (Deep Learning)
No ratings yet
CS490 Advanced Topics in Computing (Deep Learning)
37 pages
ML Unit 4
No ratings yet
ML Unit 4
23 pages
Chapter 6 - Notes PDF
No ratings yet
Chapter 6 - Notes PDF
22 pages
Course Overview: Reinforcement Learning
No ratings yet
Course Overview: Reinforcement Learning
20 pages
IMP - Question Bank-AI
No ratings yet
IMP - Question Bank-AI
4 pages
Expression Evaluation: Using Artificial Neural Network
No ratings yet
Expression Evaluation: Using Artificial Neural Network
9 pages
Neural Networks
No ratings yet
Neural Networks
5 pages
Neural Network & Fuzzy Logic Course Manual
No ratings yet
Neural Network & Fuzzy Logic Course Manual
4 pages
DL - Assignment 11 Solution
No ratings yet
DL - Assignment 11 Solution
7 pages
Artificial Intelligence & Machine Learning
No ratings yet
Artificial Intelligence & Machine Learning
27 pages
SAR AI Paper
No ratings yet
SAR AI Paper
26 pages
AI40 Markspaper
No ratings yet
AI40 Markspaper
3 pages
Breast Cancer Classification
No ratings yet
Breast Cancer Classification
3 pages
Srinidhi Kannan Resume AI
No ratings yet
Srinidhi Kannan Resume AI
1 page
VGG-Net Architecture Explained. The Company Visual Geometry Group - by Siddhesh Bangar - Medium
No ratings yet
VGG-Net Architecture Explained. The Company Visual Geometry Group - by Siddhesh Bangar - Medium
19 pages
Improving BERT Model Using Contrastive Learning For Biomedical Relation Extraction
No ratings yet
Improving BERT Model Using Contrastive Learning For Biomedical Relation Extraction
10 pages
Michael Chan
No ratings yet
Michael Chan
6 pages
AI Wrappers
No ratings yet
AI Wrappers
2 pages
35 Improvement Question - DLP (3174201) - Sem-7
No ratings yet
35 Improvement Question - DLP (3174201) - Sem-7
55 pages
Revision Worksheet - 9AI
No ratings yet
Revision Worksheet - 9AI
4 pages
Machine Learning Theory and Practice
No ratings yet
Machine Learning Theory and Practice
299 pages
Multimodal Summarization With Dual Contrastive Losses
No ratings yet
Multimodal Summarization With Dual Contrastive Losses
12 pages
K - Nearest Neighbors
No ratings yet
K - Nearest Neighbors
33 pages
Mixed Precision Training
No ratings yet
Mixed Precision Training
12 pages
Ai-Ml Documentation
No ratings yet
Ai-Ml Documentation
8 pages
WID3009 Lecture 10 Slides Generating Content
No ratings yet
WID3009 Lecture 10 Slides Generating Content
61 pages
ML.1-Overview of ML (Week 1)
No ratings yet
ML.1-Overview of ML (Week 1)
24 pages
Roadmap For Mastering Natural Language Processing
No ratings yet
Roadmap For Mastering Natural Language Processing
3 pages
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
No ratings yet
AMT305 INTRODUCTION TO MACHINE LEARNING, Pyq2
3 pages
Final Project File
No ratings yet
Final Project File
59 pages
Tuning Parameters
No ratings yet
Tuning Parameters
15 pages