0% found this document useful (0 votes)

4 views26 pages

Deep Neural Networks

The document discusses Deep Neural Networks (DNNs), focusing on issues like vanishing and exploding gradients, and techniques to avoid overfitting such as regularization, dropout, and data augmentation. It outlines the training process of a neural network, including forward propagation, loss function, backpropagation, gradient descent, evaluation, and stopping criteria. Solutions to gradient problems and regularization methods are also detailed to enhance model performance and generalization.

Uploaded by

Akhil Patnaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views26 pages

Deep Neural Networks

Uploaded by

Akhil Patnaik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Deep Neural

Networks
 Vanishing/Exploding Gradients Problem,
 avoiding overfitting through regularization,
 Dropout Regularization.
Deep Neural Networks
Introduction:
• Neural network with 2 or more hidden layers, can be called Deep
Neural Network.
• While handling a complex problem such as detecting hundreds of
types of objects in high resolution images, you may need to train a
much deeper DNN, perhaps say 10 layers, each containing hundreds of
neurons, connected by hundreds of thousands of connections.
• This leads to a problem of vanishing gradients.
Training a Neural Network
• Training a neural network involves updating its parameters (weights and biases) to
minimize a loss function based on the difference between predicted outputs and the
actual labels (in supervised learning).
• This is typically done using an optimization algorithm like gradient descent or one of its
variants.
• The general steps involved in training a neural network are outlined below:
1. Forward Propagation
2. Loss Function
3. Backpropagation
4. Gradient Descent
5. Evaluation
6. Stopping Criteria
1. Forward Propagation:
• In forward propagation, the input data is passed through the neural
network to generate the output (predictions).
• This involves calculating the weighted sum of inputs and applying an
activation function at each layer.
2. Loss Function
• The loss function, L, measures how well the neural network’s
predictions match the actual target values.
3. Back propagation
• Backpropagation is the essence of neural network training. It is the
method of fine-tuning the weights of a neural network based on the
error rate obtained in the previous epoch (i.e., iteration).
• Proper tuning of the weights allows you to reduce error rates and
make the model reliable by increasing its generalization.
• Backpropagation in neural network is a short form for “backward
propagation of errors.”
It is a standard method of training artificial neural networks. This
method helps calculate the gradient of a loss function with respect to
all the weights in the network.
4. Gradient Descent
• After computing the gradients of the loss function with respect to the
weights and biases, the parameters are updated using gradient
descent.
• Gradient descent minimizes the loss function by iteratively adjusting
the parameters in the opposite direction of the gradient.
5. Evaluation
• After each epoch (a complete pass through the training data), the
performance of the neural network is evaluated on a validation set to
monitor generalization.
• Metrics like accuracy (for classification) or RMSE (for regression) are
used to check the performance.
6. Stopping Criteria
• Training continues until a predefined stopping criterion is met.
Common stopping criteria include:
• A fixed number of epochs.
• Early stopping, where training is stopped when the performance on
the validation set stops improving (to avoid overfitting).
Vanishing Gradient
• Back propagation algorithm works by going from the output layer to
the input layer, propagating the error gradient on the way.
• Gradients often get smaller and smaller as the algorithm progresses
down to the lower layers.
• As a result the gradient descent updates in the lower layer weights
virtually unchanged.
• This makes training never converges to a good solution.
• This is called vanishing gradient.
Causes of the Vanishing
Gradient Problem
1. Activation Functions:
• Sigmoid and tanh functions can squash input values into a small range (0 to 1 for
sigmoid, -1 to 1 for tanh), leading to derivatives (gradients) that are small.
2. Weight Initialization:
• Poor initialization of weights can lead to outputs that are either very large or very
small, which when fed into activation functions, result in small gradients.
3. Deep Architectures:
• As the number of layers increases, the likelihood of the gradients diminishing
exponentially increases.
• Each layer’s small gradients compound across many layers, leading to vanishing
gradients.
Exploding Gradients
• Some times the gradients can grow bigger and bigger, so many layers
gets large weight updates and the algorithm diverges.
• This is called exploding Gradients, which is popularly seen in recurrent
neural networks.
Causes of the Exploding
Gradient Problem
1. Weight Initialization:
• Poor initialization of weights can lead to very large gradient values if
the weights are not scaled appropriately.
2. Deep Architectures and Long Sequences:
• Deep neural networks and RNNs with many layers or long sequences
exacerbate the problem due to the compounding effect of gradients.
Solutions to
Vanishing/Exploding gradients
• Xavier/He Initialization: Proper weight initialization methods like Xavier
initialization (for sigmoid/tanh) or He initialization (for ReLU) can help
prevent the gradients from shrinking or exploding. They normalize the
variance of the inputs and outputs at each layer, making learning more
stable.
• Batch Normalization: This technique normalizes the input to each layer,
ensuring that the distribution of input values stays consistent across layers,
preventing gradients from either vanishing or exploding. Batch normalization
also helps speed up training and can have a slight regularization effect.
• Gradient Clipping: In cases where exploding gradients occur, gradient
clipping can be applied. This technique scales down the gradients if they
exceed a certain threshold to keep the updates under control.
Avoiding Overfitting Through
Regularization
• Deep Neural Network typically have tens of thousands of parameters.
• With so many parameters, the network is prone to overfitting the
training set.
• This will be done using “Regularization” techniques.
• Some of the popular regularization techniques are:
• Early Stopping
• Dropout
• Max-Norm Regularization and
• Data Augmentation.
Early Stopping
• To avoid Overfitting the training set, good solution is early stopping.
• Interrupt training when its performance on the validation set starts
dropping.
• Evaluate the model on a validation set at regular intervals.
• If the performance is improved compared to the previous interval, go
back to the pervious values and stop training.
Dropout
• Another popular regularization technique for deep neural network is
arguably dropout.
• At every training step, every neuron has a probability P of being
temporarily “Dropped Out”, meaning it will be entirely ignored during
this training step.
• But it may be active during the next step.
• The hyperparameter ‘P’ is called the dropout rate and it is typically set
to 50%.
• After training neurons don’t get dropped anymore.
• It is found that many a times this technique has worked well.
Max-Norm Regularization
• Another regularization technique that is quite popular for neural
networks is called max-norm regularization.
• It constrains the weights w of the incoming connections such that**
∥ w ∥2 ≤ r, where r is the max-norm hyperparameter and ∥ · ∥2 is the ℓ2
norm**.
• It is typically implemented by computing ∥w∥2 after each training
step and clipping w if needed.
• Reducing r increases the amount of regularization and helps reduce
overfitting.
Data Augmentation
• One last regularization technique is data augmentation.
• It consists of generating new training instances from existing ones.
• Artificially boosting the size of the training set.
• This will reduce overfitting making this a regularization technique.
• The trick is to generate realistic training instances, ideally a human
should not be able to tell which instances were generated and which
ones were not.
3. Dropout Regularization
• Dropout is a popular and highly effective regularization technique
used to avoid overfitting in neural networks.
• It works by randomly "dropping out" (i.e., setting to zero) a fraction of
neurons in the network during each forward pass in training.
• The neurons that are dropped are chosen randomly, and they do not
participate in both the forward and backward passes during that
iteration.
How Dropout Works:
During Training:
• Each neuron has a probability ppp (commonly p=0.5) of being dropped.
• The network is effectively trained on a different architecture at each iteration,
forcing it to learn more robust features and preventing it from relying too
much on any particular neurons.
During Testing:
• No neurons are dropped out.
• Instead, the network’s output is scaled by the dropout probability p to account
for the increased number of active neurons.
• This ensures that the predictions remain consistent between training and
inference.
Advantages:
• Improved Generalization: By forcing the network to work with various
subsets of neurons, dropout helps the model generalize better to
unseen data.
• Efficient and Simple: Dropout is easy to implement and computationally
efficient. It has been found to be effective across a wide range of tasks.
Limitations:
• Extended Training Time: Since the model is effectively learning several
architectures simultaneously, it may require more epochs to converge
compared to a model without dropout.
import random
# List of words to choose from
words = ["python", "hangman", "coding", "programming", "algorithm"]
word = random.choice(words)
guessed_letters = set()
attempts = 6
print("Welcome to Hangman!")

# Game loop
while attempts > 0:
display_word = [letter if letter in guessed_letters else '_' for letter in word]
print("Word:", ' '.join(display_word))
if '_' not in display_word:
print("Congratulations, you guessed the word!")
break
guess = input("Guess a letter: ").lower()
if guess in guessed_letters:
print("You've already guessed that letter.")
elif guess in word:
guessed_letters.add(guess)
print("Good guess!")
else:
attempts -= 1
print(f"Wrong guess. You have {attempts} attempts left.")

if attempts == 0:
print(f"Game over! The word was: {word}")

NN 08
No ratings yet
NN 08
36 pages
Chapter 6 Deep Learning Knowledge
No ratings yet
Chapter 6 Deep Learning Knowledge
24 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
2.6 Regularization
No ratings yet
2.6 Regularization
24 pages
DL CS 7 M4 Live Class Flow
No ratings yet
DL CS 7 M4 Live Class Flow
37 pages
Lecture 5-6
No ratings yet
Lecture 5-6
45 pages
Lecture 1 Part II
No ratings yet
Lecture 1 Part II
24 pages
Dropout in Deep Learning
No ratings yet
Dropout in Deep Learning
14 pages
Unit II.
No ratings yet
Unit II.
14 pages
Deep Neural Network
No ratings yet
Deep Neural Network
60 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Black Propagatio DL
No ratings yet
Black Propagatio DL
7 pages
Training Neural Netwok: Data Set
No ratings yet
Training Neural Netwok: Data Set
35 pages
9.b Handout-2-Regularization
No ratings yet
9.b Handout-2-Regularization
5 pages
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
No ratings yet
Components-Algorithms/: The Basic Architecture of Neural Networks: Single Computational Layer
65 pages
Pa 4 Unit
No ratings yet
Pa 4 Unit
33 pages
Unit 4
No ratings yet
Unit 4
13 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
Weight Dropout For Preventing Neural Networks From Overfitting
No ratings yet
Weight Dropout For Preventing Neural Networks From Overfitting
4 pages
DNN Tip
No ratings yet
DNN Tip
49 pages
May Deep Learning
No ratings yet
May Deep Learning
16 pages
Validation and Training
No ratings yet
Validation and Training
3 pages
Regularization Slides
No ratings yet
Regularization Slides
50 pages
DL Mod2
No ratings yet
DL Mod2
45 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Deep Feedforward Networks and Regularization: Licheng Zhang
No ratings yet
Deep Feedforward Networks and Regularization: Licheng Zhang
56 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
16 pages
Home Assignment Submission Solutions
No ratings yet
Home Assignment Submission Solutions
82 pages
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
No ratings yet
Convolutional Neural Networks (Image Recognition) Part - II: Dr. Syed M. Usman
75 pages
2 Deep Neural Network - 241120 - 095158
No ratings yet
2 Deep Neural Network - 241120 - 095158
47 pages
Deep Learning Module 2 Important Topics PYQs
No ratings yet
Deep Learning Module 2 Important Topics PYQs
30 pages
Wema Graduate Trainee Test 100 Questions
No ratings yet
Wema Graduate Trainee Test 100 Questions
18 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Lect 12 - Deep Feed Forward NN - Review
No ratings yet
Lect 12 - Deep Feed Forward NN - Review
93 pages
Neural Networks Bias
No ratings yet
Neural Networks Bias
7 pages
DL Mod 2
No ratings yet
DL Mod 2
4 pages
6 Working Example 01-08-2024
No ratings yet
6 Working Example 01-08-2024
21 pages
6 - Tips For Training Deep Neural Networks
No ratings yet
6 - Tips For Training Deep Neural Networks
59 pages
Regularization and Normalization
No ratings yet
Regularization and Normalization
29 pages
Artificial Neural Networks - DL
No ratings yet
Artificial Neural Networks - DL
55 pages
Cours 4
No ratings yet
Cours 4
30 pages
Unit Online 1.4
No ratings yet
Unit Online 1.4
132 pages
Chap 2 Training Feed Forward Neural Networks
No ratings yet
Chap 2 Training Feed Forward Neural Networks
22 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
Deep Learing
No ratings yet
Deep Learing
37 pages
4 - DNN Tip
No ratings yet
4 - DNN Tip
52 pages
03 Reg Slides
No ratings yet
03 Reg Slides
64 pages
DL Unit 3
No ratings yet
DL Unit 3
14 pages
Practical Aspects of Deep Learning PI
No ratings yet
Practical Aspects of Deep Learning PI
46 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Artificial Neural Networks - Lect - 4
No ratings yet
Artificial Neural Networks - Lect - 4
17 pages
Deep Neural Network Module 4 Regularization
No ratings yet
Deep Neural Network Module 4 Regularization
53 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
DL Class3
No ratings yet
DL Class3
28 pages
Cs6551 Computer Networks: Unit - I
No ratings yet
Cs6551 Computer Networks: Unit - I
86 pages
Deep Learning
100% (2)
Deep Learning
49 pages
Question Bank Polar Curves Unit-1
100% (1)
Question Bank Polar Curves Unit-1
2 pages
Regularization For Neural Networks 1718966083
No ratings yet
Regularization For Neural Networks 1718966083
9 pages
Midterm Exam Gen Math
No ratings yet
Midterm Exam Gen Math
4 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Hspice Use
100% (1)
Hspice Use
28 pages
Hyperparameters
No ratings yet
Hyperparameters
15 pages
Deep Learning (All in One)
No ratings yet
Deep Learning (All in One)
23 pages
Tugas Fisika Rekayasa
100% (2)
Tugas Fisika Rekayasa
100 pages
Basic Aerodynamics: Lecture 12: Blade Element Analysis
No ratings yet
Basic Aerodynamics: Lecture 12: Blade Element Analysis
42 pages
Power System Operation Control
No ratings yet
Power System Operation Control
4 pages
MIT 302 - Statistical Computing II - Tutorial 03
No ratings yet
MIT 302 - Statistical Computing II - Tutorial 03
16 pages
Aerospace Engineering Syllabus
No ratings yet
Aerospace Engineering Syllabus
6 pages
Multivariate Statistical Methods: Abiyot Negash (Assi. Prof)
No ratings yet
Multivariate Statistical Methods: Abiyot Negash (Assi. Prof)
28 pages
Mapping
No ratings yet
Mapping
499 pages
WS G4 Mathematics Q1 Wk2
No ratings yet
WS G4 Mathematics Q1 Wk2
21 pages
Math 1st PT 2022
No ratings yet
Math 1st PT 2022
4 pages
Circle Theorems pdf2
No ratings yet
Circle Theorems pdf2
11 pages
1EE24 Time Domain Scan
No ratings yet
1EE24 Time Domain Scan
24 pages
CVS Tripping Curve
No ratings yet
CVS Tripping Curve
7 pages
Cours 2
No ratings yet
Cours 2
31 pages
Resolution (Knowledge Representation)
No ratings yet
Resolution (Knowledge Representation)
15 pages
Borello Electrohydraulic Servovalves
No ratings yet
Borello Electrohydraulic Servovalves
12 pages
Constrained Optimization
No ratings yet
Constrained Optimization
9 pages
The Exam Is Closed Book and Closed Notes
No ratings yet
The Exam Is Closed Book and Closed Notes
6 pages
Practice Problem Set 6
No ratings yet
Practice Problem Set 6
7 pages
CSE 518A Project Report
No ratings yet
CSE 518A Project Report
6 pages
PH 114 Lab 4 - Kenneth
No ratings yet
PH 114 Lab 4 - Kenneth
13 pages
Code2pdf 6400c76826c9d
No ratings yet
Code2pdf 6400c76826c9d
3 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Speed and Acceleration
No ratings yet
Speed and Acceleration
4 pages
R&D Showcase Template
No ratings yet
R&D Showcase Template
1 page
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: NAIVE BAYES, NEAREST NEIGHBORS and NEURAL NETWORKS: Examples with MATLAB
César Pérez López
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)

Deep Neural Networks

Uploaded by

Deep Neural Networks

Uploaded by

Deep Neural

You might also like