0% found this document useful (0 votes)

32 views37 pages

Deep Learing

Deep Neural Networks (DNNs) are advanced artificial neural networks with multiple layers that excel in complex problem-solving, particularly in high-dimensional data contexts. Training DNNs presents challenges such as vanishing gradients, overfitting, and computational complexity, but modern strategies like batch normalization and transfer learning help address these issues. Additionally, activation functions play a crucial role in the performance of DNNs, with various types like ReLU and LSTM being utilized to improve learning efficiency and manage sequential data.

Uploaded by

onebyzero34

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views37 pages

Deep Learing

Uploaded by

onebyzero34

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Deep Neural Networks (DNNs) are a class of artificial neural networks (ANNs) that consist of

multiple layers of interconnected neurons. These networks are particularly powerful in

solving complex problems involving high-dimensional data, such as image recognition,
natural language processing, and speech recognition. However, training deep neural
networks presents significant challenges due to their complexity and depth. Here's an
elaboration:

Deep Learning and Deep Neural Networks

Deep Learning
Deep learning is a subset of machine learning that utilizes deep neural networks to model
and solve problems requiring abstraction and representation. It is inspired by the structure
and function of the human brain, aiming to automatically learn feature hierarchies from raw
data.

Deep Neural Networks (DNNs)

DNNs are composed of an input layer, multiple hidden layers, and an output layer. The
hidden layers consist of neurons with nonlinear activation functions, allowing the network to
model complex, nonlinear relationships. The "deep" in DNN refers to the number of hidden
layers, which can range from a few to hundreds or even thousands in advanced
architectures like Transformers.

Challenges in Training Deep Neural Networks

Training deep neural networks involves optimizing the weights of neurons to minimize a loss
function. This process, while conceptually simple, encounters several difficulties:

1. Vanishing and Exploding Gradients

● Issue: During backpropagation, gradients can become exceedingly small (vanishing)
or excessively large (exploding) as they are propagated through layers.
● Impact:
○ Vanishing gradients hinder learning in the earlier layers because the weights
do not update significantly.
○ Exploding gradients lead to instability and can result in numerical overflows.
● Solutions:
○ Use activation functions like ReLU or its variants instead of sigmoid or tanh.
○ Apply weight initialization techniques like Xavier or He initialization.
○ Use gradient clipping for exploding gradients.

2. Overfitting

● Issue: DNNs often have a large number of parameters, leading to a high capacity for
memorizing the training data instead of generalizing to unseen data.
● Impact: Poor performance on test data despite excellent performance on training
data.
● Solutions:
○ Regularization techniques like L1/L2 regularization.
○ Dropout: Randomly deactivating neurons during training to reduce
co-dependencies.
○ Data augmentation: Expanding the training dataset by applying
transformations to the existing data.

3. Computational Complexity

● Issue: Training DNNs requires substantial computational power and memory due to
their high number of parameters and operations.
● Impact: Long training times and high resource costs.
● Solutions:
○ Use specialized hardware like GPUs or TPUs.
○ Implement efficient algorithms and libraries (e.g., TensorFlow, PyTorch).
○ Leverage parallel processing and distributed training.

4. Difficulty in Hyperparameter Tuning

● Issue: Training deep networks involves choosing many hyperparameters, including

learning rate, batch size, number of layers, and architecture design.
● Impact: Poor choices can lead to suboptimal performance or non-convergence.
● Solutions:
○ Grid search or random search.
○ Bayesian optimization.
○ Use automated machine learning (AutoML) frameworks for hyperparameter
tuning.

5. Lack of Interpretability

● Issue: DNNs are often considered "black boxes," making it difficult to understand
how decisions are made.
● Impact: Reduced trust and challenges in debugging models.
● Solutions:
○ Use visualization tools (e.g., saliency maps).
○ Apply explainability techniques like SHAP or LIME.

Modern Strategies for Effective Training

1. Batch Normalization: Normalizes inputs to each layer, reducing the internal

covariate shift and accelerating training.
2. Pretraining and Transfer Learning: Leveraging pretrained models to initialize
weights and fine-tune for specific tasks.
3. Optimizers: Advanced optimizers like Adam, RMSprop, and AdaGrad adapt learning
rates for better convergence.
4. Architectural Innovations: Use architectures like ResNet (residual networks) to
address degradation problems in very deep networks.

While deep neural networks have transformed numerous fields, their training remains a
challenging yet rewarding task. Continuous advancements in techniques and tools are
helping mitigate these difficulties, making DNNs more accessible and powerful for real-world
applications.

An activation function is like a switch or filter in a neural network that decides what
information to pass forward. It takes the input coming into a neuron, applies a rule or
condition, and produces an output that goes to the next layer. This helps the network decide
which signals are important and which to ignore, allowing it to learn and solve complex
problems.
It can also be defined as a transformation that maps the input signals into output signals that
are needed for the neural network.

Activation Functions in CNNs (Convolutional Neural Networks)

Explained

1. Tanh Activation Function

○ What it does: Converts input into a range between −1-1−1 and 111.
○ Good for: Problems where outputs need to be balanced (zero-centered).
○ Issues: It can slow learning due to the vanishing gradient problem when
values get stuck near -1 or 1.
○ Main Use: Often used in classification tasks with two classes.

○
2. Sigmoid Activation Function
○ What it does: Converts input into a range between 000 and 111.
○ Good for: Simple problems or the output layer for probabilities.
○ Issues:
■ Not zero-centered, so neurons can only send positive signals, causing
zig-zag behavior during training.
■ Vanishing Gradient Problem: At extreme values (close to 0 or 1),
gradients almost vanish, slowing or stopping learning.
■
3. ReLU (Rectified Linear Unit)
○ What it does: Outputs the input if it's positive; otherwise, outputs 0.
○ Good for: Speeding up training and avoiding the vanishing gradient problem.
○ Issues: Can face the Dying ReLU Problem, where some neurons stop
working if weights are updated poorly or learning rates are too high.
○ Advantages:
■ Simple and fast to compute.
■ Creates sparse activations (only some neurons activate at a time).
■ Works well for deep networks.
4. Leaky ReLU
○ What it does: Similar to ReLU, but allows a small negative slope when input
is less than 0 (e.g., outputs 0.01 × input).
○ Good for: Fixing the Dying ReLU Problem, ensuring neurons always
contribute a little.
○ Advantages: Speeds up training and avoids "dead neurons."
Differences Between Activation Functions
Feature Tanh Sigmoid ReLU Leaky ReLU

Output Range [−1,1][-1, [0,1][0, 1][0,1] [0,∞)[0, \infty)[0,∞) Same as ReLU but
1][−1,1] for x>0x > 0x>0, 0 slight slope for
for x≤0x \leq 0x≤0 x<0x < 0x<0

Zero-Centere Yes No Yes Yes

Gradient Yes Yes No No

Saturation

Computation Moderate Moderate Low Low

Cost

Sparsity No No Yes Yes

Common Vanishing Vanishing Dying ReLU None (fixes Dying

Problem Gradient Gradient ReLU)

Best Use Binary Probabilistic Deep Networks Deep Networks

Case Classification Outputs (avoiding dead
neurons)
In summary, ReLU is widely used for hidden layers because it’s fast and avoids common
issues. For output layers, choose sigmoid (for probabilities) or tanh (for balanced outputs).
Leaky ReLU is a good fallback if you encounter the Dying ReLU Problem.

Parameters vs Hyperparameters in Deep Learning

1. Parameters

● Definition: Parameters are the values that the deep learning model learns
automatically during training by optimizing the loss function.
● Examples in Deep Learning:
○ Weights: Connections between neurons in different layers.
○ Biases: Additional values added to neuron outputs to allow flexibility in
predictions.
● Characteristics:
○ Learned through training using algorithms like gradient descent.
○ Directly impact how the model processes data and makes predictions.

2. Hyperparameters

● Definition: Hyperparameters are the settings or configurations defined before

training begins. These values are set manually and influence the training process
itself.
● Examples in Deep Learning:
○ Learning Rate: Controls how much the weights are adjusted during training.
○ Batch Size: Number of samples processed before the model updates
parameters.
○ Number of Layers: Determines the depth of the neural network.
○ Activation Function: Defines how neuron outputs are calculated.
○ Epochs: Number of times the model sees the entire training dataset.
● Characteristics:
○ Not learned during training; set by the user or through optimization techniques
like grid search or random search.
○ Impact how effectively and quickly the model learns.

Key Differences

Aspect Parameters Hyperparameters

Learned During Yes (automatically optimized by No (manually set before training

Training the model) starts)

Examples Weights, biases Learning rate, batch size,

number of layers

Role Define the model's behavior for Control the training process
predictions

Adjustability Adjusted by training algorithms Adjusted by the user or

optimization tools

In summary, parameters define the model's structure and outputs, while hyperparameters
determine how the model learns.

Greedy Layer-Wise Training in Deep Learning

Greedy layer-wise training is a training approach used in deep neural networks to train one
layer at a time in a sequential manner. This method helps initialize the network effectively,
especially in deep architectures, and mitigates challenges like the vanishing gradient
problem.

How It Works
1. Train One Layer at a Time:
Each layer is trained independently, starting from the first layer and moving upward,
without training the entire network all at once.
2. Freeze Lower Layers:
After training a layer, its weights are frozen (fixed), and the next layer is trained
based on the output of the previously trained layer.
3. Stack Layers Gradually:
Layers are added and trained one by one, forming a deeper network step by step.
4. Fine-Tuning (Optional):
Once all layers are trained, the entire network can be fine-tuned end-to-end using
backpropagation to adjust weights across all layers together.

Why Use Greedy Layer-Wise Training?

● Efficient Initialization:
Helps initialize weights in deep networks, avoiding poor starting points caused by
random initialization.
● Reduces Vanishing Gradient Issues:
Since each layer is trained independently, gradients don't diminish as they do in deep
backpropagation.
● Improves Stability:
Training one layer at a time is computationally simpler and reduces the chances of
unstable updates.
● Historical Context:
Used in early deep learning methods like deep belief networks (DBNs) and
autoencoders.

Applications

● Pretraining for Deep Networks:

Often used as a pretraining step for initializing weights before fine-tuning a deep
neural network.
● Unsupervised Learning:
In models like stacked autoencoders or deep belief networks, each layer can be
pretrained in an unsupervised manner.

Limitations

● Time-Consuming:
Training layers sequentially can take longer compared to end-to-end training.
● Modern Alternatives:
Techniques like better weight initializations (e.g., Xavier, He initialization) and
advanced optimizers (e.g., Adam) often make greedy layer-wise training
unnecessary in modern neural networks.

In summary, greedy layer-wise training builds deep networks layer by layer, stabilizing the
learning process and making it easier to train deep architectures, especially when
computational resources or data are limited.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) is a type of neural network designed specifically to

handle sequential data, such as time-series, text, or speech. Unlike traditional feedforward
networks, RNNs have connections that form loops, allowing them to "remember" information
from previous steps in a sequence. Here's an overview:

How RNNs Work

1.
2. Shared Weights: The weights of the network remain the same for every time step,
making RNNs efficient for sequential tasks.
Structure

1. Repeating Unit: The core component of an RNN is a small neural network (often a
fully connected layer) that is repeated for each time step.
2. Unfolding Over Time: An RNN processes a sequence by "unrolling" itself, where
each time step has its own copy of the repeating unit but shares the same weights.
Advantages

● Sequential Understanding: RNNs are great for tasks where the order of data
matters, such as language translation or stock price prediction.
● Memory: They can carry information forward across time steps, allowing them to
understand context in sequences.

Challenges

1. Vanishing/Exploding Gradient Problem: Gradients may shrink or grow excessively

during training, especially for long sequences, making it hard for the network to learn
dependencies across distant time steps.
2. Short-Term Memory: Basic RNNs struggle to capture long-term dependencies in
sequences.

Applications

● Text and Language Processing: Sentiment analysis, language modeling, machine

translation.
● Time-Series Data: Stock price prediction, weather forecasting.
● Speech and Audio: Speech recognition, music generation.
Extensions

To address the limitations of basic RNNs, advanced architectures like LSTMs (Long
Short-Term Memory) and GRUs (Gated Recurrent Units) were developed to better
capture long-term dependencies.
Backpropagation through
time (BPTT)
Backpropagation Through Time (BPTT) Simplified

What is BPTT?
BPTT is a method used to train Recurrent Neural Networks (RNNs). Unlike regular neural
networks where data flows in one direction, RNNs process sequences, meaning the output
at a certain time depends not just on the input at that time but also on previous time steps.

BPTT works by "unfolding" the RNN across time steps, turning it into a series of
interconnected layers (like a deep neural network) with shared weights. Errors are
backpropagated through this unfolded structure, and weights are updated using techniques
like gradient descent.

How It Works (Example)

1. Start with a sequence of data, e.g., {x1, x2, x3, ...}.

2. Pass x1 into the RNN to get a prediction, say y1.
3. Compare y1 to the actual next value (x2) to calculate the error.
4. Use BPTT to backpropagate the error and update the weights.
5. Repeat this for the whole sequence (x2 → x3, x3 → x4, etc.).
6. Test the RNN on a validation set and adjust hyperparameters.

This process helps the RNN learn patterns over time, making it good at tasks like predicting
the next value in a sequence.

Where Is BPTT Used?

1. Speech Recognition: Recognizing spoken words by understanding the sequence of

sounds.
2. Language Modeling: Predicting the next word in a sentence, useful for tasks like
text generation.
3. Time Series Prediction: Forecasting future values based on past data, like stock
prices or weather.

Challenges of BPTT

1. Vanishing Gradients: Gradients shrink as they move backward through time,

making it hard to learn long-term dependencies.
Solution: Use techniques like truncated BPTT or models like LSTMs and GRUs.
2. Exploding Gradients: Gradients grow too large, causing instability.
Solution: Apply gradient clipping.
3. High Memory Usage: Storing activations for long sequences requires a lot of
memory.
4. Slow Training: BPTT is sequential, so it’s hard to parallelize, making it
computationally expensive.

Summary

BPTT helps RNNs learn from sequential data by "rewinding" through time to adjust weights.
While it's powerful for tasks involving patterns over time, it has limitations like high memory
needs and gradient issues. With improvements like truncated BPTT and specialized
architectures (e.g., LSTMs), these challenges can be mitigated.

Long Short-Term Memory Networks

What is LSTM?

LSTM (Long Short-Term Memory) is a type of Recurrent Neural Network (RNN) designed to
handle long-term dependencies. It was introduced in 1997 by Sepp Hochreiter and Jürgen
Schmidhuber. Unlike traditional RNNs, which struggle with problems like vanishing and
exploding gradients, LSTMs can efficiently learn and remember important information over
extended sequences of data.

What is LSTM?

Key Features of LSTMs

1. Capability to Remember Long-Term Dependencies

○ Useful for tasks like language modeling where predictions depend on both
recent and past data.
○ Example: Predicting the next word in a sentence like "The clouds are in the
sky."
2. Overcoming RNN Challenges
○ RNNs face vanishing/exploding gradient issues, making them ineffective for
long-term dependencies.
○ LSTMs mitigate this through their unique architecture.
Variants of LSTM Explained in Simple Language

LSTM (Long Short-Term Memory) is a type of neural network, but there are variations to suit
different needs. Let’s simplify each variant:

1. Peephole Connections

● In normal LSTMs, gates (like forget and input gates) decide what information to keep
or discard. However, they don’t directly "peek" at the current memory state (CtC_tCt).
● With peephole connections, gates are allowed to "look" at the current memory
state while making decisions.
● Think of it as giving the gate extra context:
○ It can see the entire notebook (cell state) before deciding what to add or
erase.
2. Coupled Forget and Input Gates

● Normally, the forget gate and the input gate work separately:
○ Forget gate decides what to erase.
○ Input gate decides what new information to add.
● In coupled gates, these decisions happen together:
○ Forget something only if you’re replacing it with something new.
○ If you’re not adding anything new, you keep the old information intact.
● Think of it like: "I’ll clean the space (forget) only if I’m putting new notes there (input)."
3. GRU (Gated Recurrent Unit)

● GRU is a simpler alternative to LSTM, introduced by Cho et al. (2014).

● Key Differences from LSTM:
1. Combines Forget and Input Gates into an Update Gate:
■ Instead of two separate gates, GRU has one gate to decide both
forgetting and updating.
2. Merges Cell State and Hidden State:
■ LSTM has two parts: memory (CtC_tCt) and what is visible (hth_tht).
GRU merges these into one state.
3. Simpler Design:
■ Fewer gates and parameters make GRU faster to train.
● Why GRU?
It’s simpler and solves the vanishing gradient problem effectively (a common issue
in training neural networks).

Summary of Variants

1. Peephole Connections: Gates can peek at the current memory.

2. Coupled Gates: Forget and add decisions are made together.
3. GRU: A simpler version of LSTM, with combined gates and states.

TensorFlow in 1 Day: Make your own Neural Network
From Everand
TensorFlow in 1 Day: Make your own Neural Network
Krishna Rungta
3.5/5 (10)
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
Notes DL-1
No ratings yet
Notes DL-1
10 pages
DL Intro
No ratings yet
DL Intro
64 pages
Unit II
No ratings yet
Unit II
56 pages
Deep Learning
100% (2)
Deep Learning
49 pages
1725876123-Unit 1 Fundamental of Deep Learning
No ratings yet
1725876123-Unit 1 Fundamental of Deep Learning
51 pages
Unit I
No ratings yet
Unit I
90 pages
Unit 5 (Second Half)
No ratings yet
Unit 5 (Second Half)
10 pages
Deep Learnig
No ratings yet
Deep Learnig
16 pages
Review of Deep Learning Algorithms and Architectur
No ratings yet
Review of Deep Learning Algorithms and Architectur
29 pages
Deepnet Lourentzou
No ratings yet
Deepnet Lourentzou
49 pages
ML Prep For Samsung
No ratings yet
ML Prep For Samsung
73 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Unit Ii DNN
No ratings yet
Unit Ii DNN
24 pages
DeepLearning Glossary
No ratings yet
DeepLearning Glossary
5 pages
Deep Learning Report For Students
No ratings yet
Deep Learning Report For Students
32 pages
NoteGPT Summary DL Mod1
No ratings yet
NoteGPT Summary DL Mod1
3 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Deep Learning Day 27
No ratings yet
Deep Learning Day 27
43 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Deep Learning For Computer Vision
No ratings yet
Deep Learning For Computer Vision
125 pages
Unit-Ii DLL
No ratings yet
Unit-Ii DLL
19 pages
Deep Learning
No ratings yet
Deep Learning
20 pages
Artificial Intelligence: Outline
No ratings yet
Artificial Intelligence: Outline
35 pages
Deep Learning UNIT 1
No ratings yet
Deep Learning UNIT 1
22 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
205 pages
Lesson 2 Neural Network Architectures
No ratings yet
Lesson 2 Neural Network Architectures
35 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
22 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Deep Learning (DL) - Comprehensive Summary
No ratings yet
Deep Learning (DL) - Comprehensive Summary
9 pages
Tutorial 1,2
No ratings yet
Tutorial 1,2
12 pages
Group I
No ratings yet
Group I
20 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Unit 2 Deep Learning and Neural Networks
No ratings yet
Unit 2 Deep Learning and Neural Networks
38 pages
Deep Learning Fundamentals
No ratings yet
Deep Learning Fundamentals
19 pages
Ca 3 DL
No ratings yet
Ca 3 DL
6 pages
ANNs
No ratings yet
ANNs
57 pages
Deep Learning
No ratings yet
Deep Learning
37 pages
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
No ratings yet
Deep Learning Meets Sparse Regularization: A Signal Processing Perspective
23 pages
Unit 1 Fundamentals of Deep Learning
No ratings yet
Unit 1 Fundamentals of Deep Learning
20 pages
UNIT - 5 Lecture 2
No ratings yet
UNIT - 5 Lecture 2
26 pages
Deep Learning UNIT 5
No ratings yet
Deep Learning UNIT 5
182 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
Ann CNN RNN
No ratings yet
Ann CNN RNN
26 pages
Lecture 1
No ratings yet
Lecture 1
38 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Data Mining: Practical Machine Learning Tools and Techniques
No ratings yet
Data Mining: Practical Machine Learning Tools and Techniques
123 pages
ISE-1 Imp DLPDF
No ratings yet
ISE-1 Imp DLPDF
28 pages
CP4252 ML Unit - V
No ratings yet
CP4252 ML Unit - V
17 pages
Deep Neural Network
No ratings yet
Deep Neural Network
12 pages
CS 611 Slides 5
No ratings yet
CS 611 Slides 5
28 pages
DGM Mid Sem
No ratings yet
DGM Mid Sem
39 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Deep Learning-1
No ratings yet
Deep Learning-1
20 pages
Reviewer
No ratings yet
Reviewer
7 pages
The Deep Learning Revolution: Introductory Overview Lecture
No ratings yet
The Deep Learning Revolution: Introductory Overview Lecture
35 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
Artificial Intelligence Interview Questions
From Everand
Artificial Intelligence Interview Questions
Tech Interviews
5/5 (2)
NASSCOM Healthcare - Transforming Through Innovation-July 2022
No ratings yet
NASSCOM Healthcare - Transforming Through Innovation-July 2022
44 pages
ZD 9.13 Syslog Event Reference Guide - Rev A - 20160715
No ratings yet
ZD 9.13 Syslog Event Reference Guide - Rev A - 20160715
37 pages
Abril 2022 Juana Sanchez
No ratings yet
Abril 2022 Juana Sanchez
72 pages
SF Series Panel Mount Speed Controller
No ratings yet
SF Series Panel Mount Speed Controller
6 pages
Annexure B
No ratings yet
Annexure B
8 pages
O2X Software Manual-En
No ratings yet
O2X Software Manual-En
182 pages
ADMSHS - Emp - Tech - Q2 - M20 - Reflecting On The ICT - FV
No ratings yet
ADMSHS - Emp - Tech - Q2 - M20 - Reflecting On The ICT - FV
24 pages
Account Statement From 15 May 2019 To 17 May 2019
No ratings yet
Account Statement From 15 May 2019 To 17 May 2019
2 pages
Soft Computing Technique Based Economic Load Dispatch Using Improved Particle Swarm Optimization
No ratings yet
Soft Computing Technique Based Economic Load Dispatch Using Improved Particle Swarm Optimization
7 pages
Inventario Libreria
No ratings yet
Inventario Libreria
48 pages
Aquamax KF Portable Engl
No ratings yet
Aquamax KF Portable Engl
4 pages
Sonicwall Gen 7 Nsa Series: Highlights
No ratings yet
Sonicwall Gen 7 Nsa Series: Highlights
10 pages
Digital Forensics: What Is Forensic?
No ratings yet
Digital Forensics: What Is Forensic?
7 pages
Inter Maths 1A
69% (35)
Inter Maths 1A
11 pages
999 Most Repeated MCQs Collection PPSC FPSC
No ratings yet
999 Most Repeated MCQs Collection PPSC FPSC
32 pages
Coursunlocking Torts e Schedule - 14th June - 2023 - Updated Business Studies
No ratings yet
Coursunlocking Torts e Schedule - 14th June - 2023 - Updated Business Studies
18 pages
Cs g524 Advanced Computer Architecture1
No ratings yet
Cs g524 Advanced Computer Architecture1
2 pages
TimeWarrior Cheat Sheet
No ratings yet
TimeWarrior Cheat Sheet
1 page
Computer Network Lab Mannual
No ratings yet
Computer Network Lab Mannual
25 pages
State Chart Diagarm
No ratings yet
State Chart Diagarm
4 pages
Electrical Input Components 326D
No ratings yet
Electrical Input Components 326D
7 pages
Proyecto Traducción
No ratings yet
Proyecto Traducción
9 pages
Student Induction Program-July Aug - 2025
No ratings yet
Student Induction Program-July Aug - 2025
3 pages
Air Quality 2
No ratings yet
Air Quality 2
34 pages
Internship Presentation YASH
No ratings yet
Internship Presentation YASH
21 pages
1 s2.0 S0957417424031506 Main
No ratings yet
1 s2.0 S0957417424031506 Main
17 pages
Les Iptv
No ratings yet
Les Iptv
5 pages
76367349732
No ratings yet
76367349732
2 pages
CS Lab Manual - Merged
No ratings yet
CS Lab Manual - Merged
49 pages

Deep Learing

Uploaded by

Deep Learing

Uploaded by

Deep Neural Networks (DNNs) are a class of artificial neural networks (ANNs) that consist of

multiple layers of interconnected neurons. These networks are particularly powerful in

Deep Learning and Deep Neural Networks

Deep Neural Networks (DNNs)

Challenges in Training Deep Neural Networks

1. Vanishing and Exploding Gradients

4. Difficulty in Hyperparameter Tuning

● Issue: Training deep networks involves choosing many hyperparameters, including

Modern Strategies for Effective Training

1. Batch Normalization: Normalizes inputs to each layer, reducing the internal

Activation Functions in CNNs (Convolutional Neural Networks)

1. Tanh Activation Function

Zero-Centere Yes No Yes Yes

Gradient Yes Yes No No

Computation Moderate Moderate Low Low

Sparsity No No Yes Yes

Common Vanishing Vanishing Dying ReLU None (fixes Dying

Best Use Binary Probabilistic Deep Networks Deep Networks

Parameters vs Hyperparameters in Deep Learning

● Definition: Hyperparameters are the settings or configurations defined before

Aspect Parameters Hyperparameters

Learned During Yes (automatically optimized by No (manually set before training

Examples Weights, biases Learning rate, batch size,

Adjustability Adjusted by training algorithms Adjusted by the user or

Greedy Layer-Wise Training in Deep Learning

Why Use Greedy Layer-Wise Training?

● Pretraining for Deep Networks:

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) is a type of neural network designed specifically to

How RNNs Work

1. Vanishing/Exploding Gradient Problem: Gradients may shrink or grow excessively

● Text and Language Processing: Sentiment analysis, language modeling, machine

How It Works (Example)

1. Start with a sequence of data, e.g., {x1, x2, x3, ...}.

Where Is BPTT Used?

1. Speech Recognition: Recognizing spoken words by understanding the sequence of

1. Vanishing Gradients: Gradients shrink as they move backward through time,

Long Short-Term Memory Networks

Key Features of LSTMs

1. Capability to Remember Long-Term Dependencies

● GRU is a simpler alternative to LSTM, introduced by Cho et al. (2014).

1. Peephole Connections: Gates can peek at the current memory.

You might also like