0% found this document useful (0 votes)

1 views

Deep Learning Module-02

The document provides an overview of feedforward neural networks, including their architecture, activation functions, and historical context. It discusses gradient-based learning, cost functions, and various optimization techniques such as batch normalization and dropout. Additionally, it covers practical implementation considerations and common issues faced in deep learning, along with solutions.

Uploaded by

sanjana sm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Deep Learning Module-02

Uploaded by

sanjana sm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

21CS743 | DEEP LEARNING

Module-02

Feedforward Networks and Deep Learning

Introduction to Feedforward Neural Networks

1.1 Basic Concepts

• A feedforward neural network is the simplest form of artificial neural network(ANN)

ud
• Information moves in only one direction: forward, from input nodes through hidden nodes
to output nodes

• No cycles or loops exist in the network structure

1.2 Historical Context

1. Origins

o
lo
Inspired by biological neural networks

First proposed by Warren McCulloch and Walter Pitts (1943)

C
o Significant advancement with perceptron by Frank Rosenblatt (1958)

2. Evolution
tu

o Single-layer to multi-layer networks

o Development of backpropagation in 1986

o Modern deep learning revolution (2012-present)

Page 1
21CS743 | DEEP LEARNING

ud
lo
C
1.3 Network Architecture

1. Input Layer
tu

o Receives raw input data

o No computation performed

o Number of neurons equals number of input features

o Standardization/normalization often applied here

2. Hidden Layers

o Performs intermediate computations

o Can have multiple hidden layers

o Each neuron connected to all neurons in previous layer

Page 2
21CS743 | DEEP LEARNING

o Feature extraction and transformation occur here

3. Output Layer

o Produces final network output

o Number of neurons depends on problem type

o Classification: typically one neuron per class

ud
o Regression: usually one neuron

1.4 Activation Functions

1. Sigmoid (Logistic)

o Formula: σ(x) = 1/(1 + e^(-x))

o
Range: [0,1] lo
Used in binary classification
C
o Properties:

▪ Smooth gradient

▪ Clear prediction probability

▪ Suffers from vanishing gradient

2. Hyperbolic Tangent (tanh)

o Formula: tanh(x) = (e^x - e^(-x))/(e^x + e^(-x))

o Range: [-1,1]

o Often performs better than sigmoid

o Properties:

▪ Zero-centered

Page 3
21CS743 | DEEP LEARNING

▪ Stronger gradients

▪ Still has vanishing gradient issue

3. ReLU (Rectified Linear Unit)

o Formula: f(x) = max(0,x)

o Most commonly used

ud
o Helps solve vanishing gradient problem

o Properties:

▪ Computationally efficient

▪ No saturation in positive region

4. Leaky ReLU
▪ lo
Dying ReLU problem
C
o Formula: f(x) = max(0.01x, x)

o Addresses dying ReLU problem

o Small negative slope

o Properties:

▪ Never completely dies

▪ Allows for negative values

▪ More robust than standard ReLU

2. Gradient-Based Learning

2.1 Understanding Gradients

1. Definition

Page 4
21CS743 | DEEP LEARNING

o Gradient is a vector of partial derivatives

o Points in direction of steepest increase

o Used to minimize loss function

2. Properties

o Direction indicates fastest increase

ud
o Magnitude indicates steepness

o Negative gradient used for minimization

2.2 Cost Functions

1. Mean Squared Error (MSE)

o
lo
Used for regression problems

Formula: MSE = (1/n)Σ(y_true - y_pred)²

C
o Properties:

▪ Always positive

▪ Penalizes larger errors more

▪ Differentiable

2. Cross-Entropy Loss

o Used for classification problems

o Formula: -Σ(y_true * log(y_pred))

o Properties:

▪ Measures probability distribution difference

▪ Better for classification than MSE

Page 5
21CS743 | DEEP LEARNING

▪ Provides stronger gradients

3. Huber Loss

o Combines MSE and MAE

o Less sensitive to outliers

o Formula:

ud
▪ L = 0.5(y - f(x))² if |y - f(x)| ≤ δ

▪ L = δ|y - f(x)| - 0.5δ² otherwise

2.3 Gradient Descent Types

1. Batch Gradient Descent

o
lo
Uses entire dataset for each update

More stable but slower

Formula: θ = θ - α∇J(θ)
C
o

o Memory intensive for large datasets

2. Stochastic Gradient Descent (SGD)

o Updates parameters after each sample

o Faster but less stable

o Better for large datasets

o High variance in parameter updates

3. Mini-batch Gradient Descent

o Compromise between batch and SGD

o Updates parameters after small batches

Page 6
21CS743 | DEEP LEARNING

o Most commonly used in practice

o Typical batch sizes: 32, 64, 128

4. Advanced Optimizers a) Adam (Adaptive Moment Estimation)

o Combines momentum and RMSprop

o Adaptive learning rates

ud
o Formula includes first and second moments

b) RMSprop

o Adaptive learning rates

o Divides by running average of gradient magnitudes

c) Momentum

o
lo
Adds fraction of previous update
C
o Helps escape local minima

o Reduces oscillation

3. Backpropagation and Chain Rule

3.1 Chain Rule Fundamentals

1. Mathematical Basis

o df/dx = df/dy * dy/dx

o Allows computation of composite function derivatives

o Essential for neural network training

2. Application in Neural Networks

o Computes gradients layer by layer

Page 7
21CS743 | DEEP LEARNING

o Propagates error backwards

o Updates weights based on contribution to error

3.2 Forward Pass

1. Input Processing

o Data normalization

ud
o Weight initialization

o Bias addition

2. Layer Computation

python

Copy

# Pseudo-code for forward pass

lo
C
for layer in network:

Z = W * A + b # Linear transformation

A = activation(Z) # Apply activation function

3. Output Generation

o Final layer activation

o Prediction computation
V

o Error calculation

3.3 Backward Pass

1. Error Calculation

o Compare output with target

Page 8
21CS743 | DEEP LEARNING

o Calculate loss using cost function

o Initialize gradient computation

2. Weight Updates

o Calculate gradients using chain rule

o Update weights: w_new = w_old - learning_rate * gradient

ud
o Update biases similarly

3. Detailed Steps

python

Copy

# Pseudo-code for backward pass

# Output layer
lo
C
dZ = A - Y # For MSE

dW = (1/m) * dZ * A_prev.T

db = (1/m) * sum(dZ)
tu

# Hidden layers

dZ = dA * activation_derivative(Z)
V

dW = (1/m) * dZ * A_prev.T

db = (1/m) * sum(dZ)

4. Regularization for Deep Learning

4.1 L1 Regularization

Page 9
21CS743 | DEEP LEARNING

1. Mathematical Form

o Adds absolute value of weights to loss

o Formula: L1 = λΣ|w|

o Promotes sparsity

2. Properties

ud
o Feature selection capability

o Produces sparse models

o Less sensitive to outliers

4.2 L2 Regularization

1. Mathematical Form

o
lo
Adds squared weights to loss

Formula: L2 = λΣw²
C
o

o Prevents large weights

2. Properties
tu

o Smooth weight decay

o No sparse solutions

o More stable training

4.3 Dropout

1. Basic Concept

o Randomly deactivate neurons

o Probability p of keeping neurons

Page 10
21CS743 | DEEP LEARNING

o Different network for each training batch

2. Implementation Details

python

Copy

# Pseudo-code for dropout

ud
mask = np.random.binomial(1, p, size=layer_size)

A = A * mask

A = A / p # Scale to maintain expected value

3. Training vs. Testing

o
lo
Used only during training

Scaled appropriately during inference

C
o Acts as model ensemble

4.4 Early Stopping

1. Implementation
tu

o Monitor validation error

o Save best model

o Stop when validation error increases

2. Benefits

o Prevents overfitting

o Reduces training time

o Automatic model selection

Page 11
21CS743 | DEEP LEARNING

5. Advanced Concepts

5.1 Batch Normalization

1. Purpose

o Normalizes layer inputs

o Reduces internal covariate shift

ud
o Speeds up training

2. Algorithm

python

Copy

lo
# Pseudo-code for batch normalization

mean = np.mean(x, axis=0)

C
var = np.var(x, axis=0)

x_norm = (x - mean) / np.sqrt(var + ε)

out = gamma * x_norm + beta

5.2 Weight Initialization

1. Xavier/Glorot Initialization

o Variance = 2/(nin + nout)

o Suitable for tanh activation

2. He Initialization

o Variance = 2/nin

o Better for ReLU activation

Page 12
21CS743 | DEEP LEARNING

6. Practical Implementation

6.1 Network Design Considerations

1. Architecture Choices

o Number of layers

o Neurons per layer

ud
o Activation functions

2. Hyperparameter Selection

o Learning rate

o Batch size

o lo
Regularization strength

6.2 Training Process

C
1. Data Preparation

o Splitting data

o Normalization
tu

o Augmentation

2. Training Loop

o Forward pass
V

o Loss computation

o Backward pass

o Parameter updates

Practice Problems and Exercises

Page 13
21CS743 | DEEP LEARNING

1. Basic Concepts

o Explain the role of activation functions in neural networks

o Compare and contrast different types of gradient descent

o Describe the vanishing gradient problem

2. Mathematical Problems

ud
o Calculate gradients for a simple 2-layer network

o Implement batch normalization equations

o Compute different loss functions

3. Implementation Challenges

o
lo
Design a network for MNIST classification

Implement dropout in Python

C
o Create a custom loss function

Key Formulas Reference Sheet

1. Activation Functions
tu

o Sigmoid: σ(x) = 1/(1 + e^(-x))

o tanh(x) = (e^x - e^(-x))/(e^x + e^(-x))

o ReLU: f(x) = max(0,x)

2. Loss Functions

o MSE = (1/n)Σ(y_true - y_pred)²

o Cross-Entropy = -Σ(y_true * log(y_pred))

3. Regularization

Page 14
21CS743 | DEEP LEARNING

o L1 = λΣ|w|

o L2 = λΣw²

4. Gradient Descent

o Update: w = w - α∇J(w)

o Momentum: v = βv - α∇J(w)

ud
Common Issues and Solutions

1. Vanishing Gradients

o Use ReLU activation

o Implement batch normalization

o
lo
Try residual connections

2. Overfitting
C
o Add dropout

o Use regularization

o Implement early stopping

3. Poor Convergence

o Adjust learning rate

o Try different optimizers

o Check data normalization

Page 15

Shayak
No ratings yet
Shayak
6 pages
Deep+Learning+Module-02+Search+Creators
No ratings yet
Deep+Learning+Module-02+Search+Creators
15 pages
Deep Learning Module-03
No ratings yet
Deep Learning Module-03
20 pages
2023246032-Backward Propagation and Other Differential Algorithms
No ratings yet
2023246032-Backward Propagation and Other Differential Algorithms
48 pages
Deep Learning Module-03 Search Creators
No ratings yet
Deep Learning Module-03 Search Creators
20 pages
Different Activation Functions With The Equations
No ratings yet
Different Activation Functions With The Equations
6 pages
tutorial 1,2
No ratings yet
tutorial 1,2
12 pages
Ch2-Training, Optimization and Regularization of DNN-new (1)
No ratings yet
Ch2-Training, Optimization and Regularization of DNN-new (1)
114 pages
CS445 - Neural Networks and Deep Learning - Lecture Notes
No ratings yet
CS445 - Neural Networks and Deep Learning - Lecture Notes
5 pages
Deep Learning (1)
No ratings yet
Deep Learning (1)
19 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Supervised Deep Learning
No ratings yet
Supervised Deep Learning
28 pages
Pure Optimization
No ratings yet
Pure Optimization
23 pages
Aidl Unit III
No ratings yet
Aidl Unit III
79 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
DL M2 Tech
No ratings yet
DL M2 Tech
32 pages
A Probabilistic Theory of Deep Learning: Unit 2
No ratings yet
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2015
14 pages
3EBX0_lecture_notes_addendum
No ratings yet
3EBX0_lecture_notes_addendum
10 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
a imprimer 4
No ratings yet
a imprimer 4
4 pages
Dat 300
No ratings yet
Dat 300
12 pages
ANN Unit IV Notes
No ratings yet
ANN Unit IV Notes
4 pages
Unit-2 Improving-Deep-Neural-Networks
No ratings yet
Unit-2 Improving-Deep-Neural-Networks
18 pages
Deep MLP's
No ratings yet
Deep MLP's
44 pages
Unit II
No ratings yet
Unit II
56 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
4 pages
Convolutional Neural Network
100% (1)
Convolutional Neural Network
59 pages
IoT - Lecture 11
No ratings yet
IoT - Lecture 11
58 pages
Essential Concept in Artificial Neural Networks
No ratings yet
Essential Concept in Artificial Neural Networks
27 pages
Neural - Networks
No ratings yet
Neural - Networks
47 pages
AI - W7L13
No ratings yet
AI - W7L13
46 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
100 pages
MODULE 2 DL SNOTES P1
No ratings yet
MODULE 2 DL SNOTES P1
16 pages
Inference and Learning
No ratings yet
Inference and Learning
33 pages
CT1 DL Ans
No ratings yet
CT1 DL Ans
13 pages
Assignment - 4
No ratings yet
Assignment - 4
24 pages
Deep Learning: Technical Introduction: Thomas Epelbaum
No ratings yet
Deep Learning: Technical Introduction: Thomas Epelbaum
106 pages
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
No ratings yet
CS 224D: Deep Learning For NLP: Lecture Notes: Part III Spring 2016
14 pages
Slides 11
No ratings yet
Slides 11
48 pages
Ch2 - Fundamental of Deep Learning
No ratings yet
Ch2 - Fundamental of Deep Learning
33 pages
26 Neural Nets
No ratings yet
26 Neural Nets
77 pages
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
No ratings yet
CII4Q3 - Computer Vision-EAR - Week-11-Intro To Deep Learning v1.0
50 pages
Unit 4
No ratings yet
Unit 4
19 pages
Fundamentals of Deep Learning
No ratings yet
Fundamentals of Deep Learning
26 pages
AI & ML Unit 5 Notes
No ratings yet
AI & ML Unit 5 Notes
23 pages
Module 2 Deep Feed Forward Networks
No ratings yet
Module 2 Deep Feed Forward Networks
18 pages
Deep Learning UNIT-II Part1
No ratings yet
Deep Learning UNIT-II Part1
48 pages
AML 03 Dense Neural Networks
No ratings yet
AML 03 Dense Neural Networks
20 pages
Deep Learning
No ratings yet
Deep Learning
40 pages
3.4 - Backpropagation and Architectures
No ratings yet
3.4 - Backpropagation and Architectures
28 pages
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Deep learning
No ratings yet
Deep learning
15 pages
Secrets of Deep Learning 1716536527
No ratings yet
Secrets of Deep Learning 1716536527
12 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
MLP 1122 20240509 ch10 DeepNN
No ratings yet
MLP 1122 20240509 ch10 DeepNN
47 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Deep Learning
No ratings yet
Deep Learning
45 pages
Radial Basis Function Neural Network (RBFNN)
No ratings yet
Radial Basis Function Neural Network (RBFNN)
12 pages
NNFL 3unit
No ratings yet
NNFL 3unit
10 pages
Unit 2 v1.
No ratings yet
Unit 2 v1.
41 pages
AI Algorithms Explained To Kids
No ratings yet
AI Algorithms Explained To Kids
20 pages
Lecture 18 - Kohonen SOM
No ratings yet
Lecture 18 - Kohonen SOM
17 pages
Tensorflow
No ratings yet
Tensorflow
25 pages
Quiz-2: Attempt History
No ratings yet
Quiz-2: Attempt History
7 pages
Practice Question Bank - Machine Learning
No ratings yet
Practice Question Bank - Machine Learning
4 pages
D1-22683 Aam Tyan 2023-24 SMD
No ratings yet
D1-22683 Aam Tyan 2023-24 SMD
6 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
Different Deep CNN Architectures - LeNet, AlexNet, VGG
No ratings yet
Different Deep CNN Architectures - LeNet, AlexNet, VGG
13 pages
CS3491 Set6
No ratings yet
CS3491 Set6
2 pages
Neural Language Model, RNNS: Pawan Goyal
No ratings yet
Neural Language Model, RNNS: Pawan Goyal
15 pages
Applied Machine Learning For Engineers: Artificial Neural Networks
0% (1)
Applied Machine Learning For Engineers: Artificial Neural Networks
6 pages
XOR Problem Tensorflow NN - Ipynb
No ratings yet
XOR Problem Tensorflow NN - Ipynb
29 pages
LSTM Recurrent Neural Networks - How To Teach A Network To Remember The Past - by Saul Dobilas - Towards Data Science
No ratings yet
LSTM Recurrent Neural Networks - How To Teach A Network To Remember The Past - by Saul Dobilas - Towards Data Science
20 pages
MACHINE LEARNING UNIT WISE IMPORTANT QUESTION 3,4,5
No ratings yet
MACHINE LEARNING UNIT WISE IMPORTANT QUESTION 3,4,5
2 pages
Second Exam 2021-22
No ratings yet
Second Exam 2021-22
14 pages
Unit V Tn321
No ratings yet
Unit V Tn321
50 pages
Artificial Neural Network Notes
No ratings yet
Artificial Neural Network Notes
24 pages
Precision, Recall, F1-Score
No ratings yet
Precision, Recall, F1-Score
6 pages
Uni2 NNDL
No ratings yet
Uni2 NNDL
21 pages
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
No ratings yet
COMP3308/COMP3608 Artificial Intelligence Week 10 Tutorial Exercises Support Vector Machines. Ensembles of Classifiers
3 pages
Week 10
No ratings yet
Week 10
3 pages
Soft Computing v.imp Ques - 5 Year PYQs ( RRSIMT)
No ratings yet
Soft Computing v.imp Ques - 5 Year PYQs ( RRSIMT)
30 pages
CS 231N Midterm Review
No ratings yet
CS 231N Midterm Review
30 pages
Text Document Classification Quiz: Q1. Classification Techniques Have Been Applied To
0% (3)
Text Document Classification Quiz: Q1. Classification Techniques Have Been Applied To
12 pages
Credit
No ratings yet
Credit
6 pages