Deep Learning Module-02
Deep Learning Module-02
Module-02
ud
• Information moves in only one direction: forward, from input nodes through hidden nodes
to output nodes
1. Origins
o
lo
Inspired by biological neural networks
2. Evolution
tu
Page 1
21CS743 | DEEP LEARNING
ud
lo
C
1.3 Network Architecture
1. Input Layer
tu
o No computation performed
2. Hidden Layers
Page 2
21CS743 | DEEP LEARNING
3. Output Layer
ud
o Regression: usually one neuron
1. Sigmoid (Logistic)
o
Range: [0,1] lo
Used in binary classification
C
o Properties:
▪ Smooth gradient
o Range: [-1,1]
o Properties:
▪ Zero-centered
Page 3
21CS743 | DEEP LEARNING
▪ Stronger gradients
ud
o Helps solve vanishing gradient problem
o Properties:
▪ Computationally efficient
4. Leaky ReLU
▪ lo
Dying ReLU problem
C
o Formula: f(x) = max(0.01x, x)
o Properties:
2. Gradient-Based Learning
1. Definition
Page 4
21CS743 | DEEP LEARNING
2. Properties
ud
o Magnitude indicates steepness
o
lo
Used for regression problems
▪ Always positive
▪ Differentiable
2. Cross-Entropy Loss
o Properties:
Page 5
21CS743 | DEEP LEARNING
3. Huber Loss
o Formula:
ud
▪ L = 0.5(y - f(x))² if |y - f(x)| ≤ δ
o
lo
Uses entire dataset for each update
Formula: θ = θ - α∇J(θ)
C
o
Page 6
21CS743 | DEEP LEARNING
ud
o Formula includes first and second moments
b) RMSprop
c) Momentum
o
lo
Adds fraction of previous update
C
o Helps escape local minima
o Reduces oscillation
1. Mathematical Basis
Page 7
21CS743 | DEEP LEARNING
1. Input Processing
o Data normalization
ud
o Weight initialization
o Bias addition
2. Layer Computation
python
Copy
Z = W * A + b # Linear transformation
3. Output Generation
o Prediction computation
V
o Error calculation
1. Error Calculation
Page 8
21CS743 | DEEP LEARNING
2. Weight Updates
ud
o Update biases similarly
3. Detailed Steps
python
Copy
# Output layer
lo
C
dZ = A - Y # For MSE
dW = (1/m) * dZ * A_prev.T
db = (1/m) * sum(dZ)
tu
# Hidden layers
dZ = dA * activation_derivative(Z)
V
dW = (1/m) * dZ * A_prev.T
db = (1/m) * sum(dZ)
4.1 L1 Regularization
Page 9
21CS743 | DEEP LEARNING
1. Mathematical Form
o Formula: L1 = λΣ|w|
o Promotes sparsity
2. Properties
ud
o Feature selection capability
4.2 L2 Regularization
1. Mathematical Form
o
lo
Adds squared weights to loss
Formula: L2 = λΣw²
C
o
2. Properties
tu
o No sparse solutions
4.3 Dropout
1. Basic Concept
Page 10
21CS743 | DEEP LEARNING
2. Implementation Details
python
Copy
ud
mask = np.random.binomial(1, p, size=layer_size)
A = A * mask
o
lo
Used only during training
1. Implementation
tu
2. Benefits
o Prevents overfitting
Page 11
21CS743 | DEEP LEARNING
5. Advanced Concepts
1. Purpose
ud
o Speeds up training
2. Algorithm
python
Copy
lo
# Pseudo-code for batch normalization
1. Xavier/Glorot Initialization
2. He Initialization
o Variance = 2/nin
Page 12
21CS743 | DEEP LEARNING
6. Practical Implementation
1. Architecture Choices
o Number of layers
ud
o Activation functions
2. Hyperparameter Selection
o Learning rate
o Batch size
o lo
Regularization strength
o Splitting data
o Normalization
tu
o Augmentation
2. Training Loop
o Forward pass
V
o Loss computation
o Backward pass
o Parameter updates
Page 13
21CS743 | DEEP LEARNING
1. Basic Concepts
2. Mathematical Problems
ud
o Calculate gradients for a simple 2-layer network
3. Implementation Challenges
o
lo
Design a network for MNIST classification
1. Activation Functions
tu
2. Loss Functions
3. Regularization
Page 14
21CS743 | DEEP LEARNING
o L1 = λΣ|w|
o L2 = λΣw²
4. Gradient Descent
o Update: w = w - α∇J(w)
o Momentum: v = βv - α∇J(w)
ud
Common Issues and Solutions
1. Vanishing Gradients
o
lo
Try residual connections
2. Overfitting
C
o Add dropout
o Use regularization
3. Poor Convergence
Page 15