Deep-Learning-Assignment-01
Deep-Learning-Assignment-01
● CNNs excel in tasks involving grid-like data (e.g., images). Convolutional layers
use filters to detect local patterns (edges, textures) by sliding over input regions,
preserving spatial relationships.
● Key Components:
○ Convolutional Layers: Extract hierarchical features (e.g., edges → shapes
→ objects).
○ Pooling Layers (Max/Average): Reduce spatial dimensions, improving
computational efficiency and translational invariance.
○ ReLU Activation: Introduces non-linearity after convolutions.
● Difference from Fully Connected Networks: CNNs exploit spatial locality,
drastically reducing parameters (weight sharing) compared to dense layers that
treat pixels as independent.
● Real-World Application: Beyond self-driving cars, CNNs are used in medical
imaging (e.g., detecting tumors in MRI scans).
● RNNs process sequential data (text, time series) using loops to pass hidden
states across time steps, capturing temporal dependencies.
● Variants:
○ LSTM: Addresses vanishing gradients with gated mechanisms, retaining
long-term memory.
○ GRU: Simplified version of LSTM with fewer parameters.
● Difference from Fully Connected Networks: Unlike FC networks, RNNs handle
variable-length sequences (e.g., sentences) by updating hidden states iteratively.
● Real-World Application: Beyond speech recognition, RNNs power machine
translation (e.g., Google Translate).
● Formula:
● f(x)=max(0,x)
● f(x)=max(0,x)
● Advantages:
○ Avoids vanishing gradients (non-saturating for
○ x>0
○ x>0).
○ Computationally cheap (no exponential operations).
● Limitations: "Dying ReLU" issue (neurons stuck at zero for negative inputs).
● Usage: Default choice in CNNs and deep networks.
● Formula:
● f(x)=ex−e−xex+e−x
● f(x)=
● e
● x
● +e
● −x
● e
● x
● −e
● −x
●
● ReLU is simpler but risks dead neurons; Tanh avoids this but saturates. Leaky
ReLU (
● f(x)=max(0.01x,x)
● f(x)=max(0.01x,x)) is a common ReLU variant to prevent neuron death.
● Formula:
● MSE=1n∑i=1n(yi−y^i)2
● MSE=
● n
● 1
●
● ∑
● i=1
● n
●
● (y
● i
●
● −
● y
● ^
●
● i
●
● )
● 2
● Usage: Regression tasks (e.g., predicting house prices).
● Why Suitable: Smooth and convex, enabling gradient-based optimization.
Penalizes large errors quadratically.
2. Cross-Entropy Loss (Multi-Class):
● Formula:
● −∑i=1nyilog(y^i)
● −∑
● i=1
● n
●
● y
● i
●
● log(
● y
● ^
●
● i
●
● )
● Usage: Classification (e.g., MNIST digit recognition).
● Why Suitable: Aligns with softmax outputs, minimizing divergence between
predicted and true probability distributions.
Experiment Details: