0% found this document useful (0 votes)
5 views5 pages

Deep-Learning-Assignment-01

The document discusses various neural network architectures, focusing on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), detailing their components, differences from fully connected networks, and real-world applications. It also covers activation functions like ReLU and Tanh, highlighting their advantages and limitations, as well as loss functions such as Mean Squared Error and Cross-Entropy Loss. Additionally, it includes observations from an interactive practice session that emphasizes the importance of balancing model complexity and regularization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views5 pages

Deep-Learning-Assignment-01

The document discusses various neural network architectures, focusing on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), detailing their components, differences from fully connected networks, and real-world applications. It also covers activation functions like ReLU and Tanh, highlighting their advantages and limitations, as well as loss functions such as Mean Squared Error and Cross-Entropy Loss. Additionally, it includes observations from an interactive practice session that emphasizes the importance of balancing model complexity and regularization.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Deep Learning Assignment 01

Question 1: Exploring Neural Network Architectures

1. Convolutional Neural Networks (CNNs):

●​ CNNs excel in tasks involving grid-like data (e.g., images). Convolutional layers
use filters to detect local patterns (edges, textures) by sliding over input regions,
preserving spatial relationships.
●​ Key Components:
○​ Convolutional Layers: Extract hierarchical features (e.g., edges → shapes
→ objects).
○​ Pooling Layers (Max/Average): Reduce spatial dimensions, improving
computational efficiency and translational invariance.
○​ ReLU Activation: Introduces non-linearity after convolutions.
●​ Difference from Fully Connected Networks: CNNs exploit spatial locality,
drastically reducing parameters (weight sharing) compared to dense layers that
treat pixels as independent.
●​ Real-World Application: Beyond self-driving cars, CNNs are used in medical
imaging (e.g., detecting tumors in MRI scans).

2. Recurrent Neural Networks (RNNs):

●​ RNNs process sequential data (text, time series) using loops to pass hidden
states across time steps, capturing temporal dependencies.
●​ Variants:
○​ LSTM: Addresses vanishing gradients with gated mechanisms, retaining
long-term memory.
○​ GRU: Simplified version of LSTM with fewer parameters.
●​ Difference from Fully Connected Networks: Unlike FC networks, RNNs handle
variable-length sequences (e.g., sentences) by updating hidden states iteratively.
●​ Real-World Application: Beyond speech recognition, RNNs power machine
translation (e.g., Google Translate).

Question 2: Beyond Sigmoid: Activation Functions


1. Rectified Linear Unit (ReLU):

●​ Formula:
●​ f(x)=max⁡(0,x)
●​ f(x)=max(0,x)
●​ Advantages:
○​ Avoids vanishing gradients (non-saturating for
○​ x>0
○​ x>0).
○​ Computationally cheap (no exponential operations).
●​ Limitations: "Dying ReLU" issue (neurons stuck at zero for negative inputs).
●​ Usage: Default choice in CNNs and deep networks.

2. Hyperbolic Tangent (Tanh):

●​ Formula:
●​ f(x)=ex−e−xex+e−x
●​ f(x)=
●​ e
●​ x

●​ +e
●​ −x

●​ e
●​ x

●​ −e
●​ −x

●​ ​

●​ (outputs between -1 and 1).


●​ Advantages:
○​ Zero-centered outputs aid faster convergence.
○​ Mitigates vanishing gradients better than Sigmoid.
●​ Limitations: Saturates for extreme inputs (gradients near zero).
●​ Usage: Preferred in RNNs for balanced gradient flow.
Comparison:

●​ ReLU is simpler but risks dead neurons; Tanh avoids this but saturates. Leaky
ReLU (
●​ f(x)=max⁡(0.01x,x)
●​ f(x)=max(0.01x,x)) is a common ReLU variant to prevent neuron death.

Question 3: Exploring Loss Functions

1. Mean Squared Error (MSE):

●​ Formula:
●​ MSE=1n∑i=1n(yi−y^i)2
●​ MSE=
●​ n
●​ 1
●​ ​

●​ ∑
●​ i=1
●​ n
●​ ​

●​ (y
●​ i
●​ ​

●​ −
●​ y
●​ ^
●​ ​

●​ i
●​ ​

●​ )
●​ 2
●​ Usage: Regression tasks (e.g., predicting house prices).
●​ Why Suitable: Smooth and convex, enabling gradient-based optimization.
Penalizes large errors quadratically.
2. Cross-Entropy Loss (Multi-Class):

●​ Formula:
●​ −∑i=1nyilog⁡(y^i)
●​ −∑
●​ i=1
●​ n
●​ ​

●​ y
●​ i
●​ ​

●​ log(
●​ y
●​ ^
●​ ​

●​ i
●​ ​

●​ )
●​ Usage: Classification (e.g., MNIST digit recognition).
●​ Why Suitable: Aligns with softmax outputs, minimizing divergence between
predicted and true probability distributions.

Bonus Activity: Interactive Practice

Experiment Details:

●​ Dataset: Tested on TensorFlow Playground’s "Spiral" dataset.


●​ Observations:
1.​ With 4 hidden layers (5 neurons each), accuracy improved from 72% to
89%, but training took 2x longer.
2.​ ReLU achieved 85% accuracy in 200 epochs vs. Sigmoid’s 60% (gradients
vanished early).
3.​ Overfitting occurred with 8 neurons/layer (99% train vs. 75% test).
Reduced neurons to 3/layer and added L2 regularization, improving test
accuracy to 82%.
Conclusion: Balancing model complexity and regularization is critical. ReLU’s efficiency

makes it ideal for deeper networks

You might also like