Gen AI
Gen AI
Malaika Hussain
March 2025
1
Figure 1: Visualization of the generated dataset
2
Figure 3: Trained perceptron decision boundary with test data
3
2 Question 2: Implementing Convolution from
Scratch
2.1 Objective
The purpose of this assignment is to develop a deeper understanding of convo-
lution operations by manually implementing them without using built-in deep
learning libraries. This will help analyze how different kernels influence image
processing.
4
2.5 Comparison of Convolution vs. Correlation
To compare convolution and correlation, two different kernels were used:
• A symmetric kernel:
0.250.250.250.25
• A non-symmetric kernel:
121000 − 1 − 2 − 1
The results were analyzed to observe how convolution and correlation yield
different outcomes.
5
2.10 Observations on Convolution vs. Correlation
• Convolution flips the kernel before applying it, while correlation does not.
• The difference in results is more pronounced when using asymmetric ker-
nels.
• Edge detection kernels, such as Sobel, yield different results when used in
convolution versus correlation.
2.12 Conclusion
Through manual implementation, we have gained insights into how convolution
operations function and how different kernel types affect image processing. The
analysis of stride, padding, and kernel size provided a better understanding of
convolution’s impact on image transformation.
6
Figure 5: Original image before processing.
7
Figure 6: Converted grayscale image.
8
Figure 8: Edge detection using the Sobel filter.
9
Figure 11: Demonstration of different kernel applications.
10
3 Question 3: Implementation of a Convolu-
tional Neural Network for CIFAR-10
3.1 Introduction
The objective of this report is to implement a Convolutional Neural Network
(CNN) for image classification using the CIFAR-10 dataset. The study includes
dataset preparation, CNN model implementation, evaluation, data augmenta-
tion, and an ablation study to analyze hyperparameter effects.
3.2 Dataset
The CIFAR-10 dataset consists of 60,000 32x32 RGB images belonging to 10
classes. The dataset is split into 50,000 training images and 10,000 testing
images.
11
3.7 Pooling Layers
Pooling layers are introduced after groups of convolutional layers to reduce the
spatial dimensions of feature maps. Max pooling is employed to retain the most
significant features while reducing computational complexity. This helps make
the model more robust to small translations and distortions in images.
• Accuracy
• Precision
• Recall
• F1-Score
• Confusion Matrix
12
3.13 Comparison of Models
Two models were trained and evaluated:
• Model without Data Augmentation
• Model with Data Augmentation (flipping, rotation, shifting)
3.18 Conclusion
This report presented the implementation of a CNN for CIFAR-10 classification.
The effect of data augmentation and hyperparameters on model performance
was analyzed. Results show that data augmentation improves generalization,
while hyperparameters significantly impact accuracy.
13
3.19 Images and Results
14
Figure 13: Confusion Matrix with Augmentation
15
4 Question 4: Implementation of a Vanilla RNN
for Next-Word Prediction
4.1 Introduction
The objective of this project is to implement a Vanilla Recurrent Neural Network
(RNN) trained on a Shakespeare text dataset for next-word prediction. Instead
of using pre-trained embeddings (e.g., Word2Vec or GloVe), the model will learn
its own embeddings using a trainable embedding layer.
4.2 Dataset
The dataset used for training is the publicly available Shakespeare text dataset
from Hugging Face. The dataset consists of a large corpus of Shakespeare’s
works, and preprocessing involves tokenization and vocabulary creation.
16
4.7 Evaluation and Performance Metrics
4.8 Generated Text Sequences
To test the model, a seed phrase such as “To be or not to” is provided, and the
model generates the next words iteratively.
4.12 Conclusion
This report presents the implementation of a Vanilla RNN for next-word pre-
diction using a Shakespeare text dataset. The model successfully learns word
embeddings and generates text sequences. An ablation study is conducted to
compare learned embeddings with random embeddings, showing the effect of
word representation on model performance. The test results indicate that the
model struggles with long-term dependencies, leading to high perplexity and
low accuracy.
17
Figure 15: Pretrained Embeddings Confusion matrix.
18
5 Question 5: Hyperparameter Search for CNN
and RNN
5.1 Introduction
Hyperparameter tuning is crucial for optimizing deep learning models. In this
task, we perform a random search over a predefined set of hyperparameters for
both Convolutional Neural Networks (CNNs) and Recurrent Neural Networks
(RNNs). The best configurations are selected based on validation accuracy, and
the models are evaluated on the test dataset.
19
Hyperparameter Best Value
Weight Initialization Glorot Uniform
Stride 2
Optimizer SGD
Number of Units (RNN) 128
Number of Layers 2
Number of Filters (CNN) 128
Learning Rate 0.0001
Kernel Size (CNN) 3
Dropout 0.5
Batch Size 32
Activation Function Sigmoid
20
Class Precision Recall F1-Score Support
0 0.85 0.91 0.88 310
1 0.96 0.93 0.94 690
Accuracy 0.93
Macro Avg 0.91 0.92 0.91 1000
Weighted Avg 0.93 0.93 0.93 1000
5.8 Conclusion
The hyperparameter search yielded optimal configurations for both CNN and
RNN models. The RNN model achieved slightly better test accuracy (92.70%)
than the CNN model (92.50%). Both models demonstrated high classification
performance, confirming the effectiveness of hyperparameter tuning in improv-
ing model performance.
21