0% found this document useful (0 votes)
6 views

Assignment-6 STC-DL

The document discusses the architecture and components of Convolutional Neural Networks (CNNs), including convolutional, pooling, and fully connected layers, and their roles in image processing. It emphasizes the importance of data preprocessing techniques like normalization and data augmentation for improving model performance, as well as how to utilize AWS services like Amazon S3 and AWS SageMaker for efficient data handling and preprocessing. Additionally, it outlines various CNN models and their applications, advantages, and disadvantages.

Uploaded by

shirisha edikoju
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Assignment-6 STC-DL

The document discusses the architecture and components of Convolutional Neural Networks (CNNs), including convolutional, pooling, and fully connected layers, and their roles in image processing. It emphasizes the importance of data preprocessing techniques like normalization and data augmentation for improving model performance, as well as how to utilize AWS services like Amazon S3 and AWS SageMaker for efficient data handling and preprocessing. Additionally, it outlines various CNN models and their applications, advantages, and disadvantages.

Uploaded by

shirisha edikoju
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

DEEP LEARNING

ASSIGNMENT-6
1Q) Explain the architecture of a simple Convolutional Neural Network (CNN) model.
Describe the roles of the convolutional layer, pooling layer, and fully connected layer in the
model. Additionally, discuss how you would use AWS SageMaker to build, train, and deploy
this model.
Ans.: Convolutional Neural Networks (CNNs) are a specialized class of neural networks
designed to process grid-like data, such as images. They are particularly well-suited for image
recognition and processing tasks.
They are inspired by the visual processing mechanisms in the human brain, CNNs excel at
capturing hierarchical patterns and spatial dependencies within images.
Key Components of a Convolutional Neural Network
1. Convolutional Layers: These layers apply convolutional operations to input images, using
filters (also known as kernels) to detect features such as edges, textures, and more complex
patterns. Convolutional operations help preserve the spatial relationships between pixels.
2. Pooling Layers: They downsample the spatial dimensions of the input, reducing the
computational complexity and the number of parameters in the network. Max pooling is a
common pooling operation, selecting the maximum value from a group of neighboring
pixels.
3. Activation Functions: They introduce non-linearity to the model, allowing it to learn more
complex relationships in the data.
4. Fully Connected Layers: These layers are responsible for making predictions based on the
high-level features learned by the previous layers. They connect every neuron in one layer
to every neuron in the next layer.
How CNNs Work?
1. Input Image: The CNN receives an input image, which is typically preprocessed to ensure
uniformity in size and format.
2. Convolutional Layers: Filters are applied to the input image to extract features like edges,
textures, and shapes.
3. Pooling Layers: The feature maps generated by the convolutional layers are downsampled
to reduce dimensionality.
4. Fully Connected Layers: The downsampled feature maps are passed through fully
connected layers to produce the final output, such as a classification label.
5. Output: The CNN outputs a prediction, such as the class of the image.
Convolutional Neural Network Training
CNNs are trained using a supervised learning approach. This means that the CNN is given a set
of labeled training images. The CNN then learns to map the input images to their correct
labels.
The training process for a CNN involves the following steps:
1. Data Preparation: The training images are preprocessed to ensure that they are all in the
same format and size.
2. Loss Function: A loss function is used to measure how well the CNN is performing on the
training data. The loss function is typically calculated by taking the difference between the
predicted labels and the actual labels of the training images.
3. Optimizer: An optimizer is used to update the weights of the CNN in order to minimize
the loss function.
4. Backpropagation: Backpropagation is a technique used to calculate the gradients of the
loss function with respect to the weights of the CNN. The gradients are then used to update
the weights of the CNN using the optimizer.
CNN Evaluation
After training, CNN can be evaluated on a held-out test set. A collection of pictures that the
CNN has not seen during training makes up the test set. How well the CNN performs on the
test set is a good predictor of how well it will function on actual data.
The efficiency of a CNN on picture categorization tasks can be evaluated using a variety of
criteria. Among the most popular metrics are:
 Accuracy: Accuracy is the percentage of test images that the CNN correctly classifies.
 Precision: Precision is the percentage of test images that the CNN predicts as a particular
class and that are actually of that class.
 Recall: Recall is the percentage of test images that are of a particular class and that the
CNN predicts as that class.
 F1 Score: The F1 Score is a harmonic mean of precision and recall. It is a good metric for
evaluating the performance of a CNN on classes that are imbalanced.
Different Types of CNN Models
1.LeNet
LeNet, developed by Yann LeCun and his colleagues in the late 1990s, was one of the first
successful CNNs designed for handwritten digit recognition. It laid the foundation for modern
CNNs and achieved high accuracy on the MNIST dataset, which contains 70,000 images of
handwritten digits (0-9).
2.AlexNet
AlexNet is a CNN architecture that was developed by Alex Krizhevsky, Ilya Sutskever, and
Geoffrey Hinton in 2012. It was the first CNN to win the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC), a major image recognition competition, and it helped to
establish CNNs as a powerful tool for image recognition.
AlexNet consists of several layers of convolutional and pooling layers, followed by fully
connected layers. The architecture includes five convolutional layers, three pooling layers, and
three fully connected layers.
3. Resnet
ResNets (Residual Networks) are designed for image recognition and processing tasks. They
are renowned for their ability to train very deep networks without overfitting, making them
highly effective for complex tasks.
It introduces skip connections that allow the network to learn residual functions making it
easier to train deep architecture.
4.GoogleNet
GoogleNet, also known as InceptionNet, is renowned for achieving high accuracy in image
classification while using fewer parameters and computational resources compared to other
state-of-the-art CNNs.
The core component of GoogleNet, Inception modules allow the network to learn features at
different scales simultaneously, enhancing performance.
5. VGG
VGGs are developed by the Visual Geometry Group at Oxford, it uses small 3×3 convolutional
filters stacked in multiple layers, creating a deep and uniform structure. Popular variants
like VGG-16 and VGG-19 achieved state-of-the-art performance on the ImageNet dataset,
demonstrating the power of depth in CNNs.
Applications of CNN
 Image classification: CNNs are the state-of-the-art models for image classification. They
can be used to classify images into different categories, such as cats and dogs, cars and
trucks, and flowers and animals.
 Object detection: CNNs can be used to detect objects in images, such as people, cars, and
buildings. They can also be used to localize objects in images, which means that they can
identify the location of an object in an image.
 Image segmentation: CNNs can be used to segment images, which means that they can
identify and label different objects in an image. This is useful for applications such as
medical imaging and robotics.
 Video analysis: CNNs can be used to analyze videos, such as tracking objects in a video or
detecting events in a video. This is useful for applications such as video surveillance and
traffic monitoring.
Advantages of CNN
 High Accuracy: CNNs achieve state-of-the-art accuracy in various image recognition
tasks.
 Efficiency: CNNs are efficient, especially when implemented on GPUs.
 Robustness: CNNs are robust to noise and variations in input data.
 Adaptability: CNNs can be adapted to different tasks by modifying their architecture.
Disadvantages of CNN
 Complexity: CNNs can be complex and difficult to train, especially for large datasets.
 Resource-Intensive: CNNs require significant computational resources for training and
deployment.
 Data Requirements: CNNs need large amounts of labeled data for training.
 Interpretability: CNNs can be difficult to interpret, making it challenging to understand
their predictions.

2Q) Discuss the importance of data preprocessing in training a CNN model. How do
techniques such as normalization and data augmentation contribute to the performance of
the model?

Ans.: Data preprocessing is a crucial step in training a Convolutional Neural Network (CNN)
model. It helps to improve the quality of the input data and ensures the network learns
efficiently, leading to better generalization on unseen data. Here's a breakdown of why
preprocessing is important and how specific techniques like normalization and data
augmentation contribute to CNN performance:

1. Normalization

Normalization is the process of scaling input data into a range that is more conducive to model
training, often to a [0, 1] or [-1, 1] range. CNNs typically perform better when input values are
normalized because:

 Faster convergence: When data features are on similar scales, the gradient descent algorithm
(used for training) converges faster. Without normalization, certain features could dominate
during training, leading to slow or unstable convergence.
 Improved stability: Neural networks are sensitive to the scale of input values. By normalizing,
we ensure that the model doesn't get "stuck" in certain regions of the loss function due to high
values or large disparities among features.
 Consistent weight updates: CNNs rely on backpropagation, which calculates gradients for
weight updates. If features are on vastly different scales, it can lead to inconsistent gradients,
causing issues in learning. Normalization makes sure the gradients are more balanced.

Common normalization techniques include min-max scaling (scaling to [0, 1]) and Z-score
normalization (scaling based on mean and standard deviation).

2. Data Augmentation
Data augmentation involves artificially increasing the size and diversity of the training dataset by
applying various transformations to the existing data, such as rotations, translations, flips, and
color adjustments. This technique helps in several ways:

 Prevents overfitting: CNNs, especially deep ones, have a large number of parameters,
which makes them prone to overfitting. By augmenting the dataset, we introduce
variations in the training data, making it harder for the model to memorize specific details
of the training set, thus improving generalization.
 Simulates real-world variability: Data augmentation simulates real-world variations
that a model might encounter in production. For instance, images could be captured from
different angles or lighting conditions. By training on a variety of augmented images, the
CNN becomes more robust to such variations.
 Enhances model robustness: Augmentation helps the model learn features that are
invariant to certain transformations, such as recognizing an object even when it's rotated
or scaled. This makes the model more adaptable to a wide range of real-world conditions.

Common data augmentation techniques include:

 Rotation: Rotating images by a certain angle.


 Flipping: Horizontal or vertical flips.
 Shifting: Translating images by a few pixels in different directions.
 Zooming: Zooming in or out within the image.
 Brightness/Contrast adjustments: Altering the lighting and contrast of the image to mimic
different environments.
 Cropping: Randomly cropping portions of the image.

3. Other Preprocessing Techniques

In addition to normalization and augmentation, there are other preprocessing techniques that can
enhance the performance of a CNN:

 Resizing: CNNs generally require input images of a fixed size. Resizing images to the required
dimensions ensures that the network can process them consistently.
 Grayscale conversion: For simpler tasks, like digit recognition, converting color images to
grayscale can reduce the complexity and speed up processing while retaining essential features.
 Noise reduction: Removing noise from images can help CNNs focus on relevant patterns and
avoid learning unnecessary details.

Overall Impact on Model Performance

 Efficiency: With preprocessing, especially normalization and data augmentation, the model
learns faster, converges more quickly, and requires fewer epochs to achieve good performance.
 Generalization: Both normalization and augmentation improve the model's ability to generalize
to unseen data, which is critical for real-world applications.
 Robustness: These techniques help the CNN adapt to a wider variety of inputs and perform
better in diverse situations, making the model more reliable.
3Q) Explain how you would handle data preprocessing and augmentation using AWS
services like Amazon S3

Ans.: Handling data preprocessing and augmentation using AWS services like Amazon S3 can
be a seamless and efficient process due to AWS's robust cloud infrastructure and specialized
machine learning services. Below is a step-by-step explanation of how you could leverage AWS
to preprocess and augment data for training a CNN model:

1. Storing Data on Amazon S3

Amazon S3 (Simple Storage Service) is widely used to store large amounts of unstructured data,
such as images, videos, or text files. Here's how you would store and manage your data:

 Data Ingestion: Upload the raw dataset to an S3 bucket. You can use the AWS Management
Console, AWS CLI, or SDKs to upload large batches of data. For large datasets, you may want to
use AWS S3 Transfer Acceleration to speed up uploads.
 Bucket Organization: Organize the data in S3 with clear folder structures (e.g., /train/,
/validation/, /test/, /raw/, /processed/). This way, it’s easier to manage and reference
the data during the preprocessing steps.

Example:

aws s3 cp dataset.zip s3://your-bucket-name/raw/dataset.zip

2. Data Preprocessing and Augmentation Pipeline with AWS Lambda and AWS
SageMaker

Once the data is in S3, you can use various AWS services to handle preprocessing and
augmentation. Two popular services are AWS Lambda (for serverless data transformations) and
AWS SageMaker (for scalable model training and preprocessing).

Using AWS Lambda for Preprocessing:

AWS Lambda allows you to run code without managing servers, which is ideal for lightweight
preprocessing tasks such as image resizing, format conversion, and normalization. You can
create a Lambda function that is triggered by events in S3 (e.g., new files uploaded).

Steps:

1. Set Up Lambda: Create a Lambda function that processes images, such as resizing,
cropping, or normalizing. Lambda supports various programming languages (e.g.,
Python, Node.js).

Example Python code to resize images:

from PIL import Image


import boto3
import os

s3_client = boto3.client('s3')

def lambda_handler(event, context):


bucket_name = event['Records'][0]['s3']['bucket']['name']
file_key = event['Records'][0]['s3']['object']['key']

# Download image from S3


local_file_path = '/tmp/{}'.format(file_key)
s3_client.download_file(bucket_name, file_key, local_file_path)

# Preprocess image (e.g., resizing)


with Image.open(local_file_path) as img:
img = img.resize((224, 224)) # Resize image to 224x224
img.save(local_file_path)

# Upload the processed image back to S3


processed_key = 'processed/{}'.format(file_key)
s3_client.upload_file(local_file_path, bucket_name, processed_key)

return {'statusCode': 200, 'body': 'Processing complete'}

2. Trigger Lambda on S3 Event: Set up an S3 event notification to trigger the Lambda


function when new data is uploaded to the raw dataset folder in S3.

Example S3 event notification:

o Event type: s3:ObjectCreated:Put


o Prefix: /raw/
o Destination: Lambda function

Once the Lambda function processes the image, it stores the preprocessed image back in the S3
bucket (in the /processed/ folder).

Using AWS SageMaker for More Advanced Preprocessing and Augmentation:

For more complex preprocessing tasks, such as applying data augmentation techniques (rotation,
flipping, color adjustments, etc.), you can use AWS SageMaker. SageMaker offers a managed
environment for training machine learning models and running preprocessing pipelines.

1. Create a SageMaker Processing Job: SageMaker allows you to define preprocessing


steps in a containerized environment. You can use SageMaker’s built-in Python SDK to
create a processing job that reads images from S3, applies augmentation, and stores the
output back in S3.

Example SageMaker Processing Script for Augmentation:

from sagemaker.processing import ScriptProcessor


from sagemaker.inputs import ProcessingInput, ProcessingOutput
import os

processor = ScriptProcessor(
image_uri='your-docker-image-uri',
command=['python3'],
role='your-iam-role',
instance_count=1,
instance_type='ml.m5.large'
)

processor.run(
code='augmentation_script.py',
inputs=[ProcessingInput(source='s3://your-bucket-name/raw/',
destination='/opt/ml/processing/input')],
outputs=[ProcessingOutput(source='/opt/ml/processing/output',
destination='s3://your-bucket-name/processed/')]
)

2. Data Augmentation Techniques: The script (augmentation_script.py) can apply


various image transformations like rotations, flips, zooms, etc. Using libraries like
TensorFlow, Keras, or OpenCV, you can implement augmentation logic.

Example Augmentation Code:

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)

# Apply augmentation to images


for image in image_list:
augmented_image = datagen.random_transform(image)
# Save augmented image to S3 or local storage

3. Store Augmented Data: After augmentation, the output images can be saved back to the
processed folder in S3, which is then ready for model training.

3. Training the Model on Amazon SageMaker

Once the data is preprocessed and augmented, you can directly use Amazon SageMaker for
training the CNN model. SageMaker offers managed Jupyter notebooks, GPU instances, and pre-
built machine learning containers for easy training. You can also create a training pipeline to
automate the entire process.
Steps:

1. Prepare the Data: Specify the S3 location of the processed and augmented images.
2. Configure the Model: Use a built-in TensorFlow, PyTorch, or MXNet container, or bring your
own custom container.
3. Train the Model: Train the model using SageMaker’s managed training environment, where you
can scale based on your requirements (e.g., using GPU instances for faster training).

4. Monitoring and Scaling with AWS

To ensure everything runs smoothly, you can monitor your processing and training jobs using
Amazon CloudWatch. For large-scale data processing and training, you can leverage AWS
Auto Scaling to dynamically adjust resources as needed.

4Q) Explain the architecture and working of a Generative Adversarial Network (GAN).
Describe the roles of the Generator and Discriminator, and how they interact during the
training process. Discuss the challenges and potential solutions in training GANs.
Ans.: Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow and his
colleagues in 2014. GANs are a class of neural networks that autonomously learn patterns in
the input data to generate new examples resembling the original dataset.
GAN’s architecture consists of two neural networks:
1. Generator: creates synthetic data from random noise to produce data so realistic that the
discriminator cannot distinguish it from real data.
2. Discriminator: acts as a critic, evaluating whether the data it receives is real or fake.
They use adversarial training to produce artificial data that is identical to actual data.

The two networks engage in a continuous game of cat and mouse: the Generator improves its
ability to create realistic data, while the Discriminator becomes better at detecting fakes. Over
time, this adversarial process leads to the generation of highly realistic and high-quality data.
Detailed Architecture of GANs
Let’s explore the generator and discriminator model of GANs in detail:
1. Generator Model
The generator is a deep neural network that takes random noise as input to generate realistic
data samples (e.g., images or text). It learns the underlying data distribution by adjusting its
parameters through backpropagation.
The generator’s objective is to produce samples that the discriminator classifies as real. The
loss function is:
JG=−1mΣi=1mlogD(G(zi))JG=−m1Σi=1mlogD(G(zi))
Where,
 JGJG measure how well the generator is fooling the discriminator.
 log D(G(zi))D(G(zi))represents log probability of the discriminator being correct for
generated samples.
 The generator aims to minimize this loss, encouraging the production of samples that the
discriminator classifies as real (logD(G(zi))(logD(G(zi)), close to 1.
2. Discriminator Model
The discriminator acts as a binary classifier, distinguishing between real and generated data.
It learns to improve its classification ability through training, refining its parameters to detect
fake samples more accurately.
When dealing with image data, the discriminator often employs convolutional layers or other
relevant architectures suited to the data type. These layers help extract features and enhance the
model’s ability to differentiate between real and generated samples.
The discriminator reduces the negative log likelihood of correctly classifying both produced
and real samples. This loss incentivizes the discriminator to accurately categorize generated
samples as fake and real samples with the following equation:
JD=−1mΣi=1mlogD(xi)–1mΣi=1mlog(1–D(G(zi))JD=−m1Σi=1mlogD(xi)–
m1Σi=1mlog(1–D(G(zi))
 JDJD assesses the discriminator’s ability to discern between produced and actual samples.
 The log likelihood that the discriminator will accurately categorize real data is represented
by logD(xi)logD(xi).
 The log chance that the discriminator would correctly categorize generated samples as fake
is represented by log⁡(1−D(G(zi)))log⁡(1−D(G(zi))).
By minimizing this loss, the discriminator becomes more effective at distinguishing between
real and generated samples.
MinMax Loss
GANs follow a minimax optimization where the generator and discriminator are adversaries:
minGmaxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]minG
maxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]
Where,
 G is generator network and is D is the discriminator network
 Actual data samples obtained from the true data distribution pdata(x)pdata(x) are
represented by x.
 Random noise sampled from a previous distribution pz(z)pz(z)(usually a normal or uniform
distribution) is represented by z.
 D(x) represents the discriminator’s likelihood of correctly identifying actual data as real.
 D(G(z)) is the likelihood that the discriminator will identify generated data coming from
the generator as authentic.
The generator aims to minimize the loss, while the discriminator tries to maximize its
classification accuracy.

How does a GAN work?


Let’s understand how the generator (G) and discriminator (D) complete to improve each other
over time:
1. Generator’s First Move
G takes a random noise vector as input. This noise vector contains random values and acts as
the starting point for G’s creation process. Using its internal layers and learned patterns, G
transforms the noise vector into a new data sample, like a generated image.
2. Discriminator’s Turn
D receives two kinds of inputs:
 Real data samples from the training dataset.
 The data samples generated by G in the previous step.
D’s job is to analyze each input and determine whether it’s real data or something G cooked
up. It outputs a probability score between 0 and 1. A score of 1 indicates the data is likely real,
and 0 suggests it’s fake.
3. Adversarial Learning
 If the discriminator correctly classifies real data as real and fake data as fake, it strengthens
its ability slightly.
 If the generator successfully fools the discriminator, it receives a positive update, while the
discriminator is penalized.
5. Generator’s Improvement
Every time the discriminator misclassifies fake data as real, the generator learns and improves.
Over multiple iterations, the generator produces more convincing synthetic samples.
6. Discriminator’s Adaptation
The discriminator continuously refines its ability to distinguish real from fake data. This
ongoing duel between the generator and discriminator enhances the overall model’s learning
process.
7. Training Progression
 As training continues, the generator becomes highly proficient at producing realistic data.
 Eventually, the discriminator struggles to distinguish real from fake, indicating that the
GAN has reached a well-trained state.
 At this point, the generator can be used to generate high-quality synthetic data for various
applications.
Types of GANs
1. Vanilla GAN
Vanilla GAN is the simplest type of GAN. It consists of:
 A generator and a discriminator, both are built using multi-layer perceptrons (MLPs).
 The model optimizes its mathematical formulation using stochastic gradient descent
(SGD).
While Vanilla GANs serve as the foundation for more advanced GAN models, they often
struggle with issues like mode collapse and unstable training.
2. Conditional GAN (CGAN)
Conditional GANs (CGANs) introduce an additional conditional parameter to guide the
generation process. Instead of generating data randomly, CGANs allow the model to produce
specific types of outputs.
Working of CGANs:
 A conditional variable (y) is fed into both the generator and the discriminator.
 This ensures that the generator creates data corresponding to the given condition (e.g.,
generating images of specific objects).
 The discriminator also receives the labels to help distinguish between real and fake data.
3. Deep Convolutional GAN (DCGAN)
Deep Convolutional GANs (DCGANs) are among the most popular and widely used types of
GANs, particularly for image generation.
What Makes DCGAN Special?
 Uses Convolutional Neural Networks (CNNs) instead of simple multi-layer perceptrons
(MLPs).
 Max pooling layers are replaced with convolutional stride, making the model more
efficient.
 Fully connected layers are removed, allowing for better spatial understanding of images.
DCGANs have been highly successful in generating high-quality images, making them a go-to
choice for deep learning researchers.
4. Laplacian Pyramid GAN (LAPGAN)
Laplacian Pyramid GAN (LAPGAN) is designed to generate ultra-high-quality images by
leveraging a multi-resolution approach.
Working of LAPGAN:
 Uses multiple generator-discriminator pairs at different levels of the Laplacian pyramid.
 Images are first downsampled at each layer of the pyramid and upscaled again using
Conditional GANs (CGANs).
 This process allows the image to gradually refine details, reducing noise and improving
clarity.
Due to its ability to generate highly detailed images, LAPGAN is considered a superior
approach for photorealistic image generation.
5. Super Resolution GAN (SRGAN)
Super-Resolution GAN (SRGAN) is specifically designed to increase the resolution of low-
quality images while preserving details.
Working of SRGAN:
 Uses a deep neural network combined with an adversarial loss function.
 Enhances low-resolution images by adding finer details, making them appear sharper and
more realistic.
 Helps reduce common image upscaling errors, such as blurriness and pixelation.
Implementation of Generative Adversarial Network (GAN) using PyTorch
Let’s explore the implementation of a Generative Adversarial Network (GAN). Our GAN will
be trained on the CIFAR-10 dataset to generate realistic images.
Step 1: Importing Required Libraries
First, we import the necessary libraries for building and training our GAN.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
The model will utilize a GPU if available; otherwise, it will default to CPU.
Step 2: Defining Image Transformations
We use PyTorch’s transforms to normalize and convert images into tensors before feeding
them into the model.
# Define a basic transform
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
Step 3: Loading the CIFAR-10 Dataset
The CIFAR-10 dataset is loaded with predefined transformations. A DataLoader is created to
process the dataset in mini-batches of 32 images, shuffled for randomness.
train_dataset = datasets.CIFAR10(root='./data',\
train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(train_dataset, \
batch_size=32, shuffle=True)
Step 4: Defining GAN Hyperparameters
Key hyperparameters are defined:
 latent_dim – Dimensionality of the noise vector.
 lr – Learning rate of the optimizer.
 beta1, beta2 – Adam optimizer coefficients.
 num_epochs – Total training iterations.
# Hyperparameters
latent_dim = 100
lr = 0.0002
beta1 = 0.5
beta2 = 0.999
num_epochs = 10
Step 5: Building the Generator
The generator takes a random latent vector (z) as input and transforms it into an image through
convolutional, batch normalization, and upsampling layers. The final output uses Tanh
activation to ensure pixel values are within the expected range.
# Define the generator
class Generator(nn.Module):
def __init__(self, latent_dim):
super(Generator, self).__init__()

self.model = nn.Sequential(
nn.Linear(latent_dim, 128 * 8 * 8),
nn.ReLU(),
nn.Unflatten(1, (128, 8, 8)),
nn.Upsample(scale_factor=2),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128, momentum=0.78),
nn.ReLU(),
nn.Upsample(scale_factor=2),
nn.Conv2d(128, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64, momentum=0.78),
nn.ReLU(),
nn.Conv2d(64, 3, kernel_size=3, padding=1),
nn.Tanh()
)

def forward(self, z):


img = self.model(z)
return img
Step 6: Building the Discriminator
The discriminator is a binary classifier that distinguishes between real and generated images. It
consists of convolutional layers, batch normalization, dropout, and LeakyReLU activation to
improve learning stability.
# Define the discriminator
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()

self.model = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.LeakyReLU(0.2),
nn.Dropout(0.25),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.ZeroPad2d((0, 1, 0, 1)),
nn.BatchNorm2d(64, momentum=0.82),
nn.LeakyReLU(0.25),
nn.Dropout(0.25),
nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(128, momentum=0.82),
nn.LeakyReLU(0.2),
nn.Dropout(0.25),
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(256, momentum=0.8),
nn.LeakyReLU(0.25),
nn.Dropout(0.25),
nn.Flatten(),
nn.Linear(256 * 5 * 5, 1),
nn.Sigmoid()
)

def forward(self, img):


validity = self.model(img)
return validity
Step 7: Initializing GAN Components
 Generator and Discriminator are initialized on the available device (GPU or CPU).
 Binary Cross-Entropy (BCE) Loss is chosen as the loss function.
 Adam optimizers are defined separately for the generator and discriminator with specified
learning rates and betas.
# Define the generator and discriminator
# Initialize generator and discriminator
generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)
# Loss function
adversarial_loss = nn.BCELoss()
# Optimizers
optimizer_G = optim.Adam(generator.parameters()\
, lr=lr, betas=(beta1, beta2))
optimizer_D = optim.Adam(discriminator.parameters()\
, lr=lr, betas=(beta1, beta2))
Step 8: Training the GAN
 The discriminator is trained to differentiate between real and fake images.
 The generator is trained to produce realistic images that fool the discriminator.
 The loss is backpropagated using Adam optimizers, and the model updates its parameters.
 Progress tracking: Loss values for both networks are printed, and generated images are
displayed every 10 epochs for visual inspection.
# Training loop
for epoch in range(num_epochs):
for i, batch in enumerate(dataloader):
# Convert list to tensor
real_images = batch[0].to(device)
# Adversarial ground truths
valid = torch.ones(real_images.size(0), 1, device=device)
fake = torch.zeros(real_images.size(0), 1, device=device)
# Configure input
real_images = real_images.to(device)

# ---------------------
# Train Discriminator
# ---------------------
optimizer_D.zero_grad()
# Sample noise as generator input
z = torch.randn(real_images.size(0), latent_dim, device=device)
# Generate a batch of images
fake_images = generator(z)

# Measure discriminator's ability


# to classify real and fake images
real_loss = adversarial_loss(discriminator\
(real_images), valid)
fake_loss = adversarial_loss(discriminator\
(fake_images.detach()), fake)
d_loss = (real_loss + fake_loss) / 2
# Backward pass and optimize
d_loss.backward()
optimizer_D.step()

# -----------------
# Train Generator
# -----------------

optimizer_G.zero_grad()
# Generate a batch of images
gen_images = generator(z)
# Adversarial loss
g_loss = adversarial_loss(discriminator(gen_images), valid)
# Backward pass and optimize
g_loss.backward()
optimizer_G.step()
# ---------------------
# Progress Monitoring
# ---------------------
if (i + 1) % 100 == 0:
print(
f"Epoch [{epoch+1}/{num_epochs}]\
Batch {i+1}/{len(dataloader)} "
f"Discriminator Loss: {d_loss.item():.4f} "
f"Generator Loss: {g_loss.item():.4f}"
)
# Save generated images for every epoch
if (epoch + 1) % 10 == 0:
with torch.no_grad():
z = torch.randn(16, latent_dim, device=device)
generated = generator(z).detach().cpu()
grid = torchvision.utils.make_grid(generated,\
nrow=4, normalize=True)
plt.imshow(np.transpose(grid, (1, 2, 0)))
plt.axis("off")
plt.show()
Output:
Epoch [10/10] Batch 1300/1563 Discriminator Loss: 0.4473 Generator Loss:
0.9555
Epoch [10/10] Batch 1400/1563 Discriminator Loss: 0.6643 Generator Loss:
1.0215
Epoch [10/10] Batch 1500/1563 Discriminator Loss: 0.4720 Generator Loss:
2.5027

You might also like