0% found this document useful (0 votes)

6 views

Assignment-6 STC-DL

The document discusses the architecture and components of Convolutional Neural Networks (CNNs), including convolutional, pooling, and fully connected layers, and their roles in image processing. It emphasizes the importance of data preprocessing techniques like normalization and data augmentation for improving model performance, as well as how to utilize AWS services like Amazon S3 and AWS SageMaker for efficient data handling and preprocessing. Additionally, it outlines various CNN models and their applications, advantages, and disadvantages.

Uploaded by

shirisha edikoju

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Assignment-6 STC-DL

Uploaded by

shirisha edikoju

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

DEEP LEARNING

ASSIGNMENT-6
1Q) Explain the architecture of a simple Convolutional Neural Network (CNN) model.
Describe the roles of the convolutional layer, pooling layer, and fully connected layer in the
model. Additionally, discuss how you would use AWS SageMaker to build, train, and deploy
this model.
Ans.: Convolutional Neural Networks (CNNs) are a specialized class of neural networks
designed to process grid-like data, such as images. They are particularly well-suited for image
recognition and processing tasks.
They are inspired by the visual processing mechanisms in the human brain, CNNs excel at
capturing hierarchical patterns and spatial dependencies within images.
Key Components of a Convolutional Neural Network
1. Convolutional Layers: These layers apply convolutional operations to input images, using
filters (also known as kernels) to detect features such as edges, textures, and more complex
patterns. Convolutional operations help preserve the spatial relationships between pixels.
2. Pooling Layers: They downsample the spatial dimensions of the input, reducing the
computational complexity and the number of parameters in the network. Max pooling is a
common pooling operation, selecting the maximum value from a group of neighboring
pixels.
3. Activation Functions: They introduce non-linearity to the model, allowing it to learn more
complex relationships in the data.
4. Fully Connected Layers: These layers are responsible for making predictions based on the
high-level features learned by the previous layers. They connect every neuron in one layer
to every neuron in the next layer.
How CNNs Work?
1. Input Image: The CNN receives an input image, which is typically preprocessed to ensure
uniformity in size and format.
2. Convolutional Layers: Filters are applied to the input image to extract features like edges,
textures, and shapes.
3. Pooling Layers: The feature maps generated by the convolutional layers are downsampled
to reduce dimensionality.
4. Fully Connected Layers: The downsampled feature maps are passed through fully
connected layers to produce the final output, such as a classification label.
5. Output: The CNN outputs a prediction, such as the class of the image.
Convolutional Neural Network Training
CNNs are trained using a supervised learning approach. This means that the CNN is given a set
of labeled training images. The CNN then learns to map the input images to their correct
labels.
The training process for a CNN involves the following steps:
1. Data Preparation: The training images are preprocessed to ensure that they are all in the
same format and size.
2. Loss Function: A loss function is used to measure how well the CNN is performing on the
training data. The loss function is typically calculated by taking the difference between the
predicted labels and the actual labels of the training images.
3. Optimizer: An optimizer is used to update the weights of the CNN in order to minimize
the loss function.
4. Backpropagation: Backpropagation is a technique used to calculate the gradients of the
loss function with respect to the weights of the CNN. The gradients are then used to update
the weights of the CNN using the optimizer.
CNN Evaluation
After training, CNN can be evaluated on a held-out test set. A collection of pictures that the
CNN has not seen during training makes up the test set. How well the CNN performs on the
test set is a good predictor of how well it will function on actual data.
The efficiency of a CNN on picture categorization tasks can be evaluated using a variety of
criteria. Among the most popular metrics are:
 Accuracy: Accuracy is the percentage of test images that the CNN correctly classifies.
 Precision: Precision is the percentage of test images that the CNN predicts as a particular
class and that are actually of that class.
 Recall: Recall is the percentage of test images that are of a particular class and that the
CNN predicts as that class.
 F1 Score: The F1 Score is a harmonic mean of precision and recall. It is a good metric for
evaluating the performance of a CNN on classes that are imbalanced.
Different Types of CNN Models
1.LeNet
LeNet, developed by Yann LeCun and his colleagues in the late 1990s, was one of the first
successful CNNs designed for handwritten digit recognition. It laid the foundation for modern
CNNs and achieved high accuracy on the MNIST dataset, which contains 70,000 images of
handwritten digits (0-9).
2.AlexNet
AlexNet is a CNN architecture that was developed by Alex Krizhevsky, Ilya Sutskever, and
Geoffrey Hinton in 2012. It was the first CNN to win the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC), a major image recognition competition, and it helped to
establish CNNs as a powerful tool for image recognition.
AlexNet consists of several layers of convolutional and pooling layers, followed by fully
connected layers. The architecture includes five convolutional layers, three pooling layers, and
three fully connected layers.
3. Resnet
ResNets (Residual Networks) are designed for image recognition and processing tasks. They
are renowned for their ability to train very deep networks without overfitting, making them
highly effective for complex tasks.
It introduces skip connections that allow the network to learn residual functions making it
easier to train deep architecture.
4.GoogleNet
GoogleNet, also known as InceptionNet, is renowned for achieving high accuracy in image
classification while using fewer parameters and computational resources compared to other
state-of-the-art CNNs.
The core component of GoogleNet, Inception modules allow the network to learn features at
different scales simultaneously, enhancing performance.
5. VGG
VGGs are developed by the Visual Geometry Group at Oxford, it uses small 3×3 convolutional
filters stacked in multiple layers, creating a deep and uniform structure. Popular variants
like VGG-16 and VGG-19 achieved state-of-the-art performance on the ImageNet dataset,
demonstrating the power of depth in CNNs.
Applications of CNN
 Image classification: CNNs are the state-of-the-art models for image classification. They
can be used to classify images into different categories, such as cats and dogs, cars and
trucks, and flowers and animals.
 Object detection: CNNs can be used to detect objects in images, such as people, cars, and
buildings. They can also be used to localize objects in images, which means that they can
identify the location of an object in an image.
 Image segmentation: CNNs can be used to segment images, which means that they can
identify and label different objects in an image. This is useful for applications such as
medical imaging and robotics.
 Video analysis: CNNs can be used to analyze videos, such as tracking objects in a video or
detecting events in a video. This is useful for applications such as video surveillance and
traffic monitoring.
Advantages of CNN
 High Accuracy: CNNs achieve state-of-the-art accuracy in various image recognition
tasks.
 Efficiency: CNNs are efficient, especially when implemented on GPUs.
 Robustness: CNNs are robust to noise and variations in input data.
 Adaptability: CNNs can be adapted to different tasks by modifying their architecture.
Disadvantages of CNN
 Complexity: CNNs can be complex and difficult to train, especially for large datasets.
 Resource-Intensive: CNNs require significant computational resources for training and
deployment.
 Data Requirements: CNNs need large amounts of labeled data for training.
 Interpretability: CNNs can be difficult to interpret, making it challenging to understand
their predictions.

2Q) Discuss the importance of data preprocessing in training a CNN model. How do
techniques such as normalization and data augmentation contribute to the performance of
the model?

Ans.: Data preprocessing is a crucial step in training a Convolutional Neural Network (CNN)
model. It helps to improve the quality of the input data and ensures the network learns
efficiently, leading to better generalization on unseen data. Here's a breakdown of why
preprocessing is important and how specific techniques like normalization and data
augmentation contribute to CNN performance:

1. Normalization

Normalization is the process of scaling input data into a range that is more conducive to model
training, often to a [0, 1] or [-1, 1] range. CNNs typically perform better when input values are
normalized because:

 Faster convergence: When data features are on similar scales, the gradient descent algorithm
(used for training) converges faster. Without normalization, certain features could dominate
during training, leading to slow or unstable convergence.
 Improved stability: Neural networks are sensitive to the scale of input values. By normalizing,
we ensure that the model doesn't get "stuck" in certain regions of the loss function due to high
values or large disparities among features.
 Consistent weight updates: CNNs rely on backpropagation, which calculates gradients for
weight updates. If features are on vastly different scales, it can lead to inconsistent gradients,
causing issues in learning. Normalization makes sure the gradients are more balanced.

Common normalization techniques include min-max scaling (scaling to [0, 1]) and Z-score
normalization (scaling based on mean and standard deviation).

2. Data Augmentation
Data augmentation involves artificially increasing the size and diversity of the training dataset by
applying various transformations to the existing data, such as rotations, translations, flips, and
color adjustments. This technique helps in several ways:

 Prevents overfitting: CNNs, especially deep ones, have a large number of parameters,
which makes them prone to overfitting. By augmenting the dataset, we introduce
variations in the training data, making it harder for the model to memorize specific details
of the training set, thus improving generalization.
 Simulates real-world variability: Data augmentation simulates real-world variations
that a model might encounter in production. For instance, images could be captured from
different angles or lighting conditions. By training on a variety of augmented images, the
CNN becomes more robust to such variations.
 Enhances model robustness: Augmentation helps the model learn features that are
invariant to certain transformations, such as recognizing an object even when it's rotated
or scaled. This makes the model more adaptable to a wide range of real-world conditions.

Common data augmentation techniques include:

 Rotation: Rotating images by a certain angle.

 Flipping: Horizontal or vertical flips.
 Shifting: Translating images by a few pixels in different directions.
 Zooming: Zooming in or out within the image.
 Brightness/Contrast adjustments: Altering the lighting and contrast of the image to mimic
different environments.
 Cropping: Randomly cropping portions of the image.

3. Other Preprocessing Techniques

In addition to normalization and augmentation, there are other preprocessing techniques that can
enhance the performance of a CNN:

 Resizing: CNNs generally require input images of a fixed size. Resizing images to the required
dimensions ensures that the network can process them consistently.
 Grayscale conversion: For simpler tasks, like digit recognition, converting color images to
grayscale can reduce the complexity and speed up processing while retaining essential features.
 Noise reduction: Removing noise from images can help CNNs focus on relevant patterns and
avoid learning unnecessary details.

Overall Impact on Model Performance

 Efficiency: With preprocessing, especially normalization and data augmentation, the model
learns faster, converges more quickly, and requires fewer epochs to achieve good performance.
 Generalization: Both normalization and augmentation improve the model's ability to generalize
to unseen data, which is critical for real-world applications.
 Robustness: These techniques help the CNN adapt to a wider variety of inputs and perform
better in diverse situations, making the model more reliable.
3Q) Explain how you would handle data preprocessing and augmentation using AWS
services like Amazon S3

Ans.: Handling data preprocessing and augmentation using AWS services like Amazon S3 can
be a seamless and efficient process due to AWS's robust cloud infrastructure and specialized
machine learning services. Below is a step-by-step explanation of how you could leverage AWS
to preprocess and augment data for training a CNN model:

1. Storing Data on Amazon S3

Amazon S3 (Simple Storage Service) is widely used to store large amounts of unstructured data,
such as images, videos, or text files. Here's how you would store and manage your data:

 Data Ingestion: Upload the raw dataset to an S3 bucket. You can use the AWS Management
Console, AWS CLI, or SDKs to upload large batches of data. For large datasets, you may want to
use AWS S3 Transfer Acceleration to speed up uploads.
 Bucket Organization: Organize the data in S3 with clear folder structures (e.g., /train/,
/validation/, /test/, /raw/, /processed/). This way, it’s easier to manage and reference
the data during the preprocessing steps.

Example:

aws s3 cp dataset.zip s3://your-bucket-name/raw/dataset.zip

2. Data Preprocessing and Augmentation Pipeline with AWS Lambda and AWS
SageMaker

Once the data is in S3, you can use various AWS services to handle preprocessing and
augmentation. Two popular services are AWS Lambda (for serverless data transformations) and
AWS SageMaker (for scalable model training and preprocessing).

Using AWS Lambda for Preprocessing:

AWS Lambda allows you to run code without managing servers, which is ideal for lightweight
preprocessing tasks such as image resizing, format conversion, and normalization. You can
create a Lambda function that is triggered by events in S3 (e.g., new files uploaded).

Steps:

1. Set Up Lambda: Create a Lambda function that processes images, such as resizing,
cropping, or normalizing. Lambda supports various programming languages (e.g.,
Python, Node.js).

Example Python code to resize images:

from PIL import Image

import boto3
import os

s3_client = boto3.client('s3')

def lambda_handler(event, context):

bucket_name = event['Records'][0]['s3']['bucket']['name']
file_key = event['Records'][0]['s3']['object']['key']

# Download image from S3

local_file_path = '/tmp/{}'.format(file_key)
s3_client.download_file(bucket_name, file_key, local_file_path)

# Preprocess image (e.g., resizing)

with Image.open(local_file_path) as img:
img = img.resize((224, 224)) # Resize image to 224x224
img.save(local_file_path)

# Upload the processed image back to S3

processed_key = 'processed/{}'.format(file_key)
s3_client.upload_file(local_file_path, bucket_name, processed_key)

return {'statusCode': 200, 'body': 'Processing complete'}

2. Trigger Lambda on S3 Event: Set up an S3 event notification to trigger the Lambda

function when new data is uploaded to the raw dataset folder in S3.

Example S3 event notification:

o Event type: s3:ObjectCreated:Put

o Prefix: /raw/
o Destination: Lambda function

Once the Lambda function processes the image, it stores the preprocessed image back in the S3
bucket (in the /processed/ folder).

Using AWS SageMaker for More Advanced Preprocessing and Augmentation:

For more complex preprocessing tasks, such as applying data augmentation techniques (rotation,
flipping, color adjustments, etc.), you can use AWS SageMaker. SageMaker offers a managed
environment for training machine learning models and running preprocessing pipelines.

1. Create a SageMaker Processing Job: SageMaker allows you to define preprocessing

steps in a containerized environment. You can use SageMaker’s built-in Python SDK to
create a processing job that reads images from S3, applies augmentation, and stores the
output back in S3.

Example SageMaker Processing Script for Augmentation:

from sagemaker.processing import ScriptProcessor

from sagemaker.inputs import ProcessingInput, ProcessingOutput
import os

processor = ScriptProcessor(
image_uri='your-docker-image-uri',
command=['python3'],
role='your-iam-role',
instance_count=1,
instance_type='ml.m5.large'
)

processor.run(
code='augmentation_script.py',
inputs=[ProcessingInput(source='s3://your-bucket-name/raw/',
destination='/opt/ml/processing/input')],
outputs=[ProcessingOutput(source='/opt/ml/processing/output',
destination='s3://your-bucket-name/processed/')]
)

2. Data Augmentation Techniques: The script (augmentation_script.py) can apply

various image transformations like rotations, flips, zooms, etc. Using libraries like
TensorFlow, Keras, or OpenCV, you can implement augmentation logic.

Example Augmentation Code:

import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
rotation_range=30,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
fill_mode='nearest'
)

# Apply augmentation to images

for image in image_list:
augmented_image = datagen.random_transform(image)
# Save augmented image to S3 or local storage

3. Store Augmented Data: After augmentation, the output images can be saved back to the
processed folder in S3, which is then ready for model training.

3. Training the Model on Amazon SageMaker

Once the data is preprocessed and augmented, you can directly use Amazon SageMaker for
training the CNN model. SageMaker offers managed Jupyter notebooks, GPU instances, and pre-
built machine learning containers for easy training. You can also create a training pipeline to
automate the entire process.
Steps:

1. Prepare the Data: Specify the S3 location of the processed and augmented images.
2. Configure the Model: Use a built-in TensorFlow, PyTorch, or MXNet container, or bring your
own custom container.
3. Train the Model: Train the model using SageMaker’s managed training environment, where you
can scale based on your requirements (e.g., using GPU instances for faster training).

4. Monitoring and Scaling with AWS

To ensure everything runs smoothly, you can monitor your processing and training jobs using
Amazon CloudWatch. For large-scale data processing and training, you can leverage AWS
Auto Scaling to dynamically adjust resources as needed.

4Q) Explain the architecture and working of a Generative Adversarial Network (GAN).
Describe the roles of the Generator and Discriminator, and how they interact during the
training process. Discuss the challenges and potential solutions in training GANs.
Ans.: Generative Adversarial Networks (GANs) were introduced by Ian Goodfellow and his
colleagues in 2014. GANs are a class of neural networks that autonomously learn patterns in
the input data to generate new examples resembling the original dataset.
GAN’s architecture consists of two neural networks:
1. Generator: creates synthetic data from random noise to produce data so realistic that the
discriminator cannot distinguish it from real data.
2. Discriminator: acts as a critic, evaluating whether the data it receives is real or fake.
They use adversarial training to produce artificial data that is identical to actual data.

The two networks engage in a continuous game of cat and mouse: the Generator improves its
ability to create realistic data, while the Discriminator becomes better at detecting fakes. Over
time, this adversarial process leads to the generation of highly realistic and high-quality data.
Detailed Architecture of GANs
Let’s explore the generator and discriminator model of GANs in detail:
1. Generator Model
The generator is a deep neural network that takes random noise as input to generate realistic
data samples (e.g., images or text). It learns the underlying data distribution by adjusting its
parameters through backpropagation.
The generator’s objective is to produce samples that the discriminator classifies as real. The
loss function is:
JG=−1mΣi=1mlogD(G(zi))JG=−m1Σi=1mlogD(G(zi))
Where,
 JGJG measure how well the generator is fooling the discriminator.
 log D(G(zi))D(G(zi))represents log probability of the discriminator being correct for
generated samples.
 The generator aims to minimize this loss, encouraging the production of samples that the
discriminator classifies as real (logD(G(zi))(logD(G(zi)), close to 1.
2. Discriminator Model
The discriminator acts as a binary classifier, distinguishing between real and generated data.
It learns to improve its classification ability through training, refining its parameters to detect
fake samples more accurately.
When dealing with image data, the discriminator often employs convolutional layers or other
relevant architectures suited to the data type. These layers help extract features and enhance the
model’s ability to differentiate between real and generated samples.
The discriminator reduces the negative log likelihood of correctly classifying both produced
and real samples. This loss incentivizes the discriminator to accurately categorize generated
samples as fake and real samples with the following equation:
JD=−1mΣi=1mlogD(xi)–1mΣi=1mlog(1–D(G(zi))JD=−m1Σi=1mlogD(xi)–
m1Σi=1mlog(1–D(G(zi))
 JDJD assesses the discriminator’s ability to discern between produced and actual samples.
 The log likelihood that the discriminator will accurately categorize real data is represented
by logD(xi)logD(xi).
 The log chance that the discriminator would correctly categorize generated samples as fake
is represented by log⁡(1−D(G(zi)))log⁡(1−D(G(zi))).
By minimizing this loss, the discriminator becomes more effective at distinguishing between
real and generated samples.
MinMax Loss
GANs follow a minimax optimization where the generator and discriminator are adversaries:
minGmaxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]minG
maxD(G,D)=[Ex∼pdata[logD(x)]+Ez∼pz(z)[log(1–D(g(z)))]
Where,
 G is generator network and is D is the discriminator network
 Actual data samples obtained from the true data distribution pdata(x)pdata(x) are
represented by x.
 Random noise sampled from a previous distribution pz(z)pz(z)(usually a normal or uniform
distribution) is represented by z.
 D(x) represents the discriminator’s likelihood of correctly identifying actual data as real.
 D(G(z)) is the likelihood that the discriminator will identify generated data coming from
the generator as authentic.
The generator aims to minimize the loss, while the discriminator tries to maximize its
classification accuracy.

How does a GAN work?

Let’s understand how the generator (G) and discriminator (D) complete to improve each other
over time:
1. Generator’s First Move
G takes a random noise vector as input. This noise vector contains random values and acts as
the starting point for G’s creation process. Using its internal layers and learned patterns, G
transforms the noise vector into a new data sample, like a generated image.
2. Discriminator’s Turn
D receives two kinds of inputs:
 Real data samples from the training dataset.
 The data samples generated by G in the previous step.
D’s job is to analyze each input and determine whether it’s real data or something G cooked
up. It outputs a probability score between 0 and 1. A score of 1 indicates the data is likely real,
and 0 suggests it’s fake.
3. Adversarial Learning
 If the discriminator correctly classifies real data as real and fake data as fake, it strengthens
its ability slightly.
 If the generator successfully fools the discriminator, it receives a positive update, while the
discriminator is penalized.
5. Generator’s Improvement
Every time the discriminator misclassifies fake data as real, the generator learns and improves.
Over multiple iterations, the generator produces more convincing synthetic samples.
6. Discriminator’s Adaptation
The discriminator continuously refines its ability to distinguish real from fake data. This
ongoing duel between the generator and discriminator enhances the overall model’s learning
process.
7. Training Progression
 As training continues, the generator becomes highly proficient at producing realistic data.
 Eventually, the discriminator struggles to distinguish real from fake, indicating that the
GAN has reached a well-trained state.
 At this point, the generator can be used to generate high-quality synthetic data for various
applications.
Types of GANs
1. Vanilla GAN
Vanilla GAN is the simplest type of GAN. It consists of:
 A generator and a discriminator, both are built using multi-layer perceptrons (MLPs).
 The model optimizes its mathematical formulation using stochastic gradient descent
(SGD).
While Vanilla GANs serve as the foundation for more advanced GAN models, they often
struggle with issues like mode collapse and unstable training.
2. Conditional GAN (CGAN)
Conditional GANs (CGANs) introduce an additional conditional parameter to guide the
generation process. Instead of generating data randomly, CGANs allow the model to produce
specific types of outputs.
Working of CGANs:
 A conditional variable (y) is fed into both the generator and the discriminator.
 This ensures that the generator creates data corresponding to the given condition (e.g.,
generating images of specific objects).
 The discriminator also receives the labels to help distinguish between real and fake data.
3. Deep Convolutional GAN (DCGAN)
Deep Convolutional GANs (DCGANs) are among the most popular and widely used types of
GANs, particularly for image generation.
What Makes DCGAN Special?
 Uses Convolutional Neural Networks (CNNs) instead of simple multi-layer perceptrons
(MLPs).
 Max pooling layers are replaced with convolutional stride, making the model more
efficient.
 Fully connected layers are removed, allowing for better spatial understanding of images.
DCGANs have been highly successful in generating high-quality images, making them a go-to
choice for deep learning researchers.
4. Laplacian Pyramid GAN (LAPGAN)
Laplacian Pyramid GAN (LAPGAN) is designed to generate ultra-high-quality images by
leveraging a multi-resolution approach.
Working of LAPGAN:
 Uses multiple generator-discriminator pairs at different levels of the Laplacian pyramid.
 Images are first downsampled at each layer of the pyramid and upscaled again using
Conditional GANs (CGANs).
 This process allows the image to gradually refine details, reducing noise and improving
clarity.
Due to its ability to generate highly detailed images, LAPGAN is considered a superior
approach for photorealistic image generation.
5. Super Resolution GAN (SRGAN)
Super-Resolution GAN (SRGAN) is specifically designed to increase the resolution of low-
quality images while preserving details.
Working of SRGAN:
 Uses a deep neural network combined with an adversarial loss function.
 Enhances low-resolution images by adding finer details, making them appear sharper and
more realistic.
 Helps reduce common image upscaling errors, such as blurriness and pixelation.
Implementation of Generative Adversarial Network (GAN) using PyTorch
Let’s explore the implementation of a Generative Adversarial Network (GAN). Our GAN will
be trained on the CIFAR-10 dataset to generate realistic images.
Step 1: Importing Required Libraries
First, we import the necessary libraries for building and training our GAN.
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
from torchvision import datasets, transforms
import matplotlib.pyplot as plt
import numpy as np

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
The model will utilize a GPU if available; otherwise, it will default to CPU.
Step 2: Defining Image Transformations
We use PyTorch’s transforms to normalize and convert images into tensors before feeding
them into the model.
# Define a basic transform
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
Step 3: Loading the CIFAR-10 Dataset
The CIFAR-10 dataset is loaded with predefined transformations. A DataLoader is created to
process the dataset in mini-batches of 32 images, shuffled for randomness.
train_dataset = datasets.CIFAR10(root='./data',\
train=True, download=True, transform=transform)
dataloader = torch.utils.data.DataLoader(train_dataset, \
batch_size=32, shuffle=True)
Step 4: Defining GAN Hyperparameters
Key hyperparameters are defined:
 latent_dim – Dimensionality of the noise vector.
 lr – Learning rate of the optimizer.
 beta1, beta2 – Adam optimizer coefficients.
 num_epochs – Total training iterations.
# Hyperparameters
latent_dim = 100
lr = 0.0002
beta1 = 0.5
beta2 = 0.999
num_epochs = 10
Step 5: Building the Generator
The generator takes a random latent vector (z) as input and transforms it into an image through
convolutional, batch normalization, and upsampling layers. The final output uses Tanh
activation to ensure pixel values are within the expected range.
# Define the generator
class Generator(nn.Module):
def __init__(self, latent_dim):
super(Generator, self).__init__()

self.model = nn.Sequential(
nn.Linear(latent_dim, 128 * 8 * 8),
nn.ReLU(),
nn.Unflatten(1, (128, 8, 8)),
nn.Upsample(scale_factor=2),
nn.Conv2d(128, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128, momentum=0.78),
nn.ReLU(),
nn.Upsample(scale_factor=2),
nn.Conv2d(128, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64, momentum=0.78),
nn.ReLU(),
nn.Conv2d(64, 3, kernel_size=3, padding=1),
nn.Tanh()
)

def forward(self, z):

img = self.model(z)
return img
Step 6: Building the Discriminator
The discriminator is a binary classifier that distinguishes between real and generated images. It
consists of convolutional layers, batch normalization, dropout, and LeakyReLU activation to
improve learning stability.
# Define the discriminator
class Discriminator(nn.Module):
def __init__(self):
super(Discriminator, self).__init__()

self.model = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.LeakyReLU(0.2),
nn.Dropout(0.25),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.ZeroPad2d((0, 1, 0, 1)),
nn.BatchNorm2d(64, momentum=0.82),
nn.LeakyReLU(0.25),
nn.Dropout(0.25),
nn.Conv2d(64, 128, kernel_size=3, stride=2, padding=1),
nn.BatchNorm2d(128, momentum=0.82),
nn.LeakyReLU(0.2),
nn.Dropout(0.25),
nn.Conv2d(128, 256, kernel_size=3, stride=1, padding=1),
nn.BatchNorm2d(256, momentum=0.8),
nn.LeakyReLU(0.25),
nn.Dropout(0.25),
nn.Flatten(),
nn.Linear(256 * 5 * 5, 1),
nn.Sigmoid()
)

def forward(self, img):

validity = self.model(img)
return validity
Step 7: Initializing GAN Components
 Generator and Discriminator are initialized on the available device (GPU or CPU).
 Binary Cross-Entropy (BCE) Loss is chosen as the loss function.
 Adam optimizers are defined separately for the generator and discriminator with specified
learning rates and betas.
# Define the generator and discriminator
# Initialize generator and discriminator
generator = Generator(latent_dim).to(device)
discriminator = Discriminator().to(device)
# Loss function
adversarial_loss = nn.BCELoss()
# Optimizers
optimizer_G = optim.Adam(generator.parameters()\
, lr=lr, betas=(beta1, beta2))
optimizer_D = optim.Adam(discriminator.parameters()\
, lr=lr, betas=(beta1, beta2))
Step 8: Training the GAN
 The discriminator is trained to differentiate between real and fake images.
 The generator is trained to produce realistic images that fool the discriminator.
 The loss is backpropagated using Adam optimizers, and the model updates its parameters.
 Progress tracking: Loss values for both networks are printed, and generated images are
displayed every 10 epochs for visual inspection.
# Training loop
for epoch in range(num_epochs):
for i, batch in enumerate(dataloader):
# Convert list to tensor
real_images = batch[0].to(device)
# Adversarial ground truths
valid = torch.ones(real_images.size(0), 1, device=device)
fake = torch.zeros(real_images.size(0), 1, device=device)
# Configure input
real_images = real_images.to(device)

# ---------------------
# Train Discriminator
# ---------------------
optimizer_D.zero_grad()
# Sample noise as generator input
z = torch.randn(real_images.size(0), latent_dim, device=device)
# Generate a batch of images
fake_images = generator(z)

# Measure discriminator's ability

# to classify real and fake images
real_loss = adversarial_loss(discriminator\
(real_images), valid)
fake_loss = adversarial_loss(discriminator\
(fake_images.detach()), fake)
d_loss = (real_loss + fake_loss) / 2
# Backward pass and optimize
d_loss.backward()
optimizer_D.step()

# -----------------
# Train Generator
# -----------------

optimizer_G.zero_grad()
# Generate a batch of images
gen_images = generator(z)
# Adversarial loss
g_loss = adversarial_loss(discriminator(gen_images), valid)
# Backward pass and optimize
g_loss.backward()
optimizer_G.step()
# ---------------------
# Progress Monitoring
# ---------------------
if (i + 1) % 100 == 0:
print(
f"Epoch [{epoch+1}/{num_epochs}]\
Batch {i+1}/{len(dataloader)} "
f"Discriminator Loss: {d_loss.item():.4f} "
f"Generator Loss: {g_loss.item():.4f}"
)
# Save generated images for every epoch
if (epoch + 1) % 10 == 0:
with torch.no_grad():
z = torch.randn(16, latent_dim, device=device)
generated = generator(z).detach().cpu()
grid = torchvision.utils.make_grid(generated,\
nrow=4, normalize=True)
plt.imshow(np.transpose(grid, (1, 2, 0)))
plt.axis("off")
plt.show()
Output:
Epoch [10/10] Batch 1300/1563 Discriminator Loss: 0.4473 Generator Loss:
0.9555
Epoch [10/10] Batch 1400/1563 Discriminator Loss: 0.6643 Generator Loss:
1.0215
Epoch [10/10] Batch 1500/1563 Discriminator Loss: 0.4720 Generator Loss:
2.5027

Reviewer - Convolutional Neural Networks (CNNs) - Muqaddas Bin Tahir
No ratings yet
Reviewer - Convolutional Neural Networks (CNNs) - Muqaddas Bin Tahir
8 pages
DLP
No ratings yet
DLP
50 pages
CO2_CNN_3
No ratings yet
CO2_CNN_3
31 pages
UNIT 2 Self Notes
No ratings yet
UNIT 2 Self Notes
10 pages
DL_Unit3_1 (1)
No ratings yet
DL_Unit3_1 (1)
67 pages
CNN
No ratings yet
CNN
5 pages
Guddu jha_organized
No ratings yet
Guddu jha_organized
3 pages
Advancements in Image Classification Using Convolutional Neural Network
No ratings yet
Advancements in Image Classification Using Convolutional Neural Network
8 pages
Convolutional Neural Network (CNN) : Assignment On
No ratings yet
Convolutional Neural Network (CNN) : Assignment On
8 pages
Convolution Neural Network
No ratings yet
Convolution Neural Network
74 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
15 pages
Cnn
No ratings yet
Cnn
9 pages
Deep Learning Approach For Object Detection Using CNN: Abstract
No ratings yet
Deep Learning Approach For Object Detection Using CNN: Abstract
7 pages
UNIT-III DeepLearning Notes
No ratings yet
UNIT-III DeepLearning Notes
30 pages
CV Unit V
No ratings yet
CV Unit V
18 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
8 pages
Report23 24
No ratings yet
Report23 24
55 pages
CNN Project
No ratings yet
CNN Project
16 pages
CNN Model Introduction and Overview
No ratings yet
CNN Model Introduction and Overview
2 pages
DL 4
No ratings yet
DL 4
4 pages
Assignment 5_ _Implementing Image Classification using Deep Learning
No ratings yet
Assignment 5_ _Implementing Image Classification using Deep Learning
8 pages
UNIT - 2
No ratings yet
UNIT - 2
31 pages
PEC CS 802C Deep Learning
No ratings yet
PEC CS 802C Deep Learning
13 pages
UNIT-2 DL
No ratings yet
UNIT-2 DL
51 pages
UNIT -4 DL
No ratings yet
UNIT -4 DL
19 pages
Cv Ppt Mt101
No ratings yet
Cv Ppt Mt101
16 pages
DL UNIT 3
No ratings yet
DL UNIT 3
27 pages
An Introduction To Convolutional Neural Networks: November 2015
No ratings yet
An Introduction To Convolutional Neural Networks: November 2015
12 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
Deep Learning Unit 5
No ratings yet
Deep Learning Unit 5
23 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
ml2
No ratings yet
ml2
70 pages
DL CNN
No ratings yet
DL CNN
7 pages
4a Convolutional Neural Networks
No ratings yet
4a Convolutional Neural Networks
56 pages
5-Convolutional Neural Network
No ratings yet
5-Convolutional Neural Network
43 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
MLP and CNN
No ratings yet
MLP and CNN
56 pages
AD3501-DL-UNIT 2 NOTES
No ratings yet
AD3501-DL-UNIT 2 NOTES
29 pages
Sommaire CNN Presentation
No ratings yet
Sommaire CNN Presentation
10 pages
deep learning u3
No ratings yet
deep learning u3
3 pages
Unit 2 QUESTIONS and ANSWERS
No ratings yet
Unit 2 QUESTIONS and ANSWERS
26 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
21 pages
Nria20-Dl - Unit-3 Notes-Final
No ratings yet
Nria20-Dl - Unit-3 Notes-Final
23 pages
anthony
No ratings yet
anthony
33 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
CNN Eem305
100% (1)
CNN Eem305
7 pages
Unit 3
No ratings yet
Unit 3
105 pages
Image Classification Using Small Convolutional Neural Network
No ratings yet
Image Classification Using Small Convolutional Neural Network
5 pages
Convolutional Neural Networks: Computer Vision
No ratings yet
Convolutional Neural Networks: Computer Vision
14 pages
ENG6500 8 DL IntroductionToDeepLearning Part2
No ratings yet
ENG6500 8 DL IntroductionToDeepLearning Part2
65 pages
CNN Pretrained Models Presentation
No ratings yet
CNN Pretrained Models Presentation
11 pages
Step by Step Procedure That How I Resolve Given Task Pytorh
No ratings yet
Step by Step Procedure That How I Resolve Given Task Pytorh
6 pages
Ch-3 Convolutional Neural Networks (CNNs)
No ratings yet
Ch-3 Convolutional Neural Networks (CNNs)
11 pages
UNIT_IV_DL
No ratings yet
UNIT_IV_DL
26 pages
What is a CNN
No ratings yet
What is a CNN
46 pages
Visual and Audio Signal Processing Lab University of Wollongong
No ratings yet
Visual and Audio Signal Processing Lab University of Wollongong
20 pages
7 Applications of Convolutional Neural Networks - FWS
No ratings yet
7 Applications of Convolutional Neural Networks - FWS
3 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
AI for Everyone: An Intermediate Guide to Artificial Intelligence
From Everand
AI for Everyone: An Intermediate Guide to Artificial Intelligence
Nova Clarke
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
A Review on Traditional Machine Learning and Deep Learning Models for WBCs Classification in Blood Smear Images
No ratings yet
A Review on Traditional Machine Learning and Deep Learning Models for WBCs Classification in Blood Smear Images
17 pages
1-s2.0-S2352914824000091-main
No ratings yet
1-s2.0-S2352914824000091-main
16 pages
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT
No ratings yet
A CASE STUDY ON ONLINE TICKET BOOKING SYSTEM PROJECT
17 pages
ViT-Based_Multi-Scale_Classification_Using_Digital_Signal_Processing_and_Image_Transformation
No ratings yet
ViT-Based_Multi-Scale_Classification_Using_Digital_Signal_Processing_and_Image_Transformation
14 pages
1-s2.0-S266591742400196X-main
No ratings yet
1-s2.0-S266591742400196X-main
8 pages
STC-Deep_Learning_Techniques_Brochure_(1)
No ratings yet
STC-Deep_Learning_Techniques_Brochure_(1)
2 pages
1-s2.0-S2542660524003974-main
No ratings yet
1-s2.0-S2542660524003974-main
25 pages
1-s2.0-S266615432400245X-main
No ratings yet
1-s2.0-S266615432400245X-main
12 pages
1-s2.0-S2665917423002416-main
No ratings yet
1-s2.0-S2665917423002416-main
13 pages
1-s2.0-S2665917424002733-main
No ratings yet
1-s2.0-S2665917424002733-main
9 pages
Biomimetics 08 0056
No ratings yet
Biomimetics 08 0056
13 pages
Crops Leaf Disease Recognition From Digital and RS Imaging Using Fusion of Multi Self-Attention RBNet Deep Architectures and Modified Dragonfly Optimization
No ratings yet
Crops Leaf Disease Recognition From Digital and RS Imaging Using Fusion of Multi Self-Attention RBNet Deep Architectures and Modified Dragonfly Optimization
18 pages
Flexi-EIT A Flexible and Reconfigurable Active Electrode Electrical Impedance Tomography System
No ratings yet
Flexi-EIT A Flexible and Reconfigurable Active Electrode Electrical Impedance Tomography System
11 pages
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching For ROI of Medical Image
No ratings yet
Reversible Data Hiding-Based Contrast Enhancement With Multi-Group Stretching For ROI of Medical Image
15 pages
Machine Learning
No ratings yet
Machine Learning
39 pages
My Download
No ratings yet
My Download
4 pages
My Download
No ratings yet
My Download
4 pages
My Download
No ratings yet
My Download
4 pages
Convolutional Neural Network - 5
No ratings yet
Convolutional Neural Network - 5
21 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Applications of AI
No ratings yet
Applications of AI
56 pages
DNN Cluster S2 22 MidSem Makeup
No ratings yet
DNN Cluster S2 22 MidSem Makeup
7 pages
Deep Learning Part 1 (IITM) - Unit 14 - Week 11
No ratings yet
Deep Learning Part 1 (IITM) - Unit 14 - Week 11
3 pages
DL
No ratings yet
DL
2 pages
CS 230 - Convolutional Neural Networks Cheatsheet
No ratings yet
CS 230 - Convolutional Neural Networks Cheatsheet
7 pages
CS3491 - Notes - Unit 5 - Neural Networks
No ratings yet
CS3491 - Notes - Unit 5 - Neural Networks
37 pages
Lecture Notes 6
No ratings yet
Lecture Notes 6
5 pages
B.E Syllabus For DL
No ratings yet
B.E Syllabus For DL
4 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
27 pages
XOR Problem Demonstration Using MATLAB
0% (1)
XOR Problem Demonstration Using MATLAB
19 pages
Introduction To Deep Convolutional Neural Networks: March 2016
No ratings yet
Introduction To Deep Convolutional Neural Networks: March 2016
51 pages
Day 2 Module 1 - Introduction To Generative AI
No ratings yet
Day 2 Module 1 - Introduction To Generative AI
15 pages
Deep Learning Syllabus
No ratings yet
Deep Learning Syllabus
2 pages
Recurrent Neural Network Applications
No ratings yet
Recurrent Neural Network Applications
16 pages
6 C3 M4 L1-RecurrentNeuralNetwork1
No ratings yet
6 C3 M4 L1-RecurrentNeuralNetwork1
29 pages
Neural Machine Translation: Shusen Wang
No ratings yet
Neural Machine Translation: Shusen Wang
57 pages
RNN, NLP
No ratings yet
RNN, NLP
2 pages
Malware Classification Using Deep Learning: Mohd Shahril
No ratings yet
Malware Classification Using Deep Learning: Mohd Shahril
48 pages
Vehicle Accident and Traffic Classification Using Deep Convolutional Neural Networks
No ratings yet
Vehicle Accident and Traffic Classification Using Deep Convolutional Neural Networks
6 pages
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
No ratings yet
CII4Q3 VISI KOMPUTER - Deep Learning - CNN
106 pages
3-Intro To Deep Learning and Perceptron
No ratings yet
3-Intro To Deep Learning and Perceptron
43 pages
"The Pope Has A New Baby!" Fake News Detection Using Deep Learning
No ratings yet
"The Pope Has A New Baby!" Fake News Detection Using Deep Learning
8 pages
A Survey of Deep Learning and Its Applications: A New Paradigm To Machine Learning
No ratings yet
A Survey of Deep Learning and Its Applications: A New Paradigm To Machine Learning
22 pages
CC511 Week 7 - Deep - Learning
No ratings yet
CC511 Week 7 - Deep - Learning
33 pages
A Survey On Deep Learning in Medical Image Analysis: Haugeland 1985
No ratings yet
A Survey On Deep Learning in Medical Image Analysis: Haugeland 1985
38 pages
Recurrent Neural Network: SUBMITTED BY: Harmanjeet Singh ROLL NO - 1803448 B.Tech, Cse (7) Ctiemt, Shahpur (Jalandhar)
No ratings yet
Recurrent Neural Network: SUBMITTED BY: Harmanjeet Singh ROLL NO - 1803448 B.Tech, Cse (7) Ctiemt, Shahpur (Jalandhar)
11 pages
Module 4 Recurrent Neural Network
No ratings yet
Module 4 Recurrent Neural Network
78 pages