0% found this document useful (0 votes)
9 views19 pages

Module V-Deep Learning

The document discusses various applications of deep learning in classical supervised tasks, including classification, regression, image analysis, and generative modeling. It highlights the use of Convolutional Neural Networks (CNNs) and Transformers for tasks such as image classification, object detection, and semantic segmentation, along with practical code examples in TensorFlow and PyTorch. Additionally, it covers advanced topics like image denoising and Variational Autoencoders (VAEs), emphasizing their significance in enhancing data quality and generating new instances.

Uploaded by

Hemanjali Boola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views19 pages

Module V-Deep Learning

The document discusses various applications of deep learning in classical supervised tasks, including classification, regression, image analysis, and generative modeling. It highlights the use of Convolutional Neural Networks (CNNs) and Transformers for tasks such as image classification, object detection, and semantic segmentation, along with practical code examples in TensorFlow and PyTorch. Additionally, it covers advanced topics like image denoising and Variational Autoencoders (VAEs), emphasizing their significance in enhancing data quality and generating new instances.

Uploaded by

Hemanjali Boola
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

DEEP LEARNING

MODULE V

Classical Supervised Tasks with Deep Learning


At its core, much of deep learning's practical utility stems from its mastery of
classical supervised tasks, where models learn directly from labeled data to
make accurate predictions. Classification is a prime example, enabling
machines to categorize inputs into predefined classes. Whether it involves
distinguishing between spam and legitimate emails in a binary classification
scenario, or identifying multiple types of objects in an image through multi-
class classification, deep learning models—particularly Convolutional Neural
Networks (CNNs) for images and Transformers for language—have achieved
unparalleled accuracy. Complementing this is regression, where deep
learning models are adept at predicting continuous numerical values. This is
crucial for applications ranging from forecasting real estate prices to
predicting stock market trends or even estimating a person's age from a
photograph, allowing models to discern intricate, non-linear relationships
within complex datasets.
Beyond simple categorization, deep learning has revolutionized image
analysis. Object detection goes a step further than mere classification by not
only identifying what objects are present but also precisely locating them
with bounding boxes. This capability is indispensable for autonomous
vehicles recognizing pedestrians and traffic signs, or for security systems
monitoring specific items. Even more granular is image segmentation, which
achieves a pixel-level understanding of an image by assigning a semantic
label to every single pixel. This fine-grained analysis is invaluable in medical
imaging for pinpointing anomalies or in self-driving cars for delineating road
boundaries from surrounding environments. Finally, sequence-to-sequence
(Seq2Seq) tasks have seen monumental advancements thanks to deep
learning. Models like Recurrent Neural Networks (RNNs) and Transformers
excel at translating input sequences into output sequences of varying
lengths, enabling breakthroughs in machine translation, text summarization,
and speech recognition, fundamentally altering how we interact with and
process linguistic information.

Image Classification with CNN (TensorFlow/Keras):


import tensorflow as tf
from tensorflow.keras import datasets, layers, models

# Load dataset (CIFAR-10)


(train_images, train_labels), (test_images, test_labels) =
datasets.cifar10.load_data()

# Normalize pixel values to be between 0 and 1


train_images, test_images = train_images / 255.0, test_images / 255.0

# Build CNN model


model = models.Sequential([
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)),
layers.MaxPooling2D((2, 2)),
layers.Conv2D(64, (3, 3), activation='relu'),
layers.MaxPooling2D((2, 2)),
layers.Flatten(),
layers.Dense(64, activation='relu'),
layers.Dense(10, activation='softmax')
])
# Compile and train
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=10)

# Evaluate
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
print(f"Test accuracy: {test_acc}")

Example - Regression Model (TensorFlow/Keras):


from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, models

# Load dataset (Boston Housing)


boston = load_boston()
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target,
test_size=0.2)

# Build the regression model


model = models.Sequential([
layers.Dense(64, activation='relu', input_dim=X_train.shape[1]),
layers.Dense(32, activation='relu'),
layers.Dense(1) # Output layer for regression
])

# Compile and train


model.compile(optimizer='adam', loss='mean_squared_error')
model.fit(X_train, y_train, epochs=10, batch_size=32)

# Evaluate
mse = model.evaluate(X_test, y_test)
print(f"Mean Squared Error: {mse}")
Image Denoising
Image denoising is a crucial practical application where deep learning
significantly enhances image quality by intelligently removing unwanted
noise while meticulously preserving the image's inherent details and
structures. Noise, often introduced during image acquisition in low-light
conditions or due to sensor limitations, can severely degrade visual clarity.
While traditional methods like spatial or transform domain filtering existed,
they frequently struggled to balance noise reduction with detail preservation,
often leading to blurry results. Deep learning, primarily through
Convolutional Neural Networks (CNNs), has transformed this field. These
networks are rigorously trained on extensive datasets of paired noisy and
clean images, allowing them to learn intricate noise patterns and discern
them from true image content. Architectures often resemble encoder-
decoder structures (similar to U-Net) or employ residual networks (like
DnCNN), strategically incorporating skip connections to ensure that low-level,
fine-grained details are retained throughout the denoising process. The
profound advantage of deep learning in denoising lies in its superior ability to
adapt to diverse noise characteristics, its efficacy in preserving subtle
textures and sharp edges, and its impressive speed once trained.
Consequently, it's an indispensable tool for enhancing the clarity of medical
scans, improving photographs taken in challenging lighting conditions, and
restoring historical or damaged visual media.

Image Denoising with a Simple CNN (PyTorch):


import torch
import torch.nn as nn
import torchvision.transforms as transforms
from torchvision import datasets
import matplotlib.pyplot as plt

# Define simple CNN for denoising


class DnCNN(nn.Module):
def __init__(self):
super(DnCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 64, 3, padding=1)
self.conv2 = nn.Conv2d(64, 64, 3, padding=1)
self.conv3 = nn.Conv2d(64, 1, 3, padding=1)

def forward(self, x):


x = torch.relu(self.conv1(x))
x = torch.relu(self.conv2(x))
x = self.conv3(x)
return x

# Load image (MNIST)


transform = transforms.Compose([transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
dataset = datasets.MNIST(root='./data', train=False, download=True,
transform=transform)
image, _ = dataset[0]

# Add noise to the image


noisy_image = image + torch.randn_like(image) * 0.5
noisy_image = torch.clamp(noisy_image, 0, 1)

# Denoising using CNN


model = DnCNN()
denoised_image = model(noisy_image.unsqueeze(0))
# Plot images
plt.subplot(1, 3, 1)
plt.imshow(image.squeeze(), cmap='gray')
plt.title("Original")
plt.subplot(1, 3, 2)
plt.imshow(noisy_image.squeeze(), cmap='gray')
plt.title("Noisy")
plt.subplot(1, 3, 3)
plt.imshow(denoised_image.squeeze().detach(), cmap='gray')
plt.title("Denoised")
plt.show()

Semantic Segmentation
Semantic segmentation represents a sophisticated leap in computer vision's
understanding of images, moving beyond simple object identification to
perform pixel-level classification. This means that instead of just drawing
boxes around objects, the model assigns a specific semantic label (like
"road," "sky," "person," or "car") to every single pixel in an image. This
provides an exceptionally granular and precise understanding of the scene.
While traditional methods relied on handcrafted features and conventional
classifiers, deep learning, particularly with the advent of Fully Convolutional
Networks (FCNs) and encoder-decoder architectures such as U-Net and
DeepLab, has led to revolutionary progress. These specialized networks are
designed to process an input image and generate an output segmentation
map of identical dimensions, where each pixel corresponds to a predicted
class label. The encoder part of the network extracts high-level semantic
features by progressively reducing spatial resolution, while the decoder then
reconstructs the full-resolution segmentation map. Skip connections are vital
in this process, transferring fine-grained spatial information from the encoder
directly to the decoder, which helps in preserving sharp boundaries and
intricate details that might otherwise be lost during downsampling.
Furthermore, techniques like dilated convolutions are employed to expand
the receptive field of filters without sacrificing resolution, allowing the
network to capture broader contextual information. Semantic segmentation
is foundational for critical applications like autonomous driving, where
accurate pixel-level environmental understanding is paramount for safe
navigation; medical image analysis, enabling precise delineation of organs or
pathological structures for diagnosis and treatment planning; and remote
sensing, for detailed classification of land cover in satellite imagery.

Semantic Segmentation using U-Net (TensorFlow/Keras):


from tensorflow.keras import layers, models

# Define a simple U-Net architecture


def unet(input_size=(128, 128, 3)):
inputs = layers.Input(input_size)
conv1 = layers.Conv2D(64, 3, activation='relu', padding='same')(inputs)
conv1 = layers.Conv2D(64, 3, activation='relu', padding='same')(conv1)
pool1 = layers.MaxPooling2D(pool_size=(2, 2))(conv1)

conv2 = layers.Conv2D(128, 3, activation='relu', padding='same')(pool1)


conv2 = layers.Conv2D(128, 3, activation='relu
', padding='same')(conv2)
pool2 = layers.MaxPooling2D(pool_size=(2, 2))(conv2)
up1 = layers.UpSampling2D(size=(2, 2))(pool2)
up1 = layers.Concatenate()([conv1, up1])
conv3 = layers.Conv2D(64, 3, activation='relu', padding='same')(up1)
conv3 = layers.Conv2D(64, 3, activation='relu', padding='same')(conv3)

outputs = layers.Conv2D(1, 1, activation='sigmoid')(conv3)


model = models.Model(inputs=inputs, outputs=outputs)
return model

Create and compile the model


model = unet()
model.compile(optimizer='adam', loss='binary_crossentropy',
metrics=['accuracy'])
Assume X_train, Y_train are your training images and labels
model.fit(X_train, Y_train, epochs=10)

Object Detection
Object detection is one of the most dynamic areas of deep learning, focused
on both identifying the classes of objects in an image and precisely localizing
them by drawing bounding boxes around the detected objects. Unlike simple
image classification, where the task is to assign a single label to an entire
image, object detection requires the model to simultaneously solve two
problems: classification (identifying what objects are present) and
localization (determining where the objects are). Early object detection
models relied on traditional computer vision techniques, but deep learning,
particularly through architectures like YOLO (You Only Look Once) and Faster
R-CNN, has fundamentally transformed the space. These models utilize CNN-
based feature extractors that scan the image at multiple scales and locations
to identify objects, followed by region proposal networks (RPN) to
hypothesize bounding boxes. As a result, object detection is now fast,
accurate, and capable of working in real-time. Applications abound in various
fields: autonomous vehicles for recognizing pedestrians and other vehicles,
surveillance systems for identifying and tracking individuals or objects of
interest, and robotics for interacting with and manipulating objects in
complex environments.

Object Detection using YOLO (via PyTorch):

# Clone YOLOv5 repository and install requirements


!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt

# Run object detection with pre-trained YOLOv5 model


!python detect.py --source test_images/ --weights yolov5s.pt

Generative Modeling with Deep Learning


Generative models are designed to learn the underlying distribution of data
in order to generate new instances that are indistinguishable from the
original data. Unlike discriminative models, which focus on classification or
regression, generative models aim to capture the full data distribution.
Popular approaches to generative modeling include Generative Adversarial
Networks (GANs), Variational Autoencoders (VAEs), and, more recently,
Diffusion Models. GANs, introduced by Ian Goodfellow, employ a two-player
game between a generator network, which attempts to create realistic data,
and a discriminator network, which tries to distinguish between real and
generated data. Over time, the generator improves at producing realistic
data that eventually fools the discriminator. VAEs, on the other hand, take a
probabilistic approach, encoding data into a lower-dimensional latent space
from which new data samples can be drawn. Diffusion models, gaining
popularity in recent years, work by progressively adding noise to data and
then learning how to reverse this noise process to recover original data,
showing impressive results in tasks like image generation. These generative
models have profound implications for various applications, including image
synthesis, drug discovery, and even art generation.

Generating Images with a Simple GAN (TensorFlow/Keras):


import tensorflow as tf
from tensorflow.keras import layers, models

# Define a simple GAN (Generator and Discriminator)


def build_generator():
model = models.Sequential([
layers.Dense(128, activation='relu', input_dim=100),
layers.Dense(784, activation='sigmoid'),
layers.Reshape((28, 28, 1))
])
return model

def build_discriminator():
model = models.Sequential([
layers.Flatten(input_shape=(28, 28, 1)),
layers.Dense(128, activation='relu'),
layers.Dense(1, activation='sigmoid')
])
return model

# Compile models
generator = build_generator()
discriminator = build_discriminator()
discriminator.compile(optimizer='adam', loss='binary_crossentropy')

# GAN Model (combined model)


discriminator.trainable = False
gan_input = layers.Input(shape=(100,))
x = generator(gan_input)
gan_output = discriminator(x)
gan = models.Model(gan_input, gan_output)
gan.compile(optimizer='adam', loss='binary_crossentropy')

# Train GAN (example using random noise for simplicity)


import numpy as np

random_noise = np.random.randn(32, 100)


fake_images = generator.predict(random_noise)

# Visualize generated images


import matplotlib.pyplot as plt
plt.imshow(fake_images[0, :, :, 0], cmap='gray')
plt.show()

Variational Autoencoders (VAEs)


Variational Autoencoders (VAEs) are a powerful class of generative models
that extend the traditional autoencoder architecture by introducing a
probabilistic framework. VAEs are designed to model the underlying
distribution of input data in a more structured and continuous latent space.
Unlike regular autoencoders, which aim to reconstruct data by compressing it
into a fixed latent representation, VAEs model the latent variables as
distributions rather than fixed values. This is achieved by encoding input
data into a probability distribution over the latent space, typically a Gaussian
distribution, with mean and variance as parameters. By sampling from this
distribution, VAEs can generate new data that is similar to the training data,
making them useful for tasks like image generation, data interpolation, and
anomaly detection.
In training a VAE, the goal is to minimize two objectives: the reconstruction
error, which ensures that the generated data is similar to the original data,
and the KL divergence loss, which ensures that the learned latent space
has a smooth, continuous structure. This combination of losses makes VAEs
more stable and easier to train compared to other generative models like
GANs. Furthermore, because VAEs learn a continuous latent space, they
allow for meaningful operations such as interpolation between data points in
the latent space. This has many applications, including image generation,
denoising, and even style transfer. One of the key strengths of VAEs is their
ability to produce coherent and diverse samples, even with relatively small
datasets. Variational Autoencoders have found applications in fields such as
healthcare, finance, and creative arts, where they are used for tasks like
generating new molecules, generating synthetic medical images, or creating
new artistic designs.

Simple VAE in Python (TensorFlow/Keras):


import tensorflow as tf
from tensorflow.keras import layers, models
from tensorflow.keras import backend as K
import numpy as np
import matplotlib.pyplot as plt

# Encoder: learns to map inputs to latent space distributions


def build_encoder(latent_dim):
inputs = layers.Input(shape=(28, 28, 1))
x = layers.Flatten()(inputs)
x = layers.Dense(128, activation='relu')(x)
z_mean = layers.Dense(latent_dim, name='z_mean')(x)
z_log_var = layers.Dense(latent_dim, name='z_log_var')(x)

# Reparameterization trick
z = layers.Lambda(sampling, output_shape=(latent_dim,))([z_mean,
z_log_var])

return models.Model(inputs, [z_mean, z_log_var, z], name="encoder")

# Decoder: reconstructs the image from the latent representation


def build_decoder(latent_dim):
latent_inputs = layers.Input(shape=(latent_dim,))
x = layers.Dense(128, activation='relu')(latent_inputs)
x = layers.Dense(28*28, activation='sigmoid')(x)
x = layers.Reshape((28, 28, 1))(x)

return models.Model(latent_inputs, x, name="decoder")

# Sampling function for reparameterization trick


def sampling(args):
z_mean, z_log_var = args
batch = K.shape(z_mean)[0]
dim = K.int_shape(z_mean)[1]
epsilon = K.random_normal(shape=(batch, dim))
return z_mean + K.exp(0.5 * z_log_var) * epsilon

# Build VAE model


latent_dim = 2
encoder = build_encoder(latent_dim)
decoder = build_decoder(latent_dim)

# Define the VAE as a model


inputs = layers.Input(shape=(28, 28, 1))
z_mean, z_log_var, z = encoder(inputs)
reconstructed = decoder(z)

vae = models.Model(inputs, reconstructed, name="vae")

# Loss function (reconstruction loss + KL divergence loss)


reconstruction_loss = tf.keras.losses.binary_crossentropy(K.flatten(inputs),
K.flatten(reconstructed)) * 28 * 28
kl_loss = - 0.5 * K.mean(1 + z_log_var - K.square(z_mean) - K.exp(z_log_var),
axis=-1)
vae_loss = K.mean(reconstruction_loss + kl_loss)

vae.add_loss(vae_loss)
vae.compile(optimizer='adam')

# Load MNIST dataset for training


(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = np.expand_dims(x_train, -1).astype("float32") / 255.
x_test = np.expand_dims(x_test, -1).astype("float32") / 255.

# Train the VAE model


vae.fit(x_train, epochs=10, batch_size=128)

# Generate new samples


random_latent_vectors = np.random.normal(size=(10, latent_dim))
generated_images = decoder.predict(random_latent_vectors)

# Plot the generated images


for i in range(10):
plt.subplot(1, 10, i + 1)
plt.imshow(generated_images[i].reshape(28, 28), cmap='gray')
plt.axis('off')
plt.show()

1. Encoder Model: The encoder maps input data to a latent space. It


consists of a dense layer that outputs two vectors: the mean (z_mean)
and log-variance (z_log_var) of a Gaussian distribution in the latent
space. The reparameterization trick ensures that gradients can be
backpropagated through this sampling process.
2. Decoder Model: The decoder takes samples from the latent space
and reconstructs the original data. It consists of dense layers followed
by reshaping to the original image shape.
3. VAE Loss Function: The loss function consists of two components:
o Reconstruction loss: Measures how well the decoder
reconstructs the input.
o KL Divergence loss: Ensures that the latent space follows a
normal distribution.
4. Training: We use the MNIST dataset for training the VAE model. The
model learns to encode the images into a lower-dimensional latent
space and then decode them back into the original images.
5. Generating New Data: Once the model is trained, we can sample
from the latent space and generate new images by passing the
samples through the decoder.

Object Recognition
Object recognition is a fundamental task in computer vision that involves
identifying and classifying objects within an image or video. Unlike traditional
image classification, which assigns a single label to an entire image, object
recognition involves both the localization (finding the position of the object)
and classification (assigning a category label to the object). Deep learning
models, particularly Convolutional Neural Networks (CNNs), have
significantly advanced the field of object recognition, enabling applications in
diverse areas such as autonomous driving, surveillance systems, and
robotics. Object recognition tasks typically rely on bounding boxes, which
highlight the objects in the image and label them with a specific class (such
as “cat,” “dog,” “car,” etc.).
Over the years, different approaches to object recognition have been
proposed, including region-based methods like R-CNN (Region-based
Convolutional Neural Networks) and Faster R-CNN, which use a two-
step process: first generating potential object regions and then classifying
them. More recently, single-stage detectors like YOLO (You Only Look
Once) and SSD (Single Shot MultiBox Detector) have been developed,
which directly predict bounding boxes and class labels in a single forward
pass through the network. YOLO is known for its speed and ability to perform
real-time object detection, while Faster R-CNN offers higher accuracy but at
the cost of slower processing speed. Object recognition models are trained
on large datasets like COCO (Common Objects in Context) and PASCAL VOC
to learn features that are useful for identifying and localizing objects in
diverse environments.
These advancements in object recognition have had wide-reaching
implications. For example, in autonomous vehicles, object recognition is
crucial for detecting pedestrians, traffic signs, and other vehicles in real-
time. In surveillance, it helps in tracking suspicious activities or monitoring
specific areas. Moreover, in retail, object recognition is used to monitor
inventory, while in healthcare, it aids in analyzing medical scans by
detecting abnormalities like tumors.

Object Recognition with YOLO in Python (Using OpenCV):


import cv2
import numpy as np

# Load YOLO pre-trained weights and configuration


net = cv2.dnn.readNet("yolov3.weights", "yolov3.cfg")
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getOutputsNames()]

# Load COCO class labels


with open("coco.names", "r") as f:
classes = [line.strip() for line in f.readlines()]

# Load the input image


image = cv2.imread("test_image.jpg")
height, width, channels = image.shape

# Prepare image for YOLO network


blob = cv2.dnn.blobFromImage(image, 0.00392, (416, 416), (0, 0, 0), True,
crop=False)
net.setInput(blob)
outs = net.forward(output_layers)

# Process detections
class_ids, confidences, boxes = [], [], []
for out in outs:
for detection in out:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
x = center_x - w // 2
y = center_y - h // 2
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)

# Non-Maximum Suppression to remove redundant boxes


indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)

# Draw bounding boxes and class labels


for i in indices.flatten():
x, y, w, h = boxes[i]
label = str(classes[class_ids[i]])
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(image, label, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0,
255, 0), 2)

# Display the output image


cv2.imshow("Object Detection", image)
cv2.waitKey(0)
cv2.destroyAllWindows()
1. Load YOLO Model: We start by loading the pre-trained YOLO model
using the cv2.dnn.readNet() function, which loads both the model
weights (yolov3.weights) and the configuration (yolov3.cfg).
2. Class Labels: We load the class labels from the coco.names file, which
contains the names of the 80 object categories YOLO can detect.
3. Image Preprocessing: We load an image (test_image.jpg) and
convert it into a format that can be passed into the YOLO model using
cv2.dnn.blobFromImage(). The image is resized to 416x416 and
normalized for the model.
4. Forward Pass: The image is passed through the network using
net.forward(). The output consists of bounding box predictions, class
labels, and confidence scores.
5. Bounding Box Processing: We iterate over the outputs and extract
the bounding boxes, class labels, and confidence scores. If the
confidence is above a threshold (0.5), we save the bounding box
coordinates.
6. Non-Maximum Suppression (NMS): After obtaining multiple
bounding boxes, we use Non-Maximum Suppression (NMS) to filter
out redundant boxes. This step helps in removing multiple detections
of the same object.
7. Display Results: Finally, we draw the bounding boxes and labels on
the image and display the result using OpenCV's imshow function.

You might also like