0% found this document useful (0 votes)
15 views19 pages

Unit - 4 DL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views19 pages

Unit - 4 DL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

UNIT-4

Convolutional Neural Networks (CNNs) in Deep Learning:


Convolutional Neural Networks (CNNs) are widely used in deep learning for
processing image and video data. They are designed to automatically learn and
extract features from input data, making them particularly effective for tasks
such as image classification, object detection, and segmentation. CNNs consist
of several key components, including convolutional layers that apply filters to
the input, pooling layers that reduce the spatial dimensions, and fully
connected layers used for making predictions. These layers work together to
capture hierarchical patterns and features from data, helping the model
understand complex structures.
Key Components
 Convolutional Layers: Apply filters to extract features from the input
data, typically images.
 Pooling Layers: Down sample the data by reducing the spatial
dimensions while preserving important features.
 Fully Connected Layers: Use these layers at the end to make predictions
or classifications based on extracted feature
Working Principle
CNNs utilize local connectivity and hierarchical feature extraction. They
automatically learn to extract features such as edges, textures, and patterns
through successive layers.
Advantages
 Ability to automatically learn complex patterns from data without
manual feature extraction.
 Performs well with spatial and temporal data, especially images and
videos.
Applications
 Image Classification: Categorizing images into different classes.
 Object Detection: Locating objects within images or videos.
 Segmentation: Dividing images into meaningful segments or regions.
Architecture
A typical CNN consists of multiple layers:
1. Input Layer
2. Convolutional Layers
3. Pooling Layers
4. Fully Connected Layers
5. Output Layer
Training Process
 CNNs are trained using backpropagation and optimization techniques
such as stochastic gradient descent (SGD).
 They require a large amount of labeled data for supervised learning.
Feature Maps
 Each layer generates feature maps, which represent different levels of
abstraction and complexity.
Challenges
 Require large computational power for training and inference.
 Vulnerable to adversarial attacks with small perturbations causing
significant misclassifications.
Representation Learning
Representation learning is a machine learning technique that automatically
discovers useful features from data, eliminating the need for manual feature
engineering. These features, often in the form of vectors or embeddings,
capture important data characteristics and can be used for tasks like
classification, clustering, and retrieval.
Unlike traditional methods that require domain expertise to design features,
representation learning learns directly from data. It uses techniques like
unsupervised learning (e.g., autoencoders and generative models like GANs
and VAEs), supervised learning (e.g., CNNs trained on labeled datasets), and
semi-supervised learning, which combines labeled and unlabeled data.
Autoencoders compress data into a simpler form during training, creating
feature vectors, while generative models like GANs learn to generate similar
data samples. Supervised learning is widely applied in areas like computer
vision, where CNNs trained on datasets like ImageNet produce robust features
for tasks such as image classification and object detection.
Representation learning is crucial in fields like natural language processing,
speech recognition, and computer vision, as it enhances the performance of
machine learning systems by automatically extracting meaningful patterns
from raw data.

Convolutional Layers in CNN


Convolutional layers are the foundation of Convolutional Neural Networks
(CNNs) in deep learning. Their primary function is to extract meaningful
features from input data, such as images or videos, by detecting patterns like
edges, textures, or more complex shapes. These features are used to help the
network understand and classify the data.
How Convolutional Layers Work
In a convolutional layer, a small matrix called a filter or kernel slides over the
input data (like an image) in a process called convolution. At each position, the
filter performs element-wise multiplication with the overlapping part of the
input, and the results are summed up to create a single value. This value
becomes part of the output called a feature map.
Filters and Feature Extraction
Each filter is designed to detect a specific feature.
 Filters in the early layers of a CNN learn basic features, such as edges,
lines, and corners.
 Filters in the deeper layers learn more complex features, like shapes,
patterns, or even entire objects.
Multiple filters are applied in one layer to capture different types of
features from the input.
Key Concepts
1. Stride: The step size of the filter as it slides over the input. A larger stride
reduces the size of the output feature map.
2. Padding: Extra pixels added around the input's border to control the size
of the output. Padding ensures that the filter can cover the edges of the
input.
3. Non-linearity (Activation Function): After convolution, activation
functions like ReLU (Rectified Linear Unit) are applied to the feature map
to introduce non-linearity, enabling the model to learn complex patterns.
Advantages of Convolutional Layers
 Local Connectivity: Filters focus on small regions, capturing local details
and reducing the number of parameters compared to fully connected
layers.
 Parameter Sharing: The same filter is applied across the input, allowing
the model to learn efficiently.
 Translation Invariance: CNNs can recognize patterns regardless of their
position in the input, making them robust for tasks like image
recognition.
Output of Convolutional Layers
The result of a convolutional layer is a stack of feature maps that represent
different aspects of the input data. These feature maps are then passed to
other layers (e.g., pooling layers or additional convolutional layers) for further
processing.
Applications
Convolutional layers are used in various tasks:
 Image Classification: Assigning labels to images (e.g., cat, dog).
 Object Detection: Locating and identifying objects within images.
 Image Segmentation: Dividing an image into meaningful parts.
 Feature Extraction: Used in domains like medical imaging, video analysis,
and facial recognition.
Convolutional layers are a powerful tool in CNNs, as they enable the network to
automatically learn features from raw input data, making them essential for
computer vision and many other deep learning applications.

Multichannel Convolutional Operation


In Convolutional Neural Networks (CNNs), a multichannel convolutional
operation involves convolving multiple filters with input data that has multiple
channels, such as RGB images. It extends the single-channel convolution
process by handling multiple input channels simultaneously, allowing the
network to learn complex spatial features.
How It Works
For an input with multiple channels (e.g., red, green, and blue in an RGB
image), a single filter operates on all channels. The results from each channel
are summed to produce one output feature map. Multiple filters are used to
generate multiple output feature maps, which are then stacked to form the
final output tensor.
Key Concepts
 Filters or Kernels: Filters in multichannel convolutions are 3D tensors.
The first two dimensions represent the filter size (height and width),
while the third corresponds to the number of input channels. Filters are
trained to extract meaningful features during training.
 Stride and Padding:
o Stride determines the step size by which the filter moves across
the input.
o Padding involves adding zeros around the input edges to control
the output size.
 Convolution Operation: Each filter is applied to all input channels, and
the results are summed. After convolution, an activation function (e.g.,
ReLU) is applied to produce the output feature map.
 Multiple Filters: Multiple filters are used to capture diverse features.
Each filter generates a unique output feature map, and these maps are
combined to form the output tensor.
Input and Output Feature Maps
 Input feature maps represent the multiple input channels (e.g., color
channels or learned features).
 Output feature maps represent the features learned by the filters and
are stacked to form the output tensor.
Applications
Multichannel convolutional operations are crucial in image recognition, video
processing, and segmentation. They enable CNNs to analyze multi-channel data
(e.g., color images) and learn complex features for tasks like object detection
and classification.

Recurrent Neural Network (RNN)


Introduction
Recurrent Neural Networks (RNNs) are a type of neural network designed for
sequential data, where the current output depends on previous inputs. Unlike
traditional neural networks, RNNs have a feedback loop that allows them to
remember information from previous steps. This makes them suitable for tasks
involving time series, text, audio, or any data with temporal dependencies.
How RNNs Work
RNNs process input sequentially, one step at a time. At each step, they take the
current input and combine it with the information (hidden state) from the
previous step. This hidden state acts as a memory, allowing the network to
retain context across the sequence. The output at each step depends on both
the current input and the memory from prior steps. However, traditional RNNs
struggle with long sequences due to issues like vanishing gradients, which can
weaken the memory over time.
5 Main Components of RNNs
1. Input Layer: Processes sequential data, such as text (one word at a time)
or audio signals.
2. Hidden Layer: Maintains the memory by storing information from
previous steps and updating it with new inputs.
3. Weights: Shared across all steps to ensure consistent learning across the
sequence.
4. Activation Function: Non-linear functions (e.g., tanh or ReLU) that help
process and transform the hidden state.
5. Output Layer: Produces predictions or classifications, either for each
step or after processing the entire sequence.

Training through RNN


1. A single-time step of the input is provided to the network.
2. Then calculate its current state using a set of current input and the previous
state.
3. The current ht becomes ht-1 for the next time step.
4. One can go as many time steps according to the problem and join the
information from all the previous states.
5. Once all the time steps are completed the final current state is used to
calculate the output.
6. The output is then compared to the actual output i.e the target output and
the error is generated.
7. The error is then back-propagated to the network to update the weights and
hence the network (RNN) is trained.
Applications
1. Natural Language Processing: Used in tasks like text generation, machine
translation, and sentiment analysis.
2. Speech Recognition: Converts spoken language into text by analyzing
audio sequences.
3. Time-Series Prediction: Forecasts stock prices, weather, or energy
consumption.
4. Video Analysis: Helps in activity recognition and video captioning.
5. Music Composition: Generates music sequences based on learned
patterns.
RNNs are powerful for handling sequential and temporal data but face
limitations with long-term dependencies. Advanced variants like LSTM
(Long Short-Term Memory) and GRU (Gated Recurrent Unit) address
some of these issues, making RNNs even more effective.
Advantages of Recurrent Neural Network
1. An RNN remembers each and every piece of information through
time. It is useful in time series prediction only because of the feature to
remember previous inputs as well. This is called Long Short-Term
Memory.
2. Recurrent neural networks are even used with convolutional layers to
extend the effective pixel neighbourhood.
Disadvantages of Recurrent Neural Network
1. Gradient vanishing and exploding problems.
2. Training an RNN is a very difficult task.
3. It cannot process very long sequences if using tanh or relu as an
activation
function.
RNN CODE:
Recurrent Neural Network (RNN) using Python and TensorFlow/Keras. This
example demonstrates a basic RNN for a sequence classification task, such as

predicting the class of a sequence .


import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras.utils import to_categorical
# Generate synthetic sequential data
# For demonstration, we use sequences of length 10 with 2 classes
num_samples = 1000
sequence_length = 10
num_features = 5 # Number of features per time step
num_classes = 2 # Output classes
# Create random input data
X = np.random.rand(num_samples, sequence_length, num_features) # Input shape:
(samples, timesteps, features)
# Generate random labels and one-hot encode them
y = np.random.randint(0, num_classes, num_samples)
y = to_categorical(y, num_classes) # Shape: (samples, num_classes)
# Build the RNN model
model = Sequential([
SimpleRNN(32, activation='relu', input_shape=(sequence_length, num_features)),
Dense(num_classes, activation='softmax') # Output layer
])
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
# Train the model
model.fit(X, y, epochs=10, batch_size=32, validation_split=0.2)
# Evaluate the model
loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")
Here’s a simple implementation of a Recurrent Neural Network (RNN) using Python and
TensorFlow/Keras. This example demonstrates a basic RNN for a sequence classification task,
such as predicting the class of a sequence.

RNN Example Code


python
Copy code
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import SimpleRNN, Dense
from tensorflow.keras.utils import to_categorical

# Generate synthetic sequential data


# For demonstration, we use sequences of length 10 with 2 classes
num_samples = 1000
sequence_length = 10
num_features = 5 # Number of features per time step
num_classes = 2 # Output classes

# Create random input data


X = np.random.rand(num_samples, sequence_length, num_features) # Input shape:
(samples, timesteps, features)

# Generate random labels and one-hot encode them


y = np.random.randint(0, num_classes, num_samples)
y = to_categorical(y, num_classes) # Shape: (samples, num_classes)
# Build the RNN model
model = Sequential([
SimpleRNN(32, activation='relu', input_shape=(sequence_length, num_features)),
Dense(num_classes, activation='softmax') # Output layer
])

# Compile the model


model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the model


model.fit(X, y, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate the model


loss, accuracy = model.evaluate(X, y)
print(f"Loss: {loss:.4f}, Accuracy: {accuracy:.4f}")

Explanation of the Code


1. Input Data: The input is a 3D array (samples, timesteps, features).
o samples: Number of sequences.
o timesteps: Length of each sequence.
o features: Number of features per time step.
2. RNN Layer:
o SimpleRNN(32): Adds an RNN with 32 hidden units, which processes
sequential data and outputs a single feature vector per sequence.
o activation='relu': Introduces non-linearity to the RNN layer.
3. Dense Layer:
o Dense(num_classes, activation='softmax'): A fully connected layer for
classification, using softmax for probability outputs.
4. Compilation:
o optimizer='adam': Optimizes the model parameters.
o loss='categorical_crossentropy': Calculates the difference between predicted
and true labels for multi-class classification.
5. Training: The model is trained on synthetic data for 10 epochs.
6. Evaluation: Outputs the model's performance on the dataset.
For real-world tasks like text or time-series analysis, replace synthetic data with
actual data and tune the model as needed!

Deep Learning with PyTorch


Introduction
PyTorch is a popular open-source deep learning framework that provides flexibility,
speed, and ease of use. It allows researchers and developers to build and train neural
networks with simple and readable code. PyTorch is built on Python and offers a
dynamic computation graph, which makes it ideal for experimentation.

How Deep Learning Works in PyTorch


1. Tensors: PyTorch uses tensors (multi-dimensional arrays) as its core data structure.
Tensors are similar to NumPy arrays but can run on GPUs for faster computation.
2. Model Definition: Neural networks in PyTorch are created using the torch.nn
module.
3. Training Process:
o Define a model architecture.
o Specify a loss function (to measure prediction errors).
o Choose an optimizer (to adjust model weights).
o Train the model by feeding data and updating weights.

Example: Simple Neural Network in PyTorch


Let’s build a neural network to classify data into two classes.
python
Copy code
import torch
import torch.nn as nn
import torch.optim as optim
# 1. Create dummy data
X = torch.rand((100, 2)) # 100 samples, each with 2 features
y = torch.randint(0, 2, (100,)) # 100 labels (binary classification: 0 or 1)
# 2. Define the Neural Network
class SimpleNN(nn.Module):
def __init__(self):
super(SimpleNN, self).__init__()
self.layer1 = nn.Linear(2, 4) # Input: 2 features, Output: 4 neurons
self.layer2 = nn.Linear(4, 1) # Output: 1 neuron (binary classification)
self.activation = nn.Sigmoid() # Sigmoid activation for binary output
def forward(self, x):
x = torch.relu(self.layer1(x)) # Apply ReLU activation on the first layer
x = self.activation(self.layer2(x)) # Sigmoid on the second layer
return x
model = SimpleNN() # Create the model
# 3. Define Loss and Optimizer
criterion = nn.BCELoss() # Binary Cross-Entropy Loss
optimizer = optim.SGD(model.parameters(), lr=0.01) # Stochastic Gradient Descent
# 4. Train the Model
for epoch in range(20): # Train for 20 epochs
optimizer.zero_grad() # Clear gradients
outputs = model(X) # Forward pass
loss = criterion(outputs.squeeze(), y.float()) # Compute loss
loss.backward() # Backward pass
optimizer.step() # Update weights
if (epoch + 1) % 5 == 0:
print(f"Epoch [{epoch+1}/20], Loss: {loss.item():.4f}")
# 5. Test the Model
with torch.no_grad():
predictions = (model(X).squeeze() > 0.5).int() # Convert probabilities to binary
predictions
accuracy = (predictions == y).sum().item() / y.size(0)
print(f"Accuracy: {accuracy * 100:.2f}%")

Explanation of the Code


1. Data:
o X: 2D input data with 2 features per sample.
o y: Labels for binary classification.
2. Neural Network:
o nn.Linear(2, 4): Fully connected layer with 2 inputs and 4 outputs.
o torch.relu: ReLU activation for non-linearity.
o nn.Sigmoid: Outputs a probability between 0 and 1 for classification.
3. Training:
o criterion: Measures prediction errors (binary cross-entropy loss).
o optimizer.step(): Adjusts the model weights to minimize the loss.
4. Testing:
o The model predicts binary outputs, and accuracy is calculated by comparing
predictions with labels.

Why PyTorch?
 Dynamic Graphs: Easy debugging and flexible model design.
 GPU Support: Built-in support for faster computation on GPUs.
 Large Community: Extensive documentation and resources.
PyTorch simplifies deep learning, making it easy for both beginners and researchers
to experiment with advanced models.
Convolutional Neural Networks (CNN) with
PyTorch
CNNs are widely used for image-related tasks like classification, object detection, and
segmentation. Below is a simple example to classify images using CNN in PyTorch.
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

# 1. Generate Tiny Dummy Data (10 images, 4x4 size, 2 classes)


X = torch.rand(10, 1, 4, 4) # 10 samples, 1 channel (grayscale), 4x4 pixels
y = torch.randint(0, 2, (10,)) # 10 labels (0 or 1 for binary classification)

# 2. Create DataLoader
dataset = TensorDataset(X, y)
dataloader = DataLoader(dataset, batch_size=2, shuffle=True)

# 3. Define a Simple CNN


class SimpleCNN(nn.Module):
def __init__(self):
super(SimpleCNN, self).__init__()
self.conv1 = nn.Conv2d(1, 2, kernel_size=2) # 1 input channel, 2 filters, 2x2
kernel
self.fc1 = nn.Linear(2 * 3 * 3, 2) # Flattened output size 2 * 3 * 3, 2 classes

def forward(self, x):


x = torch.relu(self.conv1(x)) # Apply convolution and ReLU activation
x = x.view(-1, 2 * 3 * 3) # Flatten for the fully connected layer
x = self.fc1(x) # Fully connected layer for output
return x

model = SimpleCNN()

# 4. Define Loss Function and Optimizer


criterion = nn.CrossEntropyLoss() # For multi-class classification
optimizer = optim.SGD(model.parameters(), lr=0.01) # Stochastic Gradient Descent

# 5. Train the Model


for epoch in range(5): # Train for 5 epochs
for inputs, labels in dataloader:
optimizer.zero_grad() # Clear gradients
outputs = model(inputs) # Forward pass
loss = criterion(outputs, labels) # Compute loss
loss.backward() # Backward pass
optimizer.step() # Update weights

print(f"Epoch {epoch+1}, Loss: {loss.item():.4f}")

# 6. Test the Model


with torch.no_grad():
test_output = model(X) # Use the same dummy data for testing
predictions = torch.argmax(test_output, axis=1)
print("Predictions:", predictions)

Explanation
1. Data:
o X: Represents 10 grayscale images of size 4x4 pixels. Each image has only 1
channel.
o y: Binary labels (0 or 1) representing the class of each image.
o Data is loaded using TensorDataset and DataLoader.
2. CNN Architecture:
o conv1: A convolutional layer with:
 1 input channel (grayscale images).
 2 filters (or kernels) that extract 2 feature maps.
 2x2 kernel size, which scans the image.
o fc1: A fully connected layer that takes the flattened features (from conv1) and
predicts probabilities for 2 classes.
3. Training:
o Forward pass: The input passes through conv1 and fc1 to make predictions.
o Backward pass: Gradients are calculated and weights are updated to reduce
loss.
4. Testing:
o Predictions are obtained by applying the trained model to input data. The
predicted class is determined using torch.argmax().

What’s Happening?
1. The convolutional layer (conv1) extracts spatial features like edges or patterns from
the input images.
2. The ReLU activation function introduces non-linearity, helping the model learn
complex features.
3. The fully connected layer (fc1) combines the extracted features and makes a final
prediction.
4. The loss function (CrossEntropyLoss) measures how far the predicted output is from
the actual labels.
5. The model learns through optimization (SGD in this case).

Key Points
 CNN Advantage: It automatically learns spatial features, making it ideal for image-
related tasks.
 Tiny Example: A small input (4x4 images) and simple architecture (1 conv layer, 1
dense layer) keep the example easy to follow.
 Scalability: The same principles apply to more complex models for larger datasets
like CIFAR or ImageNet.

You might also like