0% found this document useful (0 votes)
2 views

Assignment 10

The document analyzes the cost efficiency of upgrading from a dual-CPU system to a GPU system for processing images, detailing initial and operational costs, performance metrics, and energy efficiency. It also compares three deep learning frameworks (TensorFlow, PyTorch, Keras) regarding their GPU acceleration capabilities and includes implementations for MNIST digit classification. Additionally, it discusses real-time object detection for autonomous driving, emphasizing the importance of GPU computing in handling high-speed image processing and model training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Assignment 10

The document analyzes the cost efficiency of upgrading from a dual-CPU system to a GPU system for processing images, detailing initial and operational costs, performance metrics, and energy efficiency. It also compares three deep learning frameworks (TensorFlow, PyTorch, Keras) regarding their GPU acceleration capabilities and includes implementations for MNIST digit classification. Additionally, it discusses real-time object detection for autonomous driving, emphasizing the importance of GPU computing in handling high-speed image processing and model training.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Assignment 10

NAME-: PRAKSHAL JAIN


ENROLMENT NUMBER – 21102157

ANS-: To compare the cost efficiency of upgrading from a dual-CPU system to a GPU
system for processing 1 million images, we need to consider both the initial hardware costs
and the ongoing operational costs over a period of 1 year.

Step 1: Identify Key Metrics


Cost: Dual-CPU system: $5000 per CPU × 2 = $10,000
GPU system: $30,000

Performance:
Dual-CPU system: 100 images per hour
GPU system: 500 images per hour

Operational Costs:
Let’s assume the following hypothetical operational costs:
Dual-CPU system: $200 per month
GPU system: $400 per month
Step 2: Calculate Time to Process 1 Million Images

Step 3: Calculate Total Operational Costs Over 1 Year

Dual-CPU System:
Operational Costs=12 months×200 USD/month=2,400 USD
GPU System:
Operational Costs=12 months×400 USD/month=4,800 USD

Step 4: Calculate Total Costs Over 1 Year

1. Dual-CPU System:

Total Cost=Initial Cost+Operational Costs=10,000 USD+2,400 USD=12,400 USD

2. GPU System:

Total Cost=Initial Cost+Operational Costs=30,000 USD+4,800 USD=34,800 USD


To calculate the scaling efficiency when using 4 GPUs compared to a single GPU, we follow
these steps:

1. Determine the Speedup: This is the ratio of the time taken with a single GPU to the
time taken with multiple GPUs.
2. Calculate the Scaling Efficiency: This is the speedup divided by the number of
GPUs.
To determine the energy efficiency (in images per watt-hour) for both the GPU-based server
and the CPU-based server, we will follow these steps:

1. Calculate the total energy consumption for each server.


2. Calculate the energy efficiency in terms of images per watt-hour for each server.
3. Compare the energy efficiencies.

 GPU-based server:
o Power consumption: 400 watts
o Processing time: 2 hours
o Images processed: 250,000
 CPU-based server:
o Power consumption: 250 watts
o Processing time: 10 hours
o Images processed: 250,000
Energy consumption (in watt-hours) is calculated by multiplying the power consumption (in
watts) by the time (in hours).

1. GPU-based server:

Energy Consumption=Power×Time=400 watts×2 hours=800 watt-


Energy Consumption=Power×Time=400 watts×2 hours=800 watt-hours

2. CPU-based server:

Energy Consumption=Power×Time=250 watts×10 hours=2500 watt-hours


Energy Consumption=Power×Time=250 watts×10 hours=2500 watt-hours

Energy efficiency is calculated by dividing the number of images processed by the total
energy consumption.

1. GPU-based server:

Energy Efficiency=Images Processed/Energy Consumption=250,000 images/800 watt


-hours=312.5 images per watt-hour.

2. CPU-based server:

Energy Efficiency=Images Processed/Energy Consumption=250,000 images/2500 wa


tt-hours=100 images per watt-hour.

Given Data

 Number of filters: 128


 Filter size: 3×33 \times 33×3
 Input feature map size: 64×64
 Stride: 1
 Padding: None (valid convolution)
 Number of input channels: 64

Dimensions of the Output Feature Map

The formula for the output size of a convolutional layer is given by:
Output Size=Input Size−Filter Size+2×Padding/Stride+1

Since there is no padding and the stride is 1:

Output Size=64−3+2×0/1+1=62.

So, the output feature map size is 62×62.

Number of FLOPs for a Single Convolution Operation

Each filter is applied to a 3×3 region across all 64 channels. Therefore, the number of
operations per filter application is:

FLOPs per filter application=3×3×64=576.

Each convolutional operation involves a multiply and an add (MAC operation), so each filter
application involves twice the number of FLOPs:

Total FLOPs per filter application=2×576=1152.

Number of Output Elements

The output feature map has dimensions 62×62 and there are 128 filters. So, the total number
of output elements is:

62×62×128

Total Number of FLOPs

The total number of FLOPs required to compute the output feature map is:

Total FLOPs=1152×62×62×128

Let's calculate this step-by-step:

1. Calculate 62×6262 \times 6262×62:

62×62=3844

2. Multiply by 128:

3844×128=491,392

3. Multiply by 1152:

491,392×1152=566,231,424.

Conclusion
The total number of floating-point operations (FLOPs) required to compute the output feature
map for the given convolutional layer is 566,231,424.

To provide a comprehensive analysis of how three popular deep learning


frameworks (TensorFlow, PyTorch, and Keras) utilize GPU acceleration, we
will cover the following:

1. Comparative Analysis of Frameworks:


o Introduction to each framework.
o Overview of GPU acceleration in each framework.
o Key features and ease of use.
o Performance and efficiency considerations.
2. Implementation of a Simple Neural Network for MNIST Digit
Classification:
o Source code for each framework using GPU support.
3. Summary of Training Time and Accuracy Results:
o A table comparing the training times and accuracies for each implementation.

1. Comparative Analysis of Frameworks

TensorFlow

Overview:

 TensorFlow, developed by Google, is a highly flexible and comprehensive open-


source platform for machine learning.
 It offers extensive support for both research and production, with capabilities for deep
learning and other ML tasks.

GPU Acceleration:
 TensorFlow provides built-in support for GPU acceleration using CUDA and cuDNN.
 Users can leverage GPUs by simply installing the GPU version of TensorFlow and
setting device contexts in the code.

Key Features:

 Flexible and powerful, suitable for both high-level and low-level operations.
 TensorFlow Hub for reusable pre-trained models.
 TensorFlow Extended (TFX) for production deployment.

Performance:

 TensorFlow is optimized for performance with large-scale ML tasks, including


distributed training.

PyTorch

Overview:

 PyTorch, developed by Facebook's AI Research lab, is known for its dynamic


computation graph, making it more intuitive and flexible.
 It is widely used in both academia and industry for research and development.

GPU Acceleration:

 PyTorch offers seamless GPU acceleration. Tensor operations are easy to move
between CPU and GPU.
 It uses CUDA for GPU support and allows dynamic graph building, which can be
particularly useful for certain applications.

Key Features:

 Dynamic computation graph (eager execution).


 Strong community support and extensive tutorials.
 Integrates well with Python's ecosystem.

Performance:

 PyTorch is designed for flexibility and ease of use, with competitive performance,
especially in research contexts.

Keras

Overview:

 Keras is a high-level neural networks API, written in Python and capable of running
on top of TensorFlow, Theano, and other frameworks.
 It is user-friendly and fast to prototype with.
GPU Acceleration:

 Keras supports GPU acceleration through its backend frameworks, primarily


TensorFlow.
 Users can switch between CPU and GPU by setting the backend appropriately.

Key Features:

 Simple and consistent interface for building neural networks.


 Extensive library of pre-trained models.
 Strong support for prototyping and rapid development.

Performance:

 Keras prioritizes ease of use and rapid development, with performance largely
dependent on the backend used (e.g., TensorFlow).

2. Implementation of a Simple Neural Network for MNIST Digit


Classification

TensorFlow Implementation

import tensorflow as tf

from tensorflow.keras.datasets import mnist

from tensorflow.keras.models import Sequential

from tensorflow.keras.layers import Dense, Flatten

from tensorflow.keras.utils import to_categorical

# Load data

(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train, x_test = x_train / 255.0, x_test / 255.0

y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Define model

model = Sequential([

Flatten(input_shape=(28, 28)),

Dense(128, activation='relu'),

Dense(10, activation='softmax')
])

# Compile model

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model

with tf.device('/GPU:0'):

history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate model

test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)

print(f"Test accuracy: {test_acc}")

PyTorch Implementation
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

# Load data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,),
(0.5,))])
train_dataset = datasets.MNIST('./data', train=True, download=True, transform=transform)
test_dataset = datasets.MNIST('./data', train=False, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=32, shuffle=False)

# Define model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.flatten = nn.Flatten()
self.fc1 = nn.Linear(28*28, 128)
self.fc2 = nn.Linear(128, 10)

def forward(self, x):


x = self.flatten(x)
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x

model = Net().cuda()

# Define loss and optimizer


criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train model
for epoch in range(10):
model.train()
for data, target in train_loader:
data, target = data.cuda(), target.cuda()
optimizer.zero_grad()
output = model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()

# Evaluate model
correct = 0
total = 0
model.eval()
with torch.no_grad():
for data, target in test_loader:
data, target = data.cuda(), target.cuda()
output = model(data)
_, predicted = torch.max(output.data, 1)
total += target.size(0)
correct += (predicted == target).sum().item()

test_acc = correct / total


print(f"Test accuracy: {test_acc}")

Keras Implementation
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Flatten

from keras.utils import to_categorical

# Load data
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

y_train, y_test = to_categorical(y_train), to_categorical(y_test)

# Define model
model = Sequential([
Flatten(input_shape=(28, 28)),

Dense(128, activation='relu'),
Dense(10, activation='softmax')
])

# Compile model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train model

history = model.fit(x_train, y_train, epochs=10, batch_size=32, validation_split=0.2)

# Evaluate model
test_loss, test_acc = model.evaluate(x_test, y_test, verbose=2)
print(f"Test accuracy: {test_acc}")
Comprehensive Report: Real-Time Object Detection for Autonomous
Driving Using GPU Computing

1. Introduction

Autonomous driving is one of the most transformative technologies of the


modern era. At its core, it involves enabling vehicles to navigate and operate
without human intervention. A critical component of autonomous driving is
real-time object detection, which allows the vehicle to recognize and respond to
various objects and obstacles on the road, such as other vehicles, pedestrians,
traffic signs, and more. This report delves into the importance of GPU
computing in solving this problem and implements a simplified version of a
deep learning model using GPU acceleration to demonstrate its efficacy.

2. Problem Description

Real-Time Object Detection in Autonomous Driving

Autonomous vehicles must process a vast amount of visual data in real time to
detect and classify objects accurately. The challenge lies in the need for high-
speed processing to ensure safety and reliability. Traditional CPU-based
systems struggle with the computational demands of real-time object detection
due to their limited parallel processing capabilities.

Challenges:

 High-speed image processing


 Accurate object detection and classification
 Handling diverse and dynamic environments
 Ensuring safety and reliability

3. Role of GPU Computing

Why GPU Computing is Crucial

1. Parallel Processing Capabilities: GPUs are designed to handle thousands


of simultaneous threads, making them ideal for the parallel nature of
deep learning computations.
2. Speed: GPUs significantly reduce the time required to train and infer
deep learning models, which is essential for real-time applications.
3. Efficiency: Handling large-scale data and complex models is more
efficient with GPUs, enabling faster and more accurate object detection.
4. Implementation

Simplified Model for Real-Time Object Detection

We will implement a simplified version of the YOLO (You Only Look Once)
model for object detection. YOLO is known for its speed and accuracy, making
it suitable for real-time applications.

4.1 Data Preparation

For simplicity, we'll use a subset of a well-known object detection dataset like
COCO (Common Objects in Context).

python
Copy code
import tensorflow as tf
from tensorflow.keras.preprocessing.image import
ImageDataGenerator

# Assuming dataset is already downloaded and


preprocessed
# Load and preprocess data
datagen = ImageDataGenerator(rescale=1./255,
validation_split=0.2)

train_generator = datagen.flow_from_directory(
'data/train', target_size=(416, 416),
batch_size=32, class_mode='categorical',
subset='training'
)

val_generator = datagen.flow_from_directory(
'data/val', target_size=(416, 416),
batch_size=32, class_mode='categorical',
subset='validation'
)
4.2 Model Definition

We will define a simplified version of the YOLO model.

python
Copy code
from tensorflow.keras.layers import Conv2D, Input,
BatchNormalization, LeakyReLU, ZeroPadding2D
from tensorflow.keras.models import Model

def yolo_body(inputs, num_anchors, num_classes):


x = Conv2D(32, (3,3), padding='same',
use_bias=False)(inputs)
x = BatchNormalization()(x)
x = LeakyReLU(alpha=0.1)(x)

# Add more layers as needed to match the


simplified YOLO architecture
x = Conv2D(num_anchors * (num_classes + 5),
(1,1), padding='same', use_bias=False)(x)

return Model(inputs, x)

inputs = Input(shape=(416, 416, 3))


model = yolo_body(inputs, num_anchors=3,
num_classes=80)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])

# Train the model with GPU acceleration


with tf.device('/GPU:0'):
history = model.fit(train_generator, epochs=10,
validation_data=val_generator)
5. Performance Evaluation

Training Time and Accuracy

python
Copy code
# Evaluate the model
test_loss, test_acc = model.evaluate(val_generator,
verbose=2)
print(f"Validation accuracy: {test_acc}")

# Summarize performance
import matplotlib.pyplot as plt

# Plot training & validation accuracy values


plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()

# Plot training & validation loss values


plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.show()
6. Impact of GPU Computing

Speed and Efficiency

1. Training Time Reduction: GPU acceleration drastically reduces the


training time from hours (or days) to minutes (or hours), allowing for
quicker model iterations.
2. Scalability: Handling larger datasets and more complex models becomes
feasible, enabling more accurate and robust real-time object detection.
3. Real-Time Processing: With GPUs, real-time processing of video frames
is achievable, which is crucial for autonomous driving.

Example Comparison:

 CPU-based training: 10 hours


 GPU-based training: 1 hour

Inference Speed:

 CPU-based inference: 5 frames per second (fps)


 GPU-based inference: 30 fps

7. Conclusion

GPU computing plays a pivotal role in the advancement of autonomous driving


by enabling real-time object detection. The ability to process vast amounts of
data quickly and accurately not only enhances the capabilities of autonomous
vehicles but also improves safety and reliability. This simplified implementation
demonstrates the significant impact of GPU acceleration, highlighting its
necessity in real-world applications.

Deliverables

 Comprehensive Report: A 6-8 page document detailing the problem, the


role of GPU computing, and the implementation, including performance
evaluation and impact analysis.
 Source Code: Provided above, demonstrating the implementation of a
deep learning model using TensorFlow and GPU acceleration.
 Performance Evaluation and Impact Analysis: Graphs and metrics
illustrating the training time, accuracy, and the benefits of GPU
computing.

The final report should include sections for introduction, problem description,
role of GPU computing, implementation details, performance evaluation,
impact analysis, and conclusion, along with appropriate references and
appendices for the source code.

You might also like