Laboratory Practice V
Laboratory Practice V
Course Objectives:
CO/PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1 1 - 1 1 - 2 1 - - - - -
CO2 1 2 1 - - 1 - - - - - 1
CO3 - 1 1 1 1 1 - - - - - -
CO4 3 3 3 - 3 - - - - - - -
CO5 3 3 3 3 3 - - - - - - -
CO6 3 3 3 3 3 - - - - - - -
CO7 3 3 3 3 3 - - - - - -
Title of the Assignment: Design and implement Parallel Breadth First Search and Depth First Search based on existing
algorithms using OpenMP. Use a Tree or an undirected graph for BFS and DFS
Objective of the Assignment: Students should be able to write a program to implement ParallelBreadth First Search and
Depth First Search based on existing algorithms using OpenMP
Prerequisite:
1. Basic of programming language
2. Concept of BFS and DFS
3. Concept of Parallelism
Contents for Theory:
1. What is BFS?
2. What is DFS?
3. Concept of OpenMP
What is BFS?
BFS stands for Breadth-First Search. It is a graph traversal algorithm used to explore all the nodes ofa graph or tree
systematically, starting from the root node or a specified starting point, and visiting all the neighboring nodes at the
current depth level before moving on to the next depth level. The algorithm uses a queue data structure to keep track
of the nodes that need to be visited, and marks each visited node to avoid processing it again. The basic idea of the BFS
algorithm is to visit all the nodes at a given level before moving on to the next level, which ensures that all the nodes
are visited in breadth-first order. BFS is commonly used in many applications, such as finding the shortest path between
two nodes, solving puzzles, and searching through a tree or graph.
Sinhgad College Of Engineering, Pune. 3
Now let’s take a look at the steps involved in traversing a graph by using Breadth-First Search:
Step 1: Take an Empty Queue.
Step 2: Select a starting node (visiting a node) and insert it into the Queue.
Step 3: Provided that the Queue is not empty, extract the node from the Queue and insert its child nodes (exploring a
node) into the Queue.
Step 4: Print the extracted node.
What is DFS?
DFS stands for Depth-First Search. It is a popular graph traversal algorithm that explores as far as possible along each
branch before backtracking. This algorithm can be used to find the shortest pathbetween two vertices or to traverse a
graph in a systematic way. The algorithm starts at the root node and explores as far as possible along each branch before
backtracking. The backtracking is done to explore the next branch that has not been explored yet.
DFS can be implemented using either a recursive or an iterative approach. The recursive approach is simpler to
implement but can lead to a stack overflow error for very large graphs. The iterative approach uses a stack to keep track
of nodes to be explored and is preferred for larger graphs.
DFS can also be used to detect cycles in a graph. If a cycle exists in a graph, the DFS algorithm will eventually reach a
node that has already been visited, indicating that a cycle exists. A standard DFS
implementation puts each vertex of the graph into one of two categories: 1. Visited 2. Not Visited The purpose of the
algorithm is to mark each vertex as visited while avoiding cycles.
Concept of OpenMP
● OpenMP (Open Multi-Processing) is an application programming interface (API) that supports shared-memory
parallel programming in C, C++, and Fortran. It is used to write parallel programs that can run on multicore
processors, multiprocessor systems, and parallel computing clusters.
● OpenMP provides a set of directives and functions that can be inserted into the source code of a program to parallelize
its execution. These directives are simple and easy to use, and they can beapplied to loops, sections, functions, and other
Sinhgad College Of Engineering, Pune. 4
program constructs. The compiler then generates parallel code that can run on multiple processors concurrently.
● OpenMP programs are designed to take advantage of the shared-memory architecture of modern processors, where
multiple processor cores can access the same memory. OpenMP uses a fork-join model of parallel execution, where a
master thread forks multiple worker threads to execute a parallel region of the code, and then waits for all threads to
complete before continuing with the sequential part of the code.
● OpenMP is widely used in scientific computing, engineering, and other fields that require
high-performance computing. It is supported by most modern compilers and is available on a wide range of platforms,
including desktops, servers, and supercomputers. How Parallel BFS Work .
● Parallel BFS (Breadth-First Search) is an algorithm used to explore all the nodes of a graph or tree systematically in
parallel. It is a popular parallel algorithm used for graph traversal in distributed computing, shared-memory systems,
and parallel clusters.
● The parallel BFS algorithm starts by selecting a root node or a specified starting point, and then assigning it to a
thread or processor in the system. Each thread maintains a local queue of nodes to be visited and marks each visited
node to avoid processing it again.
● The algorithm then proceeds in levels, where each level represents a set of nodes that are at a certain distance from
the root node. Each thread processes the nodes in its local queue at the current level, and then exchanges the nodes
that are adjacent to the current level with other threads or processors. This is done to ensure that the nodes at the next
level are visited by the next iteration of the algorithm.
● The parallel BFS algorithm uses two phases: the computation phase and the communication phase.In the computation
phase, each thread processes the nodes in its local queue, while in the communication phase, the threads exchange the
nodes that are adjacent to the current level with other threads or processors.
● The parallel BFS algorithm terminates when all nodes have been visited or when a specified node has been found.
The result of the algorithm is the set of visited nodes or the shortest path from the root node to the target node.
● Parallel BFS can be implemented using different parallel programming models, such as OpenMP,MPI, CUDA, and
others. The performance of the algorithm depends on the number of threads or processors used, the size of the graph,
and the communication overhead between the threads or processors.
Code to implement BFS using OpenMP:
#include<iostream>
#include<stdlib.h>
#include<queue>
using namespace std;
class node
{
do
{
cout<<"\n enter data=>";cin>>data;
root=insert(root,data);
cout<<"do you want insert one more node?";cin>>ans;
}while(ans=='y'||ans=='Y');
bfs(root);
return 0;
}
Run Commands:
#include
<iostream>
#include <vector>
#include <stack>
#include <omp.h>
using namespace std;
const int MAX = 100000;
vector<int>graph[MAX;
bool visited[MAX];
void dfs(int node) {
stack<int>s;
s.push(node);
while (!s.empty())
{
int curr_node = s.top();
s.pop();
if (!visited[curr_node])
{
visited[curr_node] = true;
if (visited[curr_node])
{
cout << curr_node << " ";
}
#pragma omp parallel for
for (int i = 0; i < graph[curr_node].size(); i++)
Sinhgad College Of Engineering, Pune. 9
{
int adj_node = graph[curr_node][i];
if (!visited[adj_node])
{
s.push(adj_node);
}
}
}
}
}
int main() {
int n, m, start_node;
cout << "Enter No of Node,Edges,and startnode:" ;
cin >> n >> m >> start_node;
//n: node,m:edges
cout << "Enter Pair of edges:" ;
for (int i = 0; i < m; i++)
{ int u, v;
cin >> u >> v;
//u and v: Pair of edges
graph[u].push_back(v);
graph[v].push_back(u);
}
#pragma omp parallel for
for (int i = 0; i < n; i++)
{
visited[i] = false;
}
dfs(start_node);
/*for (int i = 0; i < n; i++)
{
if (visited[i])
{
cout << i << " ";
}
Sinhgad College Of Engineering, Pune. 10
}*/
return 0;
}
Conclusion:
In this way we can achieve parallelism while implementing Breadth First Search and Depth First Search
Title of the Assignment: Write a program to implement Parallel Bubble Sort. Use existingalgorithms and measure the
performance of sequential and parallel algorithms.
Objective of the Assignment: Students should be able to Write a program to implement Parallel Bubble Sort and can
measure the performance of sequential and parallel algorithms.
Prerequisite:
4. Basic of programming language
5. Concept of Bubble Sort
6. Concept of Parallelism
Contents for Theory:
1. What is Bubble Sort? Use of Bubble Sort
2. Example of Bubble sort?
3. Concept of OpenMP
4. How Parallel Bubble Sort Work
5. How to measure the performance of sequential and parallel algorithms?
What is Bubble Sort?
Bubble Sort is a simple sorting algorithm that works by repeatedly swapping adjacent elements if they are in the
wrong order. It is called "bubble" sort because the algorithm moves the larger elementstowards the end of the array in a
manner that resembles the rising of bubbles in a liquid.
The basic algorithm of Bubble Sort is as follows:
1. Start at the beginning of the array.
2. Compare the first two elements. If the first element is greater than the second element, swap them.
3. Move to the next pair of elements and repeat step 2.
and 2.
Step 2: The algorithm now has 3, 5, 4, 1, and 2 to compare, this time around, it compares the next two values, which
N
IR
E
E
IN
Step 4: The algorithm now has 3, 4, 1, 5, and 2 to compare. It compares the next two values, which are 5 and 2. 5 is
greater than 2, so 5 takes the index of 2 and the numbers become 3, 4, 1, 2, and 5.
EL
L
O
C
'S
H
T
E
E
P
A
Y
That’s the first iteration. And the numbers are now arranged as 3, 4, 1, 2, and 5 – from the initial 5, 3, 4, 1, and 2. As
you might realize, 5 should be the last number if the numbers are sorted in ascending order. This means the first iteration
is really completed.
Second Iteration of the Sorting and the Rest
The algorithm starts the second iteration with the last result of 3, 4, 1, 2, and 5. This time around, 3 is smaller than
4, so no swapping happens. This means the numbers will remain the same.
The algorithm proceeds to compare 4 and 1. 4 is greater than 1, so 4 is swapped for 1 and thenumbers become 3, 1, 4,
2, and 5.
N
Sinhgad College Of Engineering, Pune. 14
I
G
The algorithm now proceeds to compare 4 and 2. 4 is greater than 2, so 4 is swapped for 2 andthe numbers become
3, 1, 2, 4, and 5.
4 is now in the right place, so no swapping occurs between 4 and 5 because 4 is smaller than 5.
PY
A
That’s how the algorithm continues to compare the numbers until they are arranged inascending order of
1, 2, 3, 4, and 5.
Concept of OpenMP
I
G
N
E
F
O
EE
G
L
L
O
C
'S
● OpenMP (Open Multi-Processing) is an application programming interface (API) that supports shared-memory
H
parallel programming in C, C++, and Fortran. It is used to write parallel programs that can run on multicore
processors, multiprocessor systems, and parallel computing clusters.
● OpenMP provides a set of directives and functions that can be inserted into the source code of a program to
parallelize its execution.
These directives are simple and easy to use, and they can be applied to loops, sections, functions, and other program
constructs.
● OpenMP programs are designed to take advantage of the shared-memory architecture of modern processors,
where multiple processor cores can access the same memory. OpenMP uses a fork-join model of parallel
execution, where a master thread forks multiple worker threads to execute a parallel region of the code, and
then waits for all threads to complete before continuing with the sequential part of the code.
How Parallel Bubble Sort Work
● Parallel Bubble Sort is a modification of the classic Bubble Sort algorithm that takes advantage of parallel
processing to speed up the sorting process.
● In parallel Bubble Sort, the list of elements is divided into multiple sublists that are sorted concurrently by
multiple threads. Each thread sorts its sublist using the regular Bubble Sort algorithm. When all sublists have
been sorted, they are merged together to form the final sortedlist.
● The parallelization of the algorithm is achieved using OpenMP, a programming API that supports parallel
processing in C++, Fortran, and other programming languages. OpenMP provides a set of compiler directives
that allow developers to specify which parts of the code can be executed in parallel.
● In the parallel Bubble Sort algorithm, the main loop that iterates over the list of elements is divided into multiple
iterations that are executed concurrently by multiple threads. Each thread sorts a subset of the list, and the threads
synchronize their work at the end of each iteration to ensure that the elements are properly ordered.
● Parallel Bubble Sort can provide a significant speedup over the regular Bubble Sort algorithm, especially when
sorting large datasets on multi-core processors. However, the speedup is
limited by the overhead of thread creation and synchronization, and it may not be worth the effort for small
datasets or when using a single-core processor.
To measure the performance of sequential Bubble sort and parallel Bubble sort algorithms,you can follow
these steps:
2. Choose a range of test cases, such as arrays of different sizes and different degrees ofsortedness, to
test the performance of both algorithms.
When measuring the performance of the parallel Bubble sort algorithm, you will need to specify the number of threads
to use. You can experiment with different numbers of threads to find the optimal value for your system.
How to check CPU utilization and memory consumption in ubuntu
In Ubuntu, you can use a variety of tools to check CPU utilization and memory consumption. Here are some common
tools:
1. top: The top command provides a real-time view of system resource usage, including CPU utilization and
memory consumption. To use it, open a terminal window and type top. The output will display a list of processes
sorted by resource usage, with the most resource-intensive processes at the top.
2. htop: htop is a more advanced version of top that provides additional features, such as interactive process
filtering and a color-coded display. To use it, open a terminal window and type htop.
3. ps: The ps command provides a snapshot of system resource usage at a particular moment in time. To use it,
open a terminal window and type ps aux. This will display a list of all running processes and their resource
usage.
4. free: The free command provides information about system memory usage, including total, used, and free
memory. To use it, open a terminal window and type free -h.
5. vmstat: The vmstat command provides a variety of system statistics, including CPU utilization, memory usage,
and disk activity. To use it, open a terminal window and type vmstat.
Code to Implement parallel bubble sort using OpenMP
import numpy as np
import time
import random
import omp
def parallel_bubble_sort(arr):
n = len(arr)
for i in range(n):
# This guarantees that the array is fully sorted before the loop ends
if _name_ == '_main_':
Output:
import numpy as np
import time
import random
Sinhgad College Of Engineering, Pune. 19
import omp
def parallel_merge_sort(arr):
n = len(arr)
# Base case if n == 1:
return arr
# Use the parallel construct to distribute the loop iterations among the threads
# Each thread merges a portion of the array
with omp.parallel(num_threads=omp.get_max_threads(), default_shared=False, private=['k']):
for k in range(n1+n2):
if i == n1:
merged_arr[k:] = right_sorted[j:]
break
elif j == n2:
merged_arr[k:] = left_sorted[i:]
break
elif left_sorted[i] <= right_sorted[j]:
merged_arr[k] = left_sorted[i]
i += 1
else:
return merged_arr
if _name_ == '_main_':
sorted_arr = parallel_merge_sort(arr)
end_time = time.time()
Output:
Conclusion-In this way we can implement Bubble Sort in parallel way using OpenMP alsocome to know how to
how to measure performance of serial and parallel algorithm
Title of the Assignment:Implement Min, Max, Sum and Average operations using ParallelReduction.
Objective of the Assignment: Students should be able to learn about how to perform min, max, sum, and average
operations on a large set of data using parallel reduction technique in CUDA. The program defines four kernel
functions, reduce_min, reduce_max, reduce_sum, and reduce_avg.
Prerequisite:
1. Knowledge of parallel programming concepts and techniques, such as shared memory,threads, and
synchronization.
2. Familiarity with a parallel programming library or framework, such as OpenMP, MPI, orCUDA.
4. A programming language that supports parallel programming constructs, such as C, C++, Fortran, or
Python.
Contents of Theory :
Parallel reduction is a common technique used in parallel computing to perform a reduction operation on a large
dataset. A reduction operation combines a set of values into a single value, suchas computing the sum, maximum,
minimum, or average of the values. Parallel reduction exploits the parallelism available in modern multicore
processors, clusters of computers, or GPUs to speed up the computation.
The parallel reduction algorithm works by dividing the input data into smaller chunks that can be processed independently
in parallel. Each thread or process computes the reduction operation on its local chunk of data, producing a partial result.
The partial results are then combined in a hierarchical manner until a single result is obtained.
Sinhgad College Of Engineering, Pune. 22
The most common parallel reduction algorithm is the binary tree reduction algorithm, which has a logarithmic time
complexity and can achieve optimal parallel efficiency. In this algorithm, the input
data is initially divided into chunks of size n, where n is the number of parallel threads or processes. Each thread or
process computes the reduction operation on its chunk of data, producing n partial results.
The partial results are then recursively combined in a binary tree structure, where each internal node represents the reduction
operation of its two child nodes. The tree structure is built in a bottom-up manner, starting from the leaf nodes and ending
at the root node. Each level of the
tree reduces the number of partial results by a factor of two, until a single result is obtained at the root node.
The binary tree reduction algorithm can be implemented using various parallel programming models, such as OpenMP,
MPI, or CUDA. In OpenMP, the algorithm can be implemented using the parallel and for directives for parallelizing
the computation, and the reduction clause for combiningthe partial results. In MPI, the algorithm can be implemented
using the MPI_Reduce function for performing the reduction operation, and the MPI_Allreduce function for
distributing the result to all processes. In CUDA, the algorithm can be implemented using the parallel reduction kernel,
which uses shared memory to store the partial results and reduce the memory access latency.
Parallel reduction has many applications in scientific computing, machine learning, data analytics, and computer
graphics. It can be used to compute the sum, maximum, minimum, or average of large datasets, to perform data
filtering, feature extraction, or image processing, to solve optimization problems, or to accelerate numerical
simulations. Parallel reduction can also be combined with other parallel algorithms, such as parallel sorting, searching,
or matrix operations, to achieve higher performance and scalability.
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
#define CHUNK_SIZE 1000
struct ChunkStats {
int min_val;
int sum_val;
#pragma omp parallel shared(data, chunk_size, num_chunks, chunk_stats) private(i, j) {int thread_id =
omp_get_thread_num();
int start_index = thread_id * chunk_size;
// perform a binary operation on adjacent pairs of minimum and sum values intmin_val =
chunk_stats[0].min_val;
// the final minimum value is the minimum value of the entire dataset
*min_val_ptr = min_val;
// the final average value is the sum of the entire dataset divided by its size
int main() {
}
int min_val;
double avg_val;
parallel_reduction_min_avg(data, data_size, &min_val, &avg_val); printf("Minimum value:
%d\n", min_val); printf("Average value: %lf\n",avg_val);
free(data);
return 0;
}
Code to Implement Max and Sum operations using Parallel Reduction.
#include <stdio.h>
#include <stdlib.h>
#include <omp.h>
thread_max_val = *max_val_ptr;
*max_val_ptr = thread_max_val;
{
// Compute sum value
thread_sum_val = *sum_val_ptr;
}
*sum_val_ptr += thread_sum_val;
int main() {
*max_val_ptr = data[i];
}
*sum_val_ptr += data[i];
}
{
#pragma omp section
{
// Compute maximum value
for (int i = 1; i < omp_get_num_threads(); i++) {
int thread_max_val;
#pragma omp critical
{
thread_max_val = *max_val_ptr;
}
#pragma omp flush
*max_val_ptr = thread_max_val;
{
thread_sum_val = *sum_val_ptr;
}
#pragma omp flush
*sum_val_ptr += thread_sum_val;
}
}
}
int main() {
return 0;}
This program uses CUDA to add two large vectors of size 1000000. The vectors are initialized on the host, and then
copied to the device memory. A kernel function is defined to perform the vector addition, and then launched on the
device. The result is copied back to the host memory and verified. Finally, the device and host memories are
freed.
CUDA Program for Matrix Multiplication:
This program multiplies two matrices of size n using CUDA. It first allocates host memory for the matrices and
initializes them. Then it allocates device memory and copies the matrices to the device. It sets the kernel launch
configuration and launches the kernel function matrix_multiply. The kernel function performs the matrix multiplication
and stores the result in matrix c. Finally, it copies the result back to the host and frees the device and host memory.
The kernel function calculates the row and column indices of the output matrix using the block index and thread
index. It then uses a for loop to calculate the sum of the products of the corresponding elements in the input
matrices. The result is stored in the output matrix.
Note that in this program, we use CUDA events to measure the elapsed time of the kernel function. This is because
the kernel function runs asynchronously on the GPU, so we need to use events to synchronize the host and device
and measure the time accurately.
#include <stdio.h> #define
BLOCK_SIZE 16
global void matrix_multiply(float *a, float *b, float *c, int n)
{
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
float sum = 0;
if (row < n && col < n) {
for (int i = 0; i < n; ++i) {
sum += a[row * n + i] * b[i * n + col];
}
c[row * n + col] = sum;
}
}
int main()
Sinhgad College Of Engineering, Pune. 36
{
int n = 1024;
size_t size = n * n * sizeof(float);
float *a, *b, *c;
float *d_a, *d_b, *d_c;
cudaEvent_t start, stop;
float elapsed_time;
// Allocate host memory
a = (float *) malloc(size);
b =(float*)malloc(size);
c = (float*)malloc(size);
// Initialize matrices
Objective of the Assignment: Students get hands-on experience in developing high-performance computing
applications for the AI/ML domain. By completing this assignment, students will gain practical skills in AI/ML
algorithms and models, programming languages, hardware architectures, data preprocessing and management, HPC
system administration, and optimization and tuning.
Prerequisite:
1. Knowledge of AI/ML algorithms and models: A deep understanding of AI/ML algorithms and models is
essential to design and implement an HPC application that can efficiently perform large-scale training and
inference. This requires knowledge of statistical methods, linear algebra, optimization techniques, and deep
learning frameworks such as TensorFlow, PyTorch,and MXNet.
2. Proficiency in programming languages: Proficiency in programming languages such as C++, Python, and
CUDA is essential to develop an HPC application for AI/ML. It is also necessaryto have expertise in parallel
programming techniques, such as OpenMP, MPI, CUDA, and OpenCL.
3. Knowledge of hardware architectures: Knowledge of different hardware architectures, such as CPU, GPU,
FPGA, and ASIC, is essential to select the most suitable hardware platform for the HPC application. It is also
necessary to have expertise in optimizing the HPC application for specific hardware architectures.
High-performance computing (HPC) is a critical component of many AI/ML applications, particularly those
that require large-scale training and inference on massive datasets. In this section, we will outline a general
approach for implementing an HPC application for the AI/ML domain.
Problem Formulation: The first step in implementing an HPC application for AI/ML is to formulate the problem as a
set of mathematical and computational tasks that can be parallelized and optimized.
This involves defining the problem domain, selecting appropriate algorithms and models, and determining the
computational and memory requirements.
Hardware Selection: The next step is to select the appropriate hardware platform for the HPC application. This
Sinhgad College Of Engineering, Pune. 39
involves considering the available hardware options, such as CPU, GPU, FPGA, and ASIC, and selecting the most
suitable option based on the performance, cost, power consumption, and scalability requirements.
Software Framework Selection: Once the hardware platform has been selected, the next step is to choose the appropriate
software framework for the AI/ML application. This involves considering the available options, such as TensorFlow,
PyTorch, MXNet, and Caffe, and selecting the most suitable framework based on the programming language,
performance, ease of use, and community support.
Data Preparation and Preprocessing: Before training or inference can be performed, the data must be prepared and
preprocessed. This involves cleaning the data, normalizing and scaling the data, and splitting the data into training,
validation, and testing sets. The data must also be stored in a format that is compatible with the selected software
framework.
Model Training or Inference: The main computational task in an AI/ML application is model training or inference.
In an HPC application, this task is parallelized and optimized to take advantage of the available hardware resources.
This involves breaking the model into smaller tasks that can be parallelized, using techniques such as data parallelism,
model parallelism, or pipeline parallelism. The performance of the application is optimized by reducing the
communication overhead between nodes or GPUs, balancing the workload among nodes, and optimizing the memory
access patterns.
Model Evaluation: After the model has been trained or inference has been performed, the performance of the model
must be evaluated. This involves computing the accuracy, precision, recall, and other metrics on the validation and
testing sets. The performance of the HPC application is evaluated by measuring the speedup, scalability, and efficiency
of the parallelized tasks.
Optimization and Tuning: Finally, the HPC application must be optimized and tuned to achieve the best possible
performance. This involves profiling the code to identify bottlenecks and optimizing the code using techniques such
as loop unrolling, vectorization, and cache optimization. The performance of the application is also affected by the
choice of hyperparameters, such as the learning rate, batch size, and regularization strength, which must be tuned using
techniques such as grid search or Bayesian optimization.
Approach: We will use TensorFlow to define and train the neural network and use a parallelcomputing framework to
Requirements:
Steps:
Code:
import tensorflow as tf
model = tf.keras.models.Sequential([
tf.keras.layers.MaxPooling2D((2, 2)),
tf.keras.layers.Flatten(), tf.keras.layers.Dense(10,
activation='softmax')
])
mnist = tf.keras.datasets.mnist
end = n
x_train_chunk = x_train[start:end]
y_train_chunk = y_train[start:end]
# Compile the model
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
print(f"Epoch {epoch + 1}: Train accuracy = {train_acc:.4f}, Test accuracy = {test_acc /size:.4f}")
Output:
Epoch 1: Train accuracy = 0.9773, Test accuracy = 0.9745Epoch 2:
Train accuracy = 0.9859, Test accuracy = 0.9835 Epoch 3: Train
accuracy = 0.9887, Test accuracy = 0.9857Epoch 4: Train accuracy =
0.9905, Test accuracy = 0.9876Epoch 5: Train accuracy = 0.9919,
Test accuracy = 0.9880
Conclusion:
implementing an HPC application for the AI/ML domain involves formulating the problem, selecting the hardware
and software frameworks, preparing and preprocessing the data, parallelizing and optimizing the model training
or inference tasks, evaluating the model performance, and optimizing and tuning the HPC application for
maximum performance. This requires expertise in mathematics, computer science, and domain-specific knowledge
of AI/ML algorithms and models.
Mini Project: Evaluate performance enhancement of parallel Quicksort Algorithm using MPI
Objective: Sort a large dataset of numbers using parallel Quicksort Algorithm with MPI andcompare its performance
with the serial version of the algorithm.
Approach: We will use Python and MPI to implement the parallel version of Quicksort Algorithmand compare its
performance with the serial version of the algorithm.
Requirements:
Python 3.x
mpi4py
Theory :
Similar to mergesort, QuickSort uses a divide-and-conquer strategy and is one of the fastest sorting algorithms; it can
be implemented in a recursive or iterative fashion. The divide and conquer is a general algorithm design paradigm and
key steps of this strategy can be summarized as follows:
• Divide: Divide the input data set S into disjoint subsets S1, S2, S3…Sk.
• Recursion: Solve the sub-problems associated with S1, S2, S3…Sk.
• Conquer: Combine the solutions for S1, S2, S3…Sk. into a solution for S.
• Base case: The base case for the recursion is generally subproblems of size 0 or 1.
Many studies [2] have revealed that in order to sort N items; it will take QuickSort an average running time of O(NlogN).
The worst-case running time for QuickSort will occur when the pivot is a unique minimum or maximum element, and
as stated in [2], the worst-case running time for QuickSort on N items is O(N2). These different running times can be
influenced by the input distribution (uniform, sorted or semi-sorted, unsorted, duplicates) and the choice of the pivot
element. Here is a simple pseudocode of the QuickSort algorithm adapted from Wikipedia [1].
Sinhgad College Of Engineering, Pune. 44
We have made use of Open MPI as the backbone library for parallelizing the QuickSort algorithm. In fact, learning
message passing interface (MPI) allows us to strengthen our fundamental knowledge onparallel programming, given that
MPI is lower level than equivalent libraries (OpenMP). As simple as its name means, the basic idea behind MPI is that
messages can be passed or exchanged among different processes in order to perform a given task. An illustration can be a
communication and coordination by a master process which splits a huge task into chunks and shares them to its slave
processes. Open MPI is developed and maintained by a consortium of academic, research and industry partners; it
combines the expertise, technologies and resources all across the high performance computing community [11]. As
elaborated in [4], MPI has two types of communication routines: point-to-point communication routines and collective
communication routines. Collective routines as explained in the implementation section have been used in this study.
Algorithm :
In general, the overall algorithm used here to perform QuickSort with MPI works as followed:
iii. Divide the input size SIZE by the number of participating processes npes to get eachchunk size local
size.
1. Initialize MPI:
from mpi4py import MPI comm =
MPI.COMM_WORLD
rank = comm.Get_rank()size
= comm.Get_size()
def quicksort_parallel(arr): if
len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]left =
[]
middle = []
right = []
for x in arr:
if x < pivot:
left.append(x)
elif x == pivot:
middle.append(x)
comm.barrier()
comm.Scatter(left, chunk_left, root=0)
comm.Scatter(middle, chunk_middle, root=0)
comm.Scatter(right, chunk_right, root=0)
start_time = time.time()
quicksort_serial(arr)
5. Compare the performance of the serial and parallel versions of the algorithmpython:
if rank == 0:
print(f"Serial Quicksort Algorithm time: {serial_time:.4f} seconds") print(f"Parallel Quicksort
Algorithm time: {parallel_time:.4f} seconds")
Output:
Theory - Huffman Encoding is a lossless data compression algorithm that works by assigning variable-length codes
to the characters in a given text or data stream based on their frequency of occurrence. This encoding scheme can
be implemented on GPU to speed up the encoding process.
The variable-length codes assigned to input characters are Prefix Codes, means the codes (bit sequences) are assigned
in such a way that the code assigned to one character is not the prefix of code assigned to any other character. This is how
Huffman Coding makes sure that there is no ambiguity when decoding the generated bitstream.
Let us understand prefix codes with a counter example. Let there be four characters a, b, c and d, andtheir corresponding
variable length codes be 00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of
codes assigned to a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or
“acd” or “ab”.
To optimize the implementation for GPU, we can use parallel programming techniques such as CUDA, OpenCL, or
HIP to parallelize the calculation of character frequencies, construction of the Huffman tree, and generation of Huffman
codes.
Here are some specific optimizations that can be applied to each step:
1.Calculating character frequencies:
Use parallelism to split the input text into chunks and count the frequencies of each character in parallel on different
threads.
Source Code -
Output -
Input text: Hello, world!
Encoded text: 01000110 11010110 10001011 10101110 11110100 11011111 00101101 01000000
11111010
Theory -
Query processing is the process through which a Database Management System (DBMS) parses,verifies, and optimizes a
given query before creating low-level code that the DB understands.
Query Processing in DBMS, like any other High-Level Language (HLL) where code is first generated and then executed
to perform various operations, has two phases: compile-time and runtime.
Query the use of declarative languages and query optimization is one of the main factors contributing to the success of
RDBMS technology. Any database allows users to create queries to request specificdata, and the database then uses
effective methods to locate the requested data.
A database optimization approach based on CMP has been studied by numerous other academics. But the majority of
their effort was on optimizing join operations while taking into account the L2-cache and the parallel buffers of the
shared main memory.
The following techniques can be used to make a query parallel
• I/O parallelism
• Internal parallelism of queries
• Parallelism among queries
• Within-operation parallelism
• Parallelism in inter-operation
I/O parallelism :
This type of parallelism involves partitioning the relationships among the discs in order to speed up the retrieval of
relationships from the disc.
The inputted data is divided within, and each division is processed simultaneously. After processing all of the partitioned
data, the results are combined. Another name for it is data partitioning.
Hash partitioning is best suited for point queries that are based on the partitioning attribute and have the benefit of
offering an even distribution of data across the discs.
It should be mentioned that partitioning is beneficial for the sequential scans of the full table storedon “n” discs and the
speed at which the table may be scanned. For a single disc system, relationship takes around 1/n of the time needed to
scan the table. In I/O parallelism, there are four different methods of partitioning:
Range partitioning :
Each disc receives continuous attribute value ranges while using range partitioning. For instance, ifwe are range
partitioning three discs with the numbers 0, 1, and 2, we may assign a relation with a value of less than 5 is written
to disk0, numbers from 5 to 40 are sent to disk1, and values above 40 are written to disk2.
It has several benefits, such as putting shuffles on the disc that have attribute values within a specified range.
Round-robin partitioning :
Any order can be used to study the relationships in this method. It sends the ith tuple to the disc number (i% n).
Therefore, new rows of data are received by discs in turn. For applications that want to read the full relation sequentially
for each query, this strategy assures an even distribution of tuples across drives.
Schema Partitioning :
Various tables inside a database are put on different discs using a technique called schemapartitioning.
Intra-query parallelism :
Using a shared-nothing paralleling architecture technique, intra-query parallelism refers to the processing of a single
query in a parallel process on many CPUs.
This employs two different strategies:
First method — In this method, a duplicate task can be executed on a small amount of data by each CPU.
Second method — Using this method, the task can be broken up into various sectors, with each CPU carrying out a separate
subtask.
Inter-query parallelism
Each CPU executes numerous transactions when inter-query parallelism is used. Parallel transaction processing is what it
is known as. To support inter-query parallelism, DBMS leverages transaction dispatching.
We can also employ a variety of techniques, such as effective lock management. This technique runs each query
sequentially, which slows down the running time.
In such circumstances, DBMS must be aware of the locks that various transactions operating on various processes
have acquired. When simultaneous transactions don’t accept the same data, inter-query parallelism on shared
storage architecture works well.
Intra-operation parallelism :
In this type of parallelism, we execute each individual operation of a task, such as sorting, joins, projections, and so forth,
in parallel. Intra-operation parallelism has a very high parallelism level. Database systems naturally employ this kind of
parallelism. Consider the following SQL example: SELECT * FROM the list of vehicles and sort by model number;
Since a relation might contain a high number of records, the relational operation in theaforementioned
query is sorting.
Because this operation can be done on distinct subsets of the relation in several processors, it takes less time to sort the
data.
Inter-operation parallelism :
This term refers to the concurrent execution of many operations within a query expression. They come in two varieties:
Pipelined parallelism — In pipeline parallelism, a second operation consumes a row of the first operation’s output before
the first operation has finished producing the whole set of rows in its output. Additionally, it is feasible to perform these
two processes concurrently on several CPUs, allowing one operation to consume tuples concurrently with another operation
and thereby reduce them.
It is advantageous for systems with a limited number of CPUs and prevents the storage of interim results on a disc.
Independent parallelism- In this form of parallelism, operations contained within query phrases that are independent of
one another may be carried out concurrently. This analogy is extremely helpful when dealing with parallelism of a
lower degree.
The development of parallel database systems is an example of how database management and parallel
1. Partitioning: The first step is to partition the data into smaller subsets. The partitioning can be done based on different
criteria, such as range partitioning, hash partitioning, or list partitioning. This can be done in parallel by assigning
different processors or nodes to handle different parts of the partitioning process.
2. Query optimization: Once the data is partitioned, the next step is to optimize the queries. Query optimization involves
finding the most efficient way to execute the query by considering factors such as index usage, join methods, and
filtering. This can also be done in parallel by assigning different processors or nodes to handle different parts of the
query optimization process.
3. Query execution: After the queries are optimized, the final step is to execute the queries. The execution can be done
in parallel by assigning different processors or nodes to handle different parts of the execution process. The results can
then be combined to generate the final result set.
To implement parallelization of database query optimization, we can use parallel programming frameworks such as
OpenMP or CUDA. These frameworks provide a set of APIs and tools to distribute the workload among multiple
processors or nodes and to manage the synchronization and communication between them.
Here's an example of how we can parallelize the query optimization process using OpenMP:
//C++
Sinhgad College Of Engineering, Pune. 55
// Partition the data std::vector<std::vector<int>>
partitions;
int num_partitions = omp_get_num_threads(); #pragma omp
parallel for
for (int i = 0; i < num_partitions; i++) {
std::vector<int> partition = partition_data(data, i, num_partitions);partitions.push_back(partition);
}
In this example, we first partition the data into smaller subsets using OpenMP parallelism. Then we optimize each
query in parallel by assigning different processors or nodes to handle different parts of the optimization process. Finally,
we execute the queries in parallel by assigning different processors or nodes to handle different parts of the execution
process.
Parallelization of database query optimization can significantly improve the performance of database operations and
reduce query execution time. However, it requires careful consideration of the workload distribution, synchronization,
and communication between processors or nodes.
Theory -
Here's an example code that implements non-serial polyadic dynamic programming with GPUparallelization using
CUDA:
#include <iostream>
#include <cuda.h>
int main() {
// Allocate memory for the input arrays on the CPUfloat* x =
new float[N];
float* y = new float[M];
float* z = new float[K];
float* d_z;
cudaMalloc(&d_z, K * sizeof(float));
cudaMemcpy(d_z, z, K * sizeof(float), cudaMemcpyHostToDevice);
Sinhgad College Of Engineering, Pune. 59
// Compute the dp array on the GPU
dim3 blocksPerGrid((N + BLOCK_SIZE - 1) / BLOCK_SIZE, (M + BLOCK_SIZE - 1) / BLOCK_SIZE, (K +
BLOCK_SIZE - 1) / BLOCK_SIZE);
dim3 threadsPerBlock(BLOCK_SIZE, BLOCK_SIZE, BLOCK_SIZE);
Objective of the Assignment: Students should be able to implement linear regression by using deep neural networks.
Students should know about neural networks and its importance over machine learning models.
Prerequisite:
Linear Regression : Linear regression is a basic and commonly used type of predictive analysis. Theoverall idea of
regression is to examine two things: (1) does a set of predictor variables do a good job in predicting an outcome
(dependent) variable? (2) Which variables in particular are significant predictors of the outcome variable, and in what
way do they–indicated by the magnitude and sign of the beta estimates–impact the outcome variable? These regression
estimates are used to explain the relationship between one dependent variable and one or more independent variables.
The simplest form of the regression equation with one dependent and one independent variable is defined by the formula
y = c + b*x, where y = estimated dependent variable score, c = constant, b = regression coefficient, and x = score on the
independent variable.
The basic unit of the brain is known as a neuron; there are approximately 86 billion neurons in our nervous system
which are connected to 10^14-10^15 synapses. Each neuron receives a signal from
the synapses and gives output after processing the signal. This idea is drawn from the brain to build a neural network.
Each neuron performs a dot product between the inputs and weights, adds biases, applies an activation function, and
gives out the outputs. When a large number of neurons are present together togive out a large number of outputs, it forms
Sinhgad College Of Engineering, Pune. 61
a neural layer. Finally, multiple layers combine to form a neural network.
Neural networks are formed when multiple neural layers combine with each other to give out a network, or we can say
that there are some layers whose outputs are inputs for other layers. The most common type of layer to construct a basic
neural network is the fully connected layer, in which the adjacent layers are fully connected pairwise and neurons in a
single layer are not connected to each other.
Naming conventions. When the N-layer neural network, we do not count the input layer. Therefore, a single-layer neural
network describes a network with no hidden layers (input directly mapped to output). In the case of our code, we’re
going to use a single-layer neural network, i.e. We do not have a hidden layer.
Output layer. Unlike all layers in a Neural Network, the output layer neurons most commonly do not have an activation
function (or you can think of them as having a linear identity activation function). This is because the last output layer
is usually taken to represent the class scores (e.g. in classification), which are arbitrary real-valued numbers, or some
kind of real-valued target (e.g. In
regression). Since we’re performing regression using a single layer, we do not have any activation function.
The Boston Housing Dataset is a popular dataset in machine learning and contains information about various attributes
of houses in Boston. The goal of using deep neural networks on this dataset is to predict the median value of owner-
occupied homes.
The Boston Housing Dataset contains 13 input variables or features, such as crime rate, average number of rooms per
dwelling, and distance to employment centers. The target variable is the median value of owner-occupied homes. The
dataset has 506 rows, which is not very large, but still sufficient to train a deep neural network.
To implement a deep neural network on the Boston Housing Dataset, we can follow these steps:
Load the dataset: We can load the dataset using libraries like pandas or numpy.
Preprocess the data: We need to preprocess the data by scaling the input features so that they have zero mean and unit
variance. This step is important because it helps the neural network to converge faster.
Split the dataset: We split the dataset into training and testing sets. We can use a 70/30 or 80/20 splitfor training and
testing, respectively.
Define the model architecture: We need to define the architecture of our deep neural network. We can use libraries
like Keras or PyTorch to define our model. The architecture can include multiple hidden layers with various activation
functions and regularization techniques like dropout.
Compile the model: We need to compile the model by specifying the loss function, optimizer, and evaluation metrics.
For regression problems like this, we can use mean squared error as the loss function and adam optimizer.
Train the model: We can train the model using the training data. We can use techniques like early stopping to prevent
overfitting.
Evaluate the model: We can evaluate the model using the testing data. We can calculate the mean squared error or the
mean absolute error to evaluate the performance of the model.
Overall, using a deep neural network on the Boston Housing Dataset can result in accurate predictions of the median
value of owner-occupied homes. By following the above steps, we can implement a deep neural network and fine-tune
Practical Implementation of Boston Dataset and prediction using deep neural network.
import pandas as pd
df = pd.read_csv('boston_housing.csv')
A
R
Step 3: Split the dataset
H
from sklearn.model_selection import train_test_split # Split
the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
model.add(Dense(32, activation='relu'))
model.add(Dense(1))
'S
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['mean_absolute_error'])
callbacks=[early_stopping])
# Plot the training and validation loss over epochs import
plt.xlabel('Epochs')
plt.ylabel('Loss') plt.legend(['Training',
'Validation']) plt.show()
Conclusion : In this way we are able to learn about the Deep Neural Network and its
Title of the Assignment: Classification using Deep neural network.Binary classification using Deep Neural Networks
Example: Classify movie reviews into positive" reviews and "negative" reviews, just based on the text content of the
reviews. Use the IMDB dataset.
Objective of the Assignment: Students should be able to implement deep neural networks on textual data and they
should know the basics of natural language processing and its applications in the real world.
Prerequisite:
Classification using deep neural networks is a popular approach to solve various supervised learning problems
such as image classification, text classification, speech recognition, and many more. In this approach, the neural
network is trained on labeled data to learn a mapping betweenthe input features and the corresponding output
labels.
Binary classification is a type of classification problem in which the task is to classify the input data into one of
the two classes. In the example of classifying movie reviews as positive or negative, the input data is the text
content of the reviews, and the output labels are either positive or negative.
The deep neural network used for binary classification consists of multiple layers of interconnected neurons, which
are capable of learning complex representations of the input data.The first layer of the neural network is the input
layer, which takes the input data and passes it to the hidden layers.
Sinhgad College Of Engineering, Pune. 67
The hidden layers perform non-linear transformations on the input data to learn more complex features. Each hidden
layer consists of multiple neurons, which are connected to the neurons of the previous and next layers. The activation
function of the neurons in the hidden layers introduces non-linearity into the network and allows it to learn complex
representations of the input data.
The last layer of the neural network is the output layer, which produces the classification result. In binary
classification, the output layer consists of one neuron, which produces the probability of the input data belonging to
the positive class. The probability of the input data belonging to the negative class can be calculated as (1 - probability
of positive class).
The training of the neural network involves optimizing the model parameters to minimize the loss function. The loss
function measures the difference between the predicted output and the actual output. In binary classification, the
commonly used loss function is binary cross-entropy loss.
The IMDB dataset is a popular dataset used for binary classification of movie reviews. It contains 50,000 movie
reviews, which are split into 25,000 reviews for training and 25,000 reviews for testing. The reviews are
preprocessed and encoded as sequences of integers, where each integer represents a word in the review. The deep
neural network can be trained on this dataset to classify the movie reviews into positive or negative categories.
In summary, binary classification using deep neural networks involves designing a neural network architecture with
multiple layers of interconnected neurons, training the network on labeled data using a suitable loss function, and
using the trained network to classify new data. The IMDB dataset provides a suitable example to implement and
test this approach on movie review classification.
Dataset information:
The IMDB (Internet Movie Database) dataset is a popular dataset used for sentiment analysis, particularly binary
classification of movie reviews into positive or negative categories. It consists of 50,000 movie reviews, which are
evenly split into a training set and a testing set, each containing 25,000 reviews.
The reviews are encoded as sequences of integers, where each integer represents a word in the review. The words are
indexed based on their frequency in the dataset, with the most frequent word
assigned the index 1, the second most frequent word assigned the index 2, and so on. The indexing is capped at a
The reviews are preprocessed to remove punctuations and convert all the letters to lowercase. The reviews are also
padded or truncated to a fixed length, typically 250 words, to ensure all the input sequences have the same length.
Padding involves adding zeros to the end of the review sequence to make it of the fixed length, while truncating
involves cutting off the sequence at the maximum length.
The reviews are labeled as positive or negative based on the overall sentiment expressed in the review. The labels are
assigned as follows: reviews with a score of 7 or higher on a scale of
1-10 are labeled as positive, while reviews with a score of less than 4 are labeled as negative. Reviews with a score
between 4 and 7 are excluded from the dataset to ensure clear distinction between positive and negative categories.
The IMDB dataset is a popular benchmark dataset for sentiment analysis and has been used extensively to evaluate
various machine learning and deep learning models. Its popularity is attributed to the large size of the dataset, the
balanced distribution of positive and negative reviews, and the preprocessed format of the reviews.
1. Load the IMDB dataset using Keras' built-in imdb.load_data() function. This function loadsthe dataset and
preprocesses it as sequences of integers, with the labels already converted to binary (0 for negative, 1 for
positive).
2. Pad or truncate the sequences to a fixed length of 250 words using Keras'
pad_sequences() function.
3. Define a deep neural network architecture, consisting of an embedding layer to learn the word embeddings,
followed by multiple layers of bidirectional LSTM (Long Short-Term Memory) cells, and a final output layer
with a sigmoid activation function to output the binary classification.
4. Compile the model using binary cross-entropy loss and the Adam optimizer.
5. Train the model on the training set and validate on the validation set.
import numpy as np
The compile() method is used to compile the model with binary cross-entropy loss and the Adam optimizer. The fit()
method is used to train the model on the training set for 10 epochs with a
batch size of 128. The evaluate() method is used to evaluate the trained model on the test set and compute the accuracy and
loss.
This example demonstrates how deep neural networks can be used for binary classification on text data, specifically for
classifying movie reviews as positive or negative based on the text content.
Conclusion :
In this way we are able to learn about the Deep Neural Network and its implementation on the IMDB
dataset.Learn about sentiment analysis.
Title of the Assignment: Convolutional neural network (CNN).Use MNIST Fashion Dataset andcreate a classifier to
classify fashion clothing into categories.
Objective of the Assignment: Students should be able to implement Convolution Neural Network.
Implement classification of clothing categories on the basis of MNIST dataset.
Prerequisite:
Convolutional Neural Networks (CNNs) are a class of artificial neural networks that are specially designed to analyze and
classify images, videos, and other types of multidimensional data. They are widely used in computer vision tasks such as
image classification, object detection, and image segmentation.
The main idea behind CNNs is to perform convolutions, which are mathematical operations that apply a filter to an
image or other input data. The filter slides over the input data and performs a dot product between the filter weights and
the input values at each position, producing a new output value. By applying different filters at each layer, the network
learns to detect different features in the input data, such as edges, shapes, and textures.
CNNs typically consist of several layers that perform different operations on the input data. The most common
types of layers are:
BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING, LAVALE, PUNE. 72
Convolutional Layers: These layers perform convolutions on the input data using a set of filters. Each filter produces
a feature map, which represents the presence of a specific feature in the input data.
Pooling Layers: These layers reduce the spatial dimensions of the feature maps by taking the maximum or average
value within a small region of the feature map. This reduces the amount of computation needed in the subsequent
layers and makes the network more robust to small translations in the input data.
Activation Layers: These layers apply a nonlinear activation function, such as ReLU (Rectified Linear Unit), to the
output of the previous layer. This introduces nonlinearity into the network and allows it to learn more complex features.
Fully-Connected Layers: These layers connect all the neurons in the previous layer to all the neurons in the current
layer, similar to a traditional neural network. They are typically used at the end of the network to perform the final
classification.
The architecture of a CNN is typically organized in a series of blocks, each consisting of one ormore convolutional
layers followed by pooling and activation layers. The output of the final block isthen passed through one or more fully-
connected layers to produce the final output.
CNNs are trained using backpropagation, which is a process that updates the weights of the network based on the
difference between the predicted output and the true output. This process is typically done using a loss function, such
as cross-entropy loss, which measures the difference between the predicted output and the true output.In summary,
CNNs are a powerful class of neural networks thatare specially designed for analyzing and classifying images and
other types of multidimensional data.
They achieve this by performing convolutions on the input data using a set of filters, and by using different types of
layers to reduce the spatial dimensions of the feature maps, introduce nonlinearity, and perform the final classification.
The MNIST Fashion Dataset is a widely used benchmark dataset in the field of computer vision and machine learning.
It consists of 70,000 grayscale images of clothing items, including dresses, shirts,sneakers, sandals, and more. The
dataset is split into 60,000 training images and 10,000 test images,with each image being a 28x28 pixel square.
The dataset is often used as a benchmark for classification tasks in computer vision, particularly for image recognition and
classification using neural networks. The dataset is considered relatively easy compared to other image datasets such as
ImageNet, but it is still a challenging task due to the variability in the clothing items and the low resolution of the images.
The goal of the MNIST Fashion Dataset is to correctly classify the clothing items into one of the ten categories: T-shirt/top,
Trouser, Pullover, Dress, Coat, Sandal, Shirt, Sneaker, Bag, and Ankle boot.
The dataset was created as a replacement for the original MNIST handwritten digit dataset, which was becoming too
easy for machine learning algorithms to classify accurately. The MNIST Fashion Dataset was created to provide a more
challenging classification task while still being a relatively small dataset that can be used for experimentation and
testing.
One common approach to tackling the MNIST Fashion Dataset is to use convolutional neural networks (CNNs), which
are specifically designed to process images. CNNs consist of multiple layers, including convolutional layers, pooling
layers, and fully connected layers. The convolutional layers extract features from the images, while the pooling layers
downsample the features to reduce the computational complexity. The fully connected layers perform the final
classification of the images.
Other approaches to tackling the MNIST Fashion Dataset include using other types of neural networks such as
recurrent neural networks (RNNs) and deep belief networks (DBNs), as well as using other machine learning
algorithms such as decision trees, support vector machines (SVMs), and k-nearest neighbor (KNN) classifiers.
Overall, the MNIST Fashion Dataset is a valuable benchmark dataset in the field of computer vision and machine
learning, and its popularity is likely to continue as new algorithms and models are developed and tested.
model = keras.Sequential([
keras.layers.Flatten(input_shape=(28, 28)),
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
# Train the model
num_cols = 5
plt.ylim([0, 1])
plt.tight_layout()
Conclusion:
In this way we are able to implement Convolutional neural network (CNN) Using MNISTFashion Dataset.
Title of the Assignment: Recurrent neural network (RNN) Use the Google stock prices datasetand design a time
seriesanalysis and prediction system using RNN.
Objective of the Assignment: Students should be able to implement Recurrent NeuralNetwork. Design a
time seriesanalysis and prediction system using RNN. ,
Prerequisite:
A recurrent neural network (RNN) is a type of neural network that is designed to work with sequential data. Unlike
V
traditional feedforward neural networks that only process input data in a single pass, RNNs maintain an internal state
I
or memoryT that allows them to process sequences of input data.
This makes RNNs well-suited for tasks such as natural language processing, speech recognition, and time series
RH
analysis.
A
Group B
RNNs operate by passing the current input and their internal state through a set of interconnected nodes or "hidden
units." Each hidden unit takes in both the current input and the previous hidden state, and produces a new hidden state
BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING, LAVALE, PUNE. 84
L
A
as output. This process is repeated for each time step in the input sequence, with the output from the final hidden unit
being used as the network's overall output.
One of the key advantages of RNNs is their ability to handle variable-length input sequences, which makes them
particularly useful for tasks where the length of the input is not fixed. However, RNNs can also be difficult to train,
particularly when dealing with long input sequences or complex dependencies between inputs. To address these issues,
a number of modifications to the basic RNN architecture have been developed, including long short-term memory
(LSTM) networks and gated recurrent units (GRUs).
import numpy as np
data = pd.read_csv('GOOG.csv')
scaler.fit_transform(dataset)
ER
# Create the training and testing datasets IN
training_data_len = int(len(dataset) * 0.8)
GE
F N
training_data = dataset[:training_data_len] O
E
testing_data = dataset[training_data_len:] G
def
EL
create_dataset(dataset, time_step=1):
L
X, Y = [], [] O
C
'ST- 1):X.append(dataset[i:(i+time_step), 0])
for i in range(len(dataset) - time_step
EHP0])return
E
Y.append(dataset[i+time_step,
AD
Y
np.array(X), np.array(Y)
T RI
A IV
A
#H
Create the training and testing datasets with a time step of 60 daystime_step = 60
model.add(Dropout(0.2))
ER
model.add(LSTM(units=50, return_sequences=True))model.add(Dropout( IN
Output:
GE
F N
EEL O
Epoch 100/100
O
33/33 [==============================] L - 7sG39ms/step - loss: 0.0013
C
'ST
H
EP
AE
Deep learning is a subset of machine learning that is inspired by the structure and function of the human brain. It has been
successfully applied to various computer vision tasks, including human face recognition. There are several theories related to
human face recognition using deep learning:
Convolutional Neural Network (CNN) Theory: This theory proposes that humans recognize faces by processing the visual
information in a hierarchical manner, similar to how a convolutional neural network operates. According to this theory, the
brain first detects low-level features, such as edges and corners, and then gradually builds up to higher-level features, such as
facial contours and expressions.
Autoencoder Theory: This theory suggests that humans learn to recognize faces by compressing the information into a lower-
dimensional space and then reconstructing it back to its original form, similar to how an autoencoder operates. According to
this theory, the brain uses a hierarchical representation of faces that allows for efficient processing and recognition.
Generative Adversarial Network (GAN) Theory: This theory proposes that humans recognize faces by learning to discriminate
between real and fake faces, similar to how a generative adversarial network operates. According to this theory, the brain
learns to distinguish between genuine facial features and artifacts caused by noise or distortions in the image.
Attention Mechanism Theory: This theory suggests that humans selectively attend to specific facialfeatures, similar to how
an attention mechanism operates in deep learning. According to thitheory, the brain focuses on salient features, such
as the eyes and mouth, while ignoring lessimportant features, such as the nose or ears.
Overall, deep learning has shown great promise in advancing our understanding of human face recognition and has led to the
development of highly accurate face recognition systems.
Pretrained model :
A pre-trained model is a machine learning model that has already been trained on a large dataset and saved to disk, typically
using a supervised learning approach. The weights of the model are the learned parameters that have been optimized during
the training process to minimize the loss function, which measures the difference between the predicted outputs and the true
outputs.
Pre-trained models are useful in deep learning because they can be used as a starting point for transfer learning, where the
BHARATI VIDYAPEETH’S COLLEGE OF ENGINEERING, LAVALE, PUNE. 88
learned features of the pre-trained model are fine-tuned on a new dataset to improve its performance on a specific task.
The VGG16 model is a deep convolutional neural network that was first introduced in the 2014 ImageNet competition. It was
developed by the Visual Geometry Group (VGG) at the University of Oxford and consists of 16 convolutional and fully
connected layers. The model has a fixed input size of 224x224 pixels and takes an RGB image as input. The VGG16 model
achieved
state-of-the-art performance in the ImageNet classification task and has become a popular choice for transfer learning in
various computer vision applications. The pre-trained VGG16 model is available in several deep learning libraries, including
TensorFlow, Keras, and PyTorch.
# Extract features from the dataset using the VGG16 model def extract_features(directory):
features = {}
for file in os.listdir(directory + subdir): img_path = directory + subdir + '/' + file img = cv2.imread(img_path)
return features
train_features, train_labels, test_features, test_labels = [], [], [], [] for key in features:
test_features.append(features[key]) test_labels.append(key.split('_')[0])
# Convert labels to one-hot encoding unique_labels = list(set(train_labels))
label_map = {label: i for i, label in enumerate(unique_labels)} train_labels = [label_map[label] for label in train_labels]
test_labels = [label_map[label] for label in test_labels]
epochs=10,
Title - Implement Gender and Age Detection: predict if a person is a male or female and also their age
Theory -
Collect and prepare the dataset: In this case, we can use the "UTKFace" dataset which contains images of faces with
their corresponding gender and age labels. We need to preprocess the data, like resizing the images to a uniform size,
shuffling the dataset, and limit the age to a certain value (like 100 years).
Split the dataset: Split the dataset into training, validation, and testing sets. The usual split ratio is 80%, 10%, and
10%, respectively.
Define data generators: Define data generators for training, validation, and testing sets using the
"ImageDataGenerator" class in Keras. This class provides data augmentation techniques that can improve the model's
performance, such as rotation, zoom, and horizontal flip.
Define the neural network model: Define a convolutional neural network (CNN) model that takes the face images as
input and outputs two values - the probability of being male and the predicted age. The model can have multiple
convolutional and pooling layers followed by some dense layers.
Compile the model: Compile the model with appropriate loss and metrics for each output (gender and age). In this
case, we can use binary cross-entropy loss for gender and mean squared error (MSE) for age.
Train the model: Train the model using the fit method of the model object. We need to pass the data generators for
the training and validation sets, as well as the number of epochs and batch size.
Evaluate the model: Evaluate the model's performance on the testing set using the evaluate method of the model
object. This will give us the accuracy and mean absolute error (MAE) of the model.
Predict the gender and age of a sample image: Load a sample image and preprocess it. We can use the "cv2" library
to read the image, resize it to the same size as the training images, and normalize it. Then, we can use the "predict"
method of the model object to get the predicted gender and age.
Title - Implement Colorizing Old B&W Images: color old black and white images to colorful images Project title:
Colorizing Old B&W Images using CNN.
Theory - Colorizing black and white images to colorful images involves a complex process that requires expertise
and specialized software. However, here are some general steps involved in the process: Scan the black and white
image: The first step is to scan the black and white image and convert it into a digital format.
Preprocess the image: The image needs to be preprocessed to remove any scratches, dust, or other defects. This can
be done using image editing software like Photoshop or GIMP. Convert the image to grayscale: The black and white
image needs to be converted to grayscale. This can be done using image editing software or programming languages
like Python.
Collect training data: The next step is to collect training data for the colorization model. This can include a dataset
of colorful images with their corresponding grayscale versions. Train the colorization model: A deep learning model
can be trained to colorize grayscale images using a datasetof colorful images. This model can be trained using
software like TensorFlow, PyTorch, or Keras. Apply the colorization model to the black and white image: Once the
model is trained, it can be applied to the black and white image to generate a colorized version. This can be done
using programming languages like Python.
Refine the colorized image: The colorized image may need some manual refinement to ensure that the colors are
accurate and the image looks natural. This can be done using image editing software like Photoshop or GIMP.
Save the final image: Once the image has been colorized and refined, it can be saved in a digital format for printing
or online use.
Note that the quality of the colorized image will depend on the quality of the original black and white image, the
accuracy of the colorization model, and the manual refinement process
img_colorized.save('colorized_image.jpg')