0% found this document useful (0 votes)

32 views81 pages

LP-III Lab Manual

Uploaded by

shrikantbhor20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views81 pages

LP-III Lab Manual

Uploaded by

shrikantbhor20

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

Laboratory Practice III Lab Manual Dept of Computer Engineering

LABORATORY MANUAL
FOR
Laboratory Practice III (2019) (BE SEM I)

Department of Computer Engineering

Genba Sopanrao Moze College of Engineering, Balewadi

Genba Sopanrao Moze College of Engineering, Balewadi 1

Laboratory Practice III Lab Manual Dept of Computer Engineering

Work
Exam Schemes
Load

Term Work Practical Oral

Practical

04
50 50 --
Hours/Week

List Of Assignments

Time Span
Sr. No. Title of Assignment
(No. of
weeks)

"Write a program non-recursive and recursive program to calculate Fibonacci

1)
numbers and analyze their time and space complexity

2) "Write a program to implement Huffman Encoding using a greedy strategy"

"Write a program to solve a fractional Knapsack problem using a greedy

3)
method."

"Write a program to solve a 0-1 Knapsack problem using dynamic

4)
programming or branch and bound strategy."

"Design n-Queens matrix having first Queen placed. Use backtracking to place
5)
remaining Queens to generate the final n-queen‘s matrix"

"Mini Project - Write a program to implement matrix multiplication. Also

6) implement multithreaded matrix multiplication with either one thread per row
or one thread per cell. Analyze and compare their performance."

Predict the price of the Uber ride from a given pickup point to the agreed drop-
off location. Perform following tasks: 1. Pre-process the dataset. 2. Identify
outliers. 3. Check the correlation. 4. Implement linear regression and random
7) 02
forest regression models. 5. Evaluate the models and compare their respective
scores like R2, RMSE, etc. Dataset link:
https://www.kaggle.com/datasets/yasserh/uber-fares-dataset.

Implement K-Means clustering/ hierarchical clustering on

sales_data_sample.csv dataset. Determine the number of clusters using the
8) elbow method. Dataset link: 01
https://www.kaggle.com/datasets/kyanyoga/sample-sales-data

Genba Sopanrao Moze College of Engineering, Balewadi 2

Laboratory Practice III Lab Manual Dept of Computer Engineering

Implement K-Nearest Neighbors algorithm on diabetes.csv dataset. Compute

9) confusion matrix, accuracy, error rate, precision and recall on the given dataset. 01
Dataset link: https://www.kaggle.com/datasets/abdallamahgoub/diabetes

Given a bank customer, build a neural network-based classifier that can

determine whether they will leave or not in the next 6 months.

Dataset Description: The case study is from an open-source dataset from

Kaggle. The dataset contains 10,000 sample points with 14 distinct features
such as CustomerId, Credit Score, Geography, Gender, Age, Tenure, Balance,
10) etc. Link to the Kaggle project: 01
https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling
Perform following steps: 1. Read the dataset. 2. Distinguish the feature and
target set and divide the data set into training and test sets. 3. Normalize the
train and test data. 4. Initialize and build the model. Identify the points of
improvement and implement the same. 5. Print the accuracy score and
confusion matrix (5 points).

Classify the email using the binary classification method. Email Spam
detection has two states: a) Normal State – Not Spam, b) Abnormal State –
Spam. Use K-Nearest Neighbors and Support Vector Machine for
11) 02
classification. Analyze their performance. Dataset link: The emails.csv dataset
on the Kaggle
ttps://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv

Mini Project - Use the following dataset to analyze ups and downs in the
market and predict future stock price returns based on Indian Market data from
12) 02
2000 to 2020. Dataset Link:
ttps://www.kaggle.com/datasets/sagara9595/stock-data

13) Installation of MetaMask and study spending Ether per transaction. 01

14) Create your own wallet using Metamask for crypto transactions. 01

Write a smart contract on a test network, for Bank account of a customer for
following operations:

 Deposit money
15) 02
 Withdraw Money

 Show balance

Genba Sopanrao Moze College of Engineering, Balewadi 3

Laboratory Practice III Lab Manual Dept of Computer Engineering

Write a program in solidity to create Student data. Use the following

constructs:

 Structures

16)  Arrays 02

 Fallback

Deploy this as smart contract on Ethereum and Observe the transaction fee and
Gas values.

17) Write a survey report on types of Blockchains and its real time use cases 02

Mini Project - Develop a Blockchain based application for health-related

18) medical records

Reference Sites:

https://builtin.com/machine-learning/how-to-preprocess-data-python

https://levelup.gitconnected.com/random-forest-regression

Virtual Laboratory:
1. http://cse01-iiith.vlabs.ac.in/
2. http://vlabs.iitb.ac.in/vlabs-dev/labs/blockchain/labs/index.php
3. http://vlabs.iitb.ac.in/vlabs-dev/labs/machine_learning/labs/index.php

Genba Sopanrao Moze College of Engineering, Balewadi 4

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem statement: To calculate Fibonacci numbers using recursive and non-recursive function and
analyze their time and space complexity.

Objective: Analyzing time and space complexity

Theory: Fibonacci series is a series of natural numbers where next number is equivalent to the sum of
previous two numbers i.e. fn = fn-1 + fn-2.In fibonacci sequence each item is the sum of the previous two.
So, you wrote a recursive algorithm, for example, recursive function example for up to 5

fibonacci(5) = fibonacci(4) + fibonacci(3) fibonacci(3) = fibonacci(2) + fibonacci(1)

fibonacci(4) = fibonacci(3) + fibonacci(2) fibonacci(2) = fibonacci(1) + fibonacci(0)

The first two numbers in the Fibonacci sequence are either 1 and 1, or 0 and 1, and each
subsequent number is the sum of the previous two numbers.

Fibonacci Series Logic:

Initializing number: the nth term is the sum of (n-1)th term and (n-2)th term Call Same Function Until
Origin Condition

Algorithm : Iterative Algorithm

Genba Sopanrao Moze College of Engineering, Balewadi 5

Laboratory Practice III Lab Manual Dept of Computer Engineering

Analysis :

Two cases (1) n = 0 or 1 and (2) n > 1.

1) When n = 0 or 1, lines 4 and 5 get executed once each. Since each line has an s/e of 1, the
total step count for this case is 2.

2) When n > 1, lines 4, 8, and 14 are each executed once. Line 9 gets executed n times, and
lines 11 and 12 get executed n-1 times each. Line 8 has an s/e of 2, line 12 has an s/e of 2,
and line 13 has an s/e of 0. The remaining lines that get executed have s/e’s of 1. The total
steps for the case n > 1 is therefore 4n + 1.

Iterative Program:

# Program to display the Fibonacci sequence up to n-th term

nterms = int(input("Enter number of terms ")) #
first two terms

n1, n2 = 0, 1

count = 0

# check if the number of terms is valid

if nterms <= 0:

print("Please enter a positive integer") #

if there is only one term, return n1

elif nterms == 1:

print("Fibonacci sequence upto", nterms,":")

print(n1)

#generate fibonacci
sequence else:

print("Fibonacci sequence:")
while count <
nterms:print(n1)

nth = n1 + n2

n2 = nth

n1=n2

count+=1

Genba Sopanrao Moze College of Engineering, Balewadi 6

Laboratory Practice III Lab Manual Dept of Computer Engineering

Output : Enter number of terms 4 Fibonacci sequence:

Analysis :

T(n) = T(n-1) + T(n-2) + c

= 2T(n-1) + c //from the approximation T(n-1) ~ T(n-2)

= 2*(2T(n-2) + c) + c

= 4T(n-2) + 3c

= 8T(n-3) + 7c

= 2k * T(n - k) + (2k - 1)*c

Let's find the value of k for which: n - k

=0k=n

T(n) = 2n * T(0) + (2n - 1)*c

Conclusion: Hence, we have studied recursive and non-recursive functions to calculate Fibonacci
numbers.

Genba Sopanrao Moze College of Engineering, Balewadi 7

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

To implement Huffman Encoding for data compression using a greedy Algorithm

Objective:

To understand and implement:

1. Huffman Encoding using Greedy search Algorithm

Theory:

Huffman coding is a lossless data compression algorithm. The idea is to assign variable-length codes to input
characters, lengths of the assigned codes are based on the frequencies of corresponding characters. The most
frequent fractional knapsack problem using greedy algorithm character fractional knapsack problem using
greedy algorithm gets the smallest code and least frequent character gets the largest code. The variable-
length codes assigned to input characters are:

Prefix Codes, means the codes (bit sequences) are assigned in such a way that the code assigned to one
character is not the prefix of code assigned to any other character. This is how Huffman Coding makes sure
that there is no ambiguity when decoding the generated bitstream. Let us understand prefix codes with a
counter example. Let there be four characters a, b, c and d, and their corresponding variable length codes be
00, 01, 0 and 1. This coding leads to ambiguity because code assigned to c is the prefix of codes assigned to
a and b. If the compressed bit stream is 0001, the de-compressed output may be “cccd” or “ccb” or “acd” or
“ab”. See this for applications of Huffman Coding. There are mainly two major parts in Huffman Coding
Build a Huffman Tree from input characters.

Traverse the Huffman Tree and assign codes to characters.

Steps to build Huffman Tree

● Input is an array of unique characters along with their frequency of occurrences and output is
Huffman Tree.
● Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is
used as a priority queue.
● The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent
character is at root).
● Extract two nodes with the minimum frequency from the min heap.
● Create a new internal node with a frequency equal to the sum of the two nodes frequencies. Make the
first extracted node as its left child and the other extracted node as its right child. Add this node to
the min heap.
● Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node
and the tree is complete
● Let us understand the algorithm with an example:

Genba Sopanrao Moze College of Engineering, Balewadi 8

Laboratory Practice III Lab Manual Dept of Computer Engineering

Algorithm :

def printNodes(node,
val=''):
newVal = val
+
str(node.huff)
if(node.left):

printNodes(node.left,
newVal) if(node.right):

printNodes(node.right,
newVal) if(not node.left and not
node.right):

print(f"{node.symbol} -> {newVal}")

# characters for huffman tree

chars = ['a', 'b', 'c', 'd', 'e', 'f', 'g']

# frequency of characters

freq = [ 4, 7, 12, 14, 17, 43, 54]

# list containing unused nodes

Genba Sopanrao Moze College of Engineering, Balewadi 9

Laboratory Practice III Lab Manual Dept of Computer Engineering

nodes = []

# converting characters and frequencies into huffman tree nodes

for x in range(len(chars)):

nodes.append(node(freq[x], chars[x]))

while len(nodes) > 1:

# sort all the nodes in ascending order based on their frequency

nodes = sorted(nodes, key=lambda x: x.freq)

# pick 2 smallest nodes

left = nodes[0]

right = nodes[1]

# assign directional value to these nodes

left.huff = 0

right.huff = 1

# combine the 2 smallest nodes to create new node as their parent

newNode = node(left.freq+right.freq, left.symbol+right.symbol, left,

right)

# remove the 2 nodes and add their parent as new node among others

nodes.remove(left)

nodes.remove(right)

nodes.append(newNode)

# Huffman Tree is ready!

printNodes(nodes[0])

Output

a -> 0000 b -> 0001 c -> 001

d -> 010

Genba Sopanrao Moze College of Engineering, Balewadi 10

Laboratory Practice III Lab Manual Dept of Computer Engineering

e -> 011

f -> 10

g -> 11

Conclusion: Hence, We have successfully implemented huffman encoding using a greedy strategy.

Genba Sopanrao Moze College of Engineering, Balewadi 11

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement: To solve a fractional Knapsack problem using a greedy method.

Objective:

Interpret Fractional Knapsack Problem

THEORY:

What is the Knapsack Problem?

A thief went to a store to steal some items. There are multiple items available of different weights &
profits.Let suppose there are ‘n’ No. of items & weight of these are W1, W2,………Wn respectively,
and the profit of these items are P1, P2, Pn respectively.

Knapsack Problem may be of 2 types:

[A] 0/1 Knapsack problem

[B] Fractional Knapsack problem

0/1 Knapsack problem

● In this problem, either a whole item is selected(1) or the whole item not to be
selected(0).
● Here, the thief can’t carry a fraction of the item.
● In the LPP(Linear programming problem) form, it can be described as:

Genba Sopanrao Moze College of Engineering, Balewadi 12

Laboratory Practice III Lab Manual Dept of Computer Engineering

ALGORITHM

def fractional_knapsack(value, weight, capacity):

# index = FRACTIONAL KNAPSACK PROBLEM USING GREEDY ALGORITHM [0, 1, 2,

..., n - 1

# contains ratios of values to weight

ratio = [v/w for v, w in zip(value, weight)]

# index is sorted according to value-to-weight ratio in decreasing order

index.sort(key=lambda i: ratio[i], reverse=True)

max_value = 0

fractions = [0]*len(value)

for i in index:

if weight[i] <= capacity:

fractions[i] = 1

max_value += value[i]

capacity -= weight[i]

else:

fractions[i] = capacity/weight[i]

max_value += value[i]*capacity/weight[i]

break

return max_value, fractions

n = int(input('Enter number of items: '))

value = input('Enter the values of the {} item(s) in order: '.format(n)).split()

value = [int(v) for v in value]

weight = input('Enter the positive weights of the {} item(s) in order:

'.format(n)).split()

weight = [int(w) for w in weight]

Genba Sopanrao Moze College of Engineering, Balewadi 13

Laboratory Practice III Lab Manual Dept of Computer Engineering

capacity = int(input('Enter maximum weight: '))

max_value, fractions = fractional_knapsack(value, weight, capacity)

print('The maximum value of items that can be carried:', max_value)

print('The fractions in which the items should be taken:', fractions)

Program Output : Enter number of items: 3

Enter the values of the 3 item(s) in order: 24 15 25

Enter the positive weights of the 3 item(s) in order: 15 10 18 Enter maximum weight: 20

The maximum value of items that can be carried: 31.5

The fractions in which the items should be taken: [1, 0.5, 0]

CONCLUSION : : Hence, We have successfully studied fractional knapsack problem using greedy
algorithm.

Genba Sopanrao Moze College of Engineering, Balewadi 14

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement :To solve a 0-1 Knapsack problem using dynamic programming or branch and
bound strategy.

THEORY:

Branch and bound is an algorithm design paradigm which is generally used for solving combinatorial
optimization problems. These problems typically exponential in terms of time complexity and may require
exploring all possible permutations in worst case. Branch and Bound solve these problems relatively
quickly.

Let us consider below 0/1 Knapsack problem to understand Branch and Bound. Given two integer arrays
val[0..n-1] and wt[0..n-1] that represent values and weights associated with n items respectively.
Find out the maximum value subset of val[] such that sum of the weights of this subset is smaller than or
equal to Knapsack capacity W. Let us explore all approaches for this problem.
1. A Greedy approach is to pick the items in decreasing order of value per unit weight. The Greedy
approach works only for fractional knapsack problem and may not produce correct result for 0/1
knapsack.

2. We can use Dynamic Programming (DP) for 0/1 Knapsack problem. In DP, we use a 2D table of
size n x W. The DP Solution doesn’t work if item weights are not integers.

3. Since DP solution doesn’t always work, a solution is to use Brute Force. With n items, there are 2n
solutions to be generated, check each to see if they satisfy the constraint, save maximum solution
that satisfies constraint. This solution can be expressed as tree.

Genba Sopanrao Moze College of Engineering, Balewadi 15

Laboratory Practice III Lab Manual Dept of Computer Engineering

4. We can use Backtracking to optimize the Brute Force solution. In the tree representation,
we can do DFS of tree. If we reach a point where a solution no longer is feasible, there is
no need to continue exploring. In the given example, backtracking would be much more
effective if we had even more items or a smaller knapsack capacity.

CONCLUSION : Hence, We have successfully implemented 0-1 Knapsack problem using dynamic
programming or branch and bound strategy.

Genba Sopanrao Moze College of Engineering, Balewadi 16

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem statement:

To implement Queens matrix having first Queen placed. Also, Use backtracking to place
remaining Queens to generate the final 8-queen’s matrix.
Objective:

Implement 8-Queens problem using Backtracking

THEORY

Backtracking Algorithm

A backtracking algorithm is a problem-solving algorithm that uses a brute force approach for
finding the desired output. The Brute force approach tries out all the possible solutions and
chooses the desired/best solutions. The term backtracking suggests that if the current solution is
not suitable, then backtrack and try other solutions.

The idea is to place queens one by one in different columns, starting from the leftmost column.
When we place a queen in a column, we check for clashes with already placed queens. In the
current column, if we find a row for which there is no clash, we mark this row and column as part
of the solution. If we do not find such a row due to clashes, then we backtrack and return false.

Algorithm :

1) Start in the leftmost column

2) If all queens are placed return true

3) Try all rows in the current column. Do following for every tried row.

a) If the queen can be placed safely in this row then mark this [row, column] as part of the
solution and recursively check if placing queen here leads to a solution.

b) If placing the queen in [row, column] leads to a solution then return true.

Genba Sopanrao Moze College of Engineering, Balewadi 17

Laboratory Practice III Lab Manual Dept of Computer Engineering

c) If placing queen doesn't lead to a solution then unmark this [row, column] (Backtrack)
and go to step (a) to try other rows.

4) If all rows have been tried and nothing worked, return false to trigger backtracking.

/* C++ program to solve N Queen Problem using backtracking */

#include
<bits/stdc++.h> #define
N4

using namespace;

bool isSafe(int board[N][N], int row, int col){int i, j;

/* Check this row on left side

*/ for (i = 0; i < col; i++)if

(board[row][i]) return false;

/* Check upper diagonal on left side

for (i = row, j = col; i >= 0 && j >= 0; i--, j--) if

(board[i][j])

return false;

/* Check lower diagonal on left side */

for (i = row, j = col; j >= 0 && i < N; i++, j--) if

(board[i][j])

return false;

return true;

}/* A recursive utility function to solve N Queen problem */

bool solveNQUtil(int board[N][N], int col)

{/* base case: If all queens are placed then return true */

if (col >= N)

return true;

Genba Sopanrao Moze College of Engineering, Balewadi 18

Laboratory Practice III Lab Manual Dept of Computer Engineering

/* Consider this column and try placing this

queen in all rows one by one */

for (int i = 0; i < N; i++) {

/* Check if the q: if (isSafe(board, i, col)) {

/* Place this queen in board[i][col] */ board[i][col] = 1;

/* recur to place rest of the queens */ if (solveNQUtil(board,

col + 1))

return true;

/* If placing queen in board[i][col]doesn't lead to a solution,

then remove queen from board[i][col] */ board[i][col] = 0;
// BACKTRACK

}}/* If the queen cannot be placed in any row in this column col then return false */return false;

}/* This function solves the N Queen problem using Backtracking. It mainly uses solveNQUtil() to solve
the problem. It returns false if queens cannot be placed, otherwise, return true and prints placementof
queens in the form of 1s. Please note that there may be more than one solutions, this function prints one of
the feasible solutions.*/

bool solveNQ(){

int board[N][N] = { { 0, 0, 0, 0 },

{ 0, 0, 0, 0 },

{ 0, 0, 0, 0 } };

if (solveNQUtil(board, 0) == false) {
cout<<"Solution does not exist"; return false;

printSolution(board); return true;}

Output :

0010

1000

Genba Sopanrao Moze College of Engineering, Balewadi 19

Laboratory Practice III Lab Manual Dept of Computer Engineering

0001

0100

CONCLUSION : Hence, We have successfully implemented N-Queen’s problem using backtracking

algorithm.

Genba Sopanrao Moze College of Engineering, Balewadi 20

Laboratory Practice III Lab Manual Dept of Computer Engineering

NO.1

Problem Statement:

Predict the price of the Uber ride from a given pickup point to the agreed drop-off location.
Perform following tasks: 1. Pre-process the dataset. 2. Identify outliers. 3. Check the correlation.
4. Implement linear regression and random forest regression models. 5. Evaluate the models and
compare their respective scores like R2, RMSE, etc. Dataset link:
https://www.kaggle.com/datasets/yasserh/uber-fares-dataset.

Objective:

1. Pre-process the dataset

2. Identify outliers
3. Check the correlation
4. Implement linear regression and random forest regression models
5. Predict the price of the uber ride from a given pickup point to the agreed drop-off location
6. Evaluate the models and compare their respective scores like R2, RMSE

Theory:

What Is Data Preprocessing and Why Do We Need It?

For machine learning algorithms to work, it’s necessary to convert raw data into a clean data set, which
means we must convert the data set to numeric data. We do this by encoding all the categorical labels to
column vectors with binary values. Missing values, or NaNs (not a number) in the data set is an annoying
problem. You must either drop the missing rows or fill them up with a mean or interpolated values.

HOW TO PREPROCESS DATA IN PYTHON STEP-BY-STEP

1. Load data in Pandas.
2. Drop columns that aren’t useful.
3. Drop rows with missing values.
4. Create dummy variables.
5. Take care of missing data.
6. Convert the data frame to NumPy.
7. Divide the data set into training data and test data.

Genba Sopanrao Moze College of Engineering, Balewadi 21

Laboratory Practice III Lab Manual Dept of Computer Engineering

Identify outliers
An Outlier is a data-item/object that deviates significantly from the rest of the (so-called normal)
objects. They can be caused by measurement or execution errors. The analysis for outlier detection is
referred to as outlier mining. There are many ways to detect the outliers, and the removal process is the
data frame same as removing a data item from the panda’s data frame.

Here pandas’ data frame is used for a more realistic approach as in real-world project need to detect the
outliers arouse during the data analysis step, the same approach can be used on lists and series-type
objects.

Detecting the outliers

Outliers can be detected using visualization, implementing mathematical formulas on the dataset, or
using the statistical approach.
1. Visualization a)Using Box Plot b)Using Scatterplot.

2. Z-score
Z- Score is also called a standard score. This value/score helps to understand that how far is the data
point from the mean. And after setting up a threshold value one can utilize z score values of data points
to define the outlier
Zscore = (data_point -mean) / std. deviation

# Z score
from scipy import stats
import numpy as np

z = np.abs(stats.zscore(df_boston['DIS']))
print(z)

3. IQR (Inter Quartile Range)

IQR (Inter Quartile Range) Inter Quartile Range approach to finding the outliers is the most used and
most trusted approach used in the research field.
IQR = Quartile3 – Quartile1

What is Correlation?
Variables within a dataset can be related for lots of reasons.

For example:

● One variable could cause or depend on the values of another variable.

● One variable could be lightly associated with another variable.
● Two variables could depend on a third unknown variable.

Genba Sopanrao Moze College of Engineering, Balewadi 22

Laboratory Practice III Lab Manual Dept of Computer Engineering

It can be useful in data analysis and modeling to better understand the relationships between variables.
The statistical relationship between two variables is referred to as their correlation.

● Positive Correlation: both variables change in the same direction.

● Neutral Correlation: No relationship in the change of the variables.
● Negative Correlation: variables change in opposite directions.

The performance of some algorithms can deteriorate if two or more variables are tightly related, called
multicollinearity. An example is linear regression, where one of the offending correlated variables should
be removed to improve the skill of the model.

Covariance
Variables can be related by a linear relationship. This is a relationship that is consistently additive across
the two data samples.
This relationship can be summarized between two variables, called the covariance. It is calculated as the
average of the product between the values from each sample, where the values haven been centered (had
their mean subtracted).
The calculation of the sample covariance is as follows:
cov(X, Y) = (sum (x - mean(X)) * (y - mean(Y)) ) * 1/(n-1)
The use of the mean in the calculation suggests the need for each data sample to have a Gaussian or
Gaussian-like distribution.
The sign of the covariance can be interpreted as whether the two variables change in the same direction
(positive) or change in different directions (negative). The magnitude of the covariance is not easily
interpreted. A covariance value of zero indicates that both variables are completely independent.
The cov() NumPy function can be used to calculate a covariance matrix between two or more variables.
There are several statistics that you can use to quantify correlation. Three correlation coefficients are:
● Pearson’s r
● Spearman’s rho
● Kendall’s tau
Pearson’s coefficient measures linear correlation, while the Spearman and Kendall coefficients compare
the ranks of data. There are several NumPy, SciPy, and Pandas correlation functions and methods that you
can use to calculate these coefficients.
Pearson’s Correlation
The Pearson correlation coefficient (named for Karl Pearson) can be used to summarize the strength of the
linear relationship between two data samples.
The Pearson’s correlation coefficient is calculated as the covariance of the two variables divided by the
product of the standard deviation of each data sample. It is the normalization of the covariance between
the two variables to give an interpretable score.
Pearson's correlation coefficient = covariance(X, Y) / (stdv(X) * stdv(Y))

Genba Sopanrao Moze College of Engineering, Balewadi 23

Laboratory Practice III Lab Manual Dept of Computer Engineering

Pandas Correlation Calculation

Pandas is, in some cases, more convenient than NumPy and SciPy for calculating statistics. It offers
statistical methods for Series and DataFrame instances. For example, given two Series objects with the
same number of items, you can call .corr() on one of them with the other as the first argument:

3. Implement linear regression and random forest regression models

What is Regression?
Regression analysis is a statistical method that helps us to understand the relationship between
dependent and one or more independent variables,
Dependent Variable
This is the Main Factor that we are trying to predict.
Independent Variable
These are the variables that have a relationship with the dependent variable.
Types of Regression Analysis
There are many types of regression analysis, but in this article, we will deal with,
1. Simple Linear Regression
2. Multiple Linear Regression

What is Linear Regression?

In Machine Learning lingo, Linear Regression (LR) means simply finding the best fitting line that
explains the variability between the dependent and independent features very well or we can say it
describes the linear relationship between independent and dependent features, and in linear regression,
the algorithm predicts the continuous features (e.g. Salary, Price ), rather than deal with the categorical
features (e.g. cat, dog).
Simple Linear Regression
Simple Linear Regression uses the slope-intercept (weight-bias) form, where our model needs to find
the optimal value for both slope and intercept. So with the optimal values, the model can find the
variability between the independent and dependent features and produce accurate results. In simple
linear regression, the model takes a single independent and dependent variable.

There are many equations to represent a straight line, we will stick with the common equation,

Here, y and x are the dependent variables, and independent variables respectively. b1(m) and b0(c) are
slope and y-Intercept.

Genba Sopanrao Moze College of Engineering, Balewadi 24

Laboratory Practice III Lab Manual Dept of Computer Engineering

Slope(m) tells, for one unit of increase in x, How many units does it increase in y. When the line is
steep, the slope will be higher, the slope will be lower for the less steep line.
Constant(c) means, what is the value of y when the x is zero.
How the Model will Select the Best Fit Line?
First, our model will try a bunch of different straight lines from that it finds the optimal line that predicts
our data points well.

From the above picture, you can notice there are 4 lines, and any guess which will be our best fit line?
Ok, For finding the best fit line our model uses the cost function. In machine learning, every algorithm
has a cost function, and in simple linear regression, the goal of our algorithm is to find a minimal value
for the cost function.
And in linear regression (LR), we have many cost functions, but mostly used cost function is
MSE(Mean Squared Error). It is also known as a Least Squared Method.

Yi – Actual value,
Y^i – Predicted value,
n – number of records.
( yi – yi_hat ) is a Loss Function. And you can find in most times people will interchangeably use the
word loss and cost function. But they are different, and we are squaring the terms to neglect the negative
value.
Loss Function
It is a calculation of loss for single training data.
Cost Function
It is a calculation of average loss over the entire dataset.

Genba Sopanrao Moze College of Engineering, Balewadi 25

Laboratory Practice III Lab Manual Dept of Computer Engineering

Multiple Linear Regression

In multiple linear regression instead of having a single independent variable, the model has multiple
independent variables to predict the dependent variable.

here bo is the y-intercept, b1,b2,b3,b4…,bn are slopes of the independent variables x1,x2,x3,x4…,xn and y is
the dependent variable.
Here instead of finding a line, our model will find the best plane in 3-Dimension, and in n-Dimension,
our model will find the best hyperplane. The below diagram is for demonstration purposes.

Python code for linear regression

After splitting the data into training and testing sets, finally, the time is to train our algorithm. For that,
we need to import Linear Regression class, instantiate it, and call the fit() method along with our
training data.
regressor = LinearRegression()
regressor.fit(X_train, y_train) #training the algorithm
The linear regression model basically finds the best value for the intercept and slope, which results in a
line that best fits the data. To see the value of the intercept and slop calculated by the linear regression
algorithm for our dataset, execute the following code.
#To retrieve the intercept:
print(regressor.intercept_)

#For retrieving the slope:

print(regressor.coef_)

To make predictions on the test data, execute the following script:

Genba Sopanrao Moze College of Engineering, Balewadi 26

Laboratory Practice III Lab Manual Dept of Computer Engineering

y_pred = regressor.predict(X_test)
Let's plot our straight line with the test data:

plt.scatter(X_test, y_test, color='gray')

plt.plot(X_test, y_pred, color='red', linewidth=2)
plt.show()

The final step is to evaluate the performance of the algorithm. This step is particularly important to
compare how well different algorithms perform on a particular dataset. For regression algorithms, three
evaluation metrics are commonly used:

1. Mean Absolute Error (MAE) is the mean of the absolute value of the errors. It is calculated as:

2. Mean Squared Error (MSE) is the mean of the squared errors and is calculated as:

3. Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors:

we don’t have to perform these calculations manually. The Scikit-Learn library comes with pre-built
functions that can be used to find out these values for us.
Let’s find the values for these metrics using our test data.
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))

Random Forest Regression Models

Decision Trees are used for both regression and classification problems. They visually flow like trees,
hence the name, and in the regression case, they start with the root of the tree and follow splits based on
variable outcomes until a leaf node is reached and the result is given. An example of a decision tree is
below

Genba Sopanrao Moze College of Engineering, Balewadi 27

Laboratory Practice III Lab Manual Dept of Computer Engineering

Here we see a basic decision tree diagram which starts with the Var_1 and splits based off of specific
criteria. When ‘yes’, the decision tree follows the represented path, when ‘no’, the decision tree goes
down the other path. This process repeats until the decision tree reaches the leaf node and the resulting
outcome is decided. For the example above, the values of a, b, c, or d could be representative of any
numeric or categorical value.
Ensemble learning is the process of using multiple models, trained over the same data, averaging the
results of each model ultimately finding a more powerful predictive/classification result. Our hope, and
the requirement, for ensemble learning is that the errors of each model (in this case decision tree) are
independent and different from tree to tree.
Bootstrapping is the process of randomly sampling subsets of a dataset over a given number of
iterations and a given number of variables. These results are then averaged together to obtain a more
powerful result. Bootstrapping is an example of an applied ensemble model.
The bootstrapping Random Forest algorithm combines ensemble learning methods with the decision
tree framework to create multiple randomly drawn decision trees from the data, averaging the results to
output a new result that often leads to strong predictions/classifications.

Random Forest Regression is a supervised learning algorithm that uses ensemble learning method for
regression. Ensemble learning method is a technique that combines predictions from multiple machine
learning algorithms to make a more accurate prediction than a single model.

Genba Sopanrao Moze College of Engineering, Balewadi 28

Laboratory Practice III Lab Manual Dept of Computer Engineering

The diagram above shows the structure of a Random Forest. You can notice that the trees run in parallel
with no interaction amongst them. A Random Forest operates by constructing several decision trees during
training time and outputting the mean of the classes as the prediction of all the trees. To get a better
understanding of the Random Forest algorithm, let’s walk through the steps:
1. Pick at random k data points from the training set.
2. Build a decision tree associated to these k data points.
3. Choose the number N of trees you want to build and repeat steps 1 and 2.
4. For a new data point, make each one of your N-tree trees predict the value of y for the data
point in question and assign the new data point to the average across all of the
predicted y values.
A Random Forest Regression model is powerful and accurate. It usually performs great on many problems,
including features with non-linear relationships. Disadvantages, however, include the following: there is no
interpretability, overfitting may easily occur, we must choose the number of trees to include in the model.

Random Forest Regression Model:

We will use the sklearn module for training our random forest regression model, specifically the
RandomForestRegressor function. The RandomForestRegressor documentation shows many different
parameters we can select for our model. Some of the important parameters are highlighted below:
● n_estimators — the number of decision trees you will be running in the model
● criterion — this variable allows you to select the criterion (loss function) used to determine
model outcomes. We can select from loss functions such as mean squared error (MSE) and
mean absolute error (MAE). The default value is MSE.
● max_depth — this sets the maximum possible depth of each tree
● max_features — the maximum number of features the model will consider when determining
a split
● bootstrap — the default value for this is True, meaning the model follows bootstrapping
principles (defined earlier)
● max_samples — This parameter assumes bootstrapping is set to True, if not, this parameter
doesn’t apply. In the case of True, this value sets the largest size of each sample for each tree.
● Other important parameters are min_samples_split, min_samples_leaf, n_jobs, and others
that can be read in the sklearn’s.
● Let’s see Random Forest Regression in action!
●
After importing the libraries, importing the dataset, addressing null values, and dropping any
necessary columns, we are ready to create our Random Forest Regression model!

Step 1: Identify your dependent (y) and independent variables (X)

Step 2: Split the dataset into the Training set and Test set

Genba Sopanrao Moze College of Engineering, Balewadi 29

Laboratory Practice III Lab Manual Dept of Computer Engineering

Step 3: Training the Random Forest Regression model on the whole dataset

rf = RandomForestRegressor(n_estimators = 300, max_features = 'sqrt', max_depth = 5, random_state =

18).fit(x_train, y_train)

Looking at our base model above, we are using 300 trees; max_features per tree is equal to the squared
root of the number of parameters in our training dataset. The max depth of each tree is set to 5. And
lastly, the random_state was set to 18 just to keep everything standard.

Step 4: Predicting the Test set results

when we solve classification problems, we can view our performance using metrics such as accuracy,
precision, recall, etc. When viewing the performance metrics of a regression model, we can use factors
such as mean squared error, root mean squared error, R², adjusted R², and others.

R² score tells us how well our model is fitted to the data by comparing it to the average line of the
dependent variable. If the score is closer to 1, then it indicates that our model performs well versus if the
score is farther from 1, then it indicates that our model does not perform so well.

Conclusion: We studied how to Predict the price of the Uber ride from a given pickup point to the agreed
drop-off location using linear regression and random forest regression models. Also Evaluated the models
and compare their respective scores like R2, RMSE

Genba Sopanrao Moze College of Engineering, Balewadi 30

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

Implement K-Means clustering/ hierarchical clustering on sales_data_sample.csv dataset. Determine the

number of clusters using the elbow method. Dataset link:
https://www.kaggle.com/datasets/kyanyoga/sample-sales-data

Objective:

To understand and implement:

1. K-Means clustering
2. Determine the number of clusters using the elbow method

Theory:

What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into
different clusters. Here K defines the number of pre-defined clusters that need to be created in the process,
as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so on.

It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each
dataset belongs only one group that has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of
groups in the unlabeled dataset on its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this
algorithm is to minimize the sum of distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and
repeats the process until it does not find the best clusters. The value of k should be predetermined in this
algorithm.

The k-means clustering algorithm mainly performs two tasks:

o Determines the best value for K center points or centroids by an iterative process.
o Assigns each data point to its closest k-center. Those data points which are near to the k-center,
create a cluster.

Hence each cluster has datapoints with some commonalities, and it is away from other clusters.

Genba Sopanrao Moze College of Engineering, Balewadi 31

Laboratory Practice III Lab Manual Dept of Computer Engineering

The below diagram explains the working of the K-means Clustering Algorithm:

Algorithms:

How does the K-Means Algorithm Work?

The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new closest centroid of each
cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.

How to choose the value of "K number of clusters" in K-means Clustering?

The performance of the K-means clustering algorithm depends upon highly efficient clusters that it forms.
But choosing the optimal number of clusters is a big task. There are some different ways to find the
optimal number of clusters, but here we are discussing the most appropriate method to find the number of
clusters or value of K. The method is given below:

Elbow Method

The Elbow method is one of the most popular ways to find the optimal number of clusters. This method
uses the concept of WCSS value. WCSS stands for Within Cluster Sum of Squares, which defines the

Genba Sopanrao Moze College of Engineering, Balewadi 32

Laboratory Practice III Lab Manual Dept of Computer Engineering

total variations within a cluster. The formula to calculate the value of WCSS (for 3 clusters) is given below:

WCSS= ∑PiinCluster1 distance (Pi C1)2 +∑PiinCluster2distance (Pi C2)2+∑Pi in CLuster3 distance(Pi C3)2

In the above formula of WCSS,

∑Pi in Cluster1 distance (Pi C1)2: It is the sum of the square of the distances between each data point and
its centroid within a cluster1 and the same for the other two terms.

To measure the distance between data points and centroid, we can use any method such as Euclidean
distance or Manhattan distance.

To find the optimal value of clusters, the elbow method follows the below steps:

o It executes the K-means clustering on a given dataset for different K values (ranges from 1-10).
o For each value of K, calculates the WCSS value.
o Plots a curve between calculated WCSS values and the number of clusters K.
o The sharp point of bend or a point of the plot looks like an arm, then that point is considered as the
best value of K.

Since the graph shows the sharp bend, which looks like an elbow, hence it is known as the elbow method.
The graph for the elbow method looks like the below image:

Applications of K-means Clustering

The concern of the fact is that the data is always complicated, mismanaged, and noisy. Let’s learn where we
can implement k-means clustering among various
1. K-means clustering is applied in the Call Detail Record (CDR) Analysis. It gives in-depth vision
about customer requirements and satisfaction based on call-traffic during the time of the day and
demographic of a particular location.

Genba Sopanrao Moze College of Engineering, Balewadi 33

Laboratory Practice III Lab Manual Dept of Computer Engineering

2. It is used in the clustering of documents to identify the compatible documents in the same place.
3. It is deployed to classify the sounds based on their identical patterns and segregate malformation
in them.
4. It serves as the model of lossy images compression technique, in the confinement of images,
K-means makes clusters pixels of an image in order to decrease the total size of it.
5. It is helpful in the business sector for recognizing the portions of purchases made by customers,
also to cluster movements on apps and websites.
6. In the field of insurance and fraud detection on the basis of prior data, it is plausible to cluster
fraudulent consumers to demand based on their proximity to clusters as the patterns indicate.

K-means vs Hierarchical Clustering

1. K-means clustering produces a specific number of clusters for the disarranged and flat dataset,
where Hierarchical clustering builds a hierarchy of clusters, not for just a partition of objects
under various clustering methods and applications.
2. K-means can be used for categorical data and first converted into numeric by assigning rank, where
Hierarchical clustering was selected for categorical data but due to its complexity, a new technique
is considered to assign rank value to categorical features.
3. K-means are highly sensitive to noise in the dataset and perform well than Hierarchical clustering
where it is less sensitive to noise in a dataset.
4. Performance of the K-Means algorithm increases as the RMSE decreases and the RMSE
decreases as the number of clusters increases so the time of execution increases, in contrast to
this, the performance of Hierarchical clustering is less.
5. K-means are good for a large dataset and Hierarchical clustering is good for small datasets.

Conclusion: Thus, we have understood what Clustering is, K-mean clustering. We also Determine the
number of clusters using the elbow method and Implemented K-mean clustering.

Genba Sopanrao Moze College of Engineering, Balewadi 34

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

Implement K-Nearest Neighbors algorithm on diabetes.csv dataset. Compute confusion matrix, accuracy,
error rate, precision and recall on the given dataset. Dataset link:
https://www.kaggle.com/datasets/abdallamahgoub/diabetes

Objective:

To understand and implement:

1. K-Nearest Neighbors.
2. Compute confusion matrix, accuracy, error rate, precision and recall on the given dataset.

Theory:

What is K-Nearest Neighbors Algorithm?

K-Nearest Neighbour (KNN) Algorithm for Machine Learning

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised
Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases and put the
new case into the category that is most like the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the similarity.
This means when new data appears then it can be easily classified into a well suite category by
using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for
the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an action on
the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data, then it
classifies that data into a category that is much like the new data.

Genba Sopanrao Moze College of Engineering, Balewadi 35

Laboratory Practice III Lab Manual Dept of Computer Engineering

Why do we need a K-NN Algorithm?

Suppose there are two categories, i.e., Category A and Category B, and we have a new data point x1, so
this data point will lie in which of these categories. To solve this type of problem, we need a K-NN
algorithm. With the help of K-NN, we can easily identify the category or class of a particular dataset.
Consider the below diagram:

How does K-NN work?

The K-NN working can be explained based on the below algorithm:

o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor is
maximum.
o Step-6: Our model is ready.

How to select the value of K in the K-NN Algorithm?

o There is no particular way to determine the best value for "K", so we need to try some values to
find the best out of them. The most preferred value for K is 5.
o A very low value for K such as K=1 or K=2, can be noisy and lead to the effects of outliers in the
model.
o Large values for K are good, but it may find some difficulties.

Genba Sopanrao Moze College of Engineering, Balewadi 36

Laboratory Practice III Lab Manual Dept of Computer Engineering

Advantages of KNN Algorithm:

o It is simple to implement.
o It is robust to the noisy training data
o It can be more effective if the training data is large.
Disadvantages of KNN Algorithm:

o Always needs to determine the value of K which may be complex some time.
o The computation cost is high because of calculating the distance between the data points for all the
training samples.

Steps to implement the K-NN algorithm:

o Data Pre-processing step

o Fitting the K-NN algorithm to the Training set
o Predicting the test result
o Test accuracy of the result (Creation of Confusion matrix)
o Visualizing the test set result.

Python Code for K-NN

Fitting K-NN classifier to the Training data:

Now we will fit the K-NN classifier to the training data. To do this we will import
the KNeighborsClassifier class of Sklearn Neighbors library. After importing the class, we will create
the Classifier object of the class. The Parameter of this class will be
o n_neighbors: To define the required neighbors of the algorithm. Usually, it takes 5.
o metric='minkowski': This is the default parameter and it decides the distance between the points.
o p=2: It is equivalent to the standard Euclidean metric.

#Fitting K-NN classifier to the training set

from sklearn.neighbors import KNeighborsClassifier
classifier= KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2 )
classifier.fit(x_train, y_train)
Predicting the Test Result:
To predict the test set result, we will create a y_pred vector. Below is the code for it:

#Predicting the test set result

y_pred= classifier.predict(x_test

Genba Sopanrao Moze College of Engineering, Balewadi 37

Laboratory Practice III Lab Manual Dept of Computer Engineering

Creating the Confusion Matrix:

Now we will create the Confusion Matrix for our K-NN model to see the accuracy of the classifier. Below
is the code for it:

#Creating the Confusion matrix

from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test, y_pred)
Compute confusion matrix, accuracy, error rate, precision and recall on the given dataset.

What is Confusion Matrix and why you need it?

Confusion Matrix is a performance measurement for machine learning classification problem where output
can be two or more classes. It is a table with 4 different combinations of predicted and actual values. In this
case, TN = 55, FP = 5, FN = 10, TP = 30. The confusion matrix is as follows.

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy, and most importantly AUC-
ROC curves.

Let’s understand TP, FP, FN, TN

True Positive:
Interpretation: You predicted positive and it’s true.
You predicted that a Man is terrorist, and he is.

Genba Sopanrao Moze College of Engineering, Balewadi 38

Laboratory Practice III Lab Manual Dept of Computer Engineering

True Negative:
Interpretation: You predicted negative and it’s true.
You predicted that a man is not terrorist, and he is not.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.
You predicted that a man is terrorist, but he is not.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.
You predicted that a man is not terrorist, but he is.

Accuracy is the given as

What is the accuracy of the machine learning model for this classification task?

Accuracy represents the number of correctly classified data instances over the total number of data
instances.
In this example, Accuracy = (55 + 30)/(55 + 5 + 30 + 10 ) = 0.85 and in percentage the accuracy will be
85%.

Is accuracy the best measure?

Accuracy may not be a good measure if the dataset is not balanced (both negative and positive classes have
different number of data instances).
Even if data is imbalanced, we can figure out that our model is working well or not. For that, the values of
TPR and TNR should be high, and FPR and FNR should be as low as possible.
With the help of TP, TN, FN, and FP, other performance metrics can be calculated.

Genba Sopanrao Moze College of Engineering, Balewadi 39

Laboratory Practice III Lab Manual Dept of Computer Engineering

Precision, Recall
Both precision and recall are crucial for information retrieval, where positive class mattered the most as
compared to negative. Why?
While searching something on the web, the model does not care about something irrelevant and not
retrieved (this is the true negative case). Therefore, only TP, FP, FN are used in Precision and Recall.
Precision
What does precision mean?
Precision should ideally be 1 (high) for a good classifier. Precision becomes 1 only when the numerator
and denominator are equal i.e TP = TP +FP, this also means FP is zero. As FP increases the value of
denominator becomes greater than the numerator and precision value decreases (which we don’t want).
So in the TERRORIST example,
precision = 30/(30+ 5) = 0.857

Recall

Recall is also known as sensitivity or true positive rate and is defined as follows:

Recall should ideally be 1 (high) for a good classifier. Recall becomes 1 only when the numerator and
denominator are equal i.e TP = TP +FN, this also means FN is zero. As FN increases the value of
denominator becomes greater than the numerator and recall value decreases (which we don’t want).
So in the TERRORIST example let us see what will be the recall.

Recall = 30/ (30+ 10) = 0.75

Conclusion: Thus, we Implemented K-Nearest Neighbors algorithm on given data set and to evaluate the
performance we Computed confusion matrix, accuracy, error rate, precision and recall on the given
dataset.

Genba Sopanrao Moze College of Engineering, Balewadi 40

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

Given a bank customer, build a neural network-based classifier that can determine whether they will leave
or not in the next 6 months. Dataset Description: The case study is from an open-source dataset from
Kaggle. The dataset contains 10,000 sample points with 14 distinct features such as Customer Id, Credit
Score, Geography, Gender, Age, Tenure, Balance, etc. Link to the Kaggle project:
https://www.kaggle.com/barelydedicated/bank-customer-churn-modeling Perform following steps: 1. Read
the dataset. 2. Distinguish the feature and target set and divide the data set into training and test sets. 3.
Normalize the train and test data. 4. Initialize and build the model. Identify the points of improvement and
implement the same. 5. Print the accuracy score and confusion matrix (5 points).

Objective:

1 To build a neural network-based classifier

2. Compute confusion matrix, accuracy score on the given dataset.

Theory:

What is Neural Network?

Neural Network is a series of algorithms that are trying to mimic the human brain and find the relationship
between the sets of data. It is being used in various use-cases like in regression, classification, Image
Recognition and many more.

As we have talked above that neural networks tries to mimic the human brain then there might be the
difference as well as the similarity between them. Let us talk in brief about it.

Some major differences between them are biological neural network does parallel processing whereas the
Artificial neural network does series processing also in the former one processing is slower (in millisecond)
while in the latter one processing is faster (in a nanosecond).

Genba Sopanrao Moze College of Engineering, Balewadi 41

Laboratory Practice III Lab Manual Dept of Computer Engineering

Architecture of ANN

A neural network has many layers, and each layer performs a specific function, and as the complexity of
the model increases, the number of layers also increases that why it is known as the multi-layer perceptron.

The purest form of a neural network has three layers input layer, the hidden layer, and the output layer. The
input layer picks up the input signals and transfers them to the next layer and finally, the output layer gives
the final prediction, and these neural networks must be trained with some training data as well like machine
learning algorithms before providing a particular problem. Now, let’s understand more about perceptron.

Perceptron

As discussed above multi-layered perceptron these are basically the hidden or the dense layers. They are
made up of many neurons and neurons are the primary unit that works together to form perceptron. In
simple words, as you can see in the above picture each circle represents neurons and a vertical combination
of neurons represents perceptron’s which is basically a dense layer.

Now in the above picture, you can see each neuron’s detailed view. Here, each neurons have some weights
(in above picture w1, w2, w3) and biases and based on this computations are done as, combination = bias +
weights * input (F = w1*x1 + w2*x2 + w3*x3) and finally activation function is applied output =
activation(combination) in above picture activation is sigmoid represented by 1/(1 + e-F). There are
some other activation functions as well like ReLU, Leaky ReLU, tanh, and many more.

Genba Sopanrao Moze College of Engineering, Balewadi 42

Laboratory Practice III Lab Manual Dept of Computer Engineering

Working Of ANN

At First, information is feed into the input layer which then transfers it to the hidden layers, and
interconnection between these two layers assign weights to each input randomly at the initial point. and
then bias is added to each input neuron and after this, the weighted sum which is a combination of weights
and bias is passed through the activation function. Activation Function has the responsibility of which node
to fire for feature extraction and finally output is calculated. This whole process is known as Forward
Propagation. After getting the output model to compare it with the original output and the error is known
and finally, weights are updated in backward propagation to reduce the error and this process continues for
a certain number of epochs (iteration). Finally, model weights get updated, and prediction is done.

Advantages

1. ANN has the ability to learn and model non-linear and complex relationships as many relationships
between input and output are non-linear.
2. After training, ANN can infer unseen relationships from unseen data, and hence it is generalized.
3. Unlike many machines learning models, ANN does not have restrictions on datasets like data
should be Gaussian distributed or nay other distribution.

Applications

There are many applications of ANN. Some of them are:

1. Image Preprocessing and Character Recognition.

2. Forecasting.
3. Credit rating.
4. Fraud Detection.
5. Portfolio Management

Python code for ANN

1. Understanding and Loading the datasets

import numpy as np
import pandas as pd
from keras.models import Sequential
from keras.layers import Dense

Genba Sopanrao Moze College of Engineering, Balewadi 43

Laboratory Practice III Lab Manual Dept of Computer Engineering

2. Defining the Keras Model

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Models in Keras are defined as a sequence of layers in which each layer is added one after another. The
input should contain input features and is specified when creating the first layer with the
input_dims argument. Here inputs_dims will be 8.
A fully connected network with a three-layer is used which is defined using the Dense Class. The first
argument takes the number of neurons in that layer and, and the activation argument takes the activation
function as an input. Here ReLU is used as an activation function in the first two layers and sigmoid in the
last layer as it is a binary classification problem.

3. Compile Keras Model

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

While compiling we must specify the loss function to calculate the errors, the optimizer for updating the
weights and any metrics.
In this case, we will use “binary_crossentropy“as the loss argument as it is a binary classification problem.
Here we will take optimizer as “adam“as it automatically tunes itself and gives good results in a wide range
of problems and finally we will collect and report the classification accuracy through metrics argument.

4. Fitting the Keras Model.

model.fit(X, y, epochs=150, batch_size=10)

Now we will fit our model on the loaded data by calling the fit() function on the model.
The training process will run for a fixed number of iterations through the dataset which is specified using
the epochs argument. The number of dataset rows should be and are updated within each epoch and set
using the batch_size argument.
Here, We will run for 150 epochs and a batch size of 10.

5. Evaluate Keras Model

_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))

Genba Sopanrao Moze College of Engineering, Balewadi 44

Laboratory Practice III Lab Manual Dept of Computer Engineering

The evaluation of the model on the dataset can be done using the evaluate() function. It takes two
arguments i.e, input and output. It will generate a prediction for each input and output pair and collect
scores, including the average loss and any metrics such as accuracy.
The evaluate () function will return a list with two values first one is the loss of the model and the second
will be the accuracy of the model on the dataset. We are only interested in reporting the accuracy and hence
we ignored the loss value.

6. Make Predictions

predictions = model.predict(X)
rounded = [round(x[0]) for x in predictions]

Prediction can be done by calling the predict () function on the model. Here sigmoid activation function is
used on the output layer, so the predictions will be a probability in the range between 0 and 1.

Performance evaluation is done using Confusion metrics which is covered in previous assignments.

Conclusion: we have learned how to build a neural network-based classifier that can determine whether
bank customer will leave or not in the next 6 months.

Genba Sopanrao Moze College of Engineering, Balewadi 45

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

Classify the email using the binary classification method. Email Spam detection has two states: a) Normal
State – Not Spam, b) Abnormal State – Spam. Use K-Nearest Neighbors and Support Vector Machine for
classification. Analyze their performance. Dataset link: The emails.csv dataset on the Kaggle
ttps://www.kaggle.com/datasets/balaka18/email-spam-classification-dataset-csv

Objective:

1. To read the dataset and classify Email is Spam or Not Spam using K-NN
2. To read the dataset and classify Email is Spam or Not Spam using SVM
3. Compute confusion matrix, accuracy score on the given dataset.

Theory:

What is SVM?

Support Vector Machine (SVM) is a robust classification and regression technique that maximizes the
predictive accuracy of a model without overfitting the training data. SVM is particularly suited to analyzing
data with very large numbers (for example, thousands) of predictor fields.

SVM has applications in many disciplines, including customer relationship management (CRM), facial and
other image recognition, bioinformatics, text mining concept extraction, intrusion detection, protein
structure prediction, and voice and speech recognition.

SVM works by mapping data to a high-dimensional feature space so that data points can be categorized,
even when the data are not otherwise linearly separable. A separator between the categories is found, then
the data are transformed in such a way that the separator could be drawn as a hyperplane. Following this,
characteristics of new data can be used to predict the group to which a new record should belong.

For example, consider the following figure, in which the data points fall into two different categories.

Figure 1. Original dataset

Genba Sopanrao Moze College of Engineering, Balewadi 46

Laboratory Practice III Lab Manual Dept of Computer Engineering

The two categories can be separated with a curve, as shown in the following figure.

Figure 2. Data with separator added

After the transformation, the boundary between the two categories can be defined by a hyperplane, as
shown in the following figure.

Figure 3. Transformed data

The mathematical function used for the transformation is known as the kernel function. SVM in
Modeler supports the following kernel types:

● Linear
● Polynomial
● Radial basis function (RBF)
● Sigmoid

A linear kernel function is recommended when linear separation of the data is straightforward. In other
cases, one of the other functions should be used. You will need to experiment with the different functions to
obtain the best model in each case, as they each use different algorithms and parameters. Below figure
shows how kernel functions on data

Genba Sopanrao Moze College of Engineering, Balewadi 47

Laboratory Practice III Lab Manual Dept of Computer Engineering

Tuning Hyperparameters

● Kernel: The main function of the kernel is to transform the given dataset input data into the
required form. There are various types of functions such as linear, polynomial, and radial basis
function (RBF). Polynomial and RBF are useful for non-linear hyperplane. Polynomial and RBF
kernels compute the separation line in the higher dimension. In some of the applications, it is
suggested to use a more complex kernel to separate the classes that are curved or nonlinear. This
transformation can lead to more accurate classifiers.

● Regularization: Regularization parameter in python's Scikit-learn C parameter used to maintain

regularization. Here C is the penalty parameter, which represents misclassification or error term.
The misclassification or error term tells the SVM optimization how much error is bearable. This is
how you can control the trade-off between decision boundary and misclassification term. A smaller
value of C creates a small-margin hyperplane, and a larger value of C creates a larger-margin
hyperplane.

● Gamma: A lower value of Gamma will loosely fit the training dataset, whereas a higher value of
gamma will exactly fit the training dataset, which causes over-fitting. In other words, you can say a
low value of gamma considers only nearby points in calculating the separation line, while a value
of gamma considers all the data points in the calculation of the separation line.

Advantages of SVM:
● Effective in high dimensional cases
● Its memory efficient as it uses a subset of training points in the decision function called support
vectors
● Different kernel functions can be specified for the decision functions and its possible to specify
custom kernels
● SVM Classifiers offer good accuracy and perform faster prediction compared to Naïve Bayes
algorithm.

Genba Sopanrao Moze College of Engineering, Balewadi 48

Laboratory Practice III Lab Manual Dept of Computer Engineering

The disadvantages of support vector machines include:

● If the number of features is much greater than the number of samples, avoid over-fitting in
choosing Kernel functions and regularization term is crucial.

● SVMs do not directly provide probability estimates, these are calculated using an
expensive five-fold cross-validation.

● SVM is not suitable for large datasets because of its high training time, and it also takes
more time in training compared to Naïve Bayes. It works poorly with overlapping classes and is
also sensitive to the type of kernel used.

SVM Kernel:
The SVM kernel is a function that takes low dimensional input space and transforms it into higher-
dimensional space, i.e., it converts not separable problem to separable problem. It is mostly useful in non-
linear separation problems. Simply put the kernel, it does some extremely complex data transformations
then finds out the process to separate the data based on the labels or outputs defined.

SVM implementation in python:

Objective: Predict if cancer is beningn or malignant.

Using historical data about patients diagnosed with cancer, enable the doctors to differentiate
malignant cases and benign given the independent attributes.

# import libraries
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
# Importing Data file
data = pd.read_csv('bc2.csv')
dataset = pd.DataFrame(data)
dataset.columns
dataset = dataset.replace('?', np.nan)
dataset = dataset.apply(lambda x: x.fillna(x.median()),axis=0)

# converting the hp column from object 'Bare Nuclei'/ string type to float
dataset['Bare Nuclei'] = dataset['Bare Nuclei'].astype('float64')
dataset.isnull().sum()

Genba Sopanrao Moze College of Engineering, Balewadi 49

Laboratory Practice III Lab Manual Dept of Computer Engineering

from sklearn.model_selection import train_test_split

# To calculate the accuracy score of the model

from sklearn.metrics import accuracy_score, confusion_matrix
target = dataset["Class"]
features = dataset.drop(["ID","Class"], axis=1)
X_train, X_test, y_train, y_test = train_test_split(features,target, test_size = 0.2, random_state = 10)
from sklearn.svm import SVC

# Building a Support Vector Machine on train data

svc_model = SVC(C= .1, kernel='linear', gamma= 1)
svc_model.fit(X_train, y_train)
prediction = svc_model .predict(X_test)
# check the accuracy on the training set
print(svc_model.score(X_train, y_train))
print(svc_model.score(X_test, y_test))
Output:
0.9749552772808586

0.9642857142857143

print("Confusion Matrix:\n",confusion_matrix(prediction,y_test))

Confusion Matrix:
[[95 2]

[ 3 40]]

# Building a Support Vector Machine on train data

svc_model = SVC(kernel='rbf')
svc_model.fit(X_train, y_train)
print(svc_model.score(X_train, y_train))
print(svc_model.score(X_test, y_test))
Output:
0.998211091234347

0.9571428571428572

#Building a Support Vector Machine on train data (changing the kernel)

svc_model = SVC(kernel='poly')
svc_model.fit(X_train, y_train)
prediction = svc_model.predict(X_test)
print(svc_model.score(X_train, y_train))
print(svc_model.score(X_test, y_test))

Genba Sopanrao Moze College of Engineering, Balewadi 50

Laboratory Practice III Lab Manual Dept of Computer Engineering

Output:
1.0

0.9357142857142857
svc_model = SVC(kernel='sigmoid')
svc_model.fit(X_train, y_train)

prediction = svc_model.predict(X_test)

print(svc_model.score(X_train, y_train))
print(svc_model.score(X_test, y_test))

Output:
0.3434704830053667
0.32857142857142857

K-NN code for Email Classification

df=pd.read_csv('/content/emails.csv',error_bad_lines=False)
df.dropna(inplace = True)
df.drop(['Email No.'],axis=1,inplace=True)
X = df.drop(['Prediction'],axis = 1)
y = df['Prediction']
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=7)

knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print("Prediction",y_pred)
print("KNN accuracy = ",metrics.accuracy_score(y_test,y_pred))

KNN accuracy = 0.8009020618556701

print("Confusion matrix",metrics.confusion_matrix(y_test,y_pred))
Confusion matrix
[[804 293]
[ 16 439]]

SVM code Email Classification

# cost C = 1
model = SVC(C = 1)

Genba Sopanrao Moze College of Engineering, Balewadi 51

Laboratory Practice III Lab Manual Dept of Computer Engineering

# fit
model.fit(X_train, y_train)

# predict
y_pred = model.predict(X_test)
print("SVM accuracy = ",metrics.accuracy_score(y_test,y_pred))
SVM accuracy = 0.9381443298969072
metrics.confusion_matrix(y_true=y_test, y_pred=y_pred)
array(
[[1091, 6], '
[ 90, 365]])

Conclusion: We understood what SVM is, different kernels used for hyper parameter tuning.
We also implemented K-NN and SVM for Email classification and found that SVM gives good result
compared to K-NN for the given data set.

Genba Sopanrao Moze College of Engineering, Balewadi 52

Laboratory Practice III Lab Manual Dept of Computer Engineering

GNMENT NO.1

Problem Statement:

Installation of MetaMask and study spending Ether per transaction.

Objective:

● Understand What is MetaMask

● Installation of MetaMask
● Understand what Ether is
● Creating wallet and spending Ether per transaction

Theory:

What is MetaMask?

Blockchain offers privacy, transparency, and immutability. You will be powered to use applications,
transact anywhere, and do a lot more without anyone watching (read Google, Governments).

But there are various blockchains, each one coded for a different purpose.

However, Ethereum, a gigantic, decentralized ecosystem, is for the masses. And MetaMask is a free,
open-source, hot wallet to get you rolling with Ethereum.

Hot wallets are free and can be used from any internet-connected device. This also brings in the single most
vulnerability of such wallets–security. While a single low-key user can be quite safe with a hot wallet, you
should beware of crypto exchanges using them–and better look for cold wallets supported exchanges.Let’s
look at some of the prominent features of this wallet:

Ease of use

Starting with MetaMask is easy, quick, and anonymous. You don’t even need an email address. Just set up a
password and remember (and store) the secret recovery phrase, and you’re done.

Security

Your information is encrypted in your browser that nobody has access to. In the event of a lost password,
you have the 12-word secret recovery phase (also called a seed phrase) for recovery. Notably, it’s essential
to keep the seed phrase safe, as even MetaMask has no information about it. Once lost, it can’t be retrieved.

Genba Sopanrao Moze College of Engineering, Balewadi 53

Laboratory Practice III Lab Manual Dept of Computer Engineering

Built-In Crypto Store

If you’re wondering, no, you can’t buy Bitcoin with MetaMask. It only supports Ether and other Ether-
related tokens, including the famous ERC-20 tokens. Cryptocurrencies (excluding Ether) on Ethereum are
built as ERC-20 tokens.

Backup and Restore

MetaMask stores your information locally. So, in case you switch browsers or machines, you can restore
your MetaMask wallet with your secret recovery phrase.

Community Support

As of August 2021, MetaMask was home to 10 million monthly active users around the world. Its simple
and intuitive user interface keeps pushing these numbers with a recorded 1800% increase from July 2020.

Conclusively, try MetaMask if hot wallets are your pick. Let’s begin with the installation before moving to
its use cases.

How to Get Started

1. Install Metamask

Install Metamask from the project’s official website or an app store. It is safest to install from the official
website because app stores have accidentally hosted fake Metamask apps in the past.

Chrome, Firefox, Brave, and Edge all support Metamask. Opera users can use Metamask through Chrome
extensions, though issues have been reported. Apple and Android devices also support the app.

Genba Sopanrao Moze College of Engineering, Balewadi 54

Laboratory Practice III Lab Manual Dept of Computer Engineering

Once you have installed Metamask, click on the Metamask icon in your browser’s toolbar to open the app.
Then, click on “Get Started.”

2. Set up a New Wallet

Click on “Create a Wallet” to make a new Ethereum wallet. (Or, if you have created one already,
follow these instructions to restore your wallet and access your existing funds.)

Agree or disagree with feedback sharing to continue.

Create a password for your wallet. Though you should safely store this password, you can recover your
wallet even if you lose your password.

Genba Sopanrao Moze College of Engineering, Balewadi 55

Laboratory Practice III Lab Manual Dept of Computer Engineering

Click on the grey area to unlock your seed phrase. Be sure to store this seed phrase safely.

Anyone who knows it will be able to access your Ethereum wallet, and you will not be able to recover your
wallet without it. You can safely store your seed phrase by making multiple backups or storing the phrase
in a durable metal wallet.

Click on your seed phrase’s words in the right order (1) to prove that you have written them down correctly.
Then, click “Confirm” (2). Your Metamask wallet is ready for use in transactions.

Genba Sopanrao Moze College of Engineering, Balewadi 56

Laboratory Practice III Lab Manual Dept of Computer Engineering

3. Buy or Deposit Cryptocurrency

Now you can add funds to your wallet. Click on your wallet address (1) to copy it, then send ETH to that
address from an exchange.

Or, if you have not yet purchased ETH, click on “Buy” (2) to buy funds from Metamask’s built-in
exchange.

You will see your balance in the lower portion of the wallet (3) once your funds have been deposited. ETH
will show up by default. If custom ERC-20 tokens do not show up, click “Add Token” to add those tokens
to the list.

It may take several minutes for your ETH to arrive, depending on the amount of traffic that Ethereum is
experiencing.

4. Send Cryptocurrency

Once you own cryptocurrency, you can send your funds to other users, merchants, or your own additional
ETH wallets. To do so, click on the “Send” button in Metamask’s main panel. Enter the amount of ETH
you want to send (1) and the amount of transaction fees you want to pay (2).

Then, click “Next” (3).

A higher fee will help your transaction get confirmed faster. Metamask automatically sets a fee by default.
However, sites like EthGasStation can help you find an ideal fee manually.

Genba Sopanrao Moze College of Engineering, Balewadi 57

Laboratory Practice III Lab Manual Dept of Computer Engineering

Click “Next” (3) to finalize the details of your transaction. Then, click “Confirm.” Once again, it may take
some time for your transaction to be confirmed.

You can check the status of your transaction in Metamask’s “Activity” panel. If your transaction stalls,
see this page.

5. Spend Crypto in a dApp

If you want to spend your ETH in a DApp, chose an app from DAppRadar. In this example, we will use
Kyber, a decentralized exchange (DEX) that allows you to swap Ethereum for altcoins easily.

Visit KyberSwap’s web page. Enter the amount of crypto you want to buy and choose the tokens you want
to swap (1). Then, link your Metamask wallet to Kyberswap by clicking on “Connect Wallet” and choosing
Metamask (2).

A Metamask panel will pop up. Connect to Kyber if it is your first visit. Click “Next,” then click
“Confirm.”

Genba Sopanrao Moze College of Engineering, Balewadi 58

Laboratory Practice III Lab Manual Dept of Computer Engineering

Return to Kyber’s web page and click “Swap Now” (1). Then, in the popup, click “Confirm” (2) to perform
the transaction.

Once you have done that, Kyber will broadcast your transaction to miners. You do not need to wait for the
transaction to be mined before closing the window. You can check the status later in Metamask’s activity
panel or on Etherscan.

What Can You Do with Metamask?

Now that you know the basics of Metamask, you can use all of Ethereum’s features, including:

Genba Sopanrao Moze College of Engineering, Balewadi 59

Laboratory Practice III Lab Manual Dept of Computer Engineering

● Sending and receiving transactions between standard Ethereum addresses, such as those
owned by individuals and merchants.
● Paying for transactions in DApps such as games, gambling apps, DeFi apps, and
decentralized exchanges.
● Storing ETH and custom tokens (i.e., ERC-20 tokens).
● Storing collectibles and non-fungible tokens (NFTs).
● Connecting to Ledger and Trezor hardware wallets.

Conclusion:

We learned how to do Installation of MetaMask and study spending Ether per transaction.

Genba Sopanrao Moze College of Engineering, Balewadi 60

Laboratory Practice III Lab Manual Dept of Computer Engineering

IGNMENT NO.1

Problem Statement:

Create your own wallet using Metamask for crypto transactions.

Objective:

● Creating Own wallet using Metamask for crypto transactions.

Theory:

1. Set up a New Wallet

Click on “Create a Wallet” to make a new Ethereum wallet. (Or, if you have created one already,
follow these instructions to restore your wallet and access your existing funds.)

Agree or disagree with feedback sharing to continue.

Create a password for your wallet. Though you should safely store this password, you can recover your
wallet even if you lose your password.

Genba Sopanrao Moze College of Engineering, Balewadi 61

Laboratory Practice III Lab Manual Dept of Computer Engineering

Click on the grey area to unlock your seed phrase. Be sure to store this seed phrase safely.

Click on your seed phrase’s words in the right order (1) to prove that you have written them down correctly.
Then, click “Confirm” (2). Your Metamask wallet is ready for use in transactions.

2. Buy or Deposit Cryptocurrency

Genba Sopanrao Moze College of Engineering, Balewadi 62

Laboratory Practice III Lab Manual Dept of Computer Engineering

Now you can add funds to your wallet. Click on your wallet address (1) to copy it, then send ETH to that
address from an exchange.

Or, if you have not yet purchased ETH, click on “Buy” (2) to buy funds from Metamask’s built-in
exchange.

It may take several minutes for your ETH to arrive, depending on the amount of traffic that Ethereum is
experiencing.

3. Send Cryptocurrency

Then, click “Next” (3).

A higher fee will help your transaction get confirmed faster. Metamask automatically sets a fee by default.
However, sites like EthGasStation can help you find an ideal fee manually.

Click “Next” (3) to finalize the details of your transaction. Then, click “Confirm.” Once again, it may take
some time for your transaction to be confirmed.

Genba Sopanrao Moze College of Engineering, Balewadi 63

Laboratory Practice III Lab Manual Dept of Computer Engineering

You can check the status of your transaction in Metamask’s “Activity” panel. If your transaction stalls,
see this page.

4. Spend Crypto in a dApp

If you want to spend your ETH in a DApp, chose an app from DAppRadar. In this example, we will use
Kyber, a decentralized exchange (DEX) that allows you to swap Ethereum for altcoins easily.

A Metamask panel will pop up. Connect to Kyber if it is your first visit. Click “Next,” then click
“Confirm.”

Genba Sopanrao Moze College of Engineering, Balewadi 64

Laboratory Practice III Lab Manual Dept of Computer Engineering

Return to Kyber’s web page and click “Swap Now” (1). Then, in the popup, click “Confirm” (2) to perform
the transaction.

Conclusion: We learn how to Create our own wallet using Metamask for crypto transactions.

Genba Sopanrao Moze College of Engineering, Balewadi 65

Laboratory Practice III Lab Manual Dept of Computer Engineering

Genba Sopanrao Moze College of Engineering, Balewadi 66

Laboratory Practice III Lab Manual Dept of Computer Engineering

GNMENT NO.1

Problem Statement:

Write a smart contract on a test network, for Bank account of a customer for following operations:
Deposit money Withdraw Money Show balance

Objective:

● To learn Solidity.
● To create Smart Contract using Solidity
● Write smart contract on a test network.

Theory:

First, we need to understand the differences between a paper contract and a smart contract and the reason
why smart contracts become increasingly popular and important in recent years. A contract, by definition,
is a written or spoken (mostly written) law-enforced agreement containing the rights and duties of the
parties. Because most of business contracts are complicated and tricky, the parties need to hire professional
agents or lawyers for protecting their own rights. However, if we hire those professionals every time we
sign contracts, it is going to be extremely costly and inefficient. Smart contracts perfectly solve this by
working on ‘If-Then’ principle and as escrow services. All participants need to put their money, ownership
right or other tradable assets into smart contracts before any successful transaction. If all participating
parties meet the requirement, smart contracts will simultaneously distribute stored assets to recipients and
the distribution process will be witnessed and verified by the nodes on Ethereum network.
There are a couple of languages we can use to program smart contract. Solidity, an object-oriented and
high-level language, is by far the most famous and well maintained one. We can use Solidity to create
various smart contracts which can be used in different scenarios, including voting, blind auctions and safe
remote purchase. In this lab, we will discuss the semantics and syntax of Solidity with specific explanation,
examples, and practices.
After deciding the coding language, we need to pick an appropriate compiler. Among various compilers
like Visual Code Studio, we will use Remix IDE in this and following labs because it can be directly
accessed from browser where we can test, debug, and deploy smart contracts without any installation.
Genba Sopanrao Moze College of Engineering, Balewadi 67
Laboratory Practice III Lab Manual Dept of Computer Engineering

Steps to Execute Solidity Smart Contract using Remix IDE Remix IDE is generally used to compile and
run Solidity smart contracts. Below are the steps for the compilation, execution, and debugging of the smart
contract.

Step 1: Open Remix IDE on any of your browsers, select on the New File and click on Solidity to choose
the environment.

Step 2: Write the Smart contract in the code section, and click the Compile button under the

Compiler window to compile the contract.

Genba Sopanrao Moze College of Engineering, Balewadi 68

Laboratory Practice III Lab Manual Dept of Computer Engineering

Step 3: To execute the code, click on the Deploy button under Deploy and Run Transactions

window. After deploying the code click on the drop-down on the console.

Code

//SPDX-License-Identifier: MIT
pragma solidity ^0.6;

contract banking
{
mapping(address=>uint) public user_account;
mapping(address=>bool) public user_exists;

Genba Sopanrao Moze College of Engineering, Balewadi 69

Laboratory Practice III Lab Manual Dept of Computer Engineering

function create_account() public payable returns(string memory)

{
require(user_exists[msg.sender]==false,'Account already created');
if(msg.value==0)
{
user_account[msg.sender]=0;
user_exists[msg.sender]=true;
return "Account created";
}
require(user_exists[msg.sender]==false,"Account already created");
user_account[msg.sender]=msg.value;
user_exists[msg.sender]=true;
return "Account created";
}

function deposit() public payable returns(string memory)

{
require(user_exists[msg.sender]==true,"Account not created");
require(msg.value>0,"Value for deposit is Zero");
user_account[msg.sender]=user_account[msg.sender]+msg.value;
return "Deposited Successfully";
}

function withdraw(uint amount) public payable returns(string memory)

{

require(user_account[msg.sender]>amount,"Insufficient Balance");
require(user_exists[msg.sender]==true,"Account not created");
require(amount>0,"Amount should be more than zero");
user_account[msg.sender]=user_account[msg.sender]-amount;
msg.sender.transfer(amount);
return "Withdrawl Successful";
}

function transfer(address payable userAddress, uint amount) public returns(string memory)

{
require(user_account[msg.sender]>amount,"Insufficient balance in Bank account");
require(user_exists[msg.sender]==true,"Account is not created");
require(user_exists[userAddress]==true,"Transfer account does not exist");
require(amount>0,"Amount should be more than zero");
user_account[msg.sender]=user_account[msg.sender]-amount;
user_account[userAddress]=user_account[userAddress]+amount;
return "Transfer Successful";
}

function send_amt(address payable toAddress, uint256 amount) public payable returns(string

memory)
{

Genba Sopanrao Moze College of Engineering, Balewadi 70

Laboratory Practice III Lab Manual Dept of Computer Engineering

require(user_account[msg.sender]>amount,"Insufficeint balance in Bank account");

require(user_exists[msg.sender]==true,"Account is not created");
require(amount>0,"Amount should be more than zero");
user_account[msg.sender]=user_account[msg.sender]-amount;
toAddress.transfer(amount);
return "Transfer Success";
}

function user_balance() public view returns(uint)

{
return user_account[msg.sender];
}
function account_exist() public view returns(bool)
{
return user_exists[msg.sender];
}
}

Sample Output
After deploying the contact successful you can observe following buttonscreate_account, deposit,
send_amt, transfer, account_exist, user_account, user_balance and user_exists.
Refer the following output
● Create account

● Deposit Amount

Genba Sopanrao Moze College of Engineering, Balewadi 71

Laboratory Practice III Lab Manual Dept of Computer Engineering

● Check Account Exists

● Check User Account Exists
● Check User Balance
● Check User Exists

● Send Amount

Genba Sopanrao Moze College of Engineering, Balewadi 72

Laboratory Practice III Lab Manual Dept of Computer Engineering

● Check User Account Balance

● Transfer Amount and Check User Account Balance

Genba Sopanrao Moze College of Engineering, Balewadi 73

Laboratory Practice III Lab Manual Dept of Computer Engineering

● Withdraw Amount and Check User Account Balance

Conclusion:
Thus, we have studied a smart contract on a test network for Bank account of a customer.

Genba Sopanrao Moze College of Engineering, Balewadi 74

Laboratory Practice III Lab Manual Dept of Computer Engineering

Genba Sopanrao Moze College of Engineering, Balewadi 75

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

Write a program in solidity to create Student data. Use the following constructs for following operations:

● Structures

● Arrays

● Fallback

Objective:

● Understand and explore the working of Blockchain technology and its applications

Theory:

Step 1: Open Remix IDE on any of your browsers, select on the New File and click on Solidity to
choose the environment.

Step 2: Write the Student Management code in the code section, and click the Compile

Genba Sopanrao Moze College of Engineering, Balewadi 76

Laboratory Practice III Lab Manual Dept of Computer Engineering

button under the Compiler window to compile the contract

Step 3: To execute the code, click on the Deploy button under Deploy and Run Transactions
window. After deploying the code click on the drop-down on the console.

Code

pragma solidity ^0.6.

Genba Sopanrao Moze College of Engineering, Balewadi 77

Laboratory Practice III Lab Manual Dept of Computer Engineering

contract Student_management
{
struct Student {
int stud_id;
string name;
string department.
}
Student [] Students.
function add_stud(intstud_id,string memory name, string memory department) public{
Student memory stud = Student(stud_id,name,department);
Students.push(stud);
}
function getStudent(int stud_id) public view returns (string memory, string memory){
for (uinti=0; i<Students.length;i++)
{
Student memory stud = Students[i];
if(stud.stud_id==stud_id){
return(stud.name,stud.department);
}
}
return ("Not Found", "Not Found");
}
}

Sample Output

After deploying the contact successful you can observe two buttons add_stud and getStudents. Give the
input stud_id, name dept and click on getStudents button, enter the stud_id which you have given as an
Input and get the information of Students name and department.

Refer the following output

Genba Sopanrao Moze College of Engineering, Balewadi 78

Laboratory Practice III Lab Manual Dept of Computer Engineering

Conclusion: Hence, we have studied a program in solidity to create Student data.

Genba Sopanrao Moze College of Engineering, Balewadi 79

Laboratory Practice III Lab Manual Dept of Computer Engineering

Problem Statement:

Write a survey report on types of Blockchain and its real time use cases.

Objective:

● Understand and the types of Blockchain technology and its real time applications

Theory:

A blockchain is a digital ledger of all cryptocurrency transactions. It is constantly growing as "completed"

blocks are added to it with a new set of recordings. Each block contains a cryptographic hash of the
previous block, a timestamp, and transaction data. Bitcoin nodes use the block chain to differentiate
legitimate Bitcoin transactions from attempts to re-spend coins that have already been spent elsewhere.
Types of Blockchain
There are three types of blockchains- public, private and consortium.
1. Public Blockchain: A public blockchain has absolutely no access restrictions. Anyone with an internet
connection can send transactions to it as well as become a validator (i.e., Participate in the consensus
process). Bitcoin is the best example of a public blockchain.
2. Private Blockchain: A private blockchain is a little more centralized. Here, a central authority controls
who can access the network and who can become a validator. Validators on a private blockchain are
typically vetted by the central authority. Permissioned blockchains are often used in enterprise settings
where centralized control is necessary. An example of a private blockchain is Hyperledger Fabric.
3. Consortium Blockchain: A consortium blockchain is a hybrid of the public and private blockchain.
Here, a group of companies or organizations control who can access the network and who can become a
validator. The selection of validators is typically done through a voting process. Consortium blockchains
are often used in cases where multiple parties need to collaborate, but no party is fully trusted. An example
of a consortium blockchain is the R3 Corda platform.
Real-Time Use Cases of Blockchain
1. Supply Chain Management Blockchain can be used to create an immutable record of all
the transactions in a supply chain. This can help to increase transparency and traceability
in the supply chain.
Genba Sopanrao Moze College of Engineering, Balewadi 80
Laboratory Practice III Lab Manual Dept of Computer Engineering

2. Identity Management Blockchain can be used to create a digital identity for individuals,

organizations, and devices. This can be used for KYC (know your customer) and AML (anti-money
laundering) compliance.
3. Payments Blockchain can be used to process payments. This can be done using cryptocurrencies or fiat
currencies.
4. Data Management Blockchain can be used to store data in a tamper-proof and decentralized manner.
This can be used for data sharing and data security.
5. IoT Blockchain can be used to create a decentralized network of IoT devices. This can be used for data
sharing and data security.
6. Predictive Analytics Blockchain can be used to create a decentralized network of predictive analytics
models. This can be used for data sharing and data security.

Conclusion:
Hence, we have studied to write a survey report on types of Blockchains and its real time use cases.

Genba Sopanrao Moze College of Engineering, Balewadi 81

Lab Lec 1a - Laboratory Rules and Safety Precautions
No ratings yet
Lab Lec 1a - Laboratory Rules and Safety Precautions
52 pages
CTRF
No ratings yet
CTRF
2 pages
LP3 Lab Manual
No ratings yet
LP3 Lab Manual
174 pages
ANALOGY Bank Without Password
No ratings yet
ANALOGY Bank Without Password
8 pages
Hands On Practice
50% (2)
Hands On Practice
66 pages
Atp3 34x40
No ratings yet
Atp3 34x40
228 pages
HFHDJSJWDJNDNDKWM
No ratings yet
HFHDJSJWDJNDNDKWM
81 pages
The Rise of Bioceramics
No ratings yet
The Rise of Bioceramics
6 pages
Surveys (Tunneling)
No ratings yet
Surveys (Tunneling)
66 pages
Group 1 - FC1 G12 01 STEM - 1st Draft of RRL
No ratings yet
Group 1 - FC1 G12 01 STEM - 1st Draft of RRL
10 pages
CAPE Computer Science Unit 1 - Proposal
No ratings yet
CAPE Computer Science Unit 1 - Proposal
2 pages
2 Staad Analysis Output
No ratings yet
2 Staad Analysis Output
7 pages
Morphology of Flowering Plants Learn Cbse
No ratings yet
Morphology of Flowering Plants Learn Cbse
6 pages
Intro: Fibonacci Numbers: Daniel Kane
No ratings yet
Intro: Fibonacci Numbers: Daniel Kane
44 pages
Session 2. Legal, Technological, Accounting, Political Environments and The Role of Culture
No ratings yet
Session 2. Legal, Technological, Accounting, Political Environments and The Role of Culture
25 pages
INSIDE OUT - Reaction Paper
No ratings yet
INSIDE OUT - Reaction Paper
1 page
LP 3 Updated
No ratings yet
LP 3 Updated
43 pages
LP1 1
No ratings yet
LP1 1
129 pages
Top Bar Beekeeping (Text)
No ratings yet
Top Bar Beekeeping (Text)
5 pages
SoftwareLab Manual Aug2019
No ratings yet
SoftwareLab Manual Aug2019
33 pages
Python Lab File - Copy 1
No ratings yet
Python Lab File - Copy 1
71 pages
All Aboard Unit 1
No ratings yet
All Aboard Unit 1
7 pages
Programming Assignment 2: Algorithmic Warm-Up
No ratings yet
Programming Assignment 2: Algorithmic Warm-Up
14 pages
Programming Assignment 2: Algorithmic Warm-Up
No ratings yet
Programming Assignment 2: Algorithmic Warm-Up
14 pages
Lp3 Updated
No ratings yet
Lp3 Updated
43 pages
Lec02 Efficient Fibonacci Number Model of Computation
No ratings yet
Lec02 Efficient Fibonacci Number Model of Computation
28 pages
Programming Assignment 2: Algorithmic Warm-Up
No ratings yet
Programming Assignment 2: Algorithmic Warm-Up
11 pages
Week2 Algorithmic Warmup
No ratings yet
Week2 Algorithmic Warmup
14 pages
Resume Piping Superintendent Gedeandi
No ratings yet
Resume Piping Superintendent Gedeandi
5 pages
Jennifer Bridges
No ratings yet
Jennifer Bridges
3 pages
58 Circular 2023
No ratings yet
58 Circular 2023
8 pages
Assignment 1: 1 Asymptotic Complexity
No ratings yet
Assignment 1: 1 Asymptotic Complexity
6 pages
Thomasyl CV
No ratings yet
Thomasyl CV
7 pages
Othello Analysis
No ratings yet
Othello Analysis
2 pages
4MS Year Lesson Plan 1 Seq 1 2018-2019
No ratings yet
4MS Year Lesson Plan 1 Seq 1 2018-2019
3 pages
Soft Computing Laboratory Lab Manual
100% (1)
Soft Computing Laboratory Lab Manual
32 pages
Practical Research 1 Module 1 Performance Task
100% (2)
Practical Research 1 Module 1 Performance Task
5 pages
Root Cause Analysis Through 5 Whys
No ratings yet
Root Cause Analysis Through 5 Whys
27 pages
01 Introduction Problems
No ratings yet
01 Introduction Problems
16 pages
AIML-lab-manual Final CSE
No ratings yet
AIML-lab-manual Final CSE
43 pages
Week2 Algorithmic Warmup
No ratings yet
Week2 Algorithmic Warmup
10 pages
Narrative Tenses - Docx - Google Dokumenti
No ratings yet
Narrative Tenses - Docx - Google Dokumenti
2 pages
3dsro CompanyPresentation
No ratings yet
3dsro CompanyPresentation
10 pages
Daa Lp-Iii Lab Manual 2022-23
0% (1)
Daa Lp-Iii Lab Manual 2022-23
17 pages
Mirza Kayesh Begg - 250274290 - CompleteReport
No ratings yet
Mirza Kayesh Begg - 250274290 - CompleteReport
12 pages
UpdatedNew Lp3LabManual
No ratings yet
UpdatedNew Lp3LabManual
118 pages
AI&ML Lab Manual
No ratings yet
AI&ML Lab Manual
31 pages
(Laboratory No. 3.2: Fibonacci) : Objectives
No ratings yet
(Laboratory No. 3.2: Fibonacci) : Objectives
5 pages
Week 1 - Lecture Prinsip Perakaunan Principles of Accounting (Bt11003)
No ratings yet
Week 1 - Lecture Prinsip Perakaunan Principles of Accounting (Bt11003)
30 pages
GAD Activity Design Template
No ratings yet
GAD Activity Design Template
2 pages
24 Efficiency 4pp
No ratings yet
24 Efficiency 4pp
5 pages
Mic by Ritik
No ratings yet
Mic by Ritik
12 pages
Algorithms - A Simple Introduction in Python: Part Six
No ratings yet
Algorithms - A Simple Introduction in Python: Part Six
3 pages
Preparation of Blood Films For Malaria Detection
No ratings yet
Preparation of Blood Films For Malaria Detection
10 pages
FPT Lab Manual 2024 - Nep 2
No ratings yet
FPT Lab Manual 2024 - Nep 2
15 pages
LP 3 - UCP Lab Manual
No ratings yet
LP 3 - UCP Lab Manual
210 pages
Task 2
No ratings yet
Task 2
2 pages
BSIOTR COMP LP-III Lab Manual
No ratings yet
BSIOTR COMP LP-III Lab Manual
172 pages
lp3 NBN Manual
No ratings yet
lp3 NBN Manual
156 pages
cs3230 Lec01b Beamer
No ratings yet
cs3230 Lec01b Beamer
46 pages
Assignment No 1 (DAA)
No ratings yet
Assignment No 1 (DAA)
6 pages
LP-III Lab Manual (Daa+Ml+Bct)
No ratings yet
LP-III Lab Manual (Daa+Ml+Bct)
173 pages
Ada 13
No ratings yet
Ada 13
91 pages
LP - III Lab Manual
No ratings yet
LP - III Lab Manual
54 pages
Fod Lab
No ratings yet
Fod Lab
21 pages
Daa Lab
No ratings yet
Daa Lab
66 pages
Rescue Asd
No ratings yet
Rescue Asd
12 pages
Fibonacci Sequence Generator 2
No ratings yet
Fibonacci Sequence Generator 2
10 pages
Be LP Iii Lab Manual
No ratings yet
Be LP Iii Lab Manual
173 pages
Algorithm Lab Manual Updated
No ratings yet
Algorithm Lab Manual Updated
66 pages
Python Lab
No ratings yet
Python Lab
21 pages
IPReport
No ratings yet
IPReport
7 pages
Lecture 2 Growth of Algorithms
No ratings yet
Lecture 2 Growth of Algorithms
75 pages
CS102 Lab1 Spring2025
No ratings yet
CS102 Lab1 Spring2025
13 pages
(Ebook PDF) Linear Algebra With Applications 9th Edition Download
100% (2)
(Ebook PDF) Linear Algebra With Applications 9th Edition Download
50 pages
Mic Microproject - pdf123
No ratings yet
Mic Microproject - pdf123
16 pages
Mic Microproject - pdf12345
No ratings yet
Mic Microproject - pdf12345
16 pages
Mic Microproject - pdf12345
No ratings yet
Mic Microproject - pdf12345
15 pages
Mic Microproject - pdf123
No ratings yet
Mic Microproject - pdf123
13 pages
Uday 12345
No ratings yet
Uday 12345
14 pages
Fibonacci Series 1740243350
No ratings yet
Fibonacci Series 1740243350
19 pages
ML 512
No ratings yet
ML 512
41 pages
For Studnets Updated Labmanual After Approval
No ratings yet
For Studnets Updated Labmanual After Approval
20 pages
01 Intro 4 1 Runtimes
No ratings yet
01 Intro 4 1 Runtimes
20 pages
Lab 2 - Recursion - DATA STRUCTURE & ALGORITHMS
No ratings yet
Lab 2 - Recursion - DATA STRUCTURE & ALGORITHMS
10 pages
BE - LP III Lab Manual
No ratings yet
BE - LP III Lab Manual
54 pages
CS3401 Lab Manual Final
No ratings yet
CS3401 Lab Manual Final
37 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet