0% found this document useful (0 votes)
53 views54 pages

ML Manual - 2023-24

Uploaded by

Dev Sejvani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views54 pages

ML Manual - 2023-24

Uploaded by

Dev Sejvani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 54

A Laboratory Manual for

Introduction to Machine Learning


(3171114)

B.E. Semester 7
(Electronics and Communication)

Vishwakarma Government Engineering College, Chandkheda

Directorate of Technical Education, Gandhinagar,


Gujarat

Year : 2023-24
Vishwakarma Government Engineering College, Chandkheda

Certificate

This is to certify that Mr./Ms. ___________________________________


Enrollment No. _______________ of B.E. Semester _____ Electronics &
Communication Engineering of this Institute (GTU Code: _____ ) has
satisfactorily completed the Practical / Tutorial work for the subject of
Introduction to Machine Learning (3171114) for the academic year 2023-24.

Place: __________
Date: __________

Name and Sign of Faculty member Head of the Department


Introduction to Machine Learning (3171114)

Preface

Main motto of any laboratory/practical/field work is for enhancing required skills as well as
creating ability amongst students to solve real time problem by developing relevant
competencies in psychomotor domain. By keeping in view, GTU has designed competency
focused outcome-based curriculum for engineering degree programs where sufficient weightage
is given to practical work. It shows importance of enhancement of skills amongst the students
and it pays attention to utilize every second of time allotted for practical amongst students,
instructors and faculty members to achieve relevant outcomes by performing the experiments
rather than having merely study type experiments. It is must for effective implementation of
competency focused outcome-based curriculum that every practical is keenly designed to serve
as a tool to develop and enhance relevant competency required by the various industry among
every student. These psychomotor skills are very difficult to develop through traditional chalk
and board content delivery method in the classroom. Accordingly, this lab manual is designed
to focus on the industry defined relevant outcomes, rather than old practice of conducting
practical to prove concept and theory.

By using this lab manual students can go through the relevant theory and procedure in advance
before the actual performance which creates an interest and students can have basic idea prior to
performance. This in turn enhances pre-determined outcomes amongst students. Each
experiment in this manual begins with competency, course outcomes as well as practical
outcomes (objectives). The students will also achieve safety and necessary precautions to be
taken while performing practical.

This manual also provides guidelines to faculty members to facilitate student centric lab
activities through each experiment by arranging and managing necessary resources in order that
the students follow the procedures with required safety and necessary precautions to achieve the
outcomes. It also gives an idea that how students will be assessed by providing rubrics.

This lab manual is focuses on the development of Computer Programs for machine learning that
can change when exposed to new data. In this manual we’ll see basics of Machine Learning, and
implementation of a simple machine-learning algorithm using python.
Machine learning is a method of teaching computers to learn from data, without being explicitly
programmed. Python is a popular programming language for machine learning because it has a
large number of powerful libraries and frameworks that make it easy to implement machine
learning algorithms.
To get started with machine learning using Python, you will need to have a basic understanding of
Python programming and some knowledge of mathematical concepts such as probability, statistics,
and linear algebra.

Utmost care has been taken while preparing this lab manual however always there is chances of
improvement. Therefore, we welcome constructive suggestions for improvement and removal
of errors if any.
Introduction to Machine Learning (3171114)

Practical – Course Outcome matrix

Course Outcomes (COs):


CO1: Understand basic concepts of machine learning as well as challenges involved
CO2:Learn and implement various basic machine learning algorithms.
CO3:Study dimensionality reduction concept and its role in machine learning techniques.
CO4:Realize concepts of advanced machine learning algorithms.
CO5:Comprehend basic concepts of Neural network and its use in machine learning.

Sr. CO CO CO CO CO
Aim / Objective(s) of Experiment
No. 1 2 3 4 5

1. Introduction to Libraries used for Machine Learning. √

Create dataset of 12 Samples each having 10 features.


2. √ √
Split it into training and testing dataset.

3. Exploratory Data Analysis on Iris Dataset √

Write a program to implement simple linear regression


4. (A) from manual dataset. √
(B) from scikit-learn python library
Write a program to implement Multiple linear
5. √
regression.
Write a program to implement Decision tree on IRIS
6. √
datasets.

7. Write a program to implement logistic regression. √

Write a program to implement SVM classifier using


8. √
linear kernel using IRIS datasets.
Write a program to implement SVM classifier using
9. √
rbf kernel using IRIS datasets.
Write a program of dimensionality reduction using
10. √
PCA.
Write a program to implement Random Forest
11. √
algorithm feature extraction using IRIS Dataset
Write a program to implement K-Means Clustering
12. √
using IRIS Dataset.

13. Write a program to implement Navie Bias algorithm √

Write a program to implement and visualize the


14. √
working of activation function.
To implement Convolution Neural Network and find
15 out the total parameters of Convolution Neural √
Network.
Introduction to Machine Learning (3171114)

Industry Relevant Skills

The following industry relevant competency are expected to be developed in the student by
undertaking the practical work of this laboratory.
1.
2.

Guidelines for Faculty members


1. Teacher should provide the guideline with demonstration of practical to the students
with all features.
2. Teacher shall explain basic concepts/theory related to the experiment to the students
before starting of each practical
3. Involve all the students in performance of each experiment.
4. Teacher is expected to share the skills and competencies to be developed in the
students and ensure that the respective skills and competencies are developed in the
students after the completion of the experimentation.
5. Teachers should give opportunity to students for hands-on experience after the
demonstration.
6. Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by concerned industry.
7. Give practical assignment and assess the performance of students based on task
assigned to check whether it is as per the instructions or not.
8. Teacher is expected to refer complete curriculum of the course and follow the
guidelines for implementation.

Instructions for Students


1. Students are expected to carefully listen to all the theory classes delivered by the faculty
members and understand the COs, content of the course, teaching and examination scheme,
skill set to be developed etc.
2. Students shall organize the work in the group and make record of all observations.
3. Student shall attempt to develop related hand-on skills and build confidence.
4. Student shall develop the habits of evolving more ideas, innovations, skills etc. apart from
those included in scope of manual.
5. Student shall refer technical magazines and data books.
6. Student should develop a habit of submitting the experimentation work as per the schedule
and s/he should be well prepared for the same.

Common Safety Instructions


1. Students are expected to proper shutdown computer after performing experiment.
2.
Introduction to Machine Learning (3171114)

Index
(Progressive Assessment Sheet)

Sr. Objective(s) of Experiment Pag Date Date Assess Sign. of Rema


No. e of of ment Teacher rks
No. perfor submis Marks with
mance sion date
1 Introduction to Libraries used for Machine
Learning.
2 Create dataset of 12 Samples each having 10
features. Split it into training and testing
dataset.
3
Exploratory Data Analysis on Iris Dataset
4 Write a program to implement simple linear
regression: (A) from manual dataset.
(B) from scikit-learn python library
5 Write a program to implement Multiple linear
regression.
6 Write a program to implement Decision tree
on IRIS datasets.
7 Write a program to implement logistic
regression.
8 Write a program to implement SVM classifier
using linear kernel using IRIS datasets.
9 Write a program to implement SVM classifier
using rbf kernel using IRIS datasets.
10 Write a program of dimensionality reduction
using PCA.
11 Write a program to implement Random Forest
algorithm feature extraction using IRIS
Dataset
12 Write a program to implement K-Means
Clustering using IRIS Dataset.
13 Write a program to implement Navie Bias
algorithm
14 Write a program to implement and visualize
the working of activation function.
15 To implement Convolution Neural Network
and find out the total parameters of
Convolution Neural Network.
Total
Introduction to Machine Learning (3171114)

Experiment No: 1

Introduction to library used for Machine Learning.

Date:

Competency and Practical Skills: Basic knowledge of Machine Learning

Relevant CO: Understand basic concepts of machine learning as well as challenges


involved.

Objectives: Introduction to Libraries used for Machine Learning.

TensorFlow Library:

TensorFlow is an open-source end-to-end platform for creating Machine Learning


applications. It was created and is maintained by Google. It is a symbolic math library
that uses dataflow and differentiable programming to perform various tasks focused
on training and inference of deep neural networks. It allows developers to create
machine learning applications using various tools, libraries, and community resources.

Tensor Flow is at present the most popular software library. There are several real-
world applications of deep learning that makes Tensor Flow popular. Being an Open-
Source library for deep learning and machine learning, Tensor Flow finds a role to
play in text-based applications, image recognition, voice search, and many more.
Deep Face, Facebook’s image recognition system, uses Tensor Flow for image
recognition. Every Google app that you use has made good use of Tensor Flow to
make your experience better.

Tensor flow’s name is directly derived from its core component: A tensor is a vector
or matrix of n-dimensions that represents all types of Tensor data. A tensor is a
vector/matrix of n-dimensions representing types of data. Values in a tensor are of
identical data types with a known shape, and this shape is the dimensionality of the
matrix. A vector is a one-dimensional tensor; a matrix is a two-dimensional tensor.

A tensor is an object with three properties:


(1) a unique label (name),
(2) a dimension (shape), and
(3) A data type (dtype).

Here is the first example of tensor Flow. It shows how you can define constants and
perform computation with those constants using the session.
Introduction to Machine Learning (3171114)

# Import `tensorflow`
import tensorflow as tf

# Initialize two constants


x1 = tf.constant([1,2,3,4])
x2 = tf.constant([5,6,7,8])

# Multiply
result = tf.multiply(x1, x2)

# Print the result


print(result)

Output: tf.Tensor([5 12 21 32], shape=(4,), dtype=int32)

Scikit-learn Library:

Scikit-learn is an open source machine learning library that supports supervised and
unsupervised learning. It also provides various tools for model fitting, data
preprocessing, model selection, model evaluation, and many other utilities. Scikit-
learn provides dozens of built-in machine learning algorithms and models,
called estimators. Each estimator can be fitted to some data using its fit method.

The library is built upon the SciPy (Scientific Python) that must be installed before
you can use scikit-learn. This stack that includes:

 NumPy: Base n-dimensional array package.


 SciPy: Fundamental library for scientific computing.
 Matplotlib: Comprehensive 2D/3D plotting.
 IPython: Enhanced interactive console.
 Sympy: Symbolic mathematics.
 Pandas: Data structures and analysis.

Extensions or modules for SciPy care conventionally named SciKits. As such, the
module provides learning algorithms and is named scikit-learn.

Scikit-learn provides below group of model to the user.

 Clustering − This model is used for grouping unlabeled data.


 Cross Validation − It is used to check the accuracy of supervised models on
unseen data. (splitting the dataset in testdataset and trainingdataset)
Introduction to Machine Learning (3171114)

 Dimensionality Reduction − It is used for reducing the number of attributes in


data which can be further used for summarisation, visualisation and feature
selection.
 Ensemble methods − As name suggest, it is used for combining the predictions of
multiple supervised models.
 Feature extraction − It is used to extract the features from data to define the
attributes in image and text data.
 Feature selection − It is used to identify useful attributes to create supervised
models.
 Open Source − It is open source library and also commercially usable under BSD
(Berkeley Software Distribution) license.

Pytorch library:

PyTorch is a Python-based scientific computing package serving two broad purposes:

 A replacement for NumPy to use the power of GPUs and other accelerators.
 An automatic differentiation library that is useful to implement neural networks.

PyTorch is closely related to the lua-based Torch framework which is actively used in
Facebook.

Features of PyTorch : The major features of PyTorch are mentioned below

Easy Interface: PyTorch offers easy to use API; hence it is considered to be very
simple to operate and runs on Python. The code execution in this framework is quite
easy.
Python usage: This library is considered to be Pythonic which smoothly integrates
with the Python data science stack. Thus, it can leverage all the services and
functionalities offered by the Python environment.
Computational graphs: PyTorch provides an excellent platform which offers
dynamic computational graphs. Thus a user can change them during runtime. This is
highly useful when a developer has no idea of how much memory is required for
creating a neural network model.
Pytorch Tensors: Tensors are a specialized data structure that are very similar to
arrays and matrices. Tensors are similar to NumPy’s ndarrays, except that tensors can
run on GPUs or other specialized hardware to accelerate computing.

Interoperability With Numpy:

Numpy is a popular open source library used for mathematical and scientific
computing in Python. Instead of reinventing the wheel, Pytorch interpolates really
well with Numpy to leverage its existing ecosystem of tools and libraries.
Introduction to Machine Learning (3171114)

Here’s how we create array using numpy:

import numpy as np
x = np. array ( [ [1, 2], [3, 4.]])

>> x
array ( [ [1 . , 2.],
[3., 4. ]])

We can convert a Numpy array in tensor using torch.from_numpy

# Convert the numpy array to a torch tensor.


y = torch. tensor (x)
>> y
tensor ( [ [1., 2. ],
[3., 4. ]], dtype=torch. float64)

Conclusion: (Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is Tensor Flow?. List the properties of Tensor flow.


2. What are the function of Scikit library.
3. Define (1) NumPy (2) SciPy (3) Matplotlib (3) IPython (4) Sympy.(5) Pandas
4. What is PyTorch?
5. What are the function of PyTorch.

Rubrics:

terminology Student can Student Student can Student can Remarks


not can understand understand all
understand understand all library library and
all library all library tools and implement all
tools and not tools but able to basic function
able to not able to implement
implementing implement some basic
basic basic function
function function
justification Poor Average Good Excellent
(0-2 marks) (3-5 marks) (5-7 marks) (8-10 m1rks)
Introduction to Machine Learning (3171114)

Experiment No: 2

Create train and test dataset

Date:

Competency and Practical Skills: Python, Scikit-learn

Relevant CO: Comprehend basic concepts of neural network and its use in machine
learning.

Objectives:

1) To learn train and test data generation


2) To create dataset of 12 samples each having 10 features.
3) Split it into training and testing dataset using Scikit-learn.

Assumption: Assume 12 sample is [0,1,1,0,1,0,0,1,1,0,1,0].

Code:

import numpy as np
from sklearn.model_selection import train_test_split

# 12 samples with 10 features


inp=np.arange(1,121).reshape(12,10)

# output classification
out=np.array([0,1,1,0,1,0,0,1,1,0,1,0])

#sklearn.model_selection.train_test_split(*arrays,**options)>list

x_train,x_test,y_train,y_test=train_test_split(inp,out,test_size=
3,random_state=4)

print("\n Train Samples are:\n ",x_train)


print("\nOutput classification of train sample is: \n",y_train)
print("\n Test Samples are: \n",x_test)
print("\n Output classification of Test sample is: \n",y_test)
Introduction to Machine Learning (3171114)

Output:

Train Samples are:


[[ 91 92 93 94 95 96 97 98 99 100]
[ 81 82 83 84 85 86 87 88 89 90]
[ 21 22 23 24 25 26 27 28 29 30]
[111 112 113 114 115 116 117 118 119 120]
[ 1 2 3 4 5 6 7 8 9 10]
[ 11 12 13 14 15 16 17 18 19 20]
[ 51 52 53 54 55 56 57 58 59 60]
[ 71 72 73 74 75 76 77 78 79 80]
[101 102 103 104 105 106 107 108 109 110]]

Output classification of train sample is:


[0 1 1 0 0 1 0 1 1]

Test Samples are:


[[31 32 33 34 35 36 37 38 39 40]
[41 42 43 44 45 46 47 48 49 50]
[61 62 63 64 65 66 67 68 69 70]]

Output classification of Test sample is:


[0 1 0]

Conclusion: (Sufficient space to be provided)

Question :(Sufficient space to be provided)


1. Define feature, training data and testing data.
2. Difference between training data and validation data.
3. How do you handle missing or corrupted data in dataset?
4. Write a function to split training and testing dataset using scikit learn.

Rubrics:

Rubrics 1 2 3 4 5
Introduction to Machine Learning (3171114)

Experiment No: 3

Analysis of iris dataset


Date:

Competency and Practical Skills: Python, Scikit-learn

Relevant CO: Learn and implement various basic machine learning algorithms.

Objectives: Exploratory Data Analysis on Iris Dataset.

Exploratory Data Analysis (EDA) is a technique to analyze data using some visual
Techniques. With this technique, we can get detailed information about the statistical
summary of the data. We will also be able to deal with the duplicates values, outliers,
and also see some trends or patterns present in the dataset.

Iris Dataset
Iris Dataset is considered as the Hello World for data science. It contains five columns
namely – Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type.

Note: This dataset can be downloaded from:


URL: https://datahub.io/machine-learning/iris

You can download the Iris.csv file from the above link. Now we will use the Pandas
library to load this CSV file, and we will convert it into
the dataframe. read_csv() method is used to read CSV files.
Example:

import pandas as pd

# Reading the CSV file


df = pd.read_csv("Iris.csv")

# Printing top 5 rows


df.head()

Getting Information about the Dataset


We will use the shape parameter to get the shape of the dataset. We can see that the
data frame contains 6 columns and 150 rows. Now, let’s also the columns and their
data types. For this, we will use the info() method.
Introduction to Machine Learning (3171114)

Example:

df.shape (150, 6) df.info()

The describe() function applies basic statistical computations on the dataset like
extreme values, count of data points standard deviation, etc. Any missing value or
NaN value is automatically skipped. describe() function gives a good picture of the
distribution of data.
Example:

df.describe()

Checking Missing Values


We will check if our data contains any missing values or not. Missing values can
occur when no information is provided for one or more items or for a whole unit. We
will use the isnull() method.
Example:

df.isnull().sum()
Introduction to Machine Learning (3171114)

Checking Duplicates
Let’s see if our dataset contains any duplicates or not.
Pandas drop_duplicates() method helps in removing duplicates from the data frame.

Example:

data=df.drop_duplicates(subset
="Species",)
data

We can see that there are only three unique species. Let’s see if the dataset is balanced
or not i.e. all the species contain equal amounts of rows or not. We will use
the Series.value_counts() function. This function returns a Series containing counts of
unique values.

Example:

df.value_counts("Species")

Data Visualization

Visualizing the target column

We will use Matplotlib and Seaborn library for the data visualization.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(x='Species', data=df, )
plt.show()
Introduction to Machine Learning (3171114)

Relation between variables

We will see the relationship between the sepal length and sepal width and also
between petal length and petal width.

Example 1: Comparing Sepal Length and Sepal Width

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.scatterplot(x='SepalLengthCm',
y='SepalWidthCm',
hue='Species',
data=df, )
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1),
loc=2)

plt.show()

From the above plot, we can infer that –


 Species Setosa has smaller sepal lengths but larger sepal widths.
 Versicolor Species lies in the middle of the other two species in terms of sepal
length and width
 Species Virginica has larger sepal lengths but smaller sepal widths.

Example 2: Comparing Petal Length and Petal Width

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.scatterplot(x='PetalLengthCm',y='Pe
talWidthCm',hue='Species', data=df, )
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1),
loc=2)
plt.show()

From the above plot, we can infer that –


 Species Setosa has smaller petal lengths and widths.
 Versicolor Species lies in the middle of the other two species in terms of petal
length and width
 Species Virginica has the largest of petal lengths and widths.
Introduction to Machine Learning (3171114)

Let’s plot all the column’s relationships using a pairplot. It can be used for
multivariate analysis.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot
as plt

sns.pairplot(df.drop(['Id'
], axis = 1),
hue='Species'
, height=2)

Histograms

Histograms allow seeing the distribution of data for various columns. It can be used
for uni as well as bi-variate analysis.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2,


figsize=(10,10))

axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'],
bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'],
bins=5);
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'],
bins=6);
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'],
bins=6);
Introduction to Machine Learning (3171114)

From the above plot, we can see that –


 The highest frequency of the sepal length is between 30 and 35 which is
between 5.5 and 6
 The highest frequency of the sepal Width is around 70 which is between 3.0
and 3.5
 The highest frequency of the petal length is around 50 which is between 1 and
2
 The highest frequency of the petal width is between 40 and 50 which is
between 0.0 and 0.5

Histograms with Distplot Plot

Distplot is used basically for the univariant set of observations and visualizes it
through a histogram i.e. only one observation and hence we choose one particular
column of the dataset.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"SepalLengthCm").add_legend()

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"SepalWidthCm").add_legend()

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"PetalLengthCm").add_legend()

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"PetalWidthCm").add_legend()

plt.show()
Introduction to Machine Learning (3171114)

From the above plots, we can see that –


 In the case of Sepal Length, there is a huge amount of overlapping.
 In the case of Sepal Width also, there is a huge amount of overlapping.
 In the case of Petal Length, there is a very little amount of overlapping.
 In the case of Petal Width also, there is a very little amount of overlapping.
So we can use Petal Length and Petal Width as the classification feature.

Handling Correlation

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type
columns in the dataframe it is ignored.
Example:

data.corr(method='pearso
n')

Heatmaps

The heatmap is a data visualization technique that is used to analyze the dataset as
colors in two dimensions. Basically, it shows a correlation between all numerical
variables in the dataset. In simpler terms, we can plot the above-found correlation
using the heatmaps.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.corr(method='pearson').drop(
['Id'], axis=1).drop(['Id'], axis=0),
annot = True);

plt.show()
Introduction to Machine Learning (3171114)

From the above graph, we can see that –


 Petal width and petal length have high correlations.
 Petal length and sepal width have good correlations.
 Petal Width and Sepal length have good correlations.

Box Plots

We can use boxplots to see how the categorical value os distributed with other
numerical values.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot
as plt

def graph(y):
sns.boxplot(x="Speci
es", y=y, data=df)

plt.figure(figsize=(10,1
0))

# Adding the subplot at


the specified
# grid position
plt.subplot(221)
graph('SepalLengthCm')

plt.subplot(222)
graph('SepalWidthCm')

plt.subplot(223)
graph('PetalLengthCm')

plt.subplot(224)
graph('PetalWidthCm')

plt.show()

From the above graph, we can see that –


 Species Setosa has the smallest features and less distributed with some outliers.
 Species Versicolor has the average features.
 Species Virginica has the highest features
Introduction to Machine Learning (3171114)

Handling Outliers
An Outlier is a data-item/object that deviates significantly from the rest of the (so-
called normal)objects. They can be caused by measurement or execution errors. The
analysis for outlier detection is referred to as outlier mining.

There are many ways to detect the outliers, and the removal process is the data frame
same as removing a data item from the panda’s dataframe.

Let’s consider the iris dataset and let’s plot the boxplot for the SepalWidthCm column.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as
plt

# Load the dataset


df = pd.read_csv('Iris.csv')

sns.boxplot(x='SepalWidthCm',
data=df)

In the above graph, the values above 4 and below 2 are acting as outliers.

Removing Outliers

For removing the outlier, one must follow the same process of removing an entry
from the dataset using its exact position in the dataset because in all the above
methods of detecting the outliers end result is the list of all those data items that
satisfy the outlier definition according to the method used.

Example:

We will detect the outliers using IQR and then we will remove them. We will also
draw the boxplot to see if the outliers are removed or not.
Introduction to Machine Learning (3171114)

# Importing
import sklearn
from sklearn.datasets import load_boston
import pandas as pd
import seaborn as sns

# Load the dataset


df = pd.read_csv('Iris.csv')
# IQR
Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')

Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1
print("Old Shape: ", df.shape)
# Upper bound
upper = np.where(df['SepalWidthCm'] >= (Q3+1.5*IQR))
# Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))
# Removing the Outliers
df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)
print("New Shape: ", df.shape)

sns.boxplot(x='SepalWidthCm', data=df)

Conclusion:(Sufficient space to be provided)

Question : (Sufficient space to be provided)

1. What is EDA?
2. Write a code for comparing Sepal length and sepal width.
3. Expalain Histogram.
4. Define the following function:(I) Heatmaps (ii)Box plot (iii)describe (iv)
checking duplicates.
5. Explain library used for data visualization
Introduction to Machine Learning (3171114)

Experiment No: 4 (A)

Linear Regression Algorithm


Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objectives:
1. Implement simple linear regression algorithm
2. Plot Regression line.
3. Find coefficient of linear regression

Code:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

# mean of x and y vector


m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients


b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector


y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()
Introduction to Machine Learning (3171114)

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)

if __name__ == "__main__":
main()

Output:
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

Conclusion:(Sufficient space to be provided)


Introduction to Machine Learning (3171114)

Experiment No: 4 (B)


Linear Regression Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement simple linear regression algorithm.
2. Display scatter plot of linear regression.
3. Find coefficient, mean squared error and variance score of linear regression

Code:

#First, we will start with importing necessary packages as follow


s −
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

#Next, we will load the diabetes dataset and create its object −

diabetes = datasets.load_diabetes()

#As we are implementing SLR, we will be using only one feature as


follows −

X = diabetes.data[:, np.newaxis, 2]

#Next, we need to split the data into training and testing sets a
s follows −

X_train = X[:-30]
X_test = X[-30:]

#Next, we need to split the target into training and testing sets
as follows −

y_train = diabetes.target[:-30]
y_test = diabetes.target[-30:]

#Now, to train the model we need to create linear regression obje


ct as follows −

regr = linear_model.LinearRegression()

#Next, train the model using the training sets as follows −


regr.fit(X_train, y_train)
Introduction to Machine Learning (3171114)

#Next, make predictions using the testing set as follows −


y_pred = regr.predict(X_test)

#Next, we will be printing some coefficient like MSE, Variance sc


ore etc. as follows −

print('Coefficients: \n', regr.coef_)


print("Mean squared error: %.2f" % mean_squared_error(y_test, y_p
red))
print('Variance score: %.2f' % r2_score(y_test, y_pred))

#Now, plot the outputs as follows −


plt.scatter(X_test, y_test, color='blue')
plt.plot(X_test, y_pred, color='red', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

Output:
Coefficients:
[941.43097333]
Mean squared error: 3035.06
Variance score: 0.41

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is Linear regression?


2. What are the common types of error in linear regression.
3. Define the following term (1) mean square error (2) variance
Introduction to Machine Learning (3171114)

Experiment No: 5

Multiple Linear Regression Algorithm


Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement multiple linear regression algorithm.
2. Display scatter plot of multiple linear regression.
3. Find difference between actual and predicted value.

Code:

# import libraries

import pandas as pd
import numpy as np

# Import dataset

data_df=pd.read_csv('/content/drive/MyDrive/test.csv')
# if you are using python rather than Google co-lab you can
upload csv file from your computer)

data_df.head()

# define x and y

x=data_df.drop(['PE'],axis=1).values
y=data_df['PE'].values
print(x)
print(y)

#split dataset in training set and test set

from sklearn.model_selection import train_test_split


x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=0.3
,random_state=0)

# traini the model on training set

from sklearn.linear_model import LinearRegression


ml=LinearRegression()
ml.fit(x_train,y_train)

#predict the test set results


y_pred=ml.predict(x_test)
print(y_pred)
ml.predict([[14.96,41.76,1024.07,73.17]])
Introduction to Machine Learning (3171114)

#evaluate the model

from sklearn.metrics import r2_score


r2_score(y_test,y_pred)

# plot the result

import matplotlib.pyplot as plt


plt.figure(figsize=(15,10))
plt.scatter(y_test,y_pred)
plt.xlabel('Actual')
plt.ylabel('predicted')
# predicted values
pred_y_df=pd.DataFrame({'Actual value':y_test,'predicted value':y
_pred,'Difference':y_test-y_pred})

pred_y_df[0:20]

Output:

[[14.96 41.76 1024.07 73.17]


[ 25.18 62.96 1020.04 59.08]
[ 5.11 39.4 1012.16 92.14]
...
[ 31.32 74.33 1012.92 36.48]
[ 24.48 69.45 1013.86 62.39]
[ 21.6 62.52 1017.23 67.87]]
[463.26 444.37 488.56 ... 429.57 435.74 453.28]
[431.40245096 458.61474119 462.81967423 ... 432.47380825 436.16417243
439.00714594]
Actual value predicted value Difference

0 431.23 431.402451 -0.172451

1 460.01 458.614741 1.395259

2 461.14 462.819674 -1.679674

3 445.90 448.601237 -2.701237

4 451.29 457.879479 -6.589479

5 432.68 429.676856 3.003144

6 477.50 473.017115 4.482885

7 459.68 456.532373 3.147627

8 477.50 474.342524 3.157476

9 444.99 446.364396 -1.374396


Introduction to Machine Learning (3171114)

Actual value predicted value Difference

10 444.37 441.946411 2.423589

11 437.04 441.452599 -4.412599

12 442.34 444.746375 -2.406375

13 440.74 440.874598 -0.134598

14 436.55 438.374490 -1.824490

15 460.24 454.370315 5.869685

16 448.66 444.904201 3.755799

17 432.94 437.370808 -4.430808

18 452.82 451.306760 1.513240

19 432.20 427.453009 4.746991

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. Explain simple and multiple linear regression.


2. What is the difference between linear and non-linear regression.
3. Define actual and predicted value.
Introduction to Machine Learning (3171114)

Experiment No: 6
Decision Tree Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement Decision tree algorithm on IRIS datasets.
2. Plot tree.
3. Find classification report, confusion matrix and accuracy.

Code:

#Import library
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import tree

#load file
#col_names=['Id','SepalLengthCm','SepalWidthCm','PetalLengthCm','
PetalWidthCm','PetalWidthCm','Species']

iris=pd.read_csv('/content/drive/MyDrive/finalIris.csv')
iris.head()

# split dataset into features and target variable

feature_cols=['Id','SepalLengthCm','SepalWidthCm','PetalLengthCm'
,'PetalWidthCm','PetalWidthCm']
x=iris[feature_cols]
y=iris.Species
x.head()

# Next, we will divide the data into train and test split.
#The following code will split the dataset into 70% training data
and 30% of testing data −

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.7,
random_state=0)

#Next, train the model with the help of DecisionTreeClassifier cl


ass of sklearn as follows −
clf=DecisionTreeClassifier()
clf=clf.fit(x_train,y_train)

#At last we need to make prediction.


Introduction to Machine Learning (3171114)

y_pred=clf.predict(x_test)

#find accuracy,confusion matrix, classification report


from sklearn.metrics import classification_report,confusion_matri
x,accuracy_score

result=confusion_matrix(y_test,y_pred)
print("Confusion Matrix:",)
print(result)

result1=classification_report(y_test,y_pred)
print("Classification Report:",)
print(result1)

result2=accuracy_score(y_test,y_pred)
print("Accuracy:",)
print(result2)

#plot decision tree


fn=['sepal length (cm)','sepal width (cm)','petal length (cm)','p
etal width (cm)']
cn=['setosa', 'versicolor', 'virginica']
fig,axes=plt.subplots(nrows=1,ncols=1,figsize=(4,4),dpi=300)
tree.plot_tree(clf,feature_names =fn, class_names=cn,filled True);
fig.savefig('imagename.png')

Output:

Conclusion:(Sufficient space to be provided)

Question:(Sufficient space to be provided)

1. Explain decision tree algorithm.


2. Explain classification of machine learning.
3. Define confusion matrix and its importance.
4. How can we plot decision tree using python.
Introduction to Machine Learning (3171114)

Experiment No: 7
Logistic Regression Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement Logistic regression algorithm.
2. Find classification report, confusion matrix and accuracy.

Code:

# import libraries

import matplotlib.pyplot as plt


import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

#get data

x=np.arange(10).reshape(-1,1)
y=np.array([0,1,0,0,1,1,1,1,1,1])

#create a model and train it

model=LogisticRegression(solver='liblinear',C=10.0,random_state=0)
model.fit(x,y)

#evaluate model

p_pred=model.predict_proba(x)
y_pred=model.predict(x)
score=model.score(x,y)
conf_m=confusion_matrix(y,y_pred)
report=classification_report(y,y_pred)

print('x:',x,sep='\n')
print('y:',y,sep='\n',end='\n\n')
print('intercept:',model.intercept_)
print('coef:',model.coef_,end='\n\n')
print('y_pred:',y_pred,end='\n\n')
print('score:',score,end='\n\n')
print('conf_m:', conf_m,sep='\n',end='\n\n')
print('report:,',report,sep='\n')
Introduction to Machine Learning (3171114)

Output:

x:
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
y:
[0 1 0 0 1 1 1 1 1 1]

intercept: [-1.51632619]
coef: [[0.703457]]

y_pred: [0 0 0 1 1 1 1 1 1 1]

score: 0.8

conf_m:
[[2 1]
[1 6]]

report:,
precision recall f1-score support

0 0.67 0.67 0.67 3


1 0.86 0.86 0.86 7

accuracy 0.80 10
macro avg 0.76 0.76 0.76 10
weighted avg 0.80 0.80 0.80 10

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is logistic regression?


2. Compare linear and logistic regression.
3. What are the different types of logistic regression?
4. What are the advantage of logistic regression?
Introduction to Machine Learning (3171114)

Experiment No: 8
SVM Classifier using Linear Kernel
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement SVM classifier using linear kernel
2. Plot scatter plot for classification of IRIS data

Code:

# importing files
import pandas as pd
import numpy as np
from sklearn import svm,datasets
import matplotlib.pyplot as plt

#load input data


iris=datasets.load_iris()

#taking two feature

x=iris.data[:, :2]
y=iris.target

#svm boundries

x_min,x_max=x[:,0].min()-1,x[:,0].max()+1
y_min,y_max=x[:,1].min()-1,x[:,1].max()+1
h=(x_max/x_min)/100

xx, yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_


max,h))
x_plot=np.c_[xx.ravel(),yy.ravel()]

#regularization parameter

C=1.0

#SVM classifier object


svc_classifier=svm.SVC(kernel='linear',C=C).fit(x,y)
z=svc_classifier.predict(x_plot)
z=z.reshape(xx.shape)
plt.figure(figsize=(15,5))
plt.subplot(121)
plt.contour(xx,yy,z,cmap=plt.cm.tab10,alpha=0.3)
plt.scatter(x[:,0],x[:,1],c=y,cmap=plt.cm.Set1)
plt.xlabel('sepal length')
Introduction to Machine Learning (3171114)

plt.ylabel('sepal width')
plt.xlim(xx.min(),xx.max())
plt.title('support vector classfier with linear kernal')

Output:

Text(0.5, 1.0, 'support vector classfier with linear kernal')

 Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. Explain support vector Machine algorithm.


2. Explain different types of kernal functions.
3. What do know about hard margin SVM and Soft margin SVM?
4. What is hinge loss?
Introduction to Machine Learning (3171114)

Experiment No: 9

SVM classifier using RBF Kernel


Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement SVM classifier using rbf kernel
2. Plot scatter plot for classification of IRIS data

Code:

# importing files
import pandas as pd
import numpy as np
from sklearn import svm,datasets
import matplotlib.pyplot as plt

#load input data


iris=datasets.load_iris()

#taking two feature

x=iris.data[:, :2]
y=iris.target

#svm boundries

x_min,x_max=x[:,0].min()-1,x[:,0].max()+1
y_min,y_max=x[:,1].min()-1,x[:,1].max()+1
h=(x_max/x_min)/100

xx, yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_


max,h))
x_plot=np.c_[xx.ravel(),yy.ravel()]

#regularization parameter

C=1.0

#SVM classifier object

svc_classifier=svm.SVC(kernel='rbf',gamma='auto',C=C).fit(x,y)
z=svc_classifier.predict(x_plot)
z=z.reshape(xx.shape)
plt.figure(figsize=(15,5))
plt.subplot(121)
plt.contour(xx,yy,z,cmap=plt.cm.tab10,alpha=0.3)
plt.scatter(x[:,0],x[:,1],c=y,cmap=plt.cm.Set1)
plt.xlabel('sepal length')
plt.ylabel('sepal width')
Introduction to Machine Learning (3171114)

plt.xlim(xx.min(),xx.max())
plt.title('support vector classfier with rbf kernal')

Output:

Text(0.5, 1.0, 'support vector classfier with linear kernal')

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What are some application of SVM?


2. List out some advantage of SVM?
3. What is hyperplane in SVM?
Introduction to Machine Learning (3171114)

Experiment No: 10
Principal Component Analysis Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Study dimensionality reduction concept and its role in machine
learning techniques.

Objective:
1. Implement dimensionality reduction using PCA.
2. Learn standard scalar and plot scaled output.
Code:

# Import necessary libraries


from sklearn import datasets # to retrieve the iris Dataset
import pandas as pd # to load the dataframe
from sklearn.preprocessing import StandardScaler # to standardize
the features
from sklearn.decomposition import PCA # to apply PCA
import seaborn as sns # to plot the heat maps

#Load the Dataset


iris = datasets.load_iris()
#convert the dataset into a pandas data frame
df = pd.DataFrame(iris['data'], columns = iris['feature_names'])
#display the head (first 5 rows) of the dataset
df.head()

#Standardize the features


#Create an object of StandardScaler which is present in sklearn.p
reproces#sing
scalar = StandardScaler()
scaled_data = pd.DataFrame(scalar.fit_transform(df)) #scaling t
he data
scaled_data

#Check the Co-relation between features without PCA


sns.heatmap(scaled_data.corr())

#Applying PCA
#Taking no. of Principal Components as 3
pca = PCA(n_components = 3)
pca.fit(scaled_data)
data_pca = pca.transform(scaled_data)
data_pca = pd.DataFrame(data_pca,columns=['PC1','PC2','PC3'])
data_pca.head()

#Checking Co-relation between features after PCA


sns.heatmap(data_pca.corr())
Introduction to Machine Learning (3171114)

#Checking Co-relation between features after PCA


sns.heatmap(data_pca.corr())

Output:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

0 1 2 3

0 -0.900681 1.019004 -1.340227 -1.315444

1 -1.143017 -0.131979 -1.340227 -1.315444

2 -1.385353 0.328414 -1.397064 -1.315444

3 -1.506521 0.098217 -1.283389 -1.315444

4 -1.021849 1.249201 -1.340227 -1.315444

... ... ... ... ...

145 1.038005 -0.131979 0.819596 1.448832

146 0.553333 -1.282963 0.705921 0.922303

147 0.795669 -0.131979 0.819596 1.053935

148 0.432165 0.788808 0.933271 1.448832

149 0.068662 -0.131979 0.762758 0.790671

150 rows × 4 columns


Introduction to Machine Learning (3171114)

PC1PC2PC30-2.2647030.480027-0.1277061-2.080961-0.674134-0.2346092-2.364229-
0.3419080.0442013-2.299384-0.5973950.0912904-2.3898420.6468350.015738

Conclusion:(Sufficient space to be provided)

Question :(Sufficient space to be provided)

1. What is Dimensionality Reduction?


2. What is PCA? What does PCA do?
3. What are the advantages of Dimensionality Reduction.
4. List down the steps of a PCA algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 11

Random Forest Algorithm

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Study dimensionality reduction concept and its role in machine
learning techniques.

Objective:
1. Implement random forest algorithm using python.
2. Visualize important feature on bar plot.
3. Find accuracy of model.

Code:

#Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# load the iris dataset


from sklearn.datasets import load_iris
iris=load_iris()

# store the feature matrix (X) and response vector (y)


x=iris.data
y=iris.target

# Creating a DataFrame of given iris dataset.


import pandas as pd
data=pd.DataFrame({
'sepal length':iris.data[:,0],
'sepal width':iris.data[:,1],
'petal length':iris.data[:,2],
'petal width':iris.data[:,3],
'species':iris.target
})
data.head()
X=data[['sepal length', 'sepal width', 'petal length', 'petal wid
th']] # Features
y=data['species'] # Labels

# Split dataset into training set and test set


X_train, X_test, y_train, y_test = train_test_split(X, y, test_si
ze=0.3) # 70% training and 30% test

#Import Random Forest Model


from sklearn.ensemble import RandomForestClassifier

#Create a Gaussian Classifier


clf=RandomForestClassifier(n_estimators=100)
Introduction to Machine Learning (3171114)

#Train the model using the training sets y_pred=clf.predict(X_tes


t)
clf.fit(X_train,y_train)

y_pred=clf.predict(X_test)

#Import scikit-learn metrics module for accuracy calculation


from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

species_idx = clf.predict([[3, 5, 4, 2]])[0]


iris.target_names[species_idx]

import pandas as pd
feature_imp = pd.Series(clf.feature_importances_,index=iris.featu
re_names).sort_values(ascending=False)
feature_imp

import matplotlib.pyplot as plt


import seaborn as sns
%matplotlib inline

# Creating a bar plot


sns.barplot(x=feature_imp, y=feature_imp.index)

# Add labels to your graph


plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title("Visualizing Important Features")
plt.legend()
plt.show()

Output:
Accuracy: 0.9555555555555556
Introduction to Machine Learning (3171114)

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What do mean by random forest algorithm?


2. What do you mean by bagging?
3. What does random refer to in ‘Random Forest’?
4. List down the advantages and disadvantages of Random Forest algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 12

K means Clustering
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement K-Means Clustering using IRIS Dataset.
2. To find frequency distribution of species using pair plot.
3. Visualize correlation using a heat map.

Code:

#import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import MinMaxScaler

#Reading Dataset

iris = pd.read_csv("/content/drive/MyDrive/finalIris.csv")
x = iris.iloc[:, [0, 1, 2, 3]].values

#information of datatypes

iris.info()
iris[0:10]

#Frequency distribution of species"

from pandas.core.groupby.grouper import Index


iris_outcome=pd.crosstab(index=iris["Species"],columns="count")
iris_outcome
iris_setosa=iris.loc[iris["Species"]=="Iris-setosa"]
iris_virginica=iris.loc[iris["Species"]=="Iris-virginica"]
iris_versicolor=iris.loc[iris["Species"]=="Iris-versicolor"]

#distribution of plot using pair plot

sns.pairplot(iris,hue="Species",size=3)
#we can visualise this correlation using a heatmap.

fig = plt.figure(figsize = (15,9))


sns.heatmap(iris.corr(), cmap='Blues', annot = True);
Introduction to Machine Learning (3171114)

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
I SepalLengthC SepalWidthC PetalLengthC PetalWidthC
Species
d m m m m

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

5 6 5.4 3.9 1.7 0.4 Iris-setosa

6 7 4.6 3.4 1.4 0.3 Iris-setosa

7 8 5.0 3.4 1.5 0.2 Iris-setosa

8 9 4.4 2.9 1.4 0.2 Iris-setosa

1
9 4.9 3.1 1.5 0.1 Iris-setosa
0
Introduction to Machine Learning (3171114)
Introduction to Machine Learning (3171114)

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. Explain K means Clustering Algorithm.


2. Why do you prefer Euclidean distance over Manhattan distance in the K means
algorithm.
3. List out advantage and disadvantage of K means clustering.
4. Difference between K means clustering and KNN algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 13
Naive Bayes Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement Naive Bayes algorithm.
2. Find accuracy.
3. To visualize correlation using a heat map.

Code:

#Importing the libraries


import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# load the iris dataset


from sklearn.datasets import load_iris
iris = load_iris()

# store the feature matrix (X) and response vector (y)


x = iris.data
y = iris.target

# splitting X and y into training and testing sets


from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_si
ze=0.4, random_state=1)

# training the model on training set


from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)

# making predictions on the testing set


y_pred = classifier.predict(x_test)

# comparing actual response values (y_test) with predicted respon


se values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accur
acy_score(y_test, y_pred)*100)

# Making the Confusion Matrix


from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test, y_pred)
Introduction to Machine Learning (3171114)

import seaborn as sns


sns.heatmap(cm, annot=True)
plt.show()

Output:

Gaussian Naive Bayes model accuracy(in %): 95.0

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is Naive Bayes?Why it is called “Naive “ Bayes algorithm.?


2. List out the important characteristics of naive Bayes.
3. What are the main types of Naive Bayes Classifiers?
4. List out advantages, disadvantages and limitation of Naive bayes algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 14

Activation Function

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.

Objectives:
1. To implement activation function.
2. Visualize the working of activation function
Code:

import matplotlib.pyplot as plt


import numpy as np
inp = np.linspace(-10, 10, 40)
out_sigm = 1/(1 + np.exp(-inp))
plt.subplot(1,3,1)
plt.plot(inp, out_sigm)
plt.title("Sigmoid")
plt.grid()
out_tanh = np.tanh(inp)
plt.subplot(1,3,2)
plt.plot(inp, out_tanh)
plt.title("TanH")
plt.grid()
out_relu = [max(0, i) for i in inp]
plt.subplot(1,3,3)
plt.plot(inp, out_relu)
plt.title("ReLU")
plt.grid()
plt.show()

Output:
Introduction to Machine Learning (3171114)

Conclusion(Sufficient space to be provided)

Questions:(Sufficient space to be provided)

1. What is the role of the activation functions in Neural Networks?


2. List down the names of some popular activation function in neural networks.
3. What is the difference between forward propagation and backward propagation in
neural networks?
4. Why is ReLU the most commonly used activation function.
Introduction to Machine Learning (3171114)

Experiment No: 15

Convolution Neural Network(CNN)

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.

Objectives:
1. To implement Convolution Neural Network.
2. Find out the total parameters of Convolution Neural Network.

Code:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 5, 3) # 3x3 kernel
self.conv2 = nn.Conv2d(5, 5, 3) # 3x3 kernel
self.fc1 = nn.Linear(5*3*3, 10)

def forward(self, x):

x=self.conv1(x)
x=self.conv2(x)
x=self.fc1(x)
return x

net = Net()
print("Model is ",net)
model_params = sum(p.numel() for p in net.parameters())
print("\n Model parameter is ",model_params)

Output:

Model is Net(
(conv1): Conv2d(1, 5, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(5, 5, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=45, out_features=10, bias=True)
)
Model parameter is 740
Introduction to Machine Learning (3171114)

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What do you mean by Convolutional Neural Network ?


2. List out different layer for in CNN.
3. Briefly explain the two major steps of CNN.(1.Feature learning and 2.Classification)
4. Explain the significance of “Parameter sharing” and “sparsity of connection” in
CNN.
Introduction to Machine Learning (3171114)

You might also like