0% found this document useful (0 votes)

54 views54 pages

ML Manual - 2023-24

Uploaded by

Dev Sejvani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views54 pages

ML Manual - 2023-24

Uploaded by

Dev Sejvani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

A Laboratory Manual for

Introduction to Machine Learning

(3171114)

B.E. Semester 7
(Electronics and Communication)

Vishwakarma Government Engineering College, Chandkheda

Directorate of Technical Education, Gandhinagar,

Gujarat

Year : 2023-24
Vishwakarma Government Engineering College, Chandkheda

Certificate

This is to certify that Mr./Ms. ___________________________________

Enrollment No. _______________ of B.E. Semester _____ Electronics &
Communication Engineering of this Institute (GTU Code: _____ ) has
satisfactorily completed the Practical / Tutorial work for the subject of
Introduction to Machine Learning (3171114) for the academic year 2023-24.

Place: __________
Date: __________

Name and Sign of Faculty member Head of the Department

Introduction to Machine Learning (3171114)

Preface

Main motto of any laboratory/practical/field work is for enhancing required skills as well as
creating ability amongst students to solve real time problem by developing relevant
competencies in psychomotor domain. By keeping in view, GTU has designed competency
focused outcome-based curriculum for engineering degree programs where sufficient weightage
is given to practical work. It shows importance of enhancement of skills amongst the students
and it pays attention to utilize every second of time allotted for practical amongst students,
instructors and faculty members to achieve relevant outcomes by performing the experiments
rather than having merely study type experiments. It is must for effective implementation of
competency focused outcome-based curriculum that every practical is keenly designed to serve
as a tool to develop and enhance relevant competency required by the various industry among
every student. These psychomotor skills are very difficult to develop through traditional chalk
and board content delivery method in the classroom. Accordingly, this lab manual is designed
to focus on the industry defined relevant outcomes, rather than old practice of conducting
practical to prove concept and theory.

By using this lab manual students can go through the relevant theory and procedure in advance
before the actual performance which creates an interest and students can have basic idea prior to
performance. This in turn enhances pre-determined outcomes amongst students. Each
experiment in this manual begins with competency, course outcomes as well as practical
outcomes (objectives). The students will also achieve safety and necessary precautions to be
taken while performing practical.

This manual also provides guidelines to faculty members to facilitate student centric lab
activities through each experiment by arranging and managing necessary resources in order that
the students follow the procedures with required safety and necessary precautions to achieve the
outcomes. It also gives an idea that how students will be assessed by providing rubrics.

This lab manual is focuses on the development of Computer Programs for machine learning that
can change when exposed to new data. In this manual we’ll see basics of Machine Learning, and
implementation of a simple machine-learning algorithm using python.
Machine learning is a method of teaching computers to learn from data, without being explicitly
programmed. Python is a popular programming language for machine learning because it has a
large number of powerful libraries and frameworks that make it easy to implement machine
learning algorithms.
To get started with machine learning using Python, you will need to have a basic understanding of
Python programming and some knowledge of mathematical concepts such as probability, statistics,
and linear algebra.

Utmost care has been taken while preparing this lab manual however always there is chances of
improvement. Therefore, we welcome constructive suggestions for improvement and removal
of errors if any.
Introduction to Machine Learning (3171114)

Practical – Course Outcome matrix

Course Outcomes (COs):

CO1: Understand basic concepts of machine learning as well as challenges involved
CO2:Learn and implement various basic machine learning algorithms.
CO3:Study dimensionality reduction concept and its role in machine learning techniques.
CO4:Realize concepts of advanced machine learning algorithms.
CO5:Comprehend basic concepts of Neural network and its use in machine learning.

Sr. CO CO CO CO CO
Aim / Objective(s) of Experiment
No. 1 2 3 4 5

1. Introduction to Libraries used for Machine Learning. √

Create dataset of 12 Samples each having 10 features.

2. √ √
Split it into training and testing dataset.

3. Exploratory Data Analysis on Iris Dataset √

Write a program to implement simple linear regression

4. (A) from manual dataset. √
(B) from scikit-learn python library
Write a program to implement Multiple linear
5. √
regression.
Write a program to implement Decision tree on IRIS
6. √
datasets.

7. Write a program to implement logistic regression. √

Write a program to implement SVM classifier using

8. √
linear kernel using IRIS datasets.
Write a program to implement SVM classifier using
9. √
rbf kernel using IRIS datasets.
Write a program of dimensionality reduction using
10. √
PCA.
Write a program to implement Random Forest
11. √
algorithm feature extraction using IRIS Dataset
Write a program to implement K-Means Clustering
12. √
using IRIS Dataset.

13. Write a program to implement Navie Bias algorithm √

Write a program to implement and visualize the

14. √
working of activation function.
To implement Convolution Neural Network and find
15 out the total parameters of Convolution Neural √
Network.
Introduction to Machine Learning (3171114)

Industry Relevant Skills

The following industry relevant competency are expected to be developed in the student by
undertaking the practical work of this laboratory.
1.
2.

Guidelines for Faculty members

1. Teacher should provide the guideline with demonstration of practical to the students
with all features.
2. Teacher shall explain basic concepts/theory related to the experiment to the students
before starting of each practical
3. Involve all the students in performance of each experiment.
4. Teacher is expected to share the skills and competencies to be developed in the
students and ensure that the respective skills and competencies are developed in the
students after the completion of the experimentation.
5. Teachers should give opportunity to students for hands-on experience after the
demonstration.
6. Teacher may provide additional knowledge and skills to the students even though not
covered in the manual but are expected from the students by concerned industry.
7. Give practical assignment and assess the performance of students based on task
assigned to check whether it is as per the instructions or not.
8. Teacher is expected to refer complete curriculum of the course and follow the
guidelines for implementation.

Instructions for Students

1. Students are expected to carefully listen to all the theory classes delivered by the faculty
members and understand the COs, content of the course, teaching and examination scheme,
skill set to be developed etc.
2. Students shall organize the work in the group and make record of all observations.
3. Student shall attempt to develop related hand-on skills and build confidence.
4. Student shall develop the habits of evolving more ideas, innovations, skills etc. apart from
those included in scope of manual.
5. Student shall refer technical magazines and data books.
6. Student should develop a habit of submitting the experimentation work as per the schedule
and s/he should be well prepared for the same.

Common Safety Instructions

1. Students are expected to proper shutdown computer after performing experiment.
2.
Introduction to Machine Learning (3171114)

Index
(Progressive Assessment Sheet)

Sr. Objective(s) of Experiment Pag Date Date Assess Sign. of Rema

No. e of of ment Teacher rks
No. perfor submis Marks with
mance sion date
1 Introduction to Libraries used for Machine
Learning.
2 Create dataset of 12 Samples each having 10
features. Split it into training and testing
dataset.
3
Exploratory Data Analysis on Iris Dataset
4 Write a program to implement simple linear
regression: (A) from manual dataset.
(B) from scikit-learn python library
5 Write a program to implement Multiple linear
regression.
6 Write a program to implement Decision tree
on IRIS datasets.
7 Write a program to implement logistic
regression.
8 Write a program to implement SVM classifier
using linear kernel using IRIS datasets.
9 Write a program to implement SVM classifier
using rbf kernel using IRIS datasets.
10 Write a program of dimensionality reduction
using PCA.
11 Write a program to implement Random Forest
algorithm feature extraction using IRIS
Dataset
12 Write a program to implement K-Means
Clustering using IRIS Dataset.
13 Write a program to implement Navie Bias
algorithm
14 Write a program to implement and visualize
the working of activation function.
15 To implement Convolution Neural Network
and find out the total parameters of
Convolution Neural Network.
Total
Introduction to Machine Learning (3171114)

Experiment No: 1

Introduction to library used for Machine Learning.

Date:

Competency and Practical Skills: Basic knowledge of Machine Learning

Relevant CO: Understand basic concepts of machine learning as well as challenges

involved.

Objectives: Introduction to Libraries used for Machine Learning.

TensorFlow Library:

TensorFlow is an open-source end-to-end platform for creating Machine Learning

applications. It was created and is maintained by Google. It is a symbolic math library
that uses dataflow and differentiable programming to perform various tasks focused
on training and inference of deep neural networks. It allows developers to create
machine learning applications using various tools, libraries, and community resources.

Tensor Flow is at present the most popular software library. There are several real-
world applications of deep learning that makes Tensor Flow popular. Being an Open-
Source library for deep learning and machine learning, Tensor Flow finds a role to
play in text-based applications, image recognition, voice search, and many more.
Deep Face, Facebook’s image recognition system, uses Tensor Flow for image
recognition. Every Google app that you use has made good use of Tensor Flow to
make your experience better.

Tensor flow’s name is directly derived from its core component: A tensor is a vector
or matrix of n-dimensions that represents all types of Tensor data. A tensor is a
vector/matrix of n-dimensions representing types of data. Values in a tensor are of
identical data types with a known shape, and this shape is the dimensionality of the
matrix. A vector is a one-dimensional tensor; a matrix is a two-dimensional tensor.

A tensor is an object with three properties:

(1) a unique label (name),
(2) a dimension (shape), and
(3) A data type (dtype).

Here is the first example of tensor Flow. It shows how you can define constants and
perform computation with those constants using the session.
Introduction to Machine Learning (3171114)

# Import `tensorflow`
import tensorflow as tf

# Initialize two constants

x1 = tf.constant([1,2,3,4])
x2 = tf.constant([5,6,7,8])

# Multiply
result = tf.multiply(x1, x2)

# Print the result

print(result)

Output: tf.Tensor([5 12 21 32], shape=(4,), dtype=int32)

Scikit-learn Library:

Scikit-learn is an open source machine learning library that supports supervised and
unsupervised learning. It also provides various tools for model fitting, data
preprocessing, model selection, model evaluation, and many other utilities. Scikit-
learn provides dozens of built-in machine learning algorithms and models,
called estimators. Each estimator can be fitted to some data using its fit method.

The library is built upon the SciPy (Scientific Python) that must be installed before
you can use scikit-learn. This stack that includes:

 NumPy: Base n-dimensional array package.

 SciPy: Fundamental library for scientific computing.
 Matplotlib: Comprehensive 2D/3D plotting.
 IPython: Enhanced interactive console.
 Sympy: Symbolic mathematics.
 Pandas: Data structures and analysis.

Extensions or modules for SciPy care conventionally named SciKits. As such, the
module provides learning algorithms and is named scikit-learn.

Scikit-learn provides below group of model to the user.

 Clustering − This model is used for grouping unlabeled data.

 Cross Validation − It is used to check the accuracy of supervised models on
unseen data. (splitting the dataset in testdataset and trainingdataset)
Introduction to Machine Learning (3171114)

 Dimensionality Reduction − It is used for reducing the number of attributes in

data which can be further used for summarisation, visualisation and feature
selection.
 Ensemble methods − As name suggest, it is used for combining the predictions of
multiple supervised models.
 Feature extraction − It is used to extract the features from data to define the
attributes in image and text data.
 Feature selection − It is used to identify useful attributes to create supervised
models.
 Open Source − It is open source library and also commercially usable under BSD
(Berkeley Software Distribution) license.

Pytorch library:

PyTorch is a Python-based scientific computing package serving two broad purposes:

 A replacement for NumPy to use the power of GPUs and other accelerators.
 An automatic differentiation library that is useful to implement neural networks.

PyTorch is closely related to the lua-based Torch framework which is actively used in
Facebook.

Features of PyTorch : The major features of PyTorch are mentioned below

Easy Interface: PyTorch offers easy to use API; hence it is considered to be very
simple to operate and runs on Python. The code execution in this framework is quite
easy.
Python usage: This library is considered to be Pythonic which smoothly integrates
with the Python data science stack. Thus, it can leverage all the services and
functionalities offered by the Python environment.
Computational graphs: PyTorch provides an excellent platform which offers
dynamic computational graphs. Thus a user can change them during runtime. This is
highly useful when a developer has no idea of how much memory is required for
creating a neural network model.
Pytorch Tensors: Tensors are a specialized data structure that are very similar to
arrays and matrices. Tensors are similar to NumPy’s ndarrays, except that tensors can
run on GPUs or other specialized hardware to accelerate computing.

Interoperability With Numpy:

Numpy is a popular open source library used for mathematical and scientific
computing in Python. Instead of reinventing the wheel, Pytorch interpolates really
well with Numpy to leverage its existing ecosystem of tools and libraries.
Introduction to Machine Learning (3171114)

Here’s how we create array using numpy:

import numpy as np
x = np. array ( [ [1, 2], [3, 4.]])

>> x
array ( [ [1 . , 2.],
[3., 4. ]])

We can convert a Numpy array in tensor using torch.from_numpy

# Convert the numpy array to a torch tensor.

y = torch. tensor (x)
>> y
tensor ( [ [1., 2. ],
[3., 4. ]], dtype=torch. float64)

Conclusion: (Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is Tensor Flow?. List the properties of Tensor flow.

2. What are the function of Scikit library.
3. Define (1) NumPy (2) SciPy (3) Matplotlib (3) IPython (4) Sympy.(5) Pandas
4. What is PyTorch?
5. What are the function of PyTorch.

Rubrics:

terminology Student can Student Student can Student can Remarks

not can understand understand all
understand understand all library library and
all library all library tools and implement all
tools and not tools but able to basic function
able to not able to implement
implementing implement some basic
basic basic function
function function
justification Poor Average Good Excellent
(0-2 marks) (3-5 marks) (5-7 marks) (8-10 m1rks)
Introduction to Machine Learning (3171114)

Experiment No: 2

Create train and test dataset

Date:

Competency and Practical Skills: Python, Scikit-learn

Relevant CO: Comprehend basic concepts of neural network and its use in machine
learning.

Objectives:

1) To learn train and test data generation

2) To create dataset of 12 samples each having 10 features.
3) Split it into training and testing dataset using Scikit-learn.

Assumption: Assume 12 sample is [0,1,1,0,1,0,0,1,1,0,1,0].

Code:

import numpy as np
from sklearn.model_selection import train_test_split

# 12 samples with 10 features

inp=np.arange(1,121).reshape(12,10)

# output classification
out=np.array([0,1,1,0,1,0,0,1,1,0,1,0])

#sklearn.model_selection.train_test_split(*arrays,**options)>list

x_train,x_test,y_train,y_test=train_test_split(inp,out,test_size=
3,random_state=4)

print("\n Train Samples are:\n ",x_train)

print("\nOutput classification of train sample is: \n",y_train)
print("\n Test Samples are: \n",x_test)
print("\n Output classification of Test sample is: \n",y_test)
Introduction to Machine Learning (3171114)

Output:

Train Samples are:

[[ 91 92 93 94 95 96 97 98 99 100]
[ 81 82 83 84 85 86 87 88 89 90]
[ 21 22 23 24 25 26 27 28 29 30]
[111 112 113 114 115 116 117 118 119 120]
[ 1 2 3 4 5 6 7 8 9 10]
[ 11 12 13 14 15 16 17 18 19 20]
[ 51 52 53 54 55 56 57 58 59 60]
[ 71 72 73 74 75 76 77 78 79 80]
[101 102 103 104 105 106 107 108 109 110]]

Output classification of train sample is:

[0 1 1 0 0 1 0 1 1]

Test Samples are:

[[31 32 33 34 35 36 37 38 39 40]
[41 42 43 44 45 46 47 48 49 50]
[61 62 63 64 65 66 67 68 69 70]]

Output classification of Test sample is:

[0 1 0]

Conclusion: (Sufficient space to be provided)

Question :(Sufficient space to be provided)

1. Define feature, training data and testing data.
2. Difference between training data and validation data.
3. How do you handle missing or corrupted data in dataset?
4. Write a function to split training and testing dataset using scikit learn.

Rubrics:

Rubrics 1 2 3 4 5
Introduction to Machine Learning (3171114)

Experiment No: 3

Analysis of iris dataset

Date:

Competency and Practical Skills: Python, Scikit-learn

Relevant CO: Learn and implement various basic machine learning algorithms.

Objectives: Exploratory Data Analysis on Iris Dataset.

Exploratory Data Analysis (EDA) is a technique to analyze data using some visual
Techniques. With this technique, we can get detailed information about the statistical
summary of the data. We will also be able to deal with the duplicates values, outliers,
and also see some trends or patterns present in the dataset.

Iris Dataset
Iris Dataset is considered as the Hello World for data science. It contains five columns
namely – Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type.

Note: This dataset can be downloaded from:

URL: https://datahub.io/machine-learning/iris

You can download the Iris.csv file from the above link. Now we will use the Pandas
library to load this CSV file, and we will convert it into
the dataframe. read_csv() method is used to read CSV files.
Example:

import pandas as pd

# Reading the CSV file

df = pd.read_csv("Iris.csv")

# Printing top 5 rows

df.head()

Getting Information about the Dataset

We will use the shape parameter to get the shape of the dataset. We can see that the
data frame contains 6 columns and 150 rows. Now, let’s also the columns and their
data types. For this, we will use the info() method.
Introduction to Machine Learning (3171114)

Example:

df.shape (150, 6) df.info()

The describe() function applies basic statistical computations on the dataset like
extreme values, count of data points standard deviation, etc. Any missing value or
NaN value is automatically skipped. describe() function gives a good picture of the
distribution of data.
Example:

df.describe()

Checking Missing Values

We will check if our data contains any missing values or not. Missing values can
occur when no information is provided for one or more items or for a whole unit. We
will use the isnull() method.
Example:

df.isnull().sum()
Introduction to Machine Learning (3171114)

Checking Duplicates
Let’s see if our dataset contains any duplicates or not.
Pandas drop_duplicates() method helps in removing duplicates from the data frame.

Example:

data=df.drop_duplicates(subset
="Species",)
data

We can see that there are only three unique species. Let’s see if the dataset is balanced
or not i.e. all the species contain equal amounts of rows or not. We will use
the Series.value_counts() function. This function returns a Series containing counts of
unique values.

Example:

df.value_counts("Species")

Data Visualization

Visualizing the target column

We will use Matplotlib and Seaborn library for the data visualization.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.countplot(x='Species', data=df, )
plt.show()
Introduction to Machine Learning (3171114)

Relation between variables

We will see the relationship between the sepal length and sepal width and also
between petal length and petal width.

Example 1: Comparing Sepal Length and Sepal Width

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.scatterplot(x='SepalLengthCm',
y='SepalWidthCm',
hue='Species',
data=df, )
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1),
loc=2)

plt.show()

From the above plot, we can infer that –

 Species Setosa has smaller sepal lengths but larger sepal widths.
 Versicolor Species lies in the middle of the other two species in terms of sepal
length and width
 Species Virginica has larger sepal lengths but smaller sepal widths.

Example 2: Comparing Petal Length and Petal Width

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.scatterplot(x='PetalLengthCm',y='Pe
talWidthCm',hue='Species', data=df, )
# Placing Legend outside the Figure
plt.legend(bbox_to_anchor=(1, 1),
loc=2)
plt.show()

From the above plot, we can infer that –

 Species Setosa has smaller petal lengths and widths.
 Versicolor Species lies in the middle of the other two species in terms of petal
length and width
 Species Virginica has the largest of petal lengths and widths.
Introduction to Machine Learning (3171114)

Let’s plot all the column’s relationships using a pairplot. It can be used for
multivariate analysis.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot
as plt

sns.pairplot(df.drop(['Id'
], axis = 1),
hue='Species'
, height=2)

Histograms

Histograms allow seeing the distribution of data for various columns. It can be used
for uni as well as bi-variate analysis.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

fig, axes = plt.subplots(2, 2,

figsize=(10,10))

axes[0,0].set_title("Sepal Length")
axes[0,0].hist(df['SepalLengthCm'],
bins=7)
axes[0,1].set_title("Sepal Width")
axes[0,1].hist(df['SepalWidthCm'],
bins=5);
axes[1,0].set_title("Petal Length")
axes[1,0].hist(df['PetalLengthCm'],
bins=6);
axes[1,1].set_title("Petal Width")
axes[1,1].hist(df['PetalWidthCm'],
bins=6);
Introduction to Machine Learning (3171114)

From the above plot, we can see that –

 The highest frequency of the sepal length is between 30 and 35 which is
between 5.5 and 6
 The highest frequency of the sepal Width is around 70 which is between 3.0
and 3.5
 The highest frequency of the petal length is around 50 which is between 1 and
2
 The highest frequency of the petal width is between 40 and 50 which is
between 0.0 and 0.5

Histograms with Distplot Plot

Distplot is used basically for the univariant set of observations and visualizes it
through a histogram i.e. only one observation and hence we choose one particular
column of the dataset.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"SepalLengthCm").add_legend()

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"SepalWidthCm").add_legend()

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"PetalLengthCm").add_legend()

plot = sns.FacetGrid(df,
hue="Species")
plot.map(sns.distplot,
"PetalWidthCm").add_legend()

plt.show()
Introduction to Machine Learning (3171114)

From the above plots, we can see that –

 In the case of Sepal Length, there is a huge amount of overlapping.
 In the case of Sepal Width also, there is a huge amount of overlapping.
 In the case of Petal Length, there is a very little amount of overlapping.
 In the case of Petal Width also, there is a very little amount of overlapping.
So we can use Petal Length and Petal Width as the classification feature.

Handling Correlation

Pandas dataframe.corr() is used to find the pairwise correlation of all columns in the
dataframe. Any NA values are automatically excluded. For any non-numeric data type
columns in the dataframe it is ignored.
Example:

data.corr(method='pearso
n')

Heatmaps

The heatmap is a data visualization technique that is used to analyze the dataset as
colors in two dimensions. Basically, it shows a correlation between all numerical
variables in the dataset. In simpler terms, we can plot the above-found correlation
using the heatmaps.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt

sns.heatmap(df.corr(method='pearson').drop(
['Id'], axis=1).drop(['Id'], axis=0),
annot = True);

plt.show()
Introduction to Machine Learning (3171114)

From the above graph, we can see that –

 Petal width and petal length have high correlations.
 Petal length and sepal width have good correlations.
 Petal Width and Sepal length have good correlations.

Box Plots

We can use boxplots to see how the categorical value os distributed with other
numerical values.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot
as plt

def graph(y):
sns.boxplot(x="Speci
es", y=y, data=df)

plt.figure(figsize=(10,1
0))

# Adding the subplot at

the specified
# grid position
plt.subplot(221)
graph('SepalLengthCm')

plt.subplot(222)
graph('SepalWidthCm')

plt.subplot(223)
graph('PetalLengthCm')

plt.subplot(224)
graph('PetalWidthCm')

plt.show()

From the above graph, we can see that –

 Species Setosa has the smallest features and less distributed with some outliers.
 Species Versicolor has the average features.
 Species Virginica has the highest features
Introduction to Machine Learning (3171114)

Handling Outliers
An Outlier is a data-item/object that deviates significantly from the rest of the (so-
called normal)objects. They can be caused by measurement or execution errors. The
analysis for outlier detection is referred to as outlier mining.

There are many ways to detect the outliers, and the removal process is the data frame
same as removing a data item from the panda’s dataframe.

Let’s consider the iris dataset and let’s plot the boxplot for the SepalWidthCm column.

Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as
plt

# Load the dataset

df = pd.read_csv('Iris.csv')

sns.boxplot(x='SepalWidthCm',
data=df)

In the above graph, the values above 4 and below 2 are acting as outliers.

Removing Outliers

For removing the outlier, one must follow the same process of removing an entry
from the dataset using its exact position in the dataset because in all the above
methods of detecting the outliers end result is the list of all those data items that
satisfy the outlier definition according to the method used.

Example:

We will detect the outliers using IQR and then we will remove them. We will also
draw the boxplot to see if the outliers are removed or not.
Introduction to Machine Learning (3171114)

# Importing
import sklearn
from sklearn.datasets import load_boston
import pandas as pd
import seaborn as sns

# Load the dataset

df = pd.read_csv('Iris.csv')
# IQR
Q1 = np.percentile(df['SepalWidthCm'], 25,
interpolation = 'midpoint')

Q3 = np.percentile(df['SepalWidthCm'], 75,
interpolation = 'midpoint')
IQR = Q3 - Q1
print("Old Shape: ", df.shape)
# Upper bound
upper = np.where(df['SepalWidthCm'] >= (Q3+1.5*IQR))
# Lower bound
lower = np.where(df['SepalWidthCm'] <= (Q1-1.5*IQR))
# Removing the Outliers
df.drop(upper[0], inplace = True)
df.drop(lower[0], inplace = True)
print("New Shape: ", df.shape)

sns.boxplot(x='SepalWidthCm', data=df)

Conclusion:(Sufficient space to be provided)

Question : (Sufficient space to be provided)

1. What is EDA?
2. Write a code for comparing Sepal length and sepal width.
3. Expalain Histogram.
4. Define the following function:(I) Heatmaps (ii)Box plot (iii)describe (iv)
checking duplicates.
5. Explain library used for data visualization
Introduction to Machine Learning (3171114)

Experiment No: 4 (A)

Linear Regression Algorithm

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objectives:
1. Implement simple linear regression algorithm
2. Plot Regression line.
3. Find coefficient of linear regression

Code:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):

# number of observations/points
n = np.size(x)

# mean of x and y vector

m_x = np.mean(x)
m_y = np.mean(y)

# calculating cross-deviation and deviation about x

SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression coefficients

b_1 = SS_xy / SS_xx
b_0 = m_y - b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):

# plotting the actual points as scatter plot
plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response vector

y_pred = b[0] + b[1]*x

# plotting the regression line

plt.plot(x, y_pred, color = "g")
# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot

plt.show()
Introduction to Machine Learning (3171114)

def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])

# estimating coefficients
b = estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line

plot_regression_line(x, y, b)

if __name__ == "__main__":
main()

Output:
Estimated coefficients:
b_0 = 1.2363636363636363
b_1 = 1.1696969696969697

Conclusion:(Sufficient space to be provided)

Introduction to Machine Learning (3171114)

Experiment No: 4 (B)

Linear Regression Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement simple linear regression algorithm.
2. Display scatter plot of linear regression.
3. Find coefficient, mean squared error and variance score of linear regression

Code:

#First, we will start with importing necessary packages as follow

s −
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score

#Next, we will load the diabetes dataset and create its object −

diabetes = datasets.load_diabetes()

#As we are implementing SLR, we will be using only one feature as

follows −

X = diabetes.data[:, np.newaxis, 2]

#Next, we need to split the data into training and testing sets a
s follows −

X_train = X[:-30]
X_test = X[-30:]

#Next, we need to split the target into training and testing sets
as follows −

y_train = diabetes.target[:-30]
y_test = diabetes.target[-30:]

#Now, to train the model we need to create linear regression obje

ct as follows −

regr = linear_model.LinearRegression()

#Next, train the model using the training sets as follows −

regr.fit(X_train, y_train)
Introduction to Machine Learning (3171114)

#Next, make predictions using the testing set as follows −

y_pred = regr.predict(X_test)

#Next, we will be printing some coefficient like MSE, Variance sc

ore etc. as follows −

print('Coefficients: \n', regr.coef_)

print("Mean squared error: %.2f" % mean_squared_error(y_test, y_p
red))
print('Variance score: %.2f' % r2_score(y_test, y_pred))

#Now, plot the outputs as follows −

plt.scatter(X_test, y_test, color='blue')
plt.plot(X_test, y_pred, color='red', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

Output:
Coefficients:
[941.43097333]
Mean squared error: 3035.06
Variance score: 0.41

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is Linear regression?

2. What are the common types of error in linear regression.
3. Define the following term (1) mean square error (2) variance
Introduction to Machine Learning (3171114)

Experiment No: 5

Multiple Linear Regression Algorithm

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement multiple linear regression algorithm.
2. Display scatter plot of multiple linear regression.
3. Find difference between actual and predicted value.

Code:

# import libraries

import pandas as pd
import numpy as np

# Import dataset

data_df=pd.read_csv('/content/drive/MyDrive/test.csv')
# if you are using python rather than Google co-lab you can
upload csv file from your computer)

data_df.head()

# define x and y

x=data_df.drop(['PE'],axis=1).values
y=data_df['PE'].values
print(x)
print(y)

#split dataset in training set and test set

from sklearn.model_selection import train_test_split

x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=0.3
,random_state=0)

# traini the model on training set

from sklearn.linear_model import LinearRegression

ml=LinearRegression()
ml.fit(x_train,y_train)

#predict the test set results

y_pred=ml.predict(x_test)
print(y_pred)
ml.predict([[14.96,41.76,1024.07,73.17]])
Introduction to Machine Learning (3171114)

#evaluate the model

from sklearn.metrics import r2_score

r2_score(y_test,y_pred)

# plot the result

import matplotlib.pyplot as plt

plt.figure(figsize=(15,10))
plt.scatter(y_test,y_pred)
plt.xlabel('Actual')
plt.ylabel('predicted')
# predicted values
pred_y_df=pd.DataFrame({'Actual value':y_test,'predicted value':y
_pred,'Difference':y_test-y_pred})

pred_y_df[0:20]

Output:

[[14.96 41.76 1024.07 73.17]

[ 25.18 62.96 1020.04 59.08]
[ 5.11 39.4 1012.16 92.14]
...
[ 31.32 74.33 1012.92 36.48]
[ 24.48 69.45 1013.86 62.39]
[ 21.6 62.52 1017.23 67.87]]
[463.26 444.37 488.56 ... 429.57 435.74 453.28]
[431.40245096 458.61474119 462.81967423 ... 432.47380825 436.16417243
439.00714594]
Actual value predicted value Difference

0 431.23 431.402451 -0.172451

1 460.01 458.614741 1.395259

2 461.14 462.819674 -1.679674

3 445.90 448.601237 -2.701237

4 451.29 457.879479 -6.589479

5 432.68 429.676856 3.003144

6 477.50 473.017115 4.482885

7 459.68 456.532373 3.147627

8 477.50 474.342524 3.157476

9 444.99 446.364396 -1.374396

Introduction to Machine Learning (3171114)

Actual value predicted value Difference

10 444.37 441.946411 2.423589

11 437.04 441.452599 -4.412599

12 442.34 444.746375 -2.406375

13 440.74 440.874598 -0.134598

14 436.55 438.374490 -1.824490

15 460.24 454.370315 5.869685

16 448.66 444.904201 3.755799

17 432.94 437.370808 -4.430808

18 452.82 451.306760 1.513240

19 432.20 427.453009 4.746991

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. Explain simple and multiple linear regression.

2. What is the difference between linear and non-linear regression.
3. Define actual and predicted value.
Introduction to Machine Learning (3171114)

Experiment No: 6
Decision Tree Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement Decision tree algorithm on IRIS datasets.
2. Plot tree.
3. Find classification report, confusion matrix and accuracy.

Code:

#Import library
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import tree

#load file
#col_names=['Id','SepalLengthCm','SepalWidthCm','PetalLengthCm','
PetalWidthCm','PetalWidthCm','Species']

iris=pd.read_csv('/content/drive/MyDrive/finalIris.csv')
iris.head()

# split dataset into features and target variable

feature_cols=['Id','SepalLengthCm','SepalWidthCm','PetalLengthCm'
,'PetalWidthCm','PetalWidthCm']
x=iris[feature_cols]
y=iris.Species
x.head()

# Next, we will divide the data into train and test split.
#The following code will split the dataset into 70% training data
and 30% of testing data −

x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.7,
random_state=0)

#Next, train the model with the help of DecisionTreeClassifier cl

ass of sklearn as follows −
clf=DecisionTreeClassifier()
clf=clf.fit(x_train,y_train)

#At last we need to make prediction.

Introduction to Machine Learning (3171114)

y_pred=clf.predict(x_test)

#find accuracy,confusion matrix, classification report

from sklearn.metrics import classification_report,confusion_matri
x,accuracy_score

result=confusion_matrix(y_test,y_pred)
print("Confusion Matrix:",)
print(result)

result1=classification_report(y_test,y_pred)
print("Classification Report:",)
print(result1)

result2=accuracy_score(y_test,y_pred)
print("Accuracy:",)
print(result2)

#plot decision tree

fn=['sepal length (cm)','sepal width (cm)','petal length (cm)','p
etal width (cm)']
cn=['setosa', 'versicolor', 'virginica']
fig,axes=plt.subplots(nrows=1,ncols=1,figsize=(4,4),dpi=300)
tree.plot_tree(clf,feature_names =fn, class_names=cn,filled True);
fig.savefig('imagename.png')

Output:

Conclusion:(Sufficient space to be provided)

Question:(Sufficient space to be provided)

1. Explain decision tree algorithm.

2. Explain classification of machine learning.
3. Define confusion matrix and its importance.
4. How can we plot decision tree using python.
Introduction to Machine Learning (3171114)

Experiment No: 7
Logistic Regression Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Learn and implement various basic machine learning algorithms.

Objective:
1. Implement Logistic regression algorithm.
2. Find classification report, confusion matrix and accuracy.

Code:

# import libraries

import matplotlib.pyplot as plt

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report

#get data

x=np.arange(10).reshape(-1,1)
y=np.array([0,1,0,0,1,1,1,1,1,1])

#create a model and train it

model=LogisticRegression(solver='liblinear',C=10.0,random_state=0)
model.fit(x,y)

#evaluate model

p_pred=model.predict_proba(x)
y_pred=model.predict(x)
score=model.score(x,y)
conf_m=confusion_matrix(y,y_pred)
report=classification_report(y,y_pred)

print('x:',x,sep='\n')
print('y:',y,sep='\n',end='\n\n')
print('intercept:',model.intercept_)
print('coef:',model.coef_,end='\n\n')
print('y_pred:',y_pred,end='\n\n')
print('score:',score,end='\n\n')
print('conf_m:', conf_m,sep='\n',end='\n\n')
print('report:,',report,sep='\n')
Introduction to Machine Learning (3171114)

Output:

x:
[[0]
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]]
y:
[0 1 0 0 1 1 1 1 1 1]

intercept: [-1.51632619]
coef: [[0.703457]]

y_pred: [0 0 0 1 1 1 1 1 1 1]

score: 0.8

conf_m:
[[2 1]
[1 6]]

report:,
precision recall f1-score support

0 0.67 0.67 0.67 3

1 0.86 0.86 0.86 7

accuracy 0.80 10
macro avg 0.76 0.76 0.76 10
weighted avg 0.80 0.80 0.80 10

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is logistic regression?

2. Compare linear and logistic regression.
3. What are the different types of logistic regression?
4. What are the advantage of logistic regression?
Introduction to Machine Learning (3171114)

Experiment No: 8
SVM Classifier using Linear Kernel
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement SVM classifier using linear kernel
2. Plot scatter plot for classification of IRIS data

Code:

# importing files
import pandas as pd
import numpy as np
from sklearn import svm,datasets
import matplotlib.pyplot as plt

#load input data

iris=datasets.load_iris()

#taking two feature

x=iris.data[:, :2]
y=iris.target

#svm boundries

x_min,x_max=x[:,0].min()-1,x[:,0].max()+1
y_min,y_max=x[:,1].min()-1,x[:,1].max()+1
h=(x_max/x_min)/100

xx, yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_

max,h))
x_plot=np.c_[xx.ravel(),yy.ravel()]

#regularization parameter

C=1.0

#SVM classifier object

svc_classifier=svm.SVC(kernel='linear',C=C).fit(x,y)
z=svc_classifier.predict(x_plot)
z=z.reshape(xx.shape)
plt.figure(figsize=(15,5))
plt.subplot(121)
plt.contour(xx,yy,z,cmap=plt.cm.tab10,alpha=0.3)
plt.scatter(x[:,0],x[:,1],c=y,cmap=plt.cm.Set1)
plt.xlabel('sepal length')
Introduction to Machine Learning (3171114)

plt.ylabel('sepal width')
plt.xlim(xx.min(),xx.max())
plt.title('support vector classfier with linear kernal')

Output:

Text(0.5, 1.0, 'support vector classfier with linear kernal')

 Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. Explain support vector Machine algorithm.

2. Explain different types of kernal functions.
3. What do know about hard margin SVM and Soft margin SVM?
4. What is hinge loss?
Introduction to Machine Learning (3171114)

Experiment No: 9

SVM classifier using RBF Kernel

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement SVM classifier using rbf kernel
2. Plot scatter plot for classification of IRIS data

Code:

# importing files
import pandas as pd
import numpy as np
from sklearn import svm,datasets
import matplotlib.pyplot as plt

#load input data

iris=datasets.load_iris()

#taking two feature

x=iris.data[:, :2]
y=iris.target

#svm boundries

x_min,x_max=x[:,0].min()-1,x[:,0].max()+1
y_min,y_max=x[:,1].min()-1,x[:,1].max()+1
h=(x_max/x_min)/100

xx, yy = np.meshgrid(np.arange(x_min,x_max,h), np.arange(y_min,y_

max,h))
x_plot=np.c_[xx.ravel(),yy.ravel()]

#regularization parameter

C=1.0

#SVM classifier object

svc_classifier=svm.SVC(kernel='rbf',gamma='auto',C=C).fit(x,y)
z=svc_classifier.predict(x_plot)
z=z.reshape(xx.shape)
plt.figure(figsize=(15,5))
plt.subplot(121)
plt.contour(xx,yy,z,cmap=plt.cm.tab10,alpha=0.3)
plt.scatter(x[:,0],x[:,1],c=y,cmap=plt.cm.Set1)
plt.xlabel('sepal length')
plt.ylabel('sepal width')
Introduction to Machine Learning (3171114)

plt.xlim(xx.min(),xx.max())
plt.title('support vector classfier with rbf kernal')

Output:

Text(0.5, 1.0, 'support vector classfier with linear kernal')

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What are some application of SVM?

2. List out some advantage of SVM?
3. What is hyperplane in SVM?
Introduction to Machine Learning (3171114)

Experiment No: 10
Principal Component Analysis Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Study dimensionality reduction concept and its role in machine
learning techniques.

Objective:
1. Implement dimensionality reduction using PCA.
2. Learn standard scalar and plot scaled output.
Code:

# Import necessary libraries

from sklearn import datasets # to retrieve the iris Dataset
import pandas as pd # to load the dataframe
from sklearn.preprocessing import StandardScaler # to standardize
the features
from sklearn.decomposition import PCA # to apply PCA
import seaborn as sns # to plot the heat maps

#Load the Dataset

iris = datasets.load_iris()
#convert the dataset into a pandas data frame
df = pd.DataFrame(iris['data'], columns = iris['feature_names'])
#display the head (first 5 rows) of the dataset
df.head()

#Standardize the features

#Create an object of StandardScaler which is present in sklearn.p
reproces#sing
scalar = StandardScaler()
scaled_data = pd.DataFrame(scalar.fit_transform(df)) #scaling t
he data
scaled_data

#Check the Co-relation between features without PCA

sns.heatmap(scaled_data.corr())

#Applying PCA
#Taking no. of Principal Components as 3
pca = PCA(n_components = 3)
pca.fit(scaled_data)
data_pca = pca.transform(scaled_data)
data_pca = pd.DataFrame(data_pca,columns=['PC1','PC2','PC3'])
data_pca.head()

#Checking Co-relation between features after PCA

sns.heatmap(data_pca.corr())
Introduction to Machine Learning (3171114)

#Checking Co-relation between features after PCA

sns.heatmap(data_pca.corr())

Output:

sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)

0 5.1 3.5 1.4 0.2

1 4.9 3.0 1.4 0.2

2 4.7 3.2 1.3 0.2

3 4.6 3.1 1.5 0.2

4 5.0 3.6 1.4 0.2

0 1 2 3

0 -0.900681 1.019004 -1.340227 -1.315444

1 -1.143017 -0.131979 -1.340227 -1.315444

2 -1.385353 0.328414 -1.397064 -1.315444

3 -1.506521 0.098217 -1.283389 -1.315444

4 -1.021849 1.249201 -1.340227 -1.315444

... ... ... ... ...

145 1.038005 -0.131979 0.819596 1.448832

146 0.553333 -1.282963 0.705921 0.922303

147 0.795669 -0.131979 0.819596 1.053935

148 0.432165 0.788808 0.933271 1.448832

149 0.068662 -0.131979 0.762758 0.790671

150 rows × 4 columns

Introduction to Machine Learning (3171114)

PC1PC2PC30-2.2647030.480027-0.1277061-2.080961-0.674134-0.2346092-2.364229-
0.3419080.0442013-2.299384-0.5973950.0912904-2.3898420.6468350.015738

Conclusion:(Sufficient space to be provided)

Question :(Sufficient space to be provided)

1. What is Dimensionality Reduction?

2. What is PCA? What does PCA do?
3. What are the advantages of Dimensionality Reduction.
4. List down the steps of a PCA algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 11

Random Forest Algorithm

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Study dimensionality reduction concept and its role in machine
learning techniques.

Objective:
1. Implement random forest algorithm using python.
2. Visualize important feature on bar plot.
3. Find accuracy of model.

Code:

#Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# load the iris dataset

from sklearn.datasets import load_iris
iris=load_iris()

# store the feature matrix (X) and response vector (y)

x=iris.data
y=iris.target

# Creating a DataFrame of given iris dataset.

import pandas as pd
data=pd.DataFrame({
'sepal length':iris.data[:,0],
'sepal width':iris.data[:,1],
'petal length':iris.data[:,2],
'petal width':iris.data[:,3],
'species':iris.target
})
data.head()
X=data[['sepal length', 'sepal width', 'petal length', 'petal wid
th']] # Features
y=data['species'] # Labels

# Split dataset into training set and test set

X_train, X_test, y_train, y_test = train_test_split(X, y, test_si
ze=0.3) # 70% training and 30% test

#Import Random Forest Model

from sklearn.ensemble import RandomForestClassifier

#Create a Gaussian Classifier

clf=RandomForestClassifier(n_estimators=100)
Introduction to Machine Learning (3171114)

#Train the model using the training sets y_pred=clf.predict(X_tes

t)
clf.fit(X_train,y_train)

y_pred=clf.predict(X_test)

#Import scikit-learn metrics module for accuracy calculation

from sklearn import metrics
# Model Accuracy, how often is the classifier correct?
print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

species_idx = clf.predict([[3, 5, 4, 2]])[0]

iris.target_names[species_idx]

import pandas as pd
feature_imp = pd.Series(clf.feature_importances_,index=iris.featu
re_names).sort_values(ascending=False)
feature_imp

import matplotlib.pyplot as plt

import seaborn as sns
%matplotlib inline

# Creating a bar plot

sns.barplot(x=feature_imp, y=feature_imp.index)

# Add labels to your graph

plt.xlabel('Feature Importance Score')
plt.ylabel('Features')
plt.title("Visualizing Important Features")
plt.legend()
plt.show()

Output:
Accuracy: 0.9555555555555556
Introduction to Machine Learning (3171114)

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What do mean by random forest algorithm?

2. What do you mean by bagging?
3. What does random refer to in ‘Random Forest’?
4. List down the advantages and disadvantages of Random Forest algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 12

K means Clustering
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement K-Means Clustering using IRIS Dataset.
2. To find frequency distribution of species using pair plot.
3. Visualize correlation using a heat map.

Code:

#import libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import MinMaxScaler

#Reading Dataset

iris = pd.read_csv("/content/drive/MyDrive/finalIris.csv")
x = iris.iloc[:, [0, 1, 2, 3]].values

#information of datatypes

iris.info()
iris[0:10]

#Frequency distribution of species"

from pandas.core.groupby.grouper import Index

iris_outcome=pd.crosstab(index=iris["Species"],columns="count")
iris_outcome
iris_setosa=iris.loc[iris["Species"]=="Iris-setosa"]
iris_virginica=iris.loc[iris["Species"]=="Iris-virginica"]
iris_versicolor=iris.loc[iris["Species"]=="Iris-versicolor"]

#distribution of plot using pair plot

sns.pairplot(iris,hue="Species",size=3)
#we can visualise this correlation using a heatmap.

fig = plt.figure(figsize = (15,9))

sns.heatmap(iris.corr(), cmap='Blues', annot = True);
Introduction to Machine Learning (3171114)

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Id 150 non-null int64
1 SepalLengthCm 150 non-null float64
2 SepalWidthCm 150 non-null float64
3 PetalLengthCm 150 non-null float64
4 PetalWidthCm 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
I SepalLengthC SepalWidthC PetalLengthC PetalWidthC
Species
d m m m m

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

5 6 5.4 3.9 1.7 0.4 Iris-setosa

6 7 4.6 3.4 1.4 0.3 Iris-setosa

7 8 5.0 3.4 1.5 0.2 Iris-setosa

8 9 4.4 2.9 1.4 0.2 Iris-setosa

1
9 4.9 3.1 1.5 0.1 Iris-setosa
0
Introduction to Machine Learning (3171114)
Introduction to Machine Learning (3171114)

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. Explain K means Clustering Algorithm.

2. Why do you prefer Euclidean distance over Manhattan distance in the K means
algorithm.
3. List out advantage and disadvantage of K means clustering.
4. Difference between K means clustering and KNN algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 13
Naive Bayes Algorithm
Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Realize concepts of advanced machine learning algorithms.

Objective:
1. Implement Naive Bayes algorithm.
2. Find accuracy.
3. To visualize correlation using a heat map.

Code:

#Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# load the iris dataset

from sklearn.datasets import load_iris
iris = load_iris()

# store the feature matrix (X) and response vector (y)

x = iris.data
y = iris.target

# splitting X and y into training and testing sets

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_si
ze=0.4, random_state=1)

# training the model on training set

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(x_train, y_train)

# making predictions on the testing set

y_pred = classifier.predict(x_test)

# comparing actual response values (y_test) with predicted respon

se values (y_pred)
from sklearn import metrics
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accur
acy_score(y_test, y_pred)*100)

# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix
cm=confusion_matrix(y_test, y_pred)
Introduction to Machine Learning (3171114)

import seaborn as sns

sns.heatmap(cm, annot=True)
plt.show()

Output:

Gaussian Naive Bayes model accuracy(in %): 95.0

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What is Naive Bayes?Why it is called “Naive “ Bayes algorithm.?

2. List out the important characteristics of naive Bayes.
3. What are the main types of Naive Bayes Classifiers?
4. List out advantages, disadvantages and limitation of Naive bayes algorithm.
Introduction to Machine Learning (3171114)

Experiment No: 14

Activation Function

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.

Objectives:
1. To implement activation function.
2. Visualize the working of activation function
Code:

import matplotlib.pyplot as plt

import numpy as np
inp = np.linspace(-10, 10, 40)
out_sigm = 1/(1 + np.exp(-inp))
plt.subplot(1,3,1)
plt.plot(inp, out_sigm)
plt.title("Sigmoid")
plt.grid()
out_tanh = np.tanh(inp)
plt.subplot(1,3,2)
plt.plot(inp, out_tanh)
plt.title("TanH")
plt.grid()
out_relu = [max(0, i) for i in inp]
plt.subplot(1,3,3)
plt.plot(inp, out_relu)
plt.title("ReLU")
plt.grid()
plt.show()

Output:
Introduction to Machine Learning (3171114)

Conclusion(Sufficient space to be provided)

Questions:(Sufficient space to be provided)

1. What is the role of the activation functions in Neural Networks?

2. List down the names of some popular activation function in neural networks.
3. What is the difference between forward propagation and backward propagation in
neural networks?
4. Why is ReLU the most commonly used activation function.
Introduction to Machine Learning (3171114)

Experiment No: 15

Convolution Neural Network(CNN)

Date:

Competency and Practical Skills: Python, scikit-learn, machine learning

Relevant CO: Comprehend basic concepts of Neural network and its use in machine
learning.

Objectives:
1. To implement Convolution Neural Network.
2. Find out the total parameters of Convolution Neural Network.

Code:

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 5, 3) # 3x3 kernel
self.conv2 = nn.Conv2d(5, 5, 3) # 3x3 kernel
self.fc1 = nn.Linear(5*3*3, 10)

def forward(self, x):

x=self.conv1(x)
x=self.conv2(x)
x=self.fc1(x)
return x

net = Net()
print("Model is ",net)
model_params = sum(p.numel() for p in net.parameters())
print("\n Model parameter is ",model_params)

Output:

Model is Net(
(conv1): Conv2d(1, 5, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(5, 5, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=45, out_features=10, bias=True)
)
Model parameter is 740
Introduction to Machine Learning (3171114)

Conclusion:(Sufficient space to be provided)

Question: (Sufficient space to be provided)

1. What do you mean by Convolutional Neural Network ?

2. List out different layer for in CNN.
3. Briefly explain the two major steps of CNN.(1.Feature learning and 2.Classification)
4. Explain the significance of “Parameter sharing” and “sparsity of connection” in
CNN.
Introduction to Machine Learning (3171114)

Data Science
No ratings yet
Data Science
17 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
Introduction of Machine Learning Course Code: 4350702
No ratings yet
Introduction of Machine Learning Course Code: 4350702
9 pages
Pandas
No ratings yet
Pandas
167 pages
ML Lab Manual-17csl76
No ratings yet
ML Lab Manual-17csl76
43 pages
IPDC Solution
100% (1)
IPDC Solution
35 pages
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
No ratings yet
Unit I: Data Handling Using Pandas and Data Visualization: Marks:25
135 pages
8 To 3 Bit Priority Encoder
No ratings yet
8 To 3 Bit Priority Encoder
12 pages
Python For Data Analysis The Python Crash Course Comprehensive The Programming From The Ground Up To Python by Cannon, Jason
No ratings yet
Python For Data Analysis The Python Crash Course Comprehensive The Programming From The Ground Up To Python by Cannon, Jason
167 pages
Python Programming For Economics-Importante
No ratings yet
Python Programming For Economics-Importante
341 pages
RLDL128
No ratings yet
RLDL128
73 pages
Machine Learning
100% (1)
Machine Learning
65 pages
AIot Lab Syllabus
No ratings yet
AIot Lab Syllabus
4 pages
AIML Lab Improvement
No ratings yet
AIML Lab Improvement
20 pages
Ail411 DL Lab Syllubus
No ratings yet
Ail411 DL Lab Syllubus
4 pages
Introduction To Machine Learning Course Code: 4350702
No ratings yet
Introduction To Machine Learning Course Code: 4350702
12 pages
IBM Project Concept Note (Deliverable - 1)
No ratings yet
IBM Project Concept Note (Deliverable - 1)
4 pages
ML Lab Manual-Bcsl606
No ratings yet
ML Lab Manual-Bcsl606
64 pages
83 Bca Honors Minor
No ratings yet
83 Bca Honors Minor
25 pages
ML Final
No ratings yet
ML Final
80 pages
Astha ML Manual
No ratings yet
Astha ML Manual
56 pages
WT Lab Manual
No ratings yet
WT Lab Manual
50 pages
ML Final
No ratings yet
ML Final
72 pages
Applied Machine Learning (3171617) : B.E. Semester 7 (Information Technology)
No ratings yet
Applied Machine Learning (3171617) : B.E. Semester 7 (Information Technology)
70 pages
22aip3101rap Lab Work Book
No ratings yet
22aip3101rap Lab Work Book
84 pages
ML Lab Black & White
No ratings yet
ML Lab Black & White
83 pages
Machine Learning
No ratings yet
Machine Learning
66 pages
Lab Workbook
No ratings yet
Lab Workbook
160 pages
Taral ML Final
No ratings yet
Taral ML Final
54 pages
ML Practical Format
No ratings yet
ML Practical Format
82 pages
Swastika 21csu450 ML-1
No ratings yet
Swastika 21csu450 ML-1
82 pages
MSC Data Science 2022
No ratings yet
MSC Data Science 2022
102 pages
AML - Lab - Syllabus - Chandigarh University
No ratings yet
AML - Lab - Syllabus - Chandigarh University
9 pages
Web Scraping Presentation With Images
No ratings yet
Web Scraping Presentation With Images
4 pages
ML Lab Manual 18csl76 1
No ratings yet
ML Lab Manual 18csl76 1
54 pages
FML Lab Manual
No ratings yet
FML Lab Manual
49 pages
PDF Experiments-1 DADV
No ratings yet
PDF Experiments-1 DADV
41 pages
PySpark Interview Cheatsheet 1741068112
No ratings yet
PySpark Interview Cheatsheet 1741068112
19 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
Ad3461 ML Manual
No ratings yet
Ad3461 ML Manual
34 pages
Machine L-Lab-Manual
No ratings yet
Machine L-Lab-Manual
90 pages
Deep Learning Record
No ratings yet
Deep Learning Record
43 pages
AI ML - LAB - Manual - 2020 - 211213 - 100634
No ratings yet
AI ML - LAB - Manual - 2020 - 211213 - 100634
53 pages
Machine Learning Notes22
No ratings yet
Machine Learning Notes22
45 pages
ML Lab Manual-18csl76
No ratings yet
ML Lab Manual-18csl76
52 pages
Iml Lab (1) .177
No ratings yet
Iml Lab (1) .177
32 pages
Machine Learning Record
No ratings yet
Machine Learning Record
52 pages
ML Lab
No ratings yet
ML Lab
45 pages
Final Project Report 1
No ratings yet
Final Project Report 1
74 pages
Machine Learning Lab (CIE 421P)
No ratings yet
Machine Learning Lab (CIE 421P)
49 pages
Updated ML LAB Manual-2020-21
No ratings yet
Updated ML LAB Manual-2020-21
57 pages
Machine Learning Lab Manual-1
No ratings yet
Machine Learning Lab Manual-1
35 pages
ML Lab
No ratings yet
ML Lab
62 pages
Syl3 ML
No ratings yet
Syl3 ML
5 pages
ML LAB MANUAL (ACSML0651) - DR Roop Singh
No ratings yet
ML LAB MANUAL (ACSML0651) - DR Roop Singh
58 pages
Iml Unit 2
No ratings yet
Iml Unit 2
46 pages
ML Lab Manual Pesitm
No ratings yet
ML Lab Manual Pesitm
22 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
44 pages
Final Report of Sem 8
No ratings yet
Final Report of Sem 8
64 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
Python Re Pe
No ratings yet
Python Re Pe
9 pages
ML RECORD - Merged
No ratings yet
ML RECORD - Merged
33 pages
Shwet Mlds
No ratings yet
Shwet Mlds
35 pages
Practice Questions (Unsolved)
No ratings yet
Practice Questions (Unsolved)
8 pages
P3 Practical
No ratings yet
P3 Practical
20 pages
Pandas
No ratings yet
Pandas
8 pages
PGP in DS & AI
No ratings yet
PGP in DS & AI
24 pages
Sensors 22 05849 v2
No ratings yet
Sensors 22 05849 v2
19 pages
AI Manual
No ratings yet
AI Manual
36 pages
Machine Leanring Lab
No ratings yet
Machine Leanring Lab
8 pages
Teaching Plan - 3161003 - Summer 2023
No ratings yet
Teaching Plan - 3161003 - Summer 2023
5 pages
Dotslash 6.0 Brochure
No ratings yet
Dotslash 6.0 Brochure
9 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
1 page
PDS Exp 1 To 3
No ratings yet
PDS Exp 1 To 3
17 pages
Hackathon
No ratings yet
Hackathon
4 pages
Front Page
No ratings yet
Front Page
6 pages
MM Lab Experiments
No ratings yet
MM Lab Experiments
22 pages
ASM2-Trần Hữu Vương
No ratings yet
ASM2-Trần Hữu Vương
33 pages
Bkcan: Stsos
No ratings yet
Bkcan: Stsos
9 pages
ML Industry Lab File
No ratings yet
ML Industry Lab File
6 pages
ML - Lab - Programs - J
No ratings yet
ML - Lab - Programs - J
18 pages
Data Science Syllabus EN 2024
No ratings yet
Data Science Syllabus EN 2024
24 pages
Pandas Notes
No ratings yet
Pandas Notes
8 pages
Document 1
No ratings yet
Document 1
6 pages
Exercise6 Solution
No ratings yet
Exercise6 Solution
8 pages
ML Syllabus
No ratings yet
ML Syllabus
5 pages
Pa Lab MDM
No ratings yet
Pa Lab MDM
4 pages
ML Minor Syllabus-Sem-04
No ratings yet
ML Minor Syllabus-Sem-04
4 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
3 pages
Mandatory Additional Requirement For Earning B.pharm. Degree-W-2019-18-7-2019 - 708959
No ratings yet
Mandatory Additional Requirement For Earning B.pharm. Degree-W-2019-18-7-2019 - 708959
6 pages
Ijest NG Vol6 No4 pp31 441
No ratings yet
Ijest NG Vol6 No4 pp31 441
15 pages
CT20244402705 Application
No ratings yet
CT20244402705 Application
4 pages
Syllabus - ML Lab
No ratings yet
Syllabus - ML Lab
3 pages
Instructions For Candidates:: A10.1-R5-Data Science Using Python
No ratings yet
Instructions For Candidates:: A10.1-R5-Data Science Using Python
8 pages
Ad3461 Machine Learning Laboratory - 1
No ratings yet
Ad3461 Machine Learning Laboratory - 1
1 page
Bcse0731 - Machine Learning Lab
No ratings yet
Bcse0731 - Machine Learning Lab
1 page
Ad3461 Machine Learning Laboratory Syllabus
No ratings yet
Ad3461 Machine Learning Laboratory Syllabus
2 pages
15CSL76
No ratings yet
15CSL76
3 pages
Teaching Plan (Machine Learning) : Programming Exercises
No ratings yet
Teaching Plan (Machine Learning) : Programming Exercises
2 pages
Series and Pandas Methods
No ratings yet
Series and Pandas Methods
5 pages
Data Wrangling - Syllabus - bQx5s
No ratings yet
Data Wrangling - Syllabus - bQx5s
3 pages
Ranjan Burnwal: Software Engineer II
No ratings yet
Ranjan Burnwal: Software Engineer II
1 page
CV Format Lpu
No ratings yet
CV Format Lpu
1 page
Learn Machine Learning in 24 Hours: The Ultimate Beginner’s Guide: Master Coding in 24 Hours
From Everand
Learn Machine Learning in 24 Hours: The Ultimate Beginner’s Guide: Master Coding in 24 Hours
Aniket Jain
No ratings yet