0% found this document useful (0 votes)
32 views64 pages

ML Lab Manual-1

The document is a laboratory manual for the Machine Learning Lab course at P.S.R. Engineering College, detailing the vision, mission, program outcomes, and specific outcomes for the Electronics and Communication Engineering department. It outlines the course objectives, including the implementation of various machine learning algorithms and techniques, along with a list of experiments to be conducted using Anaconda software. Additionally, it provides theoretical background on artificial intelligence and machine learning concepts, including supervised and unsupervised learning, classification, regression, and clustering.

Uploaded by

maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views64 pages

ML Lab Manual-1

The document is a laboratory manual for the Machine Learning Lab course at P.S.R. Engineering College, detailing the vision, mission, program outcomes, and specific outcomes for the Electronics and Communication Engineering department. It outlines the course objectives, including the implementation of various machine learning algorithms and techniques, along with a list of experiments to be conducted using Anaconda software. Additionally, it provides theoretical background on artificial intelligence and machine learning concepts, including supervised and unsupervised learning, classification, regression, and clustering.

Uploaded by

maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

P.S.R.

ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)
Sevalpatti (P.O), Sivakasi - 626140.
Tamilnadu State

LABORATORY MANUAL

191EC67
MACHINE LEARNING LAB

III YEAR / V1 SEMESTER

DEPARTMENTOF
ELECTRONICS AND COMMUNICATION
ENGINEERING

Prepared by Approved by

Mrs.M.Vimala, AP/ECE Dr.K.Valarmathi,HOD/ECE


Mrs.P.Krishna Leela, AP/ECE
DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

VISION

 The vision of the Electronics and Communication Engineering Department is to produce


graduates with sound knowledge for the betterment of society and to meet the dynamic demands
of industry and research.

MISSION

 Offering under graduate and post graduate programmes by providing effective and balanced
curriculum and equip themselves to gear up to the ethical challenges awaiting them.
 Providing the technical, research and intellectual resources that will enable the students to have
a successful career in the field of Electronics and Communication Engineering.
 Providing need based training and professional skills to satisfy the needs of society and industry.

PROGRAMME OUTCOMES (POs)

Engineering Graduates will be able to

PO1:Engineering Knowledge: Apply the knowledge of mathematics, science,engineering


fundamentals, and an engineering specialization to the solution ofcomplex engineering problems.
PO2: Problem Analysis: Identify, formulate, review research literature, and analyse complex
engineering problems reaching substantiated conclusions using first principles of mathematics, natural
sciences and engineering sciences.
PO3:Design / Development of Solutions: Design solutions for complex engineeringproblems and
design system components or processes that meet the specified needswith appropriate consideration for
the public health and safety, and the cultural,societal, and environmental considerations.
PO4:Conduct Investigations of Complex Problems: Use research-based knowledgeand research
methods including design of experiments, analysis and interpretation ofdata and synthesis of the
information to provide valid conclusions.
PO5:Modern Tool Usage: Create, select and apply appropriate techniques, resourcesand modern
engineering and IT tools including prediction and modelling to complexengineering activities with an
understanding of the limitations.
PO6:The Engineer and Society: Apply reasoning informed by the contextualknowledge to assess
societal, health, safety, legal and cultural issues and theconsequent responsibilities relevant to the
professional engineering practice.
PO7: Environment and Sustainability: Understand the impact of the professionalengineering
solutions in societal and environmental contexts and demonstrate theknowledge of and need for
sustainable development.
PO8: Ethics: Apply ethical principles and commit to professional ethics andresponsibilities and norms
of the engineering practice.
PO9: Individual and Team Work: Function effectively as an individual and as amember or leader in
diverse teams, and in multidisciplinary settings.
PO10:Communication:Communicate effectively on complex engineering activitieswith the
engineering community and with society at large, such as, being able tocomprehend and write
effective reports and design documentation, make effectivepresentations, and give and receive clear
instructions.
PO11:Project Management and Finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’sown work, as a member and leader in a
team, to manage projects and inmultidisciplinary environments.
PO12:Life-long Learning:Recognize the need for, and have the preparation andability to engage in
independent and life-long learning in the broadest context oftechnological change.

PROGRAMME SPECIFIC OUTCOMES (PSOs):


PSO1: Design, simulate and analyze diverse problems in the field of telecommunication.
PSO2: Able to design and analyze varied electronic circuits for applications.
PSO3: Apply signal and image processing techniques to analyze a system forapplications.
PSO4: Construct, test and evaluate an embedded system and control systems with realtime constraints.
COURSE OUTCOMES:

The students will be able to


CO1: Construct suitable algorithms for finding the most specific hypothesis based on given training
data. (AP)
CO2: Build Artificial Neural Network and test for the given dataset. (AP)
CO3: Make use of Naïve Baye’s classifier to classify the given data. (AP)
CO4: Apply clustering algorithms to cluster the given set of data using python or Java. (AP)
CO5: Design regression algorithm to fit the given data points. (AP)
CO6: Apply machine learning algorithms to solve real world problems. (AP)

CO PO MAPPING
Program Specific
Course Program Outcomes (POs)
Outcomes (PSOs)
Outcomes PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3 PSO4

CO1 3 2 1 3 3 1 3
CO2 3 2 1 3 3 1 2
CO3 3 2 1 3 3 1 3
CO4 3 2 1 3 3 1 3
CO5 3 2 1 3 3 1 2
CO6 3 2 1 3 3 2 3
LIST OF EXPERIMENTS:

1. Implement FIND-S Algorithm for finding the most specific hypothesis based on a given set of
training samples.
2. Build an Artificial Neural Network by implementing the back propagation algorithm
3. Implement Naïve Bayesian Classifier
4. Implement classifier using Support Vector Machine
5. Implement K Nearest Neighbor Classifier
6. Implement K Means segmentation
7. Implement linear regression
8. Implement dimensionality reduction algorithm (PCA)
9. Implement random forest classifier.
10. Implement deep learning algorithm
11. Implement prediction of heart disease with machine learning
12. Implement machine learning project for any real world dataset

CONTENTS

S.No Name of the Experiments

USING ANACONDA SOFTWARE PACKAGE

1. Implement FIND-S Algorithm for finding the most specific hypothesis


based on a given set of training samples.
2. Build an Artificial Neural Network by implementing the back
propagation algorithm
3. Implement Naïve Bayesian Classifier
4. Implement classifier using Support Vector Machine
5. Implement K Nearest Neighbor Classifier
6. Implement K Means segmentation
7. Implement linear regression
8. Implement dimensionality reduction algorithm (PCA)
9. Implement random forest classifier
10. Implement deep learning algorithm
11. Implement prediction of heart disease with machine learning
12. Implement machine learning project for any real world dataset
P.S.R.ENGINEERING COLLEGE
(An Autonomous Institution, Affiliated to Anna University, Chennai)
Sevalpatti (P.O), Sivakasi - 626140.
Tamilnadu State

COURSE PLAN

COURSECODE:191EC67 YEAR &SEM: III& VI


COURSENAME: MACHINE LEARNING LAB BRANCH : ECE
FACULTYNAME: Mrs.M.VIMALA, AP/ECE
Mrs.P.Krishna Leela AP/ECE

EX.NO EXPERIMENT NAME WEEK


CYCLE-I
USING ANACONDA SOFTWARE PACKAGE
Implement FIND-S Algorithm for finding the most specific
1. I
hypothesis based on a given set of training samples.
Build an Artificial Neural Network by implementing the back
2. II
propagation algorithm
3. Implement Naïve Bayesian Classifier III
4. Implement classifier using Support Vector Machine IV
5. Implement K Nearest Neighbor Classifier V
6. Implement K Means segmentation VI
CYCLE-II
USING ANACONDA SOFTWARE PACKAGE
7. Implement linear regression VII
8. Implement dimensionality reduction algorithm (PCA) VIII
9. Implement random forest classifier IX
10. Implement deep learning algorithm X
11. Implement prediction of heart disease with machine learning X1
12. Implement machine learning project for any real world dataset XI1
INTRODUCTION

Artificial Intelligence, Machine Learning and Deep Learning

• Artificial intelligence is an effort to mimic human behavior or automate any task

• Machine learning involves building mathematical models to understand data. ML systems are
trained on the data and once these models fit on these data, they can be used to predict unknown
data.

• Deep learning performs this by deep sequence of data transformations form lower levels of
abstraction to higher levels

Categories of Machine
Learning

Unsupervised Machine Learning


Unsupervised learning is a type of self-organized learning that helps find previously unknown patterns
in data set without pre-existing labels.
Supervised Machine Learning
In Supervised learning, a model can learn from the input provided as a labeled dataset. The trained
model is then used for prediction
Reinforcement Learning
Reinforcement Learning (RL) enables an agent to learn in an interactive environment by trial and error
using feedback from its own actions and experiences.In robotics and industrial automation,RL is used
to enable the robot to create an efficient adaptive control system for itself which learns from its own
experience and behavior.
Tensor
In machine learning algorithms, the input and output are both represented as vectors. Vector is a
collection of numbers.
Example: A=[1 2 3 4]
• Scalar – Single number

• Vector – Array of numbers

• Matrix – 2D array of numbers

• Tensors – array of numbers with dimension greater than 2

Python Packages

• NumPy provides support of highly optimized multidimensional arrays. These are the basic data
structure of most state-of-the art algorithms.
• SciPy use these arrays to provide a set of fast numerical recipes. SciPy contains modules for
optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image
processing

• matplotlib is feature-rich library to plot high-quality graphs using Python.

• OpenCV is popular library for Computer Vision.

• Pandas – Python data analysis and manipulation tool; Great tool for using Excel with Python

• Keras - Keras is the high level API (Application Program Interface) of Tensor flow.

• TensorFlow is library for dataflow and differentiable programming across a range of tasks. It is
symbolic math library, and is also used for machine learning applications such as neural
networks.

Classification

Classification is the process of finding a function which helps in dividing the dataset into classes based
on different parameters.In Classification, the labels are discrete quantities
• Example – Email Spam Detection
The above figure shows 2 dimensional data with 2 features for each point represented by (x,y).
Two class labels are blue and red. From these features and labels, a model will be created that
decides a new point should be labeled blue or red. The best model for this problem will be a
straight line separating the classes. The model parameters are the particular numbers describing the
location and orientation of that line for the data The optimal values for these parameters are learned
from the data which is called training the model. The machine learning approach can generalize to
much larger data sets in many more dimensions.

Regression
Regression is a process of finding the correlations between dependent and independent variables. It
helps in predicting the continuous variables such as prediction of Market trends, weather forecast
etc., Here, the labels are continuous quantities.
Clustering

Clustering is the process of grouping similar entities together. The goal of this unsupervised
machine learning technique is to find similarities in the data point and group similar data points
together.

Dimensionality reduction

The higher the number of features, it is harder to work on it. Most of these features are correlated,
and hence redundant. The two components of dimensionality reduction are feature selection and
feature extractionFeature selection refers to subset of features. Feature extraction refers to reduction
of data in high dimensional space to a lower dimensional space.
Examples of high dimensional datasets – videos, emails, satellite observations etc.,The unnecessary
and noisy dimensions should be removed and should keep the informative ones.The goal of PCA is
to project input data onto a lower dimensional subspace, preserving as much variance within the
data as possible.
Expt. 1 Implementation of FINDS Algorithm Date:

Aim:
To implement FIND-S Algorithm for finding the most specific hypothesis based on a
given set of training samples.

Theory:
The find-S algorithm is a basic concept learning algorithm in machine learning. The find-S
algorithm finds the most specific hypothesis that fits all the positive examples. We have to note here
that the algorithm considers only those positive training example. The find-S algorithm starts with the
most specific hypothesis and generalizes this hypothesis each time it fails to classify an observed
positive training data. Hence, the Find-S algorithm moves from the most specific hypothesis to the most
general hypothesis.

Algorithm

 Initialize h to the most specific hypothesis in H


 For each positive training instance x
For each attribute constraint ai in h
If the constraint ai is satisfied by x
Then do nothing
Else replace ai in h by the next more general constraint that is satisfied by x
 Output hypothesis h

TrainingExamples:

Create enjoysport.csv file as follows.

Program:
import csv
a = []
with open('enjoysport.csv', 'r') as csvfile:
for row in csv.reader(csvfile):
a.append(row)
print(a)
print("\n The total number of training instances are : ",len(a))
num_attribute = len(a[0])-1
print("\n The initial hypothesis is : ")
hypothesis = ['0']*num_attribute
print(hypothesis)
for i in range(0, len(a)):
if a[i][num_attribute] == 'yes':
for j in range(0, num_attribute):
if hypothesis[j] == '0' or hypothesis[j] == a[i][j]:
hypothesis[j] = a[i][j]
else:
hypothesis[j] = '?'
print("\n The hypothesis for the training instance {} is : \n" .format(i+1),hypothesis)
print("\n The Maximally specific hypothesis for the training instance is ")
print(hypothesis)

Output:
Result:
Expt.2 Building of Artificial Neural Network Date:

Aim:
To build an Artificial Neural Network by implementing the back propagation algorithm

Theory:

An artificial neural network is an attempt to simulate the network of neurons that make up a human brain
so that the computer will be able to learn things and make decisions in a humanlike manner. ANNs are
created by programming regular computers to behave as though they are interconnected brain cells.

The Neural Network is constructed from 3 type of layers:


1. Input layer — initial data for the neural network.
2. Hidden layers — intermediate layer between input and output layer and place where all the
computation is done.
3. Output layer — produce the result for given inputs.
In Neural Network the activation function defines if given node should be “activated” or not based on
the weighted sum.
The unit step activation function is defined as
�� � > �ℎ���ℎ��� , �������� �ℎ� ���� ����� 1
�� � < �ℎ���ℎ��� , ���' � �������� �ℎ� ���� 0
The most widely used activation function is sigmoid function which is defined as
1
� � =
1 + �−�
Back PropagationAlgorithm

The algorithm is used to effectively train a neural network through a method called chain rule. In
simple terms, after each forward pass through a network, backpropagation performs a backward pass
while adjusting the model’s parameters (weights and biases).

Each training example is a pair of the form (�, �) where x is the vector of network input values and t is
the vector of target network output values. � is the learning rate (ex. 0.5), �� is the number of network
inputs, �ℎ����� is the number of units in the hidden layer and ���� is the number of output units. The
input from unit � into unit � is denoted ��� and the weight from unit � into unit � is denoted ��� .

 Create a feed-forward network with �� inputs, �ℎ����� units, and ���� output units.

 Initialize all network weights to small random numbers.


 Until the termination condition is met, Do

For each (�, �) in training example, Do

Propagate the input forward through the network:

1. Input the instance � to the network and compute the output �� of every unit � in the network.

Propagate the errors backward through the network:

2. For each network output unit �, calculate its error term �� .

�� = �� 1 − �� (�� − ��)

3. For each hidden unit ℎ, calculate its error term �ℎ

�ℎ = �ℎ 1 − �ℎ �ℎ,� ��
�∈�������

4. Update each network weight ���

��� = ��� + ∆���

∆��� = ������

Program:

# Import Libraries
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
# Load dataset
data = load_iris()
# Get features and target
X=data.data
y=data.target
# Get dummy variable
y = pd.get_dummies(y).values
#Split data into train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=20, random_state=4)
# Initialize variables
learning_rate = 0.1
iterations = 10000
N = y_train.size
# number of input features
input_size = 4
# number of hidden layers neurons
hidden_size = 2
# number of neurons at the output layer
output_size = 3
results = pd.DataFrame(columns=["mse", "accuracy"])
# Initialize weights
np.random.seed(10)
# initializing weight for the hidden layer
W1 = np.random.normal(scale=0.5, size=(input_size, hidden_size))
# initializing weight for the output layer
W2 = np.random.normal(scale=0.5, size=(hidden_size , output_size))
def sigmoid(x):
return 1 / (1 + np.exp(-x))
defmean_squared_error(y_pred, y_true):
return ((y_pred - y_true)**2).sum() / 2*(y_pred.size)
defaccuracy(y_pred, y_true):
acc = y_pred.argmax(axis=1) == y_true.argmax(axis=1)
return acc.mean()
for itr in range(iterations):
# feedforward propagation
# on hidden layer
Z1 = np.dot(X_train, W1)
A1 = sigmoid(Z1)
# on output layer
Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)
# Calculating error
mse = mean_squared_error(A2, y_train)
acc = accuracy(A2, y_train)
results=results.append({"mse":mse, "accuracy":acc},ignore_index=True )
# backpropagation
E1 = A2 - y_train
dW1 = E1 * A2 * (1 - A2)
E2 = np.dot(dW1, W2.T)
dW2 = E2 * A1 * (1 - A1)
# weight updates
W2_update = np.dot(A1.T, dW1) / N
W1_update = np.dot(X_train.T, dW2) / N
W2 = W2 - learning_rate * W2_update
W1 = W1 - learning_rate * W1_update
results.mse.plot(title="Mean Squared Error")
results.accuracy.plot(title="Accuracy")
Z1 = np.dot(X_test, W1)
A3 = sigmoid(Z1)
Z2 = np.dot(A3, W2)
A4 = sigmoid(Z2)
acc = accuracy(A4, y_test)
print("Accuracy: {}".format(acc))

Output:

Result:
Expt. 3 Implementation of Naïve Bayesian Classifier Date:

Aim:
To implement Naïve Bayesian Classifier

Theory:
It requires preliminary mathematical knowledge in conditional probability and Bayes heorem,
Suppose we have input data on the characteristics of the weather (outlook, temperature, humidity, windy)
and whether you played golf or not (i.e., last column).Naive Bayes essentially compares
the proportion between each input variable and the categories in the output variable. This can be shown in
the table below.

Consider that we have a new day with the following characteristics:

 Outlook: sunny

 Temperature: mild

 Humidity: normal

 Windy: false

First, we’ll calculate the probability that you will play golf given X, P(yes|X), followed by the
probability that you won’t play golf given X, P(no|X).
P(yes)=9/14

P(outlook = sunny/yes) = 2/9

P(temperature = mild/yes) = 4/9

P(humidity = normal/yes) = 6/9

P(windy = FALSE/yes) = 6/9

P(No)=5/14

P(outlook = sunny/no) = 3/5

P(temperature = mild/no) = 2/5

P(humidity = normal/no) = 1/5

P(windy = FALSE/no) = 2/9

P(yes/X)=2/9 x 4/9 x 6/9 x 6/9 x 9/14 = 0.0282

P(no/X)=3/5 x 2/5 x 1/5 x 2/9 x 5/14 = 0.0069

Because P(yes|X) > P(no|X), then you can predict that this person would play golf given that the outlook
is sunny, the temperature is mild, the humidity is normal, and it’s not windy.
Confusion Matrix:

Actual
Positive Negative
Predicted Positive True Positive False Positive
Negative False Negative True Negative

True Positives – Data points labelled as positive that are actually positive
False Positives - Data points labelled as positive that are actually negative
True Negatives – Data points labelled as negative that are actually negative
False Negatives – Data points labelled as negative that are actually positive

���� ��������
������ =
���� �������� + ����� ��������

���� ��������
��������� =
���� �������� + ����� ��������
���� �������� + ���� ��������
�������� =
�����
Dataset

Program:

import pandas as pd
msg=pd.read_csv('naivetext.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
#splitting the dataset into train and test data
from sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print ('\n the total number of Training Data :',ytrain.shape)
print ('\n the total number of Test Data :',ytest.shape)
#output of the words or Tokens in the text documents
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print('\n The words or Tokens in the text documents \n')
print(count_vect.get_feature_names())
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
# Training Naive Bayes (NB) classifier on training data.
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
#printing accuracy, Confusion matrix, Precision and Recall
from sklearn import metrics
print('\n Accuracy of the classifier is',metrics.accuracy_score(ytest,predicted))
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('\n The value of Precision', metrics.precision_score(ytest,predicted))
print('\n The value of Recall', metrics.recall_score(ytest,predicted))

Output:
Result:
Expt.4 Implement classifier using Support Vector Machine Date:

Aim:
To implement Support Vector Machine classifier

Theory:

Support Vector Machines

A Support Vector Machine (SVM) is a supervised machine learning algorithm that can be
employed for both classification and regression purposes. SVMs are based on the idea of finding
a hyperplane that best divides a dataset into two classes, as shown in the image below.

For a classification task with only two features (like the image above), the hyperplane can be
thought of as a line that linearly separates and classifies a set of data.Intuitively, the further from
the hyperplane our data points lie, the more confident we are that they have been correctly
classified. We therefore want our data points to be as far away from the hyperplane as possible,
while still being on the correct side of it.So when new testing data is added, whatever side of the
hyperplane it lands will decide the class that we assign to it.

The distance between the hyperplane and the nearest data point from either set is known as the
margin. The goal is to choose a hyperplane with the greatest possible margin between the
hyperplane and any point within the training set, giving a greater chance of new data being
classified correctly.

In order to classify a dataset like the one above it’s necessary to move away from a 2d view of
the data to a 3d view.
Define the Hyperplanes�1 and �2 such that

�1 : � ∗ �� + � ≥ 1 �ℎ�� �� =+ 1

�2 : � ∗ �� + � ≤ 1 �ℎ�� �� =− 1

� is the weight vector, �� is the input vector and � is the bias.

The points on the planes �1 and �2 are the support vectors.

The plane �0 is the median between where : � ∗ �� + � = 0

d+ = the shortest distance to the closest positive point

d- = the shortest distance to the closest negative point

The margin of the separating hyper plane is d+ + d-


Program
from sklearn import datasets
from sklearn.model_selection import train_test_split
iris = datasets.load_iris()
X = iris.data # we only take the first two features.
y = iris.target
from sklearn.svm import SVC
model = SVC(kernel='linear', C=1E10)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.33, random_state=42)
model.fit(X_train, y_train)
print(model.score(X_test, y_test))

Output:
Result:
Expt. 5 Implementation of K Nearest Neighbor Classifier Date:

Aim:

To implement K Nearest Neighbor Classifier

Theory:

First, we start off with data that is already classified (i.e., the red and blue data points). Then
when a new data point is added, it will be classified it by looking at the k nearest classified
points. Whichever class gets the most votes determines what the new point gets classified as.

In this case, if we set k=1, we can see that the first nearest point to the grey sample is a red
data point. Therefore, the point would be classified as red.
Program:

from matplotlib import pyplot as plt

import numpy as np

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.model_selection import KFold

data = load_iris()

features = data.data

feature_names = data.feature_names

feature_names = data.feature_names

target = data.target

target_names = data.target_names

labels = target_names[target]

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors=1)

kf = KFold(n_splits=5, shuffle=True)

X=features

y=target

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.33, random_state=42)

classifier.fit(X_train, y_train)

print(classifier.score(X_test, y_test))
Output:

Result:
Expt.6 Implementation of K Means segmentation Date:

Aim:
To implement K Means segmentation algorithm for images.

Theory:

K-Means Clustering

Example of K-Means clustering

K=2 X=[20,40,32,96]
Cluster centres
C1=20 c2=40
1st iteration
Class 1 class 2
20 40
32
96
New cluster centres
C1=20 c2=56 (Average of 40, 32, 96)
2nd iteration
Class 1 class 2
20 40
32 96
New cluster centres
C1=26 (Average of 20,32) c2=68 (Average of 40, 96)
3rd iteration
Class 1 class 2
20 96
40
32
New cluster centres
C1=31 (Average of 20,40,32) c2=96
3rd iteration
Class 1 class 2
20 96
40
32
Since the values of successive iterations don’t change, it stops

Program:

import cv2

import numpy as np

import matplotlib.pyplot as plt

image = cv2.imread("image.jpg")

image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# reshape the image to a 2D array of pixels and 3 color values (RGB)

pixel_values = image.reshape((-1, 3)) # - indicates unknown dimension

# convert to float

pixel_values = np.float32(pixel_values)

print(pixel_values.shape)

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.2)

# number of clusters (K)


k=3

_, labels, (centers) = cv2.kmeans(pixel_values, k, None, criteria, 10,


cv2.KMEANS_RANDOM_CENTERS)

# convert back to 8 bit values

centers = np.uint8(centers)

# flatten the labels array

labels = labels.flatten()

# convert all pixels to the color of the centroids

segmented_image = centers[labels.flatten()]

# reshape back to the original image dimension

segmented_image = segmented_image.reshape(image.shape)

# show the image

plt.imshow(image)

plt.show()

plt.imshow(segmented_image)

plt.show()

np.set_printoptions(threshold=np.inf)

print(labels)
Output:

Result:
Expt.7 Implementation of Linear Regression Date:

Aim
To implement linear regression algorithm

Theory:

Linear Regression is one of the most fundamental algorithms used to model relationships
between a dependent variable and one or more independent variables. In simpler terms, it
involves finding the ‘line of best fit’ that represents two or more variables.

The line of best fit is found by minimizing the squared distances between the points and the line
of best fit — this is known as minimizing the sum of squared residuals. A residual is simply
equal to the predicted value minus the actual value.

Comparing the green line of best fit to the red line, the vertical lines (the residuals) are much
bigger for the green line than the red line. This makes sense because the green line is so far away
from the points that it isn’t a good representation of the data at all.

� = �� + �
Logistic Regression

First, a score using an equation similar to the equation for the line of best fit for linear regression
is calculated. Logistic regression has one additional step.The extra step is feeding the score that
was previously calculated in the sigmoid function below so that a probability in return is
obtained. This probability can then be converted to a binary output, either 1 or 0.

1
� � =
1 + �−�

To find the weights of the initial equation and calculate the score, methods like gradient descent
or maximum likelihood are used.

Program

from IPython import get_ipython

get_ipython().magic('reset -sf')

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

dataset = pd.read_csv('Position_Salaries.csv')

X = dataset.iloc[:, 1:2].values

y = dataset.iloc[:, 2].values

from sklearn.tree import DecisionTreeRegressor

regressor = DecisionTreeRegressor(random_state=0)

regressor.fit(X,y)

n=np.array([6.5]).reshape(1, 1)

y_pred = regressor.predict(n)

plt.scatter(X, y, color = 'red')

plt.plot(X, regressor.predict(X), color = 'blue')


plt.title('Regression Model')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

X_grid = np.arange(min(X), max(X), 0.01)

X_grid = X_grid.reshape((len(X_grid), 1))

plt.scatter(X, y, color = 'red')

plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')

plt.title('Example of Decision Regression Model')

plt.xlabel('Position level')

plt.ylabel('Salary')

plt.show()

print(y_pred)

Output:
Result:
Expt.8 Implementation of dimensionality reduction algorithm (PCA) Date:

Aim:
To implement dimensionality reduction algorithm – Principal Component Analysis

Theory:

Principal Component Analysis (PCA)

Consider an example of Classification of Emails as spam or not. Construct a mathematical


representation of each email as a bag-of-words vector which is binary vector. Each entry in the
bag-of-words is the number of times a corresponding word appears in an email (0 if it does not
appear). Consider for an email, the bag of words vectors is (x1,x2…..xm). Not all dimensions
(words) of the vectors are informative for spam/not spam classification. Better features are
Lottery, credit, pay than dog, cat, tree etc., Hence to reduce dimension, PCA is used. Construct
an mxm covariance vector from the sample (x1,x2….xm). Compute its eigen vectors and eigen
values. Project the vectors onto eigen vectors corresponding to top p eigen values. Hence the
dimension is reduced to p.

The eigen vectors have the special property that they point towards the directions of the most
variance within the data. The first dimension points towards the highest variance in the subspace
orthogonal to the first vector.
Projecting onto top eigen vectors preserves maximum variance. Capturing more variance means
capturing more information to analyze. Plot the eigen values and find the plot where the eigen
values start to decay exponentially. 3 top values should be selected for top dataset and 7 for
bottom. After low dimensional PCA projection of bag-of-words is computed, classification
algorithms may be used to classify mails as spam/not spam
Program

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

# importing or loading the dataset

dataset = pd.read_csv('wine.csv')

# distributing the dataset into two components X and Y

X = dataset.iloc[:, 1:13].values

y = dataset.iloc[:, 0].values

# Splitting the X and Y into the

# Training set and Testing set

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

# performing preprocessing part

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)

# Applying PCA function on training

# and testing set of X component

from sklearn.decomposition import PCA

from sklearn.linear_model import LogisticRegression

pca = PCA(n_components = 2)

X1_train = pca.fit_transform(X_train)
X1_test = pca.transform(X_test)

print(X.shape)

print(X1_train.shape)

variance = pca.explained_variance_ratio_

classifier = LogisticRegression(random_state = 0)

classifier.fit(X1_train, y_train)

y_pred = classifier.predict(X1_test)

# making confusion matrix between

# test set of Y and predicted value.

print(classifier.score(X1_test, y_test))

classifier.fit(X_train, y_train)

y1_pred = classifier.predict(X_test)

# making confusion matrix between

# test set of Y and predicted value.

print(classifier.score(X_test, y_test))

print(np.shape(X_train))

print(np.shape(X1_train))

plt.figure(figsize=(8,6))

plt.scatter(X1_train[:,0],X1_train[:,1],s=10,c=y_train,cmap='rainbow')

plt.xlabel('First principal component')

plt.ylabel('Second Principal Component')

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)
Output:

Result:
Expt.9 Implementation of Random Forest Classifier Date:

Aim:

To implement random forest classifier

Theory:

Decision Tree

 Ensemble learning is a method where multiple learning algorithms are used in


conjunction. The purpose of doing so is that it allows you to achieve higher predictive
performance than if you were to use an individual algorithm by itself.

 Random forests are an ensemble learning technique that builds off of decision trees.
Random forests involve creating multiple decision trees using bootstrapped datasets of
the original data and randomly selecting a subset of variables at each step of the decision
tree. The model then selects the mode of all of the predictions of each decision tree
(bagging). By relying on a “majority wins” model, it reduces the risk of error from an
individual tree.

For example, if we created one decision tree, the third one, it would predict 0. But if we
relied on the mode of all 4 decision trees, the predicted value would be 1.

Program

from matplotlib import pyplot as plt

import numpy as np

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split

from sklearn.model_selection import KFold

data = load_iris()

features = data.data

feature_names = data.feature_names

feature_names = data.feature_names

target = data.target

target_names = data.target_names

labels = target_names[target]

from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators=100)

kf = KFold(n_splits=5, shuffle=True)

X=features

y=target

X_train, X_test, y_train, y_test = train_test_split(

X, y, test_size=0.33, random_state=42)

classifier.fit(X_train, y_train)
print(classifier.score(X_test, y_test))

Output:

Result:
Expt.10 Implementation of Deep Learning Algorithm Date:

Aim:
To implement deep learning algorithm for classification of images

Theory:
Deep Learning
Deep Learning is a specific subfield of machine learning: a new take on learning representations
from data that puts an emphasis on learning successive layers of increasingly meaningful
representations. How many layers contribute to a model of the data is called the depth of the
model. These layered representations are learned via models called neural networks.

Applications of deep learning

 Image classification
 Speech recognition
 Handwriting transcription
 Machine translation
 Improved text-to-speech conversion
 Digital assistants such as Google Now and Amazon Alexa
 Autonomous driving
 Improved search results on the web
 Ability to answer natural-language questions
Initially, the network output is far from target and the loss score is accordingly very high. With
every example the network processes, the weights are adjusted a little in the correct direction,
and the loss score decreases. The training loop when repeated a sufficient number of times yields
weight values that minimize the loss function. A network with a minimal loss is one for which
the outputs are as close as they can be to the targets: a trained network.
The incremental, layer-by-layer way in which increasingly complex representations are
developed. The intermediate incremental representations are learned jointly.

Keras to learn to classify handwritten digits

• To classify grayscale images of handwritten digits (28 × 28 pixels) into their 10


categories (0 through 9)

• MNIST dataset - 60,000 training images, plus 10,000 test images, assembled by the
National Institute of Standards and Technology (the NIST in MNIST)

• The core building block of neural networks is the layer, a data-processing module which
can act as filter for data.

• Some data goes in, and it comes out in a more useful form. Specifically, layers extract
representations out of the data fed into them.
• The above example consists of two dense layers (fully connected layers) and 10-way
softmax layer which will return 10 probability scores summing to 1.

• A loss function—How the network will be able to measure its performance on the training
data, and thus how it will be able to steer itself in the right direction.

• An optimizer—The mechanism through which the network will update itself based on the
data it sees and its loss function.

Convolutional Neural Network

The CNN consists of series of convolutional and pooling layers, followed by fully connected
layers and activation layer. The convolution layer is employed for filtering operations and
pooling is used to reduce the dimensionality. CNN is trained through back propagation with
gradient descent.
The two stages of the CNN are the feed forward stage and back propagation stage. In the feed
forward stage, the input images are fed to the network. Each Convolution layer in the network
comprises of a set of independent filters and each filter is convolved with the image to generate
feature maps.

The filter weights are initialized randomly and the output is computed. The network output is
compared with the desired output using loss function to compute the error rate. Based on the
error rate, in the back propagation, the weights are updated. The process is repeated for sufficient
number of iterations. The convolutional layer is followed by pooling layer whose function is to
reduce the size of the feature map thereby reducing the computational cost.

An activation function such as Rectified Linear Unit (ReLU) is applied after convolution which
increases the training speed. The ReLU activation function is expressed in the following equation.

� �� � > 0
���� � =
0 �� � < 0

The output of the convolutional and pooling layers represent high level features of the input
image. The fully connected layer uses these features to classify the input images into different
classes. It uses softmax activation function in the output layer which assigns probability to each
class.

Program for digit classification


import numpy as np
from matplotlib import pyplot as plt
from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
from keras import models
from keras import layers
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))
network.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255
from keras.utils import to_categorical
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
network.fit(train_images, train_labels,
validation_data=(test_images,test_labels),epochs=5, batch_size=128)
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
x = range(1, 10)
for n in x:
plt.subplot(9,1,n)
plt.imshow(train_images[n],cmap=plt.get_cmap('gray'))
plt.show()
Program for Plant leaf disease classification
from __future__ import print_function
from IPython import get_ipython
get_ipython().magic('reset -sf')
from keras.preprocessing.image import ImageDataGenerator, load_img
train_dir = "D:\\PSREC-ECE\\Machine learning concepts workshop\\ML workshop
programs\\training" (Give the path)
test_dir ="D:\\PSREC-ECE\\Machine learning concepts workshop\\ML workshop
programs\\testing" (Give the path)
nTrain = 32
nVal = 8
datagen = ImageDataGenerator(rescale=1./255)
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras import optimizers
from keras.layers import Conv2D,MaxPooling2D,Dense,Flatten,Dropout
from keras.layers.normalization import BatchNormalization
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten,\
Conv2D, MaxPooling2D,BatchNormalization,LayerNormalization
#from keras.backend import tf as ktf
from keras.preprocessing.image import ImageDataGenerator, load_img
model=Sequential()

model.add(Conv2D(filters=16, input_shape=(224,224,3), kernel_size=(3,3),


padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2), padding='same'))
model.add(Conv2D(filters=32, kernel_size=(3,3), padding='same'))
model.add(Activation('relu'))
# Pooling
model.add(MaxPooling2D(pool_size=(2,2), padding='same'))
# Passing it to a dense layer
model.add(Flatten())
# 1st Dense Layer
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dense(4))
model.add(Activation('softmax'))
model.summary()
from keras.preprocessing.image import ImageDataGenerator
batch_size = 2
train_generator = datagen.flow_from_directory(
directory=train_dir,
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical',
shuffle='True')
validation_generator = datagen.flow_from_directory(
directory=test_dir,
target_size=(224, 224),
batch_size=batch_size,
class_mode='categorical',
shuffle='True')
model.compile(optimizer=optimizers.Adam(lr=2e-4),
loss='binary_crossentropy',
metrics=['acc'])
history = model.fit(train_generator,
batch_size=2,
epochs=20,
validation_data=(validation_generator))

Output:
Result:
Expt. 11 Implementation of prediction of heart disease with machine learning Date:

Aim:
To implement prediction of heart disease with machine learning algorithm
Theory:
Program:
Output:

Result:
Expt. 12 Implementation of machine learning project for any real world dataset Date:

Aim:
To implement machine learning algorithm project for any real world dataset
Theory:
Program:
Output:

Result:

You might also like