ML Lab Manual Session
ML Lab Manual Session
Year : 3rdYear
INDEX
S.NO CONTENTS CO PAGE
NO.
1 VISION/MISION 4
2. PEO 4
3. PO’s 5
4. MAPPING OF PEO& PO 6
7. SYLLABUS 8
8. BOOKS 9
9. INSTRUCTIONAL METHODS 9
14. EXPERIMENTS 12
2
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
examples.
Exp: -3 Objectives:-Write a program to demonstrate the working of the decision 1 16
tree based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample
3
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
VISION:
To become renowned Centre of excellence in computer science and engineering and make
competent engineers & professionals with high ethical values prepared for lifelong learning.
MISSION:
To impart outcome based education for emerging technologies in the field of computer
science and engineering.
To provide opportunities for interaction between academia and industry.
To provide platform for lifelong learning by accepting the change in technologies.
To develop aptitude of fulfilling social responsibilities.
2. PEO
4
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
3. PROGRAM OUTCOMES
1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and Computer Science & Engineering specialization to the solution of
complex Computer Science & Engineering problems.
2. Problem analysis: Identify, formulate, research literature, and analyze complex
Computer Science and Engineering problems reaching substantiated conclusions
using first principles of mathematics, natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex Computer Science
and Engineering problems and design system components or processes that meet the
specified needs with appropriate consideration for the public health and safety, and
the cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of Computer Science and Engineering
experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
Computer Science Engineering activities with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional Computer Science and Engineering
practice.
7. Environment and sustainability: Understand the impact of the professional
Computer Science and Engineering solutions in societal and environmental contexts,
and demonstrate the knowledge of, and need for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the Computer Science and Engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member
or leader in diverse teams, and in multidisciplinary settings in Computer Science and
Engineering.
10. Communication: Communicate effectively on complex Computer Science and
Engineering activities with the engineering community and with society at large, such
as, being able to comprehend and write effective reports and design documentation,
make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of
the Computer Science and Engineering and management principles and apply these to
one’s own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change in Computer Science and Engineering.
5
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
1 2 3 4 5 6 7 8 9 10 11 12
I 3 2 1 3 3 2 1 1 2 1 2 2
II 3 3 2 3 3 2 1 1 1 1 2 2
III 2 2 2 2 1 1 2 3 3 2 2 1
IV 2 1 1 1 1 1 2 2 2 2 1 2
V 3 2 1 2 1 2 1 1 2 1 1 1
5. COURSE OUTCOMES
CO2. Apply appropriate data sets to train and implement learning algorithms.
6
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
MAPPING OF CO & PO
CO-PO Mapping
ML LAB 6CS4-22
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1: Understand the
implementation
procedures for machine
learning algorithms. 3 3 3 3 3 2 1 1 1 2 3 3
CO2: Applyappropriate
data setsto train
andimplement
learningalgorithms. 3 3 3 2 2 1 1 1 1 2 2 3
CO3:Implementmachine
learningalgorithms to
solvereal world
problems 3 3 3 3 3 2 1 2 2 2 3 3
PSO1: Ability to interpret and analyze network specific and cyber security issues,
automation in real word environment.
PSO2: Ability to Design and Develop Mobile and Web-based applications under realistic
constraints.
CO-PSO Mapping
CO’s PSO1 PSO2
7
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
7. SYLLABUS
6CS4-22 MACHINE LEARNING LAB
Class: VI Sem. B.Tech. Evaluation
Branch: Computer Science & Engineering Examination Time = Two (2) Hours
Schedule(Per Week Practical Hrs.): Three (3) Maximum Marks = 75
[Sessional /Mid-term (45) & End-term (30)
Objectives: At the end of the semester, the students should have clearly
understood and implemented the following:
List of exercises:
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file
2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.
3. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample
5. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
6. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.
7. Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
You can use Java/Python ML library classes/API.
8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.
8
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
10. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
Outcomes:
At the end of the semester, the students should have clearly understood and implemented the
following:
• Perform the programming by writing programs in python
• Performed & implemented machine learning algorithms
8. BOOKS
9. INSTRUCTIONAL METHODS:-
9.1. Direct Instructions:
I. White board presentation
9.3.Indirect Instructions:
I. Problem solving
9
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.
DON’TS
1. No one is allowed to bring storage devices like Pan Drive /Floppy etc. in the
lab.
2. Don’t mishandle the system.
3. Don’t leave the system on standing for long
4. Don’t bring any external material in the lab.
5. Don’t make noise in the lab.
6. Don’t bring the mobile in the lab. If extremely necessary then keep ringers off.
7. Don’t enter in the lab without permission of lab Incharge.
8. Don’t litter in the lab.
9. Don’t delete or make any modification in system files.
10. Don’t carry any lab equipments outside the lab.
We need your full support and cooperation for smooth functioning of the Lab
10
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
All the students are supposed to prepare the theory regarding the next program.
Students are supposed to bring the practical file and the lab copy.
Previous programs should be written in the practical file.
Any student not following these instructions will be denied entry in the lab.
11
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
14. EXPERIMENTS
EXPERIMENT 1
AIM: Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file
Find-s Algorithm :
1. Load Data set
2. Initialize h to the most specific hypothesis in H
3. For each positive training instance x
• For each attribute constraint ai in h
If the constraint ai in h is satisfied by x then do nothing
else replace ai in h by the next more general constraint that is satisfied by x
4. Output hypothesis h
Source Code:
importcsv
with open('tennis.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list:
print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
12
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Output:
13
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 2
AIM: For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with
the training examples.
Candidate-Elimination Algorithm:
Source Code:
importnumpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
14
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1) print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
returnspecific_h, general_h
Output:
initialization of specific_h and general_h
['Cloudy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['?' '?' '?' 'Strong' '?' '?']
Final General_h:
[['?', '?', '?', 'Strong', '?', '?']]
15
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 3
AIM: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample
ID3 Algorithm:
Following terminologies are used in this algorithm
Entropy : Entropy is a measure of impurity
It is defined for a binary class with values a/b as:
Entropy = - p(a)*log(p(a)) – p(b)*log(p(b))
Information Gain : measuring the expected reduction in Entropy
Gain(S,A)= Entropy(S) - Sum for v from 1 to n of (|Sv|/|S|) * Entropy(Sv)
THE PROCEDURE
1) In the ID3 algorithm, begin with the original set of attributes as the root node.
2) On each iteration of the algorithm, iterate through every unused attribute of the remaining set and
calculates the entropy (or information gain) of that attribute.
3) Then, select the attribute which has the smallest entropy (or largest information gain) value.
4) The set of remaining attributes is then split by the selected attribute to produce subsets of the data.
5) The algorithm continues to recurse on each subset, considering only attributes never selected before.
Source Code:
importnumpy as np
import math
fromdata_loader import read_data
class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""
def __str__(self):
returnself.attribute
for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1
for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)
def entropy(S):
items = np.unique(S)
ifitems.size == 1:
return 0
for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)
defgain_ratio(data, col):
items, dict = subtables(data, col, delete=False)
total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))
for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
17
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
for x in range(entropies.shape[0]):
total_entropy -= entropies[x]
returntotal_entropy / iv
defcreate_node(data, metadata):
#TODO: If information gain is zero?
if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node
split = np.argmax(gains)
node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)
for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))
return node
def empty(size):
s = ""
for x in range(size):
s += " "
return s
defprint_tree(node, level):
if node.answer != "":
print(empty(level), node.answer)
18
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
return
print(empty(level), node.attribute)
Output:
outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes
OR
import pandas as pd
importnumpy as np
dataset= pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])
def entropy(target_col):
elements,counts = np.unique(target_col,return_counts = True)
entropy = np.sum([(-counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
19
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
defInfoGain(data,split_attribute_name,target_name="class"):
total_entropy = entropy(data[target_name])
vals,counts= np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy=np.sum([(counts[i]/np.sum(counts))*entropy(data.where(data[split_attribute_name]==
vals[i]).dr opna()[target_name]) for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
returnInformation_Gain
tree = ID3(dataset,dataset,dataset.columns[:-1])
print(' \nDisplay Tree\n',tree)
Output:
Display Tree
{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'humidity': {'High':
'No', 'Normal': 'Yes'}}}}
20
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 4
AIM: Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.
Source Code:
Below is a small contrived dataset that we can use to test out training our neural network.
X1 X2 Y
2.7810836 2.550537003 0
1.465489372 2.362125076 0
3.396561688 4.400293529 0
1.38807019 1.850220317 0
3.06407232 3.005305973 0
7.627531214 2.759262235 1
5.332441248 2.088626775 1
6.922596716 1.77106367 1
8.675418651 -0.242068655 1
21
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
7.673756466 3.508563011 1
Below is the complete example. We will use 2 neurons in the hidden layer. It is a binary
classification problem (2 classes) so there will be two neurons in the output layer. The network
will be trained for 20 epochs with a learning rate of 0.5, which is high because we are training
for so few iterations.
import random
from math import exp from random
import seed
# Initialize a network
definitialize_network(n_inputs, n_hidden, n_outputs): network = list()
hidden_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_ inputs + 1)]} for i in
range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_ hidden + 1)]} for i in
range(n_outputs)]
network.append(output_layer)
return network
defbackward_propagate_error(network, expected):
for i in reversed(range(len(network))): layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)): error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta']) errors.append(error)
else:
for j in range(len(layer)): neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)): neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])
[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset])) network =
initialize_network(n_inputs, 2, n_outputs) train_network(network, dataset,
0.5, 20, n_outputs)
i= 1
for layer in network: j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub) j=j+1
i=i+1
Output:
>epoch=0, lrate=0.500,error=4.763
>epoch=1, lrate=0.500,error=4.558
>epoch=2, lrate=0.500,error=4.316
>epoch=3, lrate=0.500,error=4.035
>epoch=4, lrate=0.500,error=3.733
>epoch=5, lrate=0.500,error=3.428
>epoch=6, lrate=0.500,error=3.132
>epoch=7, lrate=0.500,error=2.850
>epoch=8, lrate=0.500,error=2.588
>epoch=9, lrate=0.500,error=2.348
>epoch=10, lrate=0.500,error=2.128
>epoch=11, lrate=0.500,error=1.931
>epoch=12, lrate=0.500,error=1.753
>epoch=13, lrate=0.500,error=1.595
>epoch=14, lrate=0.500,error=1.454
>epoch=15, lrate=0.500,error=1.329
>epoch=16, lrate=0.500,error=1.218
>epoch=17, lrate=0.500,error=1.120
>epoch=18, lrate=0.500,error=1.033
>epoch=19, lrate=0.500,error=0.956
Layer[1] Node[1]:
{'weights': [-1.435239043819221, 1.8587338175173547, 0.7917644224148094],
'output': 0.029795197360175857, 'delta': -0.006018730117768358}
Layer[1] Node[2]:
24
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Layer[2] Node[1]:
{'weights': [2.223584933362892, 1.2428928053374768, -1.3519548925527454],
'output': 0.23499833662766154, 'delta': -0.042246618795029306}
Layer[2] Node[2]:
{'weights': [-2.509732251870173, -0.5925943219491905, 1.259965727484093],
'output': 0.7543931062537561, 'delta': 0.04550706392557862}
Predict
Making predictions with a trained neural network is easy enough. We have already seen
how to forward- propagate an input pattern to get an output. This is all we need to do to
make a prediction. We can use the output values themselves directly as the probability of
a pattern belonging to each output class. It may be more useful to turn this output back
into a crisp class prediction. We can do this by selecting the class value with the larger
probability. This is also called the arg max function. Below is a function named predict()
that implements this procedure. It returns the index in the network output that has the
largest probability. It assumes that class values have been converted to integers starting at
0.
Output:
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
26
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 5
AIM: Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.
Bayesian Theorem:
Naive Bayes: For the Bayesian Rule above, we have to extend it so that we have
Bayes’ rule:
Since Naive Bayes assumes that the conditional probabilities of the independent variables
are statistically independent we can decompose the likelihood to a product of terms:
Using Bayes' rule above, we label a new case X with a class level Cj that achieves the highest posterior
probability.
27
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Naive Bayes can be modeled in several different ways including normal, lognormal, gamma and
Poisson density functions:
Types:
For example in Iris dataset features are sepal width, petal width, sepal length, petallength.
• MultinomialNaiveBayes:Itsisusedwhenwehavediscretedata(e.g.movieratingsranging1
and 5 as each rating will have certain frequency to represent). In text learning we have
the count of each word to predict the class orlabel
28
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
values. Means 0s can represent “word does not occur in the document” and 1s as "word
occurs in thedocument"
Source Code:
importcsv
import random import
math
# 1.Data Handling
# 1.1 Loading the Data from csv file of Pima indians diabetes dataset.
defloadcsv(filename):
lines = csv.reader(open(filename, "r")) dataset = list(lines)
for i in range(len(dataset)):
# converting the attributes from string to floating point numbers
dataset[i] = [float(x) for x in dataset[i]]
return dataset
#2.Summarize Data
#The naive bayes model is comprised of a #summary of the data in
the training dataset.
#This summary is then used when making predictions.
#involves the mean and the standard deviation for each attribute, by class value
29
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
1]]= []
separated[vector[-1]].append(vector)
return separated
#The mean is the central middle or central tendency of the data, # and we will use it as the
middle of our gaussian distribution # when calculatingprobabilities
#The standard deviation describes the variation of spread of the data, #and we will use it to characterize
the expected spread of each attribute #in our Gaussian distribution when calculating probabilities.
def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(
*dataset)]
del summaries[-1]
return summaries
defsummarizeByClass(dataset): separated =
separateByClass(dataset) summaries ={}
forclassValue, instances in separated.items(): summaries[classValue] =
summarize(instances)
return summaries
#3.Make Prediction
#3.1 Calculate Probaility Density Function
defcalculateProbability(x, mean, stdev):
30
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent
#3.3 Prediction : look for the largest probability and return the associated class
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector) bestLabel, bestProb = None, -
1
forclassValue, probability in probabilities.items():
ifbestLabel is None or probability >bestProb: bestProb =probability
bestLabel =classValue
returnbestLabel
#4.MakePredictions
# Function which return predictions for list of predictions # For eachinstance
#5. ComputingAccuracy
defgetAccuracy(testSet, predictions): correct = 0
for i in range(len(testSet)):
iftestSet[i][-1] == predictions[i]: correct += 1
return (correct/float(len(testSet))) * 100.0
#Main Function
def main():
filename = 'C:\\Users\\Dr.Thyagaraju\\Desktop\\Data\\pima-indians-diab etes.csv'
splitRatio = 0.67
dataset = loadcsv(filename)
print("\n The Data Set Splitting into Training and Testing \n") trainingSet, testSet =
splitDataset(dataset, splitRatio)
# prepare model
summaries = summarizeByClass(trainingSet) print("\n Model
Summaries:\n",summaries)
# test model
predictions = getPredictions(summaries, testSet)
print("\nPredictions:\n",predictions)
Output:
32
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Model Summaries:
Predictions:
[0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0,0.0, 0.0, 0.0, 1.0, 1.0, 1.0,
0.0, 1.0, 0.0, 1.0, 0.0,1.0,1.0, 0.0, 0.0,0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,1.0, 1.0, 0.0,
0.0, 0.0, 0.0,
1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0,
1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0,
1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0,
1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,
0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0]
Accuracy:80.31496062992126%
33
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 6
AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.
Source Code:
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)
print(ytrain.shape)
Output:
31
41
50
60
70
80
90
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64
(5,) (13,) (5,) (13,)
Accuracy metrics
Accuracy of the classifer is 0.8
Confusion matrix
[[3 1] [0 1]]
Recall and Precison
1.0 0.5
36
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 7
AIM: Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.
Attribute Information: --
Only 14 used
-- 1. #3 (age)
-- 2. #4 (sex)
-- 3. #9 (cp)
-- 4. #10 (trestbps)
-- 5. #12 (chol)
-- 6. #16 (fbs)
-- 7. #19 (restecg)
-- 8. #32 (thalach)
-- 9. #38 (exang)
-- 10. #40 (oldpeak)
-- 11. #41 (slope)
-- 12. #44 (ca)
-- 13. #51 (thal)
-- 14. #58 (num)
Source Code:
importnumpy as np
fromurllib.request import urlopen
importurllib
37
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
import pandas as pd
frompgmpy.inference import VariableElimination
frompgmpy.models import BayesianModel
frompgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator
names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal',
'heartdisease']
heartDisease = pd.read_csv('heart.csv', names = names)
heartDisease = heartDisease.replace('?', np.nan)
model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
frompgmpy.inference import VariableElimination
HeartDisease_infer = VariableElimination(model)
Output:
╒════════════════╤════
│ heartdisease │ phi(heartdisease) │
╞══════════════════════
│ heartdisease_0 │ 0.5593 │
├─────────────────────┤
│ heartdisease_1 │ 0.4407 │
╘════════════════
38
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 8
AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on
the quality of clustering. You can add Java/Python ML library classes/API in the program.
K Means Algorithm :
1. Load Data Sets
2. Clusters the data into k groups where k is predefined.
3. Select k points at random as cluster centers.
4. Assign objects to their closest cluster center according to the Euclidean distance function
5. Calculate the centroid or mean of all objects in each cluster.
6. Repeat steps 3, 4 and 5 until the same points are assigned to each cluster in consecutive rounds.
39
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Source Code:
importnumpy as np
fromsklearn.cluster import KMeans
importmatplotlib.pyplot as plt
fromsklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
40
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
importmatplotlib.pyplot as plt1
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1],
color='black')
Output:
41
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]
mean:
[[57.70629058 25.73574491]
[52.12044022 22.46250453]
[46.4364858 39.43288647]]
Covariances
[[[83.51878796 14.926902 ]
[14.926902 2.70846907]]
[[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938]
[29.55835938 18.17157304]]]
[[71.24 28. ]
[52.53 25. ]
[64.54 27. ]
[55.69 22. ]
[54.58 25. ]
[41.91 10. ]
[58.64 20. ]
[52.02 8. ]
[31.25 34. ]
[44.31 19. ]
[49.35 40. ]
[58.07 45. ]
[44.22 22. ]
[55.73 19. ]
[46.63 43. ]
[52.97 32. ]
[46.25 35. ]
[51.55 27. ]
[57.05 26. ]
[58.45 30. ]
[43.42 23. ]
[55.68 37. ]
[55.15 18. ]
42
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
43
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 9
AIM: Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.
Algorithm:
Principle: points (documents) that are close in the space belong to the same class
44
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor
45
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Distance Metrics
46
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Source Code:
classifier.fit(X_train,y_train)
#predict the test resuts
y_pred=classifier.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worngpredicition",(1-accuracy_score(y_test,y_pred)))
Output:
Accuracy Metrics
precision recall f1-score support
correctpredicition 0.9736842105263158
worngpredicition 0.02631578947368418
47
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
EXPERIMENT 10
AIM: Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
Regression:
Regression is a technique from statistics that is used to predict values of a desired target quantity
when the target quantity is continuous.
In regression, we seek to identify (or estimate) a continuous variable y associated with a given input
vector x.
y is called the dependentvariable.
x is called the independentvariable.
Lowess Algorithm:Locally weighted regression is a very powerful non- parametric model used
in statistical learning .Given a dataset X, y, we attempt to find a model parameter β(x) that
minimizes residual sum of weighted squared errors. The weights are given by a kernel function(k
or w) which can be chosen arbitrarily
48
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Algorithm
Read the Given data Sample to X and the curve (linear or non linear) toY
Set the value for Smoothening parameter or Free parameter sayτ
Set the bias /Point of interest set X0 which is a subset ofX
Determine the weight matrix using:
Prediction =x0*β
49
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.
Algorithm
Read the Given data Sample to X and the curve (linear or non linear) toY
Set the value for Smoothening parameter or Free parameter sayτ
Set the bias /Point of interest set X0 which is a subset ofX
Determine the weight matrix using:
Prediction =x0*β
50