0% found this document useful (0 votes)
7 views50 pages

ML Lab Manual Session

The document is a lab manual for the Machine Learning Lab (6CS4-22) at Jaipur Engineering College, outlining the vision, mission, program outcomes, and course outcomes for third-year Computer Science and Engineering students. It includes a detailed syllabus with various machine learning experiments, instructional methods, assessment criteria, and recommended books. The manual aims to equip students with practical skills in implementing machine learning algorithms and understanding their applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views50 pages

ML Lab Manual Session

The document is a lab manual for the Machine Learning Lab (6CS4-22) at Jaipur Engineering College, outlining the vision, mission, program outcomes, and course outcomes for third-year Computer Science and Engineering students. It includes a detailed syllabus with various machine learning experiments, instructional methods, assessment criteria, and recommended books. The manual aims to equip students with practical skills in implementing machine learning algorithms and understanding their applications in real-world scenarios.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

LAB MANUAL

Lab Name : Machine Learning Lab

Lab Code : 6CS4-22

Branch : Computer Science & Engineering

Year : 3rdYear

Jaipur Engineering College and Research Centre, Jaipur


Department of Computer Science & Engineering
(Rajasthan Technical University, KOTA)
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

INDEX
S.NO CONTENTS CO PAGE
NO.

1 VISION/MISION 4

2. PEO 4

3. PO’s 5

4. MAPPING OF PEO& PO 6

5. COURSE OUTCOMES & CO-PO MAPPING 6-7

6. PROGRAM SPECIFIC OUTCOMES & PSO-PO MAPPING 7

7. SYLLABUS 8

8. BOOKS 9

9. INSTRUCTIONAL METHODS 9

10. LEARNING MATERIALS 9

11. ASSESSMENT OF OUTCOMES 10

12. OUTCOMES WILL BE ACHIEVED THROUGH 10

13. DO’s, DONT’s and INSTRUCTIONS 10

14. EXPERIMENTS 12

Exp: - 1 Objectives:-Implement and demonstrate the FIND-S algorithm for finding 1 12


the most specific hypothesis based on a given set of training data samples.
Read the training data from a .CSV file
Exp: - 2 Objectives:-For a given set of training data examples stored in a .CSV 1 14
file, implement and demonstrate the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with the training

2
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

examples.
Exp: -3 Objectives:-Write a program to demonstrate the working of the decision 1 16
tree based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample

Exp: -4 Objectives:- Build an Artificial Neural Network by implementing the 2 21


Backpropagation algorithm and test the same using appropriate data sets

Exp: -5 Objectives:-Write a program to implement the naïve Bayesian classifier 2 27


for a sample training data set stored as a .CSV file. Compute the accuracy
of the classifier, considering few test data sets.

Exp: -6 Objectives:-Assuming a set of documents that need to be classified, use 2 34


the naïve Bayesian Classifier model to perform this task. Built-in Java
classes/API can be used to write the program. Calculate the accuracy,
precision, and recall for your data set.

Exp: -7 Objectives:-Write a program to construct a Bayesian network considering 3 37


medical data. Use this model to demonstrate the diagnosis of heart patients
using standard Heart Disease Data Set. You can use Java/Python ML
library classes/API.

Exp: -8 Objectives:-Apply EM algorithm to cluster a set of data stored in a .CSV 3 39


file. Use the same data set for clustering using k-Means algorithm.
Compare the results of these two algorithms and comment on the quality
of clustering. You can add Java/Python ML library classes/API in the
program.

Exp: -9 Objectives:-Write a program to implement k-Nearest Neighbour 3 44


algorithm to classify the iris data set. Print both correct and wrong
predictions. Java/Python ML library classes can be used for this problem.

Exp: -10 Objectives:-Implement the non-parametric Locally Weighted Regression 3 48


algorithm in order to fit data points. Select appropriate data set for your
experiment and draw graphs.

3
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

JAIPUR ENGINEERING COLLEGE AND RESEARCH CENTRE


Department of Computer Science and Engineering
Branch: Computer Science and Engineering Semester: 6th
Course Name: MACHINE LEARNING LAB Code: 6CS4-22
External Marks: 30 Practical hrs: 3hr/week
Internal Marks: 45 Total Marks: 75

1. VISION & MISSION

VISION:
To become renowned Centre of excellence in computer science and engineering and make
competent engineers & professionals with high ethical values prepared for lifelong learning.

MISSION:
 To impart outcome based education for emerging technologies in the field of computer
science and engineering.
 To provide opportunities for interaction between academia and industry.
 To provide platform for lifelong learning by accepting the change in technologies.
 To develop aptitude of fulfilling social responsibilities.

2. PEO

1. PEO1: To provide students with the fundamentals of Engineering Sciences with


more emphasis in Computer Science & Engineering by way of analyzing and
exploiting engineering challenges.
2. PEO2: To train students with good scientific and engineering knowledge so as to
comprehend, analyze, design, and create novel products and solutions for the real life
problems in Computer Science and Engineering
3. PEO3: To inculcate professional and ethical attitude, effective communication skills,
teamwork skills, multidisciplinary approach, entrepreneurial thinking and an ability to
relate engineering issues with social issues for Computer Science & Engineering.
4. PEO4: To provide students with an academic environment aware of excellence,
leadership, written ethical codes and guidelines, and the self-motivated life-long
learning needed for a successful professional career in Computer Science &
Engineering.
5. PEO5: To prepare students to excel in Industry and Higher education by Educating
Students along with High moral values and Knowledge in Computer Science &
Engineering.

4
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

3. PROGRAM OUTCOMES
1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and Computer Science & Engineering specialization to the solution of
complex Computer Science & Engineering problems.
2. Problem analysis: Identify, formulate, research literature, and analyze complex
Computer Science and Engineering problems reaching substantiated conclusions
using first principles of mathematics, natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex Computer Science
and Engineering problems and design system components or processes that meet the
specified needs with appropriate consideration for the public health and safety, and
the cultural, societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and
research methods including design of Computer Science and Engineering
experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and
modern engineering and IT tools including prediction and modeling to complex
Computer Science Engineering activities with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to
assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional Computer Science and Engineering
practice.
7. Environment and sustainability: Understand the impact of the professional
Computer Science and Engineering solutions in societal and environmental contexts,
and demonstrate the knowledge of, and need for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the Computer Science and Engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member
or leader in diverse teams, and in multidisciplinary settings in Computer Science and
Engineering.
10. Communication: Communicate effectively on complex Computer Science and
Engineering activities with the engineering community and with society at large, such
as, being able to comprehend and write effective reports and design documentation,
make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of
the Computer Science and Engineering and management principles and apply these to
one’s own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to
engage in independent and life-long learning in the broadest context of technological
change in Computer Science and Engineering.

5
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

4. MAPPING OF PEOs & POs

PEO PROGRAM OUTCOMES

1 2 3 4 5 6 7 8 9 10 11 12

I 3 2 1 3 3 2 1 1 2 1 2 2

II 3 3 2 3 3 2 1 1 1 1 2 2

III 2 2 2 2 1 1 2 3 3 2 2 1

IV 2 1 1 1 1 1 2 2 2 2 1 2

V 3 2 1 2 1 2 1 1 2 1 1 1

5. COURSE OUTCOMES

Graduates would be able to:


CO1. Understand the implementation procedures for machine learning algorithms.

CO2. Apply appropriate data sets to train and implement learning algorithms.

CO3. Implement machine learning algorithms to solve real world problems

6
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

MAPPING OF CO & PO

CO-PO Mapping
ML LAB 6CS4-22

PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
CO1: Understand the
implementation
procedures for machine
learning algorithms. 3 3 3 3 3 2 1 1 1 2 3 3
CO2: Applyappropriate
data setsto train
andimplement
learningalgorithms. 3 3 3 2 2 1 1 1 1 2 2 3
CO3:Implementmachine
learningalgorithms to
solvereal world
problems 3 3 3 3 3 2 1 2 2 2 3 3

6. PROGRAM SPECIFIC OUTCOME (PSO):

PSO1: Ability to interpret and analyze network specific and cyber security issues,
automation in real word environment.
PSO2: Ability to Design and Develop Mobile and Web-based applications under realistic
constraints.

CO-PSO Mapping
CO’s PSO1 PSO2

CO1: Understand the implementation procedures for machine 2 2


learning algorithms.

CO2: Apply appropriate data sets to train and implement learning 2 1


algorithms.

CO3: Implement machine learning algorithms to solve real world 2 2


problems

7
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

7. SYLLABUS
6CS4-22 MACHINE LEARNING LAB
Class: VI Sem. B.Tech. Evaluation
Branch: Computer Science & Engineering Examination Time = Two (2) Hours
Schedule(Per Week Practical Hrs.): Three (3) Maximum Marks = 75
[Sessional /Mid-term (45) & End-term (30)

Objectives: At the end of the semester, the students should have clearly
understood and implemented the following:

List of exercises:
1. Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a .CSV
file

2. For a given set of training data examples stored in a .CSV file, implement and
demonstrate the Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.

3. Write a program to demonstrate the working of the decision tree based ID3 algorithm.
Use an appropriate data set for building the decision tree and apply this knowledge to classify
a new sample

4. Build an Artificial Neural Network by implementing the Backpropagation algorithm and


test the same using appropriate data sets.

5. Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

6. Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier
model to perform this task. Built-in Java classes/API can be used to write the program.
Calculate the accuracy, precision, and recall for your data set.

7. Write a program to construct a Bayesian network considering medical data. Use this
model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
You can use Java/Python ML library classes/API.

8. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in
the program.

8
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

9. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.

10. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

Outcomes:
At the end of the semester, the students should have clearly understood and implemented the
following:
• Perform the programming by writing programs in python
• Performed & implemented machine learning algorithms

8. BOOKS

Text and Reference books

1. Tom Mitchell: Machine Learning ISBN 0070428077, McGraw Hill,1997.


2. Understanding Machine Learning: From Theory to Algorithms, Cambridge
University Press, 2014.
3. Python Machine Learning by Sebastian Raschka, ISBN 978-1-78355-513-0,
Packtlib
4. NPTEL Video Lectures

9. INSTRUCTIONAL METHODS:-
9.1. Direct Instructions:
I. White board presentation

9.2. Interactive Instruction:


I. Algorithms

9.3.Indirect Instructions:
I. Problem solving

10. LEARNING MATERIALS:-


1. Text/Lab Manual
2. https://www.jecrcfoundation.com/student-corner/lab-videos

9
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic Year
2020-21
RIICO Jaipur- 302 022.

11. ASSESSMENT OF OUTCOMES:-


1. End term Practical exam (Conducted by RTU, KOTA)
2. Daily Lab interaction.

12. OUTCOMES WILL BE ACHIEVED THROUGH FOLLOWING:-


1. Lab Teaching (through marker and white board).
2. Discussion on Algorithms.
3. Lab Experiment Execution.

13. INSTRUCTIONS OF LAB


DO’s
1. Please switch off the Mobile/Cell phone before entering Lab.
2. Enter the Lab with complete source code and data.
3. Check whether all peripheral are available at your desktop before proceeding for
program.
4. Intimate the lab In charge whenever you are incompatible in using the system or
in case software get corrupted/ infected by virus.
5. Arrange all the peripheral and seats before leaving the lab.
6. Properly shutdown the system before leaving the lab.
7. Keep the bag outside in the racks.
8. Enter the lab on time and leave at proper time.
9. Maintain the decorum of the lab.
10. Utilize lab hours in the corresponding experiment.
11. Get your Cd / Pen drive checked by lab In charge before using it in the lab.

DON’TS
1. No one is allowed to bring storage devices like Pan Drive /Floppy etc. in the
lab.
2. Don’t mishandle the system.
3. Don’t leave the system on standing for long
4. Don’t bring any external material in the lab.
5. Don’t make noise in the lab.
6. Don’t bring the mobile in the lab. If extremely necessary then keep ringers off.
7. Don’t enter in the lab without permission of lab Incharge.
8. Don’t litter in the lab.
9. Don’t delete or make any modification in system files.
10. Don’t carry any lab equipments outside the lab.
We need your full support and cooperation for smooth functioning of the Lab

10
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

INSTRUCTIONS FOR STUDENT

BEFORE ENTERING IN THE LAB

 All the students are supposed to prepare the theory regarding the next program.
 Students are supposed to bring the practical file and the lab copy.
 Previous programs should be written in the practical file.
 Any student not following these instructions will be denied entry in the lab.

WHILE WORKING IN THE LAB


 Adhere to experimental schedule as instructed by the lab incharge.
 Get the previously executed program signed by the instructor.
 Get the output of the current program checked by the instructor in the lab copy.
 Each student should work on his/her assigned computer at each turn of the lab.
 Take responsibility of valuable accessories.
 Concentrate on the assigned practical and do not play games.
 If anyone caught red handed carrying any equipment of the lab, then he will have to face serious
consequences.

11
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

14. EXPERIMENTS

EXPERIMENT 1

AIM: Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file

Find-s Algorithm :
1. Load Data set
2. Initialize h to the most specific hypothesis in H
3. For each positive training instance x
• For each attribute constraint ai in h
If the constraint ai in h is satisfied by x then do nothing
else replace ai in h by the next more general constraint that is satisfied by x
4. Output hypothesis h

Source Code:
importcsv
with open('tennis.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
h = [['0', '0', '0', '0', '0', '0']]
for i in your_list:
print(i)
if i[-1] == "True":
j=0
for x in i:
if x != "True":
if x != h[0][j] and h[0][j] == '0':
h[0][j] = x
elif x != h[0][j] and h[0][j] != '0':
h[0][j] = '?'
else:
pass
j=j+1
12
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

print("Most specific hypothesis is")


print(h)

Output:

'Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same',True


'Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same',True
'Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change',False
'Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change',True

Maximally Specific set


[['Sunny', 'Warm', '?', 'Strong', '?', '?']]

13
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 2

AIM: For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with
the training examples.
Candidate-Elimination Algorithm:

1. Load data set


2. G <-maximally general hypotheses in H
3. S <- maximally specific hypotheses in H
4. For each training example d= <x,c(x)>
Case 1 : If d is a positive example
Remove from G any hypothesis that is inconsistent with d
For each hypothesis s in S that is not consistent with d
• Remove s from S.
• Add to S all minimal generalizations h of s such that
• h consistent with d
• Some member of G is more general than h
• Remove from S any hypothesis that is more general than another hypothesis in S
Case 2: If d is a negative example
Remove from S any hypothesis that is inconsistent with d
For each hypothesis g in G that is not consistent with d
*Remove g from G.
*Add to G all minimal specializations h of g such that
o h consistent with d
o Some member of S is more specific than h
• Remove from G any hypothesis that is less general than another hypothesis in G

Source Code:
importnumpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('finds1.csv'))
concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "Yes":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
14
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "No":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1) print("Specific_h ",i+1,"\n ")
print(specific_h)
print("general_h ", i+1, "\n ")
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])

returnspecific_h, general_h

s_final, g_final = learn(concepts, target)


print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")

Output:
initialization of specific_h and general_h
['Cloudy' 'Cold' 'High' 'Strong' 'Warm' 'Change']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]
steps of Candidate Elimination Algorithm 8
Specific_h 8
['?' '?' '?' 'Strong' '?' '?']
general_h 8
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', 'Strong', '?', '?'], ['?', '?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?', '?']]
Final Specific_h:
['?' '?' '?' 'Strong' '?' '?']
Final General_h:
[['?', '?', '?', 'Strong', '?', '?']]
15
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 3

AIM: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample

ID3 Algorithm:
Following terminologies are used in this algorithm
 Entropy : Entropy is a measure of impurity
It is defined for a binary class with values a/b as:
Entropy = - p(a)*log(p(a)) – p(b)*log(p(b))
 Information Gain : measuring the expected reduction in Entropy
Gain(S,A)= Entropy(S) - Sum for v from 1 to n of (|Sv|/|S|) * Entropy(Sv)
THE PROCEDURE
1) In the ID3 algorithm, begin with the original set of attributes as the root node.
2) On each iteration of the algorithm, iterate through every unused attribute of the remaining set and
calculates the entropy (or information gain) of that attribute.
3) Then, select the attribute which has the smallest entropy (or largest information gain) value.
4) The set of remaining attributes is then split by the selected attribute to produce subsets of the data.
5) The algorithm continues to recurse on each subset, considering only attributes never selected before.

Source Code:
importnumpy as np
import math
fromdata_loader import read_data

class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""

def __str__(self):
returnself.attribute

defsubtables(data, col, delete):


dict = {}
16
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

items = np.unique(data[:, col])


count = np.zeros((items.shape[0], 1), dtype=np.int32)

for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1

for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)

return items, dict

def entropy(S):
items = np.unique(S)

ifitems.size == 1:
return 0

counts = np.zeros((items.shape[0], 1))


sums = 0

for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:


sums += -1 * count * math.log(count, 2)
return sums

defgain_ratio(data, col):
items, dict = subtables(data, col, delete=False)

total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))

for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
17
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

entropies[x] = ratio * entropy(dict[items[x]][:, -1])


intrinsic[x] = ratio * math.log(ratio, 2)

total_entropy = entropy(data[:, -1])


iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):
total_entropy -= entropies[x]

returntotal_entropy / iv

defcreate_node(data, metadata):
#TODO: If information gain is zero?

if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node

gains = np.zeros((data.shape[1] - 1, 1))

for col in range(data.shape[1] - 1):


gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))

return node

def empty(size):
s = ""
for x in range(size):
s += " "
return s
defprint_tree(node, level):
if node.answer != "":
print(empty(level), node.answer)
18
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

return

print(empty(level), node.attribute)

for value, n in node.children:


print(empty(level + 1), value)
print_tree(n, level + 2)

metadata, traindata = read_data("tennis.data")


data = np.array(traindata)
node = create_node(data, metadata)
print_tree(node, 0)

Output:

outlook
overcast
b'yes'
rain
wind
b'strong'
b'no'
b'weak'
b'yes'
sunny
humidity
b'high'
b'no'
b'normal'
b'yes

OR

import pandas as pd
importnumpy as np

dataset= pd.read_csv('playtennis.csv',names=['outlook','temperature','humidity','wind','class',])

def entropy(target_col):
elements,counts = np.unique(target_col,return_counts = True)
entropy = np.sum([(-counts[i]/np.sum(counts))*np.log2(counts[i]/np.sum(counts)) for i in
range(len(elements))])
return entropy
19
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

defInfoGain(data,split_attribute_name,target_name="class"):
total_entropy = entropy(data[target_name])
vals,counts= np.unique(data[split_attribute_name],return_counts=True)
Weighted_Entropy=np.sum([(counts[i]/np.sum(counts))*entropy(data.where(data[split_attribute_name]==
vals[i]).dr opna()[target_name]) for i in range(len(vals))])
Information_Gain = total_entropy - Weighted_Entropy
returnInformation_Gain

def ID3(data,originaldata,features,target_attribute_name="class",parent_node_class = None):


iflen(np.unique(data[target_attribute_name])) <= 1:
returnnp.unique(data[target_attribute_name])[0]
eliflen(data)==0:
return
np.unique(originaldata[target_attribute_name])[np.argmax(np.unique(originaldata[target_attribut
e_name],return_counts=True)[1])]
eliflen(features) ==0:
returnparent_node_class
else:
parent_node_class=np.unique(data[target_attribute_name])[np.argmax(np.unique(data[target_attribu
te_name],return _counts=True)[1])]
item_values = [InfoGain(data,feature,target_attribute_name) for feature in features] #Return the
information gain values for the features in the dataset
best_feature_index = np.argmax(item_values)
best_feature = features[best_feature_index]
tree = {best_feature:{}}
features = [i for i in features if i != best_feature]
for value in np.unique(data[best_feature]):
value = value
sub_data = data.where(data[best_feature] == value).dropna()
subtree = ID3(sub_data,dataset,features,target_attribute_name,parent_node_class)
tree[best_feature][value] = subtree
return(tree)

tree = ID3(dataset,dataset,dataset.columns[:-1])
print(' \nDisplay Tree\n',tree)

Output:

Display Tree
{'outlook': {'Overcast': 'Yes', 'Rain': {'wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'humidity': {'High':
'No', 'Normal': 'Yes'}}}}

20
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 4

AIM: Build an Artificial Neural Network by implementing the Backpropagation algorithm and test
the same using appropriate data sets.

Source Code:
Below is a small contrived dataset that we can use to test out training our neural network.

X1 X2 Y
2.7810836 2.550537003 0
1.465489372 2.362125076 0
3.396561688 4.400293529 0
1.38807019 1.850220317 0
3.06407232 3.005305973 0
7.627531214 2.759262235 1
5.332441248 2.088626775 1
6.922596716 1.77106367 1
8.675418651 -0.242068655 1

21
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

7.673756466 3.508563011 1
Below is the complete example. We will use 2 neurons in the hidden layer. It is a binary
classification problem (2 classes) so there will be two neurons in the output layer. The network
will be trained for 20 epochs with a learning rate of 0.5, which is high because we are training
for so few iterations.

import random
from math import exp from random
import seed

# Initialize a network
definitialize_network(n_inputs, n_hidden, n_outputs): network = list()
hidden_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_ inputs + 1)]} for i in
range(n_hidden)]
network.append(hidden_layer)
output_layer = [{'weights':[random.uniform(-0.5,0.5) for i in range(n_ hidden + 1)]} for i in
range(n_outputs)]
network.append(output_layer)
return network

# Calculate neuron activation for an input


def activate(weights, inputs): activation =
weights[-1]
for i in range(len(weights)-1): activation += weights[i] *
inputs[i]
return activation

# Transfer neuron activation


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output


defforward_propagate(network, row): inputs = row
for layer in network: new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs) neuron['output'] =
transfer(activation) new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs

# Calculate the derivative of an neuron output


deftransfer_derivative(output):
return output * (1.0 - output)

# Backpropagate error and store in neurons


22
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

defbackward_propagate_error(network, expected):
for i in reversed(range(len(network))): layer = network[i]
errors = list()
if i != len(network)-1:
for j in range(len(layer)): error = 0.0
for neuron in network[i + 1]:
error += (neuron['weights'][j] * neuron['delta']) errors.append(error)
else:
for j in range(len(layer)): neuron = layer[j]
errors.append(expected[j] - neuron['output'])
for j in range(len(layer)): neuron = layer[j]
neuron['delta'] = errors[j] * transfer_derivative(neuron['output'])

# Update network weights with error


defupdate_weights(network, row, l_rate):
for i in range(len(network)): inputs = row[:-1]
if i != 0:
inputs = [neuron['output'] for neuron in network[i - 1]]
for neuron in network[i]:
for j in range(len(inputs)):
neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j]
neuron['weights'][-1] += l_rate * neuron['delta']

# Train a network for a fixed number of epochs


deftrain_network(network, train, l_rate, n_epoch, n_outputs):
for epoch in range(n_epoch): sum_error = 0
for row in train:
outputs = forward_propagate(network, row) expected = [0 for i
in range(n_outputs)] expected[row[-1]] = 1
sum_error += sum([(expected[i]-outputs[i])**2 for i in range(l en(expected))])
backward_propagate_error(network, expected)
update_weights(network, row, l_rate)
print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))

#Test training backprop algorithm


seed(1)
dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
23
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

[7.673756466,3.508563011,1]]
n_inputs = len(dataset[0]) - 1
n_outputs = len(set([row[-1] for row in dataset])) network =
initialize_network(n_inputs, 2, n_outputs) train_network(network, dataset,
0.5, 20, n_outputs)

#for layer in network:


# print(layer)

i= 1
for layer in network: j=1
for sub in layer:
print("\n Layer[%d] Node[%d]:\n" %(i,j),sub) j=j+1
i=i+1

Output:

>epoch=0, lrate=0.500,error=4.763
>epoch=1, lrate=0.500,error=4.558
>epoch=2, lrate=0.500,error=4.316
>epoch=3, lrate=0.500,error=4.035
>epoch=4, lrate=0.500,error=3.733
>epoch=5, lrate=0.500,error=3.428
>epoch=6, lrate=0.500,error=3.132
>epoch=7, lrate=0.500,error=2.850
>epoch=8, lrate=0.500,error=2.588
>epoch=9, lrate=0.500,error=2.348
>epoch=10, lrate=0.500,error=2.128
>epoch=11, lrate=0.500,error=1.931
>epoch=12, lrate=0.500,error=1.753
>epoch=13, lrate=0.500,error=1.595
>epoch=14, lrate=0.500,error=1.454
>epoch=15, lrate=0.500,error=1.329
>epoch=16, lrate=0.500,error=1.218
>epoch=17, lrate=0.500,error=1.120
>epoch=18, lrate=0.500,error=1.033
>epoch=19, lrate=0.500,error=0.956

Layer[1] Node[1]:
{'weights': [-1.435239043819221, 1.8587338175173547, 0.7917644224148094],
'output': 0.029795197360175857, 'delta': -0.006018730117768358}

Layer[1] Node[2]:

24
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

{'weights': [-0.7704959899742789, 0.8257894037467045, 0.21154103288579731


], 'output': 0.06771641538441577, 'delta': -0.005025585510232048}

Layer[2] Node[1]:
{'weights': [2.223584933362892, 1.2428928053374768, -1.3519548925527454],
'output': 0.23499833662766154, 'delta': -0.042246618795029306}

Layer[2] Node[2]:
{'weights': [-2.509732251870173, -0.5925943219491905, 1.259965727484093],
'output': 0.7543931062537561, 'delta': 0.04550706392557862}

Predict
Making predictions with a trained neural network is easy enough. We have already seen
how to forward- propagate an input pattern to get an output. This is all we need to do to
make a prediction. We can use the output values themselves directly as the probability of
a pattern belonging to each output class. It may be more useful to turn this output back
into a crisp class prediction. We can do this by selecting the class value with the larger
probability. This is also called the arg max function. Below is a function named predict()
that implements this procedure. It returns the index in the network output that has the
largest probability. It assumes that class values have been converted to integers starting at
0.

from math import exp

# Calculate neuron activation for an input


def activate(weights, inputs): activation =
weights[-1]
for i in range(len(weights)-1): activation += weights[i] *
inputs[i]
return activation

# Transfer neuron activation


def transfer(activation):
return 1.0 / (1.0 + exp(-activation))

# Forward propagate input to a network output


defforward_propagate(network, row): inputs = row
for layer in network: new_inputs = []
for neuron in layer:
activation = activate(neuron['weights'], inputs) neuron['output'] =
transfer(activation) new_inputs.append(neuron['output'])
inputs = new_inputs
return inputs
25
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

# Make a prediction with a network


def predict(network, row):
outputs = forward_propagate(network, row)
returnoutputs.index(max(outputs))

# Test making predictions with the network


dataset = [[2.7810836,2.550537003,0],
[1.465489372,2.362125076,0],
[3.396561688,4.400293529,0],
[1.38807019,1.850220317,0],
[3.06407232,3.005305973,0],
[7.627531214,2.759262235,1],
[5.332441248,2.088626775,1],
[6.922596716,1.77106367,1],
[8.675418651,-0.242068655,1],
[7.673756466,3.508563011,1]]
network = [[{'weights': [-1.482313569067226, 1.8308790073202204,1.0783819
22048799]}, {'weights': [0.23244990332399884, 0.3621998343835864,0.402898
21191094327]}],
[{'weights': [2.5001872433501404, 0.7887233511355132,-1.1026649757805829]}, {'weights': [-
2.429350576245497, 0.8357651039198697,1.0699217181280656]}]]
for row in dataset:
prediction = predict(network, row)
print('Expected=%d, Got=%d' % (row[-1], prediction))

Output:

Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=0, Got=0
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1
Expected=1, Got=1

26
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 5

AIM: Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

Bayesian Theorem:

Naive Bayes: For the Bayesian Rule above, we have to extend it so that we have

Bayes’ rule:

Given a set of variables, X = {x1,x2,x...,xd}, we want to construct the posterior probability


for the event Cj among a set of possible outcomes C = {c1,c2,c...,cd} , the Bayes Rule is

Since Naive Bayes assumes that the conditional probabilities of the independent variables
are statistically independent we can decompose the likelihood to a product of terms:

and rewrite the posterior as:

Using Bayes' rule above, we label a new case X with a class level Cj that achieves the highest posterior
probability.

27
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

Naive Bayes can be modeled in several different ways including normal, lognormal, gamma and
Poisson density functions:

Types:

• Gaussian: It is used in classification and it assumes that features follow a normal


distribution. Gaussian Naive Bayes is used in cases when all our features are continuous.

For example in Iris dataset features are sepal width, petal width, sepal length, petallength.

• MultinomialNaiveBayes:Itsisusedwhenwehavediscretedata(e.g.movieratingsranging1
and 5 as each rating will have certain frequency to represent). In text learning we have
the count of each word to predict the class orlabel

28
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

 BernoulliNaiveBayes:It assumesthatallourfeaturesarebinarysuchthattheytakeonly two

values. Means 0s can represent “word does not occur in the document” and 1s as "word
occurs in thedocument"

Source Code:
importcsv
import random import
math

# 1.Data Handling
# 1.1 Loading the Data from csv file of Pima indians diabetes dataset.
defloadcsv(filename):
lines = csv.reader(open(filename, "r")) dataset = list(lines)
for i in range(len(dataset)):
# converting the attributes from string to floating point numbers
dataset[i] = [float(x) for x in dataset[i]]
return dataset

#1.2 Splitting the Data set into Training Set


defsplitDataset(dataset, splitRatio): trainSize = int(len(dataset) *
splitRatio) trainSet = []
copy = list(dataset)
whilelen(trainSet) <trainSize:
index = random.randrange(len(copy)) # random index
trainSet.append(copy.pop(index))
return [trainSet, copy]

#2.Summarize Data
#The naive bayes model is comprised of a #summary of the data in
the training dataset.
#This summary is then used when making predictions.
#involves the mean and the standard deviation for each attribute, by class value

#2.1: Separate Data By Class


#Function to categorize the dataset in terms of classes
#The function assumes that the last attribute (-1) is the class value. #The function returns a map of class
values to lists of data instances. defseparateByClass(dataset):
separated = {}
for i in range(len(dataset)): vector = dataset[i]
if (vector[-1] not in separated): separated[vector[-

29
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

1]]= []
separated[vector[-1]].append(vector)
return separated

#The mean is the central middle or central tendency of the data, # and we will use it as the
middle of our gaussian distribution # when calculatingprobabilities

#2.2 : Calculate Mean


def mean(numbers):
return sum(numbers)/float(len(numbers))

#The standard deviation describes the variation of spread of the data, #and we will use it to characterize
the expected spread of each attribute #in our Gaussian distribution when calculating probabilities.

#2.3 : Calculate Standard Deviation


defstdev(numbers): avg
=mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
returnmath.sqrt(variance)

#2.4 : Summarize Dataset


#Summarize Data Set for a list of instances (for a class value)
#The zip function groups the values for each attribute across our data ins tances
#into their own lists so that we can compute the mean and standard deviati on values
#for the attribute.

def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(
*dataset)]
del summaries[-1]
return summaries

#2.5 : Summarize Attributes By Class


#We can pull it all together by first separating our training dataset into #instances grouped by class.Then
calculate the summaries for each attribut e.

defsummarizeByClass(dataset): separated =
separateByClass(dataset) summaries ={}
forclassValue, instances in separated.items(): summaries[classValue] =
summarize(instances)
return summaries

#3.Make Prediction
#3.1 Calculate Probaility Density Function
defcalculateProbability(x, mean, stdev):

30
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

#3.2 Calculate Class Probabilities


defcalculateClassProbabilities(summaries, inputVector): probabilities = {}
forclassValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)): mean, stdev =
classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, std
ev)
return probabilities

#3.3 Prediction : look for the largest probability and return the associated class
def predict(summaries, inputVector):
probabilities = calculateClassProbabilities(summaries, inputVector) bestLabel, bestProb = None, -
1
forclassValue, probability in probabilities.items():
ifbestLabel is None or probability >bestProb: bestProb =probability
bestLabel =classValue
returnbestLabel

#4.MakePredictions
# Function which return predictions for list of predictions # For eachinstance

defgetPredictions(summaries, testSet): predictions = []


for i in range(len(testSet)):
result = predict(summaries, testSet[i])
predictions.append(result)
returnpredictions

#5. ComputingAccuracy
defgetAccuracy(testSet, predictions): correct = 0
for i in range(len(testSet)):
iftestSet[i][-1] == predictions[i]: correct += 1
return (correct/float(len(testSet))) * 100.0

#Main Function
def main():
filename = 'C:\\Users\\Dr.Thyagaraju\\Desktop\\Data\\pima-indians-diab etes.csv'
splitRatio = 0.67
dataset = loadcsv(filename)

#print("\n The Data Set :\n",dataset)


print("\n The length of the Data Set : ",len(dataset))
31
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

print("\n The Data Set Splitting into Training and Testing \n") trainingSet, testSet =
splitDataset(dataset, splitRatio)

print('\n Number of Rows in Training Set:{0} rows'.format(len(training Set)))


print('\n Number of Rows in Testing Set:{0} rows'.format(len(testSet))
)

print("\n First Five Rows of Training Set:\n")


for i in range(0,5):
print(trainingSet[i],"\n")

print("\n First Five Rows of Testing Set:\n")


for i in range(0,5): print(testSet[i],"\n")

# prepare model
summaries = summarizeByClass(trainingSet) print("\n Model
Summaries:\n",summaries)

# test model
predictions = getPredictions(summaries, testSet)
print("\nPredictions:\n",predictions)

accuracy = getAccuracy(testSet, predictions) print('\n Accuracy:


{0}%'.format(accuracy))
main()

Output:

The length of the DataSet: 768

The Data Set Splitting into Training and Testing Number


of Rows in Training Set:514 rows
Number of Rows in Testing Set:254 rows
First Five Rows of Training Set:

[4.0, 116.0, 72.0, 12.0, 87.0, 22.1, 0.463, 37.0, 0.0]


[0.0, 84.0, 64.0, 22.0, 66.0, 35.8, 0.545, 21.0,0.0]
[0.0, 162.0, 76.0, 36.0, 0.0, 49.6, 0.364, 26.0,1.0]
[10.0, 101.0, 86.0, 37.0, 0.0, 45.6, 1.136, 38.0, 1.0]
[5.0, 78.0, 48.0, 0.0, 0.0, 33.7, 0.654, 25.0, 0.0]

32
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

First Five Rows of Testing Set:

[1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0,0.0]


[8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0,1.0]
[4.0, 110.0, 92.0, 0.0, 0.0, 37.6, 0.191, 30.0,0.0]
[10.0, 139.0, 80.0, 0.0, 0.0, 27.1, 1.441, 57.0, 0.0]
[7.0, 100.0, 0.0, 0.0, 0.0, 30.0, 0.484, 32.0, 1.0]

Model Summaries:

{0.0:[(3.3474320241691844, 3.045635385378286), (111.54380664652568,


26.040069054720693), (68.45921450151057, 18.15540652389224),(19.94561933534743,
14.709615608767137), (71.50151057401813, 101.04863439385403),(30.863141993957708,
7.207208162103949), (0.4341842900302116, 0.2960911906946818),(31.613293051359516,
12.100651311117689)],1.0:[(4.469945355191257,3.7369440851983082),(139.387978142076533.
733070931373234),(71.14754098360656,20.694403393963842),(22.92896174863388,18.1519950
92528765),(107.97814207650273,14692526156736633),(35.28633879781422,7.78334226034858
3),(0.5569726775956286,0.394224334398509),(36.78688524590164,11.174610282702282)]}

Predictions:

[0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0,0.0, 0.0, 0.0, 1.0, 1.0, 1.0,
0.0, 1.0, 0.0, 1.0, 0.0,1.0,1.0, 0.0, 0.0,0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,1.0, 1.0, 0.0,
0.0, 0.0, 0.0,
1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0,
1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0,
1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0,
1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0,
0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0,
1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0]

Accuracy:80.31496062992126%

33
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 6
AIM: Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.

Source Code:

import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.message
y=msg.labelnum
print(X)
print(y)

fromsklearn.model_selection import train_test_split


xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
34
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

print(ytrain.shape)

fromsklearn.feature_extraction.text import CountVectorizer


count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
fromsklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

fromsklearn import metrics


print('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted)) print(metrics.precision_score(ytest,predicted))

Output:

The dimensions of the dataset (18, 2)


0 I love this sandwich
1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today
Name: message, dtype: object
01
11
21
35
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

31
41
50
60
70
80
90
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64
(5,) (13,) (5,) (13,)
Accuracy metrics
Accuracy of the classifer is 0.8
Confusion matrix
[[3 1] [0 1]]
Recall and Precison
1.0 0.5

36
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 7

AIM: Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use
Java/Python ML library classes/API.

Attribute Information: --
Only 14 used
-- 1. #3 (age)
-- 2. #4 (sex)
-- 3. #9 (cp)
-- 4. #10 (trestbps)
-- 5. #12 (chol)
-- 6. #16 (fbs)
-- 7. #19 (restecg)
-- 8. #32 (thalach)
-- 9. #38 (exang)
-- 10. #40 (oldpeak)
-- 11. #41 (slope)
-- 12. #44 (ca)
-- 13. #51 (thal)
-- 14. #58 (num)

Source Code:

importnumpy as np
fromurllib.request import urlopen
importurllib
37
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

import pandas as pd
frompgmpy.inference import VariableElimination
frompgmpy.models import BayesianModel
frompgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimator

names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal',
'heartdisease']
heartDisease = pd.read_csv('heart.csv', names = names)
heartDisease = heartDisease.replace('?', np.nan)

model = BayesianModel([('age', 'trestbps'), ('age', 'fbs'), ('sex', 'trestbps'),


('exang','trestbps'),('trestbps','heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),
('heartdisease','thalach'), ('heartdisease','chol')])

model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)
frompgmpy.inference import VariableElimination
HeartDisease_infer = VariableElimination(model)

q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 37, 'sex' :0})


print(q['heartdisease'])

Output:

╒════════════════╤════
│ heartdisease │ phi(heartdisease) │
╞══════════════════════
│ heartdisease_0 │ 0.5593 │
├─────────────────────┤
│ heartdisease_1 │ 0.4407 │
╘════════════════

38
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 8

AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on
the quality of clustering. You can add Java/Python ML library classes/API in the program.

K Means Algorithm :
1. Load Data Sets
2. Clusters the data into k groups where k is predefined.
3. Select k points at random as cluster centers.
4. Assign objects to their closest cluster center according to the Euclidean distance function
5. Calculate the centroid or mean of all objects in each cluster.
6. Repeat steps 3, 4 and 5 until the same points are assigned to each cluster in consecutive rounds.

39
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

Source Code:
importnumpy as np
fromsklearn.cluster import KMeans
importmatplotlib.pyplot as plt
fromsklearn.mixture import GaussianMixture
import pandas as pd
X=pd.read_csv("kmeansdata.csv")
x1 = X['Distance_Feature'].values
x2 = X['Speeding_Feature'].values
X = np.array(list(zip(x1, x2))).reshape(len(x1), 2)
plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50])
40
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

plt.title('Dataset')
plt.scatter(x1, x2)
plt.show()
#code for EM
gmm = GaussianMixture(n_components=3)
gmm.fit(X)
em_predictions = gmm.predict(X)
print("\nEM predictions")
print(em_predictions)
print("mean:\n",gmm.means_)
print('\n')
print("Covariances\n",gmm.covariances_)
print(X)
plt.title('Exceptation Maximum')
plt.scatter(X[:,0], X[:,1],c=em_predictions,s=50)
plt.show()
#code for Kmeans
importmatplotlib.pyplot as plt1
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
print(kmeans.labels_)
plt.title('KMEANS')
plt1.scatter(X[:,0], X[:,1], c=kmeans.labels_, cmap='rainbow')
plt1.scatter(kmeans.cluster_centers_[:,0] ,kmeans.cluster_centers_[:,1],
color='black')

Output:

41
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EM predictions
[0 0 0 1 0 1 1 1 2 1 2 2 1 1 2 1 2 1 0 1 0 1 1]

mean:
[[57.70629058 25.73574491]
[52.12044022 22.46250453]
[46.4364858 39.43288647]]

Covariances
[[[83.51878796 14.926902 ]
[14.926902 2.70846907]]
[[29.95910352 15.83416554]
[15.83416554 67.01175729]]
[[79.34811849 29.55835938]
[29.55835938 18.17157304]]]
[[71.24 28. ]
[52.53 25. ]
[64.54 27. ]
[55.69 22. ]
[54.58 25. ]
[41.91 10. ]
[58.64 20. ]
[52.02 8. ]
[31.25 34. ]
[44.31 19. ]
[49.35 40. ]
[58.07 45. ]
[44.22 22. ]
[55.73 19. ]
[46.63 43. ]
[52.97 32. ]
[46.25 35. ]
[51.55 27. ]
[57.05 26. ]
[58.45 30. ]
[43.42 23. ]
[55.68 37. ]
[55.15 18. ]

42
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

centroid and predications


[[57.74090909 24.27272727]
[48.6 38. ]
[45.176 16.4 ]]
[0 0 0 0 0 2 0 2 1 2 1 1 2 0 1 1 1 0 0 0 2 1 0]

43
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 9

AIM: Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for this
problem.

Algorithm:

Principle: points (documents) that are close in the space belong to the same class

Definiton of Nearest Neighbor

44
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

X X X
(a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor

45
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

Distance Metrics

46
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

Source Code:

fromsklearn.neighbors import KNeighborsClassifier


fromsklearn.metrics import confusion_matrix
fromsklearn.metrics import accuracy_score
fromsklearn.metrics import classification_report
fromsklearn.model_selection import train_test_split
import pandas as pd
dataset=pd.read_csv("iris.csv")
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=0,test_size=0.25)
classifier=KNeighborsClassifier(n_neighbors=8,p=3,metric='euclidean')

classifier.fit(X_train,y_train)
#predict the test resuts
y_pred=classifier.predict(X_test)
cm=confusion_matrix(y_test,y_pred)
print('Confusion matrix is as follows\n',cm)
print('Accuracy Metrics')
print(classification_report(y_test,y_pred))
print(" correct predicition",accuracy_score(y_test,y_pred))
print(" worngpredicition",(1-accuracy_score(y_test,y_pred)))

Output:

Confusion matrix is as follows


[[13 0 0]
[ 0 15 1]
[ 0 0 9]]

Accuracy Metrics
precision recall f1-score support

Iris-setosa 1.00 1.00 1.00 13

Iris-versicolor 1.00 0.94 0.97 16

Iris-virginica 0.90 1.00 0.95 9

avg / total 0.98 0.97 0.97 38

correctpredicition 0.9736842105263158

worngpredicition 0.02631578947368418

47
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

EXPERIMENT 10

AIM: Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

Regression:
 Regression is a technique from statistics that is used to predict values of a desired target quantity
when the target quantity is continuous.
 In regression, we seek to identify (or estimate) a continuous variable y associated with a given input
vector x.
 y is called the dependentvariable.
 x is called the independentvariable.

Loess/Lowess Regression: Loess regression is a nonparametric technique that uses local


weighted regression to fit a smooth curve through points in a scatter plot.

Lowess Algorithm:Locally weighted regression is a very powerful non- parametric model used
in statistical learning .Given a dataset X, y, we attempt to find a model parameter β(x) that
minimizes residual sum of weighted squared errors. The weights are given by a kernel function(k
or w) which can be chosen arbitrarily

48
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

Algorithm

 Read the Given data Sample to X and the curve (linear or non linear) toY
 Set the value for Smoothening parameter or Free parameter sayτ
 Set the bias /Point of interest set X0 which is a subset ofX
 Determine the weight matrix using:

 Determine the value of model term parameter β using:

 Prediction =x0*β

49
Jaipur Engineering College and Research
Centre, Shri Ram kiNangal, via Sitapura Academic year-
2019-20
RIICO Jaipur- 302 022.

Algorithm

 Read the Given data Sample to X and the curve (linear or non linear) toY
 Set the value for Smoothening parameter or Free parameter sayτ
 Set the bias /Point of interest set X0 which is a subset ofX
 Determine the weight matrix using:

 Determine the value of model term parameter β using:

 Prediction =x0*β

50

You might also like