0% found this document useful (0 votes)
38 views61 pages

18csl76 Lab Manual Lab Material

The document is a lab manual for the Artificial Intelligence and Machine Learning Laboratory course at Vemana Institute of Technology, detailing the course objectives, experiments, and evaluation methods. It outlines the vision and mission of the institute and department, along with program educational objectives and specific outcomes. The manual includes various AI and ML algorithms to be implemented in Python or Java, along with guidelines for practical examinations.

Uploaded by

wasim khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views61 pages

18csl76 Lab Manual Lab Material

The document is a lab manual for the Artificial Intelligence and Machine Learning Laboratory course at Vemana Institute of Technology, detailing the course objectives, experiments, and evaluation methods. It outlines the vision and mission of the institute and department, along with program educational objectives and specific outcomes. The manual includes various AI and ML algorithms to be implemented in Python or Java, along with guidelines for practical examinations.

Uploaded by

wasim khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

lOMoARcPSD|44664451

18CSL76 Lab Manual - lab material

Cryptography (Vemana Institute of Technology )

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university


Downloaded by wasim khan ([email protected])
lOMoARcPSD|44664451

Academic Year 2021– 2022 (ODD Semester)

LAB MANUAL

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING


LABORATORY
18CSL76
VII Semester CS

Prepared By
A Rosline Mary, S Suma
Asst. Professor
Dept. of CSE

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Vision and Mission of the Institute


Vision
To become a leading institute for quality technical education and research with ethical values.
Mission
M1: To continually improve quality education system that produces thinking engineers having good
technical capabilities with human values.
M2: To nurture a good eco-system that encourages faculty and students to engage in meaningful research
and development.
M3: To strengthen industry institute interface for promoting team work, internship and entrepreneurship.
M4: To enhance educational opportunities to the rural and weaker sections of the society to equip with
practical skills to face the challenges of life.

Vision and Mission of the Department


Vision
To become a leading department engaged in quality education and research in the field of computer science
and engineering.

Mission
M1: To nurture a positive environment with state of art facilities conducive for deep learning and
meaningful research and development.

M2: To enhance interaction with industry for promoting collaborative research in emerging technologies.
M3: To strengthen the learning experiences enabling the students to become ethical professionals with good
interpersonal skills, capable of working effectively in multi-disciplinary teams.

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PROGRAM EDUCATIONAL OBJECTIVES (PEOS)


PEO 1: Successful and ethical professionals in IT and ITES industries contributing to societal progress.
PEO 2: Engaged in life-long learning, adapting to changing technological scenarios.
PEO 3: Communicate and work effectively in diverse teams and exhibit leadership qualities.

PROGRAM SPECIFIC OUTCOMES (PSOs)


PSO 1: Analyze, design, implement and test innovative application software systems to meet the specified
requirements.
PSO 2: Understand and use systems software packages.
PSO 3: Understand the organization and architecture of digital computers, embedded systems and computer
networks.

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

SYLLABUS
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING LABORATORY
[As per Choice Based Credit System (CBCS) scheme]
(Effective from the academic year 2018 -2019)
SEMESTER – VII
Subject Code: 18CSL76 IA Marks: 40
Number of Lecture Hours/Week: 01I + 02P Exam Marks: 60
Total Number of Lecture Hours: 36 Exam Hours: 03
CREDITS – 02
Course objectives: This course will enable students to

 Implement and evaluate AI and ML algorithms in and Python programming language.


PART - A

Description:

 The programs can be implemented in either JAVA or Python.


 For Problems 1 to 6 and 10, programs are to be developed without using the built-in classes or
APIs of Java/Python.
 Data sets can be taken from standard repositories
(https://archive.ics.uci.edu/ml/datasets.html) or constructed by the students.

Lab Experiments:
 Implement A* Search algorithm.
 Implement AO* Search algorithm.
 For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm o output a description of the set of all hypotheses consistent with
the training examples.
 Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
 Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.
 Write a program to implement the naïve Bayesian classifier for a sample training data set stored as
a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

 Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on
the quality of clustering. You can add Java/Python ML library classes/API in the program.
 Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions. Java/Python ML library classes can be used for this problem.
 Implement the non-parametric Locally Weighted Regression algorithm to fit data points. Select
appropriate data set for your experiment and draw graphs.

Conduction of Practical Examination:


 All laboratory experiments are to be included for practical examination.
 Students are allowed to pick one experiment from the lot.
 Strictly follow the instructions as printed on the cover page of answer script
 Marks distribution: Procedure + Execution + Viva-Voce: 15+70+15 = 100 Marks
Change of experiment is allowed only once and marks allotted to the procedure part to be made
zero.

Course outcomes: The students should be able to:


 Implement and demonstrate AI and ML algorithms.
 Evaluate different algorithms.

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Table of contents
Exp. Experiments Page
No. No.
1 Implement A* Search algorithm. 1

2 Implement AO* Search algorithm. 5

3 For a given set of training data examples stored in a .CSV file, implement and 12
demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
4 Write a program to demonstrate the working of the decision tree based ID3 algorithm. 15
Use an appropriate data set for building the decision tree and apply this knowledge to
classify a new sample.
5 Build an Artificial Neural Network by implementing the Backpropagation algorithm 42
and test the same using appropriate data sets.
6 Write a program to implement the naïve Bayesian classifier for a sample training data 45
set stored as a .CSV file. Compute the accuracy of the classifier, considering few test
data sets.
7 Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data 48
set for clustering using k-Means algorithm. Compare the results of these two
algorithms and comment on the quality of clustering. You can add Java/Python ML
library classes/API in the program.
8 Write a program to implement k-Nearest Neighbour algorithm to classify the iris data 50
set. Print both correct and wrong predictions. Java/Python ML library classes can be
used for this problem.
9 Implement the non-parametric Locally Weighted Regression algorithm in order to fit 52
data points. Select appropriate data set for your experiment and draw graphs.

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Machine Learning Laboratory 2019-2020

1. Implement A* Search algorithm.


def aStarAlgo(start_node, stop_node):
open_set = set(start_node) #Keeps track of unvisited nodes
closed_set = set() #keeps track of visited nodes
g = {} #store distance from starting node
parents = {} # parents contains an adjacency map of all
nodes g[start_node] = 0 #distance of starting node
from itself is zero #start_node is root node i.e it has no parent
nodes
#so start_node is set to its own parent node
parents[start_node] = start_node

while len(open_set) > 0:


n = None
#node with lowest f() is found
for v in open_set:
if n == None or g[v] + heuristic(v) < g[n] + heuristic(n):
n=v
if n == stop_node or Graph_nodes[n] == None:
pass
else:
for (m, weight) in get_neighbors(n):
#nodes 'm' not in first and last set are added to
first #n is set its parent
if m not in open_set and m not in closed_set:
open_set.add(m)
parents[m] = n
g[m] = g[n] + weight
#for each node m,compare its distance from start i.e g(m) to the
#from start through n node
else:

Dept. of CSE, Vemana IT 1

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

if g[m] > g[n] + weight:


#update g(m)
g[m] = g[n] + weight
#change parent of m to n
parents[m] = n
#if m in closed set,remove and add to open
if m in closed_set:
closed_set.remove(m)
open_set.add(m)
if n == None:
print('Path does not exist!')
return None
# if the current node is the stop_node
# then we begin reconstructin the path from it to the start_node
if n == stop_node:
path = []
while parents[n] != n:
path.append(n)
n = parents[n]
path.append(start_node)
path.reverse()
print('Path found: {}'.format(path))
return path
# remove n from the open_list, and add it to closed_list
# because all of his neighbors were inspected
open_set.remove(n)
closed_set.add(n)
print('Path does not exist!')
return None
#define fuction to return neighbor and its distance

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

#from the passed node


def get_neighbors(v):
if v in Graph_nodes:
return Graph_nodes[v]
else:
return None

#INPUT 1

#for simplicity we ll consider heuristic distances given


#and this function returns heuristic distance for all
nodes def heuristic(n):
H_dist = {
'A': 11,
'B': 6,
'C': 99,
'D': 1,
'E': 7,
'G': 0,
}
return H_dist[n]
#Describe your graph here
Graph_nodes = {
'A': [('B', 2), ('E', 3)],
'B': [('C', 1),('G', 9)],
'C': None,
'E': [('D', 6)],
'D': [('G', 1)],
}
aStarAlgo('A', 'G')

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

OUTPUT:
Path found: ['A', 'E', 'D', 'G']

##INPUT2
def heuristic(n):
H_dist = {
'A': 11,
'B': 6,
'C': 5,
'D': 7,
'E': 3,
'F': 6,
'G': 5,
'H': 3,
'I': 1,
'J': 0
}
return H_dist[n]
#Describe your graph here
Graph_nodes = {
'A': [('B', 6), ('F', 3)],
'B': [('A', 6), ('C', 3), ('D', 2)],
'C': [('B', 3), ('D', 1), ('E', 5)],
'D': [('B', 2), ('C', 1), ('E', 8)],
'E': [('C', 5), ('D', 8), ('I', 5), ('J', 5)],
'F': [('A', 3), ('G', 1), ('H', 7)],
'G': [('F', 1), ('I', 3)],
'H': [('F', 7), ('I', 2)],
'I': [('E', 5), ('G', 3), ('H', 2), ('J', 3)],
}
aStarAlgo('A', 'J')

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

OUTPUT:
Path found: ['A', 'F', 'G', 'I', 'J']

2. Implement AO* Search algorithm.

AO STAR SEARCH:
class Graph:
def init (self, graph, heuristicNodeList, startNode): #instantiate graph object with graph topology,
heuristic values, start node
self.graph = graph
self.H=heuristicNodeList
self.start=startNode
self.parent={}
self.status={}
self.solutionGraph={}
def applyAOStar(self): # starts a recursive AO* algorithm
self.aoStar(self.start, False)
def getNeighbors(self, v): # gets the Neighbors of a given node
return self.graph.get(v,'')
def getStatus(self,v): # return the status of a given
node return self.status.get(v,0)
def setStatus(self,v, val): # set the status of a given node
self.status[v]=val
def getHeuristicNodeValue(self, n):
return self.H.get(n,0) # always return the heuristic value of a given node
def setHeuristicNodeValue(self, n, value):
self.H[n]=value # set the revised heuristic value of a given node
def printSolution(self):

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

print("FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START


NODE:",self.start)
print(" ")
print(self.solutionGraph)
print(" ")

def computeMinimumCostChildNodes(self, v): # Computes the Minimum Cost of child nodes of a


given node v
minimumCost=0
costToChildNodeListDict={}
costToChildNodeListDict[minimumCost]=[]
flag=True
for nodeInfoTupleList in self.getNeighbors(v): # iterate over all the set of child node/s
cost=0
nodeList=[]
for c, weight in nodeInfoTupleList:
cost=cost+self.getHeuristicNodeValue(c)+weight
nodeList.append(c)
if flag==True: # initialize Minimum Cost with the cost of first set of child node/s
minimumCost=cost
costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
flag=False
else: # checking the Minimum Cost nodes with the current Minimum Cost
if minimumCost>cost:
minimumCost=cost
costToChildNodeListDict[minimumCost]=nodeList # set the Minimum Cost child node/s
return minimumCost, costToChildNodeListDict[minimumCost] # return Minimum Cost and
Minimum Cost child node/s
def aoStar(self, v, backTracking): # AO* algorithm for a start node and backTracking status flag
print("HEURISTIC VALUES :", self.H)
print("SOLUTION GRAPH :", self.solutionGraph)

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

print("PROCESSING NODE :", v)


print(" ")
if self.getStatus(v) >= 0: # if status node v >= 0, compute Minimum Cost nodes of
v
minimumCost, childNodeList = self.computeMinimumCostChildNodes(v)
print(minimumCost, childNodeList)
self.setHeuristicNodeValue(v, minimumCost)
self.setStatus(v,len(childNodeList))
solved=True # check the Minimum Cost nodes of v are solved
for childNode in childNodeList:
self.parent[childNode]=v
if self.getStatus(childNode)!=-1:
solved=solved & False
if solved==True: # if the Minimum Cost nodes of v are solved, set the current node status as
solved(-1)
self.setStatus(v,-1)
self.solutionGraph[v]=childNodeList # update the solution graph with the solved nodes which
may be a part of solution
if v!=self.start: # check the current node is the start node for backtracking the current node value
self.aoStar(self.parent[v], True) # backtracking the current node value with backtracking status
set to true
if backTracking==False: # check the current call is not for backtracking
for childNode in childNodeList: # for each Minimum Cost child node
self.setStatus(childNode,0) # set the status of child node to 0(needs exploration)
self.aoStar(childNode, False) # Minimum Cost child node is further explored with
backtracking status as false
#for simplicity we ll consider heuristic distances given
print ("Graph - 1")
h1 = {'A': 1, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
graph1 = {
'A': [[('B', 1), ('C', 1)], [('D', 1)]],
'B': [[('G', 1)], [('H', 1)]],

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

'C': [[('J', 1)]],


'D': [[('E', 1), ('F', 1)]],
'G': [[('I', 1)]]
}
G1= Graph(graph1, h1, 'A')
G1.applyAOStar()
G1.printSolution()
print ("Graph - 2")
h2 = {'A': 1, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7} # Heuristic values of Nodes
graph2 = { # Graph of Nodes and Edges
'A': [[('B', 1), ('C', 1)], [('D', 1)]], # Neighbors of Node 'A', B, C & D with repective weights
'B': [[('G', 1)], [('H', 1)]], # Neighbors are included in a list of lists
'D': [[('E', 1), ('F', 1)]] # Each sublist indicate a "OR" node or "AND" nodes
}
G2 = Graph(graph2, h2, 'A') # Instantiate Graph object with graph, heuristic values and start Node
G2.applyAOStar() # Run the AO* algorithm
G2.printSolution() # Print the solution graph as output of the AO* algorithm search
OUTPUT :
OUTPUT:
Graph - 1
HEURISTIC VALUES : {'A': 1, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
10 ['B', 'C']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : B
6 ['G']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
10 ['B', 'C']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 5, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PROCESSING NODE : G
8 ['I']
HEURISTIC VALUES : {'A': 10, 'B': 6, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : B
8 ['H']
HEURISTIC VALUES : {'A': 10, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : A
12 ['B', 'C']
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 7, 'J': 1}
SOLUTION GRAPH : {}
PROCESSING NODE : I
0 []
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 8, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': []}
PROCESSING NODE : G
1 ['I']
HEURISTIC VALUES : {'A': 12, 'B': 8, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I']}
PROCESSING NODE : B
2 ['G']
HEURISTIC VALUES : {'A': 12, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : A
6 ['B', 'C']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : C
2 ['J']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : A
6 ['B', 'C']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 1}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G']}
PROCESSING NODE : J
0 []
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 2, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 0}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G'], 'J': []}

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PROCESSING NODE : C
1 ['J']
HEURISTIC VALUES : {'A': 6, 'B': 2, 'C': 1, 'D': 12, 'E': 2, 'F': 1, 'G': 1, 'H': 7, 'I': 0, 'J': 0}
SOLUTION GRAPH : {'I': [], 'G': ['I'], 'B': ['G'], 'J': [], 'C': ['J']}
PROCESSING NODE : A
5 ['B', 'C']
FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE: A
{'I': [], 'G': ['I'], 'B': ['G'], 'J': [], 'C': ['J'], 'A': ['B', 'C']}

INPUT 2
print ("Graph - 2")
h2 = {'A': 1, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7} # Heuristic values of Nodes
graph2 = { # Graph of Nodes and Edges
'A': [[('B', 1), ('C', 1)], [('D', 1)]], # Neighbors of Node 'A', B, C & D with repective weights
'B': [[('G', 1)], [('H', 1)]], # Neighbors are included in a list of lists
'D': [[('E', 1), ('F', 1)]] # Each sublist indicate a "OR" node or "AND" nodes
}

G2 = Graph(graph2, h2, 'A') # Instantiate Graph object with graph, heuristic values and start Node
G2.applyAOStar() # Run the AO* algorithm
G2.printSolution() # Print the solution graph as output of the AO* algorithm search

OUTPUT:
Graph - 2
HEURISTIC VALUES : {'A': 1, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : A
11 ['D']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : D
10 ['E', 'F']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PROCESSING NODE : A
11 ['D']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 4, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {}
PROCESSING NODE : E
0 []
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 10, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : D
6 ['E', 'F']
HEURISTIC VALUES : {'A': 11, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : A
7 ['D']
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 4, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': []}
PROCESSING NODE : F
0 []
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 6, 'E': 0, 'F': 0, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': [], 'F': []}
PROCESSING NODE : D
2 ['E', 'F']
HEURISTIC VALUES : {'A': 7, 'B': 6, 'C': 12, 'D': 2, 'E': 0, 'F': 0, 'G': 5, 'H': 7}
SOLUTION GRAPH : {'E': [], 'F': [], 'D': ['E', 'F']}
PROCESSING NODE : A
3 ['D']
FOR GRAPH SOLUTION, TRAVERSE THE GRAPH FROM THE START NODE: A
{'E': [], 'F': [], 'D': ['E', 'F'], 'A': ['D']}

3. For a given set of training data examples stored in a .CSV file, implement and demonstrate the C
andidate-Elimination algorithm to output a description of the set of all hypotheses consistent with t
he training examples.
import numpy as np

import pandas as pd

data = pd.read_csv('enjoysport.csv')

concepts = np.array(data.iloc[:,0:-1])

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

print(concepts)

target = np.array(data.iloc[:,-1])

print(target)

def learn(concepts, target):

specific_h = concepts[0].copy()

print("initialization of specific_h and general_h")

print(specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]

print(general_h)

for i, h in enumerate(concepts):

print("For Loop Starts")

if target[i] == "yes":

print("If instance is Positive ")

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

specific_h[x] ='?'

general_h[x][x] ='?'

if target[i] == "no":

print("If instance is Negative ")

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

general_h[x][x] = specific_h[x]

else:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

general_h[x][x] = '?'

print(" steps of Candidate Elimination Algorithm",i+1)

print(specific_h)

print(general_h)

print("\n")

print("\n")

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:

general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h:", s_final, sep="\n")

print("Final General_h:", g_final, sep="\n")

OUTPUT:

[['sunny' 'warm' 'normal' 'strong' 'warm' 'same']


['sunny' 'warm' 'high' 'strong' 'warm' 'same']
['rainy' 'cold' 'high' 'strong' 'warm' 'change']
['sunny' 'warm' 'high' 'strong' 'cool' 'change']]
['yes' 'yes' 'no' 'yes']
initialization of specific_h and general_h
['sunny' 'warm' 'normal' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

For Loop Starts


If instance is Positive
steps of Candidate Elimination Algorithm 1

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

['sunny' 'warm' 'normal' 'strong' 'warm' 'same']


[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

For Loop Starts


If instance is Positive
steps of Candidate Elimination Algorithm 2
['sunny' 'warm' '?' 'strong' 'warm' 'same']
[['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?', '?']]

For Loop Starts


If instance is Negative
steps of Candidate Elimination Algorithm 3
['sunny' 'warm' '?' 'strong' 'warm' 'same']
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '
?', '?', '?'], ['?', '?', '?', '?', '?', 'same']]

For Loop Starts


If instance is Positive
steps of Candidate Elimination Algorithm 4
['sunny' 'warm' '?' 'strong' '?' '?']
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?', '?'], ['?', '?', '?', '
?', '?', '?'], ['?', '?', '?', '?', '?', '?']]

Final Specific_h:
['sunny' 'warm' '?' 'strong' '?' '?']
Final General_h:
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?', '?']]

4. Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.

Import Play Tennis Data


import pandas as pd
from pandas import DataFrame
df_tennis = DataFrame.from_csv('PlayTennis.csv')
print("\n Given Play Tennis Data Set:\n\n", df_tennis)
Given Play Tennis Data Set:
PlayTennis Outlook Temperature Humidity Windy

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

0 NO sunny hot high weak


1 NO sunny hot high strong
2 YES overcast hot high weak
3 YES rainy mild high weak
4 YES rainy cool normal weak
5 NO rainy cool normal strong
6 YES overcast cool normal strong
7 NO sunny mild high weak
8 YES sunny cool normal weak
9 YES rainy mild normal weak
10 YES sunny mild normal strong
11 YES overcast mild high strong
12 YES overcast hot normal weak
13 NO rainy mild high strong

df_tennis.keys()[0]

'PlayTennis'

Entropy of the Training Data Set


#Function to calculate the entropy of probaility of observations
# -p*log2*p
def entropy(probs):
import math

return sum( [-prob*math.log(prob, 2) for prob in probs] )


#Function to calulate the entropy of the given Data Sets/List with respect to target attributes
def entropy_of_list(a_list):
#print("A-list",a_list)
from collections import Counter
cnt = Counter(x for x in a_list) # Counter calculates the propotion of class
print("\nClasses:",cnt)
#print("No and Yes Classes:",a_list.name,cnt)
num_instances = len(a_list) # = 14

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

print("\n Number of Instances of the Current Sub Class is {0}:".format(num_instances ))


probs = [x / num_instances for x in cnt.values()] # x means no of YES/NO
print(probs)
print("\n Classes:",list(cnt.keys()))#min(cnt),max(cnt))
print(" \n Probabilities of Class {0} is {1}:".format(min(cnt),min(probs)))
print(" \n Probabilities of Class {0} is {1}:".format(max(cnt),max(probs)))
return entropy(probs) # Call Entropy :
# The initial entropy of the YES/NO attribute for our dataset.
print("\n INPUT DATA SET FOR ENTROPY CALCULATION:\n", df_tennis['PlayTennis'])
total_entropy = entropy_of_list(df_tennis['PlayTennis'])
print("\n Total Entropy of PlayTennis Data Set:",total_entropy)
INPUT DATA SET FOR ENTROPY CALCULATION:
0 NO
1 NO
2 YES
3 YES
4 YES
5 NO
6 YES
7 NO
8 YES
9 YES
10 YES
11 YES
12 YES
13 NO
Name: PlayTennis, dtype: object
Classes: Counter({'YES': 9, 'NO':
5})
Number of Instances of the Current Sub Class is
14: [0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Total Entropy of PlayTennis Data Set: 0.9402859586706309

Information Gain of Attributes


def information_gain(df, split_attribute_name, target_attribute_name, trace=0):
print("Information Gain Calculation of ",split_attribute_name)
# Split Data by Possible Vals of Attribute:
df_split = df.groupby(split_attribute_name)
print("split:", type(df_split))
for name,group in df_split:
print("Name:\n",name)
print("Group:\n",group)
# Calculate Entropy for Target Attribute, as well as
# Proportion of Obs in Each Data-Split
nobs = len(df.index)
# print("NOBS",nobs)
#define the aggregation based on target attribute name
df_agg_ent = df_split.agg({target_attribute_name : [entropy_of_list, lambda x: len(x)/nobs]
})[target_attribute_name]
#print(target_attribute_name)
#print(" Entropy List ",entropy_of_list)
print(df_agg_ent.columns)
print("DFAGGENT",df_agg_ent)#[target_attribute_name])
df_agg_ent.columns = ['Entropy', 'PropObservations']
#if trace: # helps understand what fxn is doing:
# print(df_agg_ent)
# Calculate Information Gain:
new_entropy = sum( df_agg_ent['Entropy'] * df_agg_ent['PropObservations']
) old_entropy = entropy_of_list(df[target_attribute_name])
return old_entropy - new_entropy
print('Info-gain for Outlook is :'+str( information_gain(df_tennis, 'Outlook', 'PlayTennis')),"\n")
print('\n Info-gain for Humidity is: ' + str( information_gain(df_tennis, 'Humidity', 'PlayTennis')),"\n")
print('\n Info-gain for Windy is:' + str( information_gain(df_tennis, 'Windy', 'PlayTennis')),"\n")

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

print('\n Info-gain for Temperature is:' + str( information_gain(df_tennis,


'Temperature','PlayTennis')),"\n")
Information Gain Calculation of Outlook
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
overcast
Group:
PlayTennis Outlook Temperature Humidity Windy
2 YES overcast hot high weak
6 YES overcast cool normal strong
11 YES overcast mild high strong
12 YES overcast hot normal weak
Name:
rainy
Group:
PlayTennis Outlook Temperature Humidity Windy
3 YES rainy mild high weak
4 YES rainy cool normal weak
5 NO rainy cool normal strong
9 YES rainy mild normal weak
13 NO rainy mild high strong
Name:
sunny
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
1 NO sunny hot high strong
7 NO sunny mild high weak
8 YES sunny cool normal weak
10 YES sunny mild normal strong
Classes: Counter({'YES': 4})
Number of Instances of the Current Sub Class is 4:
[1.0]

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Classes: ['YES']
Probabilities of Class YES is 1.0:
Probabilities of Class YES is 1.0:
Classes: Counter({'YES': 3, 'NO': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Classes: Counter({'NO': 3, 'YES': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Outlook
overcast 0.000000 0.285714
rainy 0.970951 0.357143
sunny 0.970951 0.357143
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:
[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Info-gain for Outlook is :0.2467498197744391
Information Gain Calculation of Humidity
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
high
Group:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PlayTennis Outlook Temperature Humidity Windy


0 NO sunny hot high weak
1 NO sunny hot high strong
2 YES overcast hot high weak
3 YES rainy mild high weak
7 NO sunny mild high weak
11 YES overcast mild high strong
13 NO rainy mild high strong
Name:
normal
Group:
PlayTennis Outlook Temperature Humidity Windy
4 YES rainy cool normal weak
5 NO rainy cool normal strong
6 YES overcast cool normal strong
8 YES sunny cool normal weak
9 YES rainy mild normal weak
10 YES sunny mild normal strong
12 YES overcast hot normal weak
Classes: Counter({'NO': 4, 'YES': 3})
Number of Instances of the Current Sub Class is 7:
[0.5714285714285714, 0.42857142857142855]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.42857142857142855:
Probabilities of Class YES is 0.5714285714285714:
Classes: Counter({'YES': 6, 'NO': 1})
Number of Instances of the Current Sub Class is 7:
[0.8571428571428571, 0.14285714285714285]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.14285714285714285:
Probabilities of Class YES is 0.8571428571428571:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Humidity
high 0.985228 0.5
normal 0.591673 0.5
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:
[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Info-gain for Humidity is: 0.15183550136234136
Information Gain Calculation of Windy
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
strong
Group:
PlayTennis Outlook Temperature Humidity Windy
1 NO sunny hot high strong
5 NO rainy cool normal strong
6 YES overcast cool normal strong
10 YES sunny mild normal strong
11 YES overcast mild high strong
13 NO rainy mild high strong
Name:
weak
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
2 YES overcast hot high weak
3 YES rainy mild high weak
4 YES rainy cool normal weak
7 NO sunny mild high weak
8 YES sunny cool normal weak
9 YES rainy mild normal weak

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

12 YES overcast hot normal weak


Classes: Counter({'NO': 3, 'YES': 3})
Number of Instances of the Current Sub Class is 6:
[0.5, 0.5]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'YES': 6, 'NO': 2})
Number of Instances of the Current Sub Class is 8:
[0.25, 0.75]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.25:
Probabilities of Class YES is 0.75:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Windy
strong 1.000000 0.428571
weak 0.811278 0.571429
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:
[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Info-gain for Windy is:0.04812703040826927
Information Gain Calculation of Temperature
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
cool
Group:
PlayTennis Outlook Temperature Humidity Windy
4 YES rainy cool normal weak
5 NO rainy cool normal strong

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

6 YES overcast cool normal strong


8 YES sunny cool normal weak
Name:
hot
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
1 NO sunny hot high strong
2 YES overcast hot high weak
12 YES overcast hot normal weak
Name:
mild
Group:
PlayTennis Outlook Temperature Humidity Windy
3 YES rainy mild high weak
7 NO sunny mild high weak
9 YES rainy mild normal weak
10 YES sunny mild normal strong
11 YES overcast mild high strong
13 NO rainy mild high strong
Classes: Counter({'YES': 3, 'NO': 1})
Number of Instances of the Current Sub Class is 4:
[0.75, 0.25]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.25:
Probabilities of Class YES is 0.75:
Classes: Counter({'NO': 2, 'YES': 2})
Number of Instances of the Current Sub Class is 4:
[0.5, 0.5]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'YES': 4, 'NO': 2})

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Number of Instances of the Current Sub Class is 6: [0.6666666666666666, 0.3333333333333333] Classes: ['YES', 'NO']
Probabilities of Class NO is 0.3333333333333333:
Probabilities of Class YES is 0.6666666666666666: Index(['entropy_of_list', '<lambda>'], dtype='object')

DFAGGENT entropy_of_list <lambda>


Temperature cool
hot
mild 0.811278 0.285714
1.000000 0.285714
0.918296 0.428571

Classes: Counter({'YES': 9, 'NO': 5})


Number of Instances of the Current Sub Class is 14: [0.35714285714285715, 0.6428571428571429] Classes: ['NO', 'YES'
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429: Info-gain for Temperature is:0.029222565658954647

ID3 Algorithm
def id3(df, target_attribute_name, attribute_names, default_class=None):
## Tally target attribute:
from collections import Counter
cnt = Counter(x for x in df[target_attribute_name])# class of YES /NO
## First check: Is this split of the dataset homogeneous?
if len(cnt) == 1:
return next(iter(cnt)) # next input data set, or raises StopIteration when EOF is hit.

## Second check: Is this split of the dataset empty?


# if yes, return a default value
elif df.empty or (not attribute_names):
return default_class # Return None for Empty Data Set

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

## Otherwise: This dataset is ready to be devied up!


else:
# Get Default Value for next recursive call of this function:
default_class = max(cnt.keys()) #No of YES and NO Class
# Compute the Information Gain of the attributes:
gainz = [information_gain(df, attr, target_attribute_name) for attr in attribute_names] #
index_of_max = gainz.index(max(gainz)) # Index of Best Attribute
# Choose Best Attribute to split on:
best_attr = attribute_names[index_of_max]

# Create an empty tree, to be populated in a moment


tree = {best_attr:{}} # Iniiate the tree with best attribute as a node
remaining_attribute_names = [i for i in attribute_names if i != best_attr]

# Split dataset
# On each split, recursively call this algorithm.
# populate the empty tree with subtrees, which
# are the result of the recursive call
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree

Predicting Attributes
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PlayTennis') #Remove the class attribute
print("Predicting Attributes:", attribute_names)
List of Attributes: ['PlayTennis', 'Outlook', 'Temperature', 'Humidity', 'Windy']

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Predicting Attributes: ['Outlook', 'Temperature', 'Humidity', 'Windy']

# Run Algorithm:
from pprint import pprint
tree = id3(df_tennis,'PlayTennis',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
#print(tree)
pprint(tree)
attribute = next(iter(tree))
print("Best Attribute :\n",attribute)
print("Tree Keys:\n",tree[attribute].keys())
Information Gain Calculation of Outlook
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'> Name:
overcast Group:
PlayTennis Outlook Temperature Humidity Windy

2 YES overcast hot high weak


6 YES overcast cool normal strong
11 YES overcast mild high strong
12 YES overcast hot normalweak
Name: rainy Group:

PlayTennis Outlook Temperature Humidity Windy


3 YESrainy
YES
mildhighweak
NO
rainy
YESrainy cool
rainy normalweak cool normal strong
4 NO rainy
mild normalweak
5
9
13
Name: sunny
mild high strong

Group:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PlayTennis Outlook Temperature Humidity Windy


0 NO sunny hot high weak
1 NO sunny hot high strong
7 NO sunny mild high weak
8 YES sunny cool normal weak
10 YES sunny mild normal strong

Classes: Counter({'YES': 4})

Number of Instances of the Current Sub Class is 4:


[1.0]

Classes: ['YES']

Probabilities of Class YES is 1.0:

Probabilities of Class YES is 1.0:

Classes: Counter({'YES': 3, 'NO': 2})

Number of Instances of the Current Sub Class is 5:


[0.6, 0.4]

Classes: ['YES', 'NO']

Probabilities of Class NO is 0.4:


Probabilities of Class YES is 0.6:
Classes: Counter({'NO': 3, 'YES': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Index(['entropy_of_list', '<lambda>'], dtype='object')


DFAGGENT entropy_of_list <lambda>
Outlook
overcast 0.000000 0.285714
rainy 0.970951 0.357143
sunny 0.970951 0.357143
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:
[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Information Gain Calculation of Temperature
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
cool
Group:
PlayTennis Outlook Temperature Humidity Windy
4 YES rainy cool normal weak
5 NO rainy cool normal strong
6 YES overcast cool normal strong
8 YES sunny cool normal weak
Name:
hot
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
1 NO sunny hot high strong
2 YES overcast hot high weak
12 YES overcast hot normal weak
Name:
mild
Group:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PlayTennis Outlook Temperature Humidity Windy


3 YES rainy mild high weak
7 NO sunny mild high weak
9 YES rainy mild normal weak
10 YES sunny mild normal strong
11 YES overcast mild high strong
13 NO rainy mild high strong
Classes: Counter({'YES': 3, 'NO': 1})
Number of Instances of the Current Sub Class is 4:
[0.75, 0.25]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.25:
Probabilities of Class YES is 0.75:
Classes: Counter({'NO': 2, 'YES': 2})
Number of Instances of the Current Sub Class is 4:
[0.5, 0.5]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'YES': 4, 'NO': 2})
Number of Instances of the Current Sub Class is 6:
[0.6666666666666666, 0.3333333333333333]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.3333333333333333:
Probabilities of Class YES is 0.6666666666666666:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Temperature
cool 0.811278 0.285714
hot 1.000000 0.285714
mild 0.918296 0.428571
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Information Gain Calculation of Humidity
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
high
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
1 NO sunny hot high strong
2 YES overcast hot high weak
3 YES rainy mild high weak
7 NO sunny mild high weak
11 YES overcast mild high strong
13 NO rainy mild high strong
Name:
normal
Group:
PlayTennis Outlook Temperature Humidity Windy
4 YES rainy cool normal weak
5 NO rainy cool normal strong
6 YES overcast cool normal strong
8 YES sunny cool normal weak
9 YES rainy mild normal weak
10 YES sunny mild normal strong
12 YES overcast hot normal weak
Classes: Counter({'NO': 4, 'YES': 3})
Number of Instances of the Current Sub Class is 7:
[0.5714285714285714, 0.42857142857142855]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.42857142857142855:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Probabilities of Class YES is 0.5714285714285714:


Classes: Counter({'YES': 6, 'NO': 1})
Number of Instances of the Current Sub Class is 7:
[0.8571428571428571, 0.14285714285714285]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.14285714285714285:
Probabilities of Class YES is 0.8571428571428571:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Humidity
high 0.985228 0.5
normal 0.591673 0.5
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:
[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Information Gain Calculation of Windy
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
strong
Group:
PlayTennis Outlook Temperature Humidity Windy
1 NO sunny hot high strong
5 NO rainy cool normal strong
6 YES overcast cool normal strong
10 YES sunny mild normal strong
11 YES overcast mild high strong
13 NO rainy mild high strong
Name:
weak
Group:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

PlayTennis Outlook Temperature Humidity Windy


0 NO sunny hot high weak
2 YES overcast hot high weak
3 YES rainy mild high weak
4 YES rainy cool normal weak
7 NO sunny mild high weak
8 YES sunny cool normal weak
9 YES rainy mild normal weak
12 YES overcast hot normal weak
Classes: Counter({'NO': 3, 'YES': 3})
Number of Instances of the Current Sub Class is 6:
[0.5, 0.5]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'YES': 6, 'NO': 2})
Number of Instances of the Current Sub Class is 8:
[0.25, 0.75]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.25:
Probabilities of Class YES is 0.75:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Windy
strong 1.000000 0.428571
weak 0.811278 0.571429
Classes: Counter({'YES': 9, 'NO': 5})
Number of Instances of the Current Sub Class is 14:
[0.35714285714285715, 0.6428571428571429]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.35714285714285715:
Probabilities of Class YES is 0.6428571428571429:
Information Gain Calculation of Temperature

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>


Name:
cool
Group:
PlayTennis Outlook Temperature Humidity Windy
4 YES rainy cool normal weak
5 NO rainy cool normal strong
Name:
mild
Group:
PlayTennis Outlook Temperature Humidity Windy
3 YES rainy mild high weak
9 YES rainy mild normal weak
13 NO rainy mild high strong
Classes: Counter({'YES': 1, 'NO': 1})
Number of Instances of the Current Sub Class is 2:
[0.5, 0.5]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'YES': 2, 'NO': 1})
Number of Instances of the Current Sub Class is 3:
[0.6666666666666666, 0.3333333333333333]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.3333333333333333:
Probabilities of Class YES is 0.6666666666666666:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Temperature
cool 1.000000 0.4
mild 0.918296 0.6
Classes: Counter({'YES': 3, 'NO': 2})
Number of Instances of the Current Sub Class is 5:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

[0.6, 0.4]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Information Gain Calculation of Humidity
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
high
Group:
PlayTennis Outlook Temperature Humidity Windy
3 YES rainy mild high weak
13 NO rainy mild high strong
Name:
normal
Group:
PlayTennis Outlook Temperature Humidity Windy
4 YES rainy cool normal weak
5 NO rainy cool normal strong
9 YES rainy mild normal weak
Classes: Counter({'YES': 1, 'NO': 1})
Number of Instances of the Current Sub Class is 2:
[0.5, 0.5]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'YES': 2, 'NO': 1})
Number of Instances of the Current Sub Class is 3:
[0.6666666666666666, 0.3333333333333333]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.3333333333333333:
Probabilities of Class YES is 0.6666666666666666:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Humidity
high 1.000000 0.4
normal 0.918296 0.6
Classes: Counter({'YES': 3, 'NO': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Information Gain Calculation of Windy
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
strong
Group:
PlayTennis Outlook Temperature Humidity Windy
5 NO rainy cool normal strong
13 NO rainy mild high strong
Name:
weak
Group:
PlayTennis Outlook Temperature Humidity Windy
3 YES rainy mild high weak
4 YES rainy cool normal weak
9 YES rainy mild normal weak
Classes: Counter({'NO': 2})

Number of Instances of the Current Sub Class is 2:


[1.0]
Classes: ['NO']
Probabilities of Class NO is 1.0:
Probabilities of Class NO is 1.0:
Classes: Counter({'YES': 3})
Number of Instances of the Current Sub Class is 3:

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

[1.0]
Classes: ['YES']
Probabilities of Class YES is 1.0:
Probabilities of Class YES is 1.0:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Windy
strong 0.0 0.4
weak 0.0 0.6
Classes: Counter({'YES': 3, 'NO': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]
Classes: ['YES', 'NO']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Information Gain Calculation of Temperature
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
cool
Group:
PlayTennis Outlook Temperature Humidity Windy
8 YES sunny cool normal weak
Name:
hot
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
1 NO sunny hot high strong
Name:
mild
Group:
PlayTennis Outlook Temperature Humidity Windy
7 NO sunny mild high weak

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

10 YES sunny mild normal strong


Classes: Counter({'YES': 1})
Number of Instances of the Current Sub Class is 1:
[1.0]
Classes: ['YES']
Probabilities of Class YES is 1.0:
Probabilities of Class YES is 1.0:
Classes: Counter({'NO': 2})
Number of Instances of the Current Sub Class is 2:
[1.0]
Classes: ['NO']
Probabilities of Class NO is 1.0:
Probabilities of Class NO is 1.0:
Classes: Counter({'NO': 1, 'YES': 1})
Number of Instances of the Current Sub Class is 2:
[0.5, 0.5]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Temperature
cool 0.0 0.2
hot 0.0 0.4
mil 1.0 0.4
Classes: Counter({'NO': 3, 'YES': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Information Gain Calculation of Humidity
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Name:
high
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
1 NO sunny hot high strong
7 NO sunny mild high weak
Name:
normal
Group:
PlayTennis Outlook Temperature Humidity Windy
8 YES sunny cool normal weak
10 YES sunny mild normal strong
Classes: Counter({'NO': 3})
Number of Instances of the Current Sub Class is 3:
[1.0]
Classes: ['NO']
Probabilities of Class NO is 1.0:
Probabilities of Class NO is 1.0:
Classes: Counter({'YES': 2})
Number of Instances of the Current Sub Class is 2:
[1.0]
Classes: ['YES']
Probabilities of Class YES is 1.0:
Probabilities of Class YES is 1.0:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Humidity
high 0.0 0.6
normal 0.0 0.4
Classes: Counter({'NO': 3, 'YES': 2})
Number of Instances of the Current Sub Class is 5:
[0.6, 0.4]

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Classes: ['NO', 'YES']


Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
Information Gain Calculation of Windy
split: <class 'pandas.core.groupby.groupby.DataFrameGroupBy'>
Name:
strong
Group:
PlayTennis Outlook Temperature Humidity Windy
1 NO sunny hot high strong
10 YES sunny mild normal strong
Name:
weak
Group:
PlayTennis Outlook Temperature Humidity Windy
0 NO sunny hot high weak
7 NO sunny mild high weak
8 YES sunny cool normal weak
Classes: Counter({'NO': 1, 'YES': 1})
Number of Instances of the Current Sub Class is 2:
[0.5, 0.5]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.5:
Probabilities of Class YES is 0.5:
Classes: Counter({'NO': 2, 'YES': 1})
Number of Instances of the Current Sub Class is 3:
[0.6666666666666666, 0.3333333333333333]
Classes: ['NO', 'YES']
Probabilities of Class NO is 0.3333333333333333:
Probabilities of Class YES is 0.6666666666666666:
Index(['entropy_of_list', '<lambda>'], dtype='object')
DFAGGENT entropy_of_list <lambda>
Windy

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

strong 1.000000 0.4


weak 0.918296 0.6
Classes: Counter({'NO': 3, 'YES': 2})
Number of Instances of the Current Sub Class is 5: [0.6, 0.4]
Classes: ['NO', 'YES'] Probabilities of Class NO is 0.4:
Probabilities of Class YES is 0.6:
The Resultant Decision Tree is :
{'Outlook': {'overcast': 'YES',
'rainy': {'Windy': {'strong': 'NO', 'weak': 'YES'}},
'sunny': {'Humidity': {'high': 'NO', 'normal': 'YES'}}}} Best Attribute :
Outlook Tree Keys:
dict_keys(['overcast', 'rainy', 'sunny'])

DT={'Outlook': {'Overcast': 'Yes', 'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Humidity':
{'High': 'No', 'Normal': 'Yes'}}}}
testsample= {'Outlook':'Sunny','Temperature':'Hot', 'Humidity':'High', 'Wind':'Strong'}
dic=testsample
print("Test sample : ", dic)
for value in DT.values():
while(value!='Yes' or value!='No'):
for i in dic.keys():
if i in DT.keys():
DT= DT[i]
#print(DT)
for i in dic.values():
if i in DT.keys():
DT= DT[i]
if( DT=='No' or DT== 'Yes'):
print(" The test sample is classfied as ", DT)

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Test sample : {'Outlook': 'Sunny', 'Temperature': 'Hot', 'Humidity': 'High', 'Wind': 'Strong'}
The test sample is classfied as No

5 Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.

import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float) # Features ( Hrs Slept, Hrs Studied)
y = np.array(([92], [86], [89]), dtype=float) # Labels(Marks obtained)
c=np.amax(X,axis=0) # Normalize
print(c)
X = X/c # Normalize

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

y = y/100
print(X)
print(y)

def sigmoid(x):
return 1/(1 + np.exp(-x))
def sigmoid_grad(x):
return x * (1 - x)
# Variable initialization
epoch=4 #Setting training iterations
eta =0.3 #Setting learning rate (eta)
input_neurons = 2 #number of features in data set
hidden_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
# Weight and bias - Random initialization
wh=np.random.uniform(size=(input_neurons,hidden_neurons)) # 2x3
print(wh)
bh=np.random.uniform(size=(1,hidden_neurons)) # 1x3
print(bh)
wout=np.random.uniform(size=(hidden_neurons,output_neurons)) # 3x1
print(wout)
bout=np.random.uniform(size=(1,output_neurons))
print(bout)

for i in range(epoch):
#Forward Propogation
h_ip=np.dot(X,wh) + bh # Dot product + bias
print(h_ip)
h_act = sigmoid(h_ip) # Activation function
o_ip=np.dot(h_act,wout) + bout
output = sigmoid(o_ip)

#Backpropagation
# Error at Output layer

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Eo = y-output # Error at o/p


outgrad = sigmoid_grad(output)
d_output = Eo* outgrad # Errj=Oj(1-Oj)(Tj-Oj)
print("the d_output is",d_output)
# Error at Hidden later
Eh = d_output.dot(wout.T) # .T means transpose
hiddengrad = sigmoid_grad(h_act) # How much hidden layer wts contributed to error
d_hidden = Eh * hiddengrad
wout += h_act.T.dot(d_output) *eta # Dotproduct of nextlayererror and currentlayerop
wh += X.T.dot(d_hidden) *eta
print("Normalized Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

Output:

[3. 9.]
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
[[0.92]
[0.86]
[0.89]]
[[0.34391748 0.26893556 0.69278977]
[0.27114375 0.50486066 0.36606859]]
[[0.01416985 0.30257255 0.49180958]]
[[0.87639985]
[0.62305314]
[0.48385416]]
[[0.84524982]]
[[0.51459193 0.98672358 1.31973802]
[0.27944443 0.67269588 0.92611094]
[0.53884983 0.90808188 1.42864508]]
[[0.51459193 0.98672358 1.31973802]

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

[0.27944443 0.67269588 0.92611094]


[0.53884983 0.90808188 1.42864508]]
[[0.51459193 0.98672358 1.31973802]
[0.27944443 0.67269588 0.92611094]
[0.53884983 0.90808188 1.42864508]]
[[0.51459193 0.98672358 1.31973802]
[0.27944443 0.67269588 0.92611094]
[0.53884983 0.90808188 1.42864508]]
the d_output is [[ 0.00150286]
[-0.00302754]
[-0.0011524 ]]
Normalized Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[0.92]
[0.86]
[0.89]]
Predicted Output:
[[0.90286381]
[0.89123185]
[0.90317825]]

6. Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

import csv
import random
import math

def loadCsv(filename):
lines = csv.reader(open(filename, "r"))
dataset = list(lines)

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

for i in range(len(dataset)):
dataset[i] = [float(x) for x in dataset[i]]
return dataset

def splitDataset(dataset, splitRatio):


trainSize = int(len(dataset) * splitRatio)
trainSet = []
copy = list(dataset)
while len(trainSet) < trainSize:
index = random.randrange(len(copy))
trainSet.append(copy.pop(index))
return [trainSet, copy]

def separateByClass(dataset):
separated = {}
for i in range(len(dataset)):
vector = dataset[i]
if (vector[-1] not in separated):
separated[vector[-1]] = []
separated[vector[-1]].append(vector)
return separated

def mean(numbers):
return sum(numbers)/float(len(numbers))

def stdev(numbers):
avg = mean(numbers)
variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
return math.sqrt(variance)

def summarize(dataset):
summaries = [(mean(attribute), stdev(attribute)) for attribute in zip(*dataset)]
del summaries[-1]
return summaries

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

def summarizeByClass(dataset):
separated = separateByClass(dataset)
summaries = {}
for classValue, instances in separated.items():
summaries[classValue] = summarize(instances)
return summaries

def calculateProbability(x, mean, stdev):


exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent

def calculateClassProbabilities(summaries, inputVector):


probabilities = {}
for classValue, classSummaries in summaries.items():
probabilities[classValue] = 1
for i in range(len(classSummaries)):
mean, stdev = classSummaries[i]
x = inputVector[i]
probabilities[classValue] *= calculateProbability(x, mean, stdev)
return probabilities

def predict(summaries, inputVector):


probabilities = calculateClassProbabilities(summaries, inputVector)
bestLabel, bestProb = None, -1
for classValue, probability in probabilities.items():
if bestLabel is None or probability > bestProb:
bestProb = probability
bestLabel = classValue
return bestLabel

def getPredictions(summaries, testSet):


predictions = []
for i in range(len(testSet)):

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

result = predict(summaries, testSet[i])


predictions.append(result)
return predictions

def getAccuracy(testSet, predictions):


correct = 0
for i in range(len(testSet)):
if testSet[i][-1] == predictions[i]:
correct += 1
return (correct/float(len(testSet))) * 100.0

def main():
filename = 'pima-indians-diabetes.csv'
splitRatio = 0.80
dataset = loadCsv(filename)
trainingSet, testSet = splitDataset(dataset, splitRatio)
print('Split {0} rows into train={1} and test={2} rows'.format(len(dataset), len(trainingSet),
len(testSet)))
# prepare model
summaries = summarizeByClass(trainingSet)
# test model
predictions = getPredictions(summaries, testSet)
accuracy = getAccuracy(testSet, predictions)
print('Accuracy: {0}%'.format(accuracy))
main()
OUTPUT:
Split 768 rows into train=614 and test=154 rows
Accuracy: 32.467532467532465%

7. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment
on the quality of clustering. You can add Java/Python ML library classes/API in the program.

import matplotlib.pyplot as plt

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

from sklearn import datasets


from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
import sklearn.metrics as sm
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
y.columns = ['Targets']
plt.figure(figsize=(14,7))
model = KMeans(n_clusters=3)
model.fit(X)
model.labels_
plt.figure(figsize=(14,7))
colormap = np.array(['red', 'lime', 'black'])
plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('EM Clustering')
plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K-Means clustering')
acc = sm.accuracy_score( y , model.labels_)
print(acc * 100)

OUTPUT:
24.0
<Figure size 1008x504 with 0 Axes>

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

8. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions. Java/Python ML library classes can be used for this problem.

from sklearn.datasets import load_iris


from sklearn.neighbors import KNeighborsClassifier

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

from sklearn.model_selection import train_test_split


import numpy as np
iris_dataset=load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris_dataset["data"], iris_dataset["target"])
kn = KNeighborsClassifier()
kn.fit(X_train, y_train)
prediction = kn.predict(X_test)
print(f"ACCURACY: {kn.score(X_test, y_test)}")
target_names = iris_dataset.target_names
for pred,actual in zip(prediction,y_test):
print(f"Prediction is {target_names[pred]} Actual is {target_names[actual]}")

OUTPUT:
ACCURACY: 0.9473684210526315
Prediction is virginica Actual is virginica
Prediction is virginica Actual is virginica
Prediction is virginica Actual is virginica
Prediction is setosa Actual is setosa
Prediction is virginica Actual is virginica
Prediction is virginica Actual is virginica
Prediction is virginica Actual is virginica
Prediction is setosa Actual is setosa
Prediction is setosa Actual is setosa
Prediction is versicolor Actual is virginica
Prediction is virginica Actual is virginica
Prediction is virginica Actual is virginica
Prediction is setosa Actual is setosa
Prediction is versicolor Actual is versicolor
Prediction is setosa Actual is setosa
Prediction is versicolor Actual is versicolor
Prediction is versicolor Actual is versicolor
Prediction is setosa Actual is setosa
Prediction is setosa Actual is setosa
Prediction is virginica Actual is virginica

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Prediction is versicolor Actual is versicolor


Prediction is setosa Actual is setosa
Prediction is versicolor Actual is versicolor
Prediction is virginica Actual is virginica
Prediction is virginica Actual is virginica
Prediction is virginica Actual is versicolor
Prediction is setosa Actual is setosa
Prediction is versicolor Actual is versicolor
Prediction is versicolor Actual is versicolor
Prediction is setosa Actual is setosa
Prediction is setosa Actual is setosa
Prediction is versicolor Actual is versicolor
Prediction is virginica Actual is virginica
Prediction is versicolor Actual is versicolor
Prediction is setosa Actual is setosa
Prediction is virginica Actual is virginica
Prediction is versicolor Actual is versicolor
Prediction is setosa Actual is setosa

9. Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.

import matplotlib.pyplot as plt


import pandas as pd
import numpy as np

def kernel(point,xmat, k):


m,n= np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights

def localWeight(point,xmat,ymat,k):
wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat,ymat,k):
m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

data = pd.read_csv('10_data10_tips.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)

mbill = np.mat(bill)
mtip = np.mat(tip)
m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X= np.hstack((one.T,mbill.T))

ypred = localWeightRegression(X,mtip,5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='blue')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=8)
plt.xlabel('Total bill')
plt.ylabel('Tip')
#plt.show()

OUTPUT:

Regression with parameter k=3

Regression with parameter k=8

Downloaded by wasim khan ([email protected])


lOMoARcPSD|44664451

Downloaded by wasim khan ([email protected])

You might also like