My ML Lab Manual
My ML Lab Manual
My ML Lab Manual
Problem1: Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data samples. Read the
training data from a .CSV file.
Algorithm:
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
• For each attribute constraint ai in h
If the constraint ai in h is satisfied by x then do nothing
else replace ai in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
Illustration:
Step1: Find S
Step2: Find S
Step2: Find S
Iteration 4 and Step 3: Find S
print (" \n The most general hypothesis : ['?', '?', '?', '?', '?', '?']\n")
print ("\n The most specific hypothesis : ['0', '0', '0', '0', '0', '0']\n")
a=[]
print("\n The Given Training Data Set \n")
Problem-2: For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent with the training examples.
Trace – 1:
Trace – 2:
Trace – 3:
Final Version Space:
Source Code:
OUTPUT:
Problem – 3: Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the decision tree
and apply this knowledge to classify a new sample.
Algorithm:
Illustration: To illustrate the operation of ID3, let’s consider the learning task represented by the
below examples Compute the Gain and identify which attribute is the best as illustrated below
Day Outlook Temperature. Humidity Wind Play Tennis
Source Code:
import pandas as pd
from pandas import DataFrame
df_tennis =
DataFrame.from_csv('C:\\Users\\HD\\Desktop\\Data\\PlayTennis.csv')
df_tennis
Output :
No and Yes Classes : PlayTennis Counter({'Yes': 9, 'No': 5})
Entropy of given PlayTennis Data Set : 0.9402859586706309
ID3 Algorithm
def id3(df, target_attribute_name, attribute_names, default_class=None):
## Tally target attribute:
from collections import Counter
cnt = Counter(x for x in df[target_attribute_name])# class of YES /NO
## First check: Is this split of the dataset homogeneous?
if len(cnt) == 1:
return next(iter(cnt))
## Second check: Is this split of the dataset empty?
# if yes, return a default value
elif df.empty or (not attribute_names):
return default_class
## Otherwise: This dataset is ready to be divvied up!
else:
# Get Default Value for next recursive call of this function:
default_class = max(cnt.keys()) #[index_of_max] # most common value of
target attribute in dataset
# Choose Best Attribute to split on:
gainz = [information_gain(df, attr, target_attribute_name) for attr in
attribute_names]
index_of_max = gainz.index(max(gainz))
best_attr = attribute_names[index_of_max]
# Create an empty tree, to be populated in a moment
tree = {best_attr:{}}
remaining_attribute_names = [i for i in attribute_names if i != best_attr]
# Split dataset
# On each split, recursively call this algorithm.
# populate the empty tree with subtrees, which
# are the result of the recursive call
for attr_val, data_subset in df.groupby(best_attr):
subtree = id3(data_subset,
target_attribute_name,
remaining_attribute_names,
default_class)
tree[best_attr][attr_val] = subtree
return tree
Predicting Attributes:
# Get Predictor Names (all but 'class')
attribute_names = list(df_tennis.columns)
print("List of Attributes:", attribute_names)
attribute_names.remove('PlayTennis') #Remove the class attribute
print("Predicting Attributes:", attribute_names)
Tree Construction:
# Run Algorithm:
from pprint import pprint
tree = id3(df_tennis,'PlayTennis',attribute_names)
print("\n\nThe Resultant Decision Tree is :\n")
pprint(tree)
Classification Accuracy:
def classify(instance, tree, default=None):
attribute = next(iter(tree))#tree.keys()[0]
if instance[attribute] in tree[attribute].keys():
result = tree[attribute][instance[attribute]]
if isinstance(result, dict): # this is a tree, delve deeper
return classify(instance, result)
else:
return result # this is a label
else:
return default