0% found this document useful (0 votes)
27 views40 pages

ML Lab Manual 20-06

Uploaded by

Uma Maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views40 pages

ML Lab Manual 20-06

Uploaded by

Uma Maheswari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 40

EXCEL ENGINEERING COLLEGE

(Autonomous)
Approved by AICTE, New Delhi & Affiliated to Anna University, Chennai
Accredited by NBA, NAAC with “A+” and Recognised by UGC (2f &12B)
KOMARAPALAYAM - 637303

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA


SCIENCE

20AI505-MACHINE LEARNING LABORATORY

V SEMESTER - R 2020

REFERENCE MANUAL

PREPARED BY

C. Eben Exceline, AP/AI&DS


EXCEL ENGINEERING COLLEGE

VISION

To create competitive human resources in the fields of engineering for the benefit of
society

MISSION

to meet global challenges.

 To provide a conducive ambience for better learning and to bring creativity in the students.
 To develop sustainable environment for innovative learning to serve the needy.
 To meet global demands for excellence in technical education.
 To train young minds with values, culture, integrity, innovation and leadership.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE


AND DATA SCIENCE

VISION

To promote quality education with industry collaboration and to enable students with
intellectual skills to succeed in globally competitive environment.

MISSION

 To educate the students with strong fundamentals in the areas of Artificial Intelligence and Data
Science.
 Provide multi-disciplinary research and innovation driven academic environment to meet the
global demands.
 Foster the spirit of lifelong learning in students through practical and social exposure beyond
the classroom
PROGRAMME EDUCATIONAL OBJECTIVES (PEOs)

1. Graduates will have solid basics in Mathematics, Programming, Machine Learning,


Artificial Intelligence and Data Science Fundamentals and Advancements to solve
technical problems.
2. Graduates will have the capability to apply their acquired knowledge and skills to solve the
issues in real world Artificial Intelligence and Data Science sectors and to develop feasible
and viable systems.
3. Graduates will have the potential to participate in life-long learning through professional
developments for societal needs with ethical values.

PROGRAM SPECIFIC OUTCOMS(PSOs)

1. Ability to implement innovative, cost effective, energy efficient and eco-friendly integrated
solutions for existing and new applications using Internet of Things.
2. Graduates will possess the additional skills in network security and IT infrastructure in Cyberspace
3. Develop, test and maintain software system for business and other applications that meet the
automation needs of the society and industry
PROGRAMME OUTCOMES (POs)

1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals and an engineering specialization to the solution of complex engineering
problems.
2. Problem Analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of
mathematics, natural sciences, and engineering sciences.
3. Design / Development of solutions: Design solutions for complex engineering problems
and design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods, including design of experiments, analysis and interpretation of data, and synthesis
of the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling of complex engineering
activities with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and Sustainability: Understand the impact of the professional engineering
solutions to societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual and as a member or leader
in diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and
receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the
engineering management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Lifelong learning: Recognize the need for and have the preparation and ability to engage
in independent and lifelong learning in the broadest context of technological change.
20AI505 MACHINE LEARNING LABORATORY
OBJECTIVES:
1. Make use of Data sets in implementing the machine learning algorithms
2. Implement the machine learning concepts and algorithms in any suitable language of choice.
3. Propose appropriate data sets to the Machine Learning algorithms
4. Identify the appropriate algorithms for real-world problems.
5. Demonstrate Machine learning with readily available data.

CO
S. No List of Exercises Mapping RBT
Implement and demonstrate the FIND-S algorithm for finding the most
1 specific hypothesis based on a given set of training data samples. Read the Apply
training data from a .CSV file. CO1
For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Candidate-Elimination algorithm to
2 output a description of the set of all hypotheses consistent with the training Apply
CO1
examples.
Write a program to demonstrate the working of the decision tree-based
3 ID3 algorithm. Use an appropriate data set for building the Apply
decision tree and apply this knowledge to classify a new sample. CO2
Build an Artificial Neural Network by implementing the Back propagation
4 algorithm and test the same using appropriate data Apply
sets. CO2
Write a program to implement the naïve Bayesian classifier for a sample
5 training data set stored as a .CSV file. Compute the accuracy of the Apply
classifier, considering few test data sets. CO3
Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be
6 used to write the program. Calculate the Apply
CO3
accuracy, precision, and recall for your data set.
Write a program to construct a Bayesian network considering medical data.
Use this model to demonstrate the diagnosis of heart patients using
7 standard Heart Disease Data Set. You can use CO4 Apply
Java/Python ML library classes/API.
Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same
data set for clustering using k-Means algorithm. Compare the results of
8
these two algorithms and comment on the quality of clustering. You can CO4 Apply
add Java/Python ML library classes/API in the
program.
Write a program to implement k-Nearest Neighbor algorithm to
classify the iris data set. Print both correct and wrong predictions. Java/Python CO5 Apply
9 ML library classes can be used for this problem.

Implement the non-parametric Locally Weighted Regression algorithm in


order to fit data points. Select appropriate data set for your experiment and CO5 Apply
10 draw graphs.
OUTCOMES:
Upon Completion of the course, the students will be able to:
 Implement the procedures for the machine learning algorithms.
 Design Java/Python programs for various Learning algorithms.
 Classify appropriate data sets to the Machine Learning algorithms.
 Apply Machine Learning algorithms to solve real world problems.
 Perform experiments in Machine Learning using real-world data.
COURSE: 20AI505 MACHINE LEARNING LABORATORY

1. EXPERIMENT NO: 1
2. TITLE: FIND-S ALGORITHM
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python
4. AIM:
• Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis
based on a given set of training data samples. Read the training data from a .CSV file.
5. THEORY:
• The concept learning approach in machine learning, can be formulated as “Problem of
searching through a predefined space of potential hypotheses for the hypothesis that best
fits the training examples”.
• Find-S algorithm for concept learning is one of the most basic algorithms of machine
learning.

Find-S Algorithm
1. Initialize h to the most specific hypothesis in H
2. For each positive training instance x
For each attribute constraint a i in h :
If the constraint a i in h is satisfied by x then do nothing
Else replace a i in h by the next more general constraint that is satisfied by x
3. Output hypothesis h
• It is Guaranteed to output the most specific hypothesis within H that is consistent with the
positive training examples.
• Also Notice that negative examples are ignored.
Limitations of the Find-S algorithm:
• No way to determine if the only final hypothesis (found by Find-S) is consistent with data or
there are more hypothesis that is consistent with data.
• Inconsistent sets of training data can mislead the finds algorithm as it ignores negative data
samples.
• A good concept learning algorithm should be able to backtrack the choice of hypothesis
found so that the resulting hypothesis can be improved over time. Unfortunately, Find-S
provide no such method.
6.PROCEDURE/PROGRAMME:
import pandas as pd
import numpy as np
d = pd.read_csv("PlayTennis.csv")

print(d)
a = np.array(d)[:,:-1]
print(" The attributes are: ",a)
t = np.array(d)[:,-1]
print("The target is: ",t)
def train(c,t):
for i, val in enumerate(t):
if val == "Yes":
specific_hypothesis = c[i].copy()
break

for i, val in enumerate(c):


if t[i] == "Yes":
COURSE: 20AI505 MACHINE LEARNING LABORATORY

for x in range(len(specific_hypothesis)):
if val[x] != specific_hypothesis[x]:
specific_hypothesis[x] = '?'
else:
pass
return specific_hypothesis
print(" The final hypothesis is:",train(a,t))
Output
Unnamed: 0 PlayTennis Outlook Temperature Humidity Wind
0 0 No Sunny Hot High Weak
1 1 No Sunny Hot High Strong
2 2 Yes Overcast Hot High Weak
3 3 Yes Rain Mild High Weak
4 4 Yes Rain Cool Normal Weak
5 5 No Rain Cool Normal Strong
6 6 Yes Overcast Cool Normal Strong
7 7 No Sunny Mild High Weak
8 8 Yes Sunny Cool Normal Weak
9 9 Yes Rain Mild Normal Weak
10 10 Yes Sunny Mild Normal Strong
11 11 Yes Overcast Mild High Strong
12 12 Yes Overcast Hot Normal Weak
13 13 No Rain Mild High Strong
The attributes are: [[0 'No' 'Sunny' 'Hot' 'High']
[1 'No' 'Sunny' 'Hot' 'High']
[2 'Yes' 'Overcast' 'Hot' 'High']
[3 'Yes' 'Rain' 'Mild' 'High']
[4 'Yes' 'Rain' 'Cool' 'Normal']
[5 'No' 'Rain' 'Cool' 'Normal']
[6 'Yes' 'Overcast' 'Cool' 'Normal']
[7 'No' 'Sunny' 'Mild' 'High']
[8 'Yes' 'Sunny' 'Cool' 'Normal']
[9 'Yes' 'Rain' 'Mild' 'Normal']
[10 'Yes' 'Sunny' 'Mild' 'Normal']
[11 'Yes' 'Overcast' 'Mild' 'High']
[12 'Yes' 'Overcast' 'Hot' 'Normal']
[13 'No' 'Rain' 'Mild' 'High']]
The target is: ['Weak' 'Strong' 'Weak' 'Weak' 'Weak' 'Strong' 'Strong' 'Weak'
'Weak'
'Weak' 'Strong' 'Strong' 'Weak' 'Strong']
The final hypothesis is: None
COURSE: 20AI505 MACHINE LEARNING LABORATORY

EXPERIMENT NO: 2
2. TITLE: CANDIDATE-ELIMINATION ALGORITHM
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python

4. AIM:
• For a given set of training data examples stored in a .CSV file, implement and demonstrate
the Candidate-Elimination algorithm to output a description of the set of all hypotheses
consistent with the training examples.

5. THEORY:
• The key idea in the Candidate-Elimination algorithm is to output a description of the set of
all hypotheses consistent with the training examples.
• It computes the description of this set without explicitly enumerating all of its members.
• This is accomplished by using the more-general-than partial ordering and maintaining a
compact representation of the set of consistent hypotheses.
• The algorithm represents the set of all hypotheses consistent with the observed training
examples. This subset of all hypotheses is called the version space with respect to the
hypothesis space H and the training examples D, because it contains all plausible versions of
the target concept.
• A version space can be represented with its general and specific boundary sets.
• The Candidate-Elimination algorithm represents the version space by storing only its most
general members G and its most specific members S.
• Given only these two sets S and G, it is possible to enumerate all members of a version
space by generating hypotheses that lie between these two sets in general-to-specific partial
ordering over hypotheses. Every member of the version space lies between these boundaries

Algorithm
1. Initialize G to the set of maximally general hypotheses in H
2. Initialize S to the set of maximally specific hypotheses in H
3. For each training example d, do
If d is a positive example
Remove from G any hypothesis inconsistent with d ,
For each hypothesis s in S that is not consistent with d ,
Remove s from S
Add to S all minimal generalizations h of s such that h is consistent with d,
and some member of G is more general than h
Remove from S, hypothesis that is more general than another hypothesis in S
If d is a negative example
Remove from S any hypothesis inconsistent with d
For each hypothesis g in G that is not consistent with d
Remove g from G
Add to G all minimal specializations h of g such that h is consistent with d,
and some member of S is more specific than h
Remove from G any hypothesis that is less general than another hypothesis in G

Course Teacher: Mrs. C. Eben Exceline, Assistant Professor, Department of AIDS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

6.PROCEDURE/PROGRAMME:
import numpy as np

import pandas as pd

data = pd.read_csv('PlayTennis.csv')

concepts = np.array(data.iloc[:,0:-1])

print("\nInstances are:\n",concepts)

target = np.array(data.iloc[:,-1])

print("\nTarget Values are: ",target)

def learn(concepts, target):

specific_h = concepts[0].copy()

print("\nInitialization of specific_h and genearal_h")

print("\nSpecific Boundary: ", specific_h)

general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]

print("\nGeneric Boundary: ",general_h)

for i, h in enumerate(concepts):

print("\nInstance", i+1 , "is ", h)

if target[i] == "yes":

print("Instance is Positive ")

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

specific_h[x] ='?'

general_h[x][x] ='?'

if target[i] == "no":

print("Instance is Negative ")

for x in range(len(specific_h)):

if h[x]!= specific_h[x]:

general_h[x][x] = specific_h[x]
Course Teacher: Mrs. C. Eben Exceline, Assistant Professor, Department of AIDS,
Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

else:

general_h[x][x] = '?'

print("Specific Bundary after ", i+1, "Instance is ", specific_h)

print("Generic Boundary after ", i+1, "Instance is ", general_h)

print("\n")

indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]

for i in indices:

general_h.remove(['?', '?', '?', '?', '?', '?'])

return specific_h, general_h

s_final, g_final = learn(concepts, target)

print("Final Specific_h: ", s_final, sep="\n")


print("Final General_h: ", g_final, sep="\n")
7.RESULTS&CONCLUSIONS:
Instances are:
[[0 'No' 'Sunny' 'Hot' 'High']
[1 'No' 'Sunny' 'Hot' 'High']
[2 'Yes' 'Overcast' 'Hot' 'High']
[3 'Yes' 'Rain' 'Mild' 'High']
[4 'Yes' 'Rain' 'Cool' 'Normal']
[5 'No' 'Rain' 'Cool' 'Normal']
[6 'Yes' 'Overcast' 'Cool' 'Normal']
[7 'No' 'Sunny' 'Mild' 'High']
[8 'Yes' 'Sunny' 'Cool' 'Normal']
[9 'Yes' 'Rain' 'Mild' 'Normal']
[10 'Yes' 'Sunny' 'Mild' 'Normal']
[11 'Yes' 'Overcast' 'Mild' 'High']
[12 'Yes' 'Overcast' 'Hot' 'Normal']
[13 'No' 'Rain' 'Mild' 'High']]

Target Values are: ['Weak' 'Strong' 'Weak' 'Weak' 'Weak' 'Strong' 'Strong'
'Weak' 'Weak'
'Weak' 'Strong' 'Strong' 'Weak' 'Strong']

Initialization of specific_h and genearal_h

Specific Boundary: [0 'No' 'Sunny' 'Hot' 'High']

Generic Boundary: [['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?',
'?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]

Course Teacher: Mrs. C. Eben Exceline, Assistant Professor, Department of AIDS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

Instance 1 is [0 'No' 'Sunny' 'Hot' 'High']


Specific Bundary after 1 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 1 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 2 is [1 'No' 'Sunny' 'Hot' 'High']


Specific Bundary after 2 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 2 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 3 is [2 'Yes' 'Overcast' 'Hot' 'High']


Specific Bundary after 3 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 3 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 4 is [3 'Yes' 'Rain' 'Mild' 'High']


Specific Bundary after 4 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 4 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 5 is [4 'Yes' 'Rain' 'Cool' 'Normal']


Specific Bundary after 5 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 5 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 6 is [5 'No' 'Rain' 'Cool' 'Normal']


Specific Bundary after 6 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 6 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 7 is [6 'Yes' 'Overcast' 'Cool' 'Normal']


Specific Bundary after 7 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 7 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 8 is [7 'No' 'Sunny' 'Mild' 'High']

Course Teacher: Mrs. C. Eben Exceline, Assistant Professor, Department of AIDS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

Specific Bundary after 8 Instance is [0 'No' 'Sunny' 'Hot' 'High']


Generic Boundary after 8 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 9 is [8 'Yes' 'Sunny' 'Cool' 'Normal']


Specific Bundary after 9 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 9 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 10 is [9 'Yes' 'Rain' 'Mild' 'Normal']


Specific Bundary after 10 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 10 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 11 is [10 'Yes' 'Sunny' 'Mild' 'Normal']


Specific Bundary after 11 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 11 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 12 is [11 'Yes' 'Overcast' 'Mild' 'High']


Specific Bundary after 12 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 12 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 13 is [12 'Yes' 'Overcast' 'Hot' 'Normal']


Specific Bundary after 13 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 13 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Instance 14 is [13 'No' 'Rain' 'Mild' 'High']


Specific Bundary after 14 Instance is [0 'No' 'Sunny' 'Hot' 'High']
Generic Boundary after 14 Instance is [['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]

Final Specific_h:
[0 'No' 'Sunny' 'Hot' 'High']
Final General_h:
[['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]

Course Teacher: Mrs. C. Eben Exceline, Assistant Professor, Department of AIDS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

1. EXPERIMENT NO: 3
2. TITLE: ID3 ALGORITHM
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python
4. AIM:
• Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use
an appropriate data set for building the decision tree and apply this knowledge toclassify a
new sample.
5. THEORY:
• ID3 algorithm is a basic algorithm that learns decision trees by constructing them topdown,
beginning with the question "which attribute should be tested at the root of the tree?".
• To answer this question, each instance attribute is evaluated using a statistical test to
determine how well it alone classifies the training examples. The best attribute is selected
and used as the test at the root node of the tree.
• A descendant of the root node is then created for each possible value of this attribute, and
the training examples are sorted to the appropriate descendant node (i.e., down the branch
corresponding to the example's value for this attribute).
• The entire process is then repeated using the training examples associated with each
descendant node to select the best attribute to test at that point in the tree.
• A simplified version of the algorithm, specialized to learning boolean-valued functions (i.e.,
concept learning), is described below.

Algorithm: ID3(Examples, Target Attribute, Attributes)


Input: Examples are the training examples.
Target attribute is the attribute whose value is to be predicted by the tree.
Attributes is a list of other attributes that may be tested by the learned decision tree.
Output: Returns a decision tree that correctly classiJies the given Examples
Method:
1. Create a Root node for the tree
2. If all Examples are positive, Return the single-node tree Root, with label = +
3. If all Examples are negative, Return the single-node tree Root, with label = -
4. If Attributes is empty,
Return the single-node tree Root, with label = most common value of
Target Attribute in Examples
Else
A ← the attribute from Attributes that best classifies Examples
The decision attribute for Root ←A
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi
Let Examplesvi be the subset of Examples that have value vi for A
If Examplesvi is empty Then below this new branch add a leaf node with label = most
common value of Target Attribute in Examples
Else
below this new branch add the subtree ID3(Examplesvi, TargetAttribute, Attributes–{A})
End
Return Root
6. PROCEDURE / PROGRAMME :

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

import numpy as np
import math
import csv
def read_data(filename):
with open(filename, 'r') as csvfile:
datareader = csv.reader(csvfile, delimiter=',')
headers = next(datareader)
metadata = []
traindata = []
for name in headers:
metadata.append(name)
for row in datareader:
traindata.append(row)

return (metadata, traindata)


class Node:
def __init__(self, attribute):
self.attribute = attribute
self.children = []
self.answer = ""

def __str__(self):
return self.attribute
def subtables(data, col, delete):
dict = {}
items = np.unique(data[:, col])
count = np.zeros((items.shape[0], 1), dtype=np.int32)

for x in range(items.shape[0]):
for y in range(data.shape[0]):
if data[y, col] == items[x]:
count[x] += 1

for x in range(items.shape[0]):
dict[items[x]] = np.empty((int(count[x]), data.shape[1]), dtype="|S32")
pos = 0
for y in range(data.shape[0]):
if data[y, col] == items[x]:
dict[items[x]][pos] = data[y]
pos += 1
if delete:
dict[items[x]] = np.delete(dict[items[x]], col, 1)

return items, dict


def entropy(S):
items = np.unique(S)

if items.size == 1:
return 0

counts = np.zeros((items.shape[0], 1))


sums = 0

for x in range(items.shape[0]):
counts[x] = sum(S == items[x]) / (S.size * 1.0)

for count in counts:


sums += -1 * count * math.log(count, 2)
return sums

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

def gain_ratio(data, col):


items, dict = subtables(data, col, delete=False)

total_size = data.shape[0]
entropies = np.zeros((items.shape[0], 1))
intrinsic = np.zeros((items.shape[0], 1))

for x in range(items.shape[0]):
ratio = dict[items[x]].shape[0]/(total_size * 1.0)
entropies[x] = ratio * entropy(dict[items[x]][:, -1])
intrinsic[x] = ratio * math.log(ratio, 2)

total_entropy = entropy(data[:, -1])


iv = -1 * sum(intrinsic)

for x in range(entropies.shape[0]):
total_entropy -= entropies[x]

return total_entropy / iv
def create_node(data, metadata):
if (np.unique(data[:, -1])).shape[0] == 1:
node = Node("")
node.answer = np.unique(data[:, -1])[0]
return node

gains = np.zeros((data.shape[1] - 1, 1))

for col in range(data.shape[1] - 1):


gains[col] = gain_ratio(data, col)

split = np.argmax(gains)

node = Node(metadata[split])
metadata = np.delete(metadata, split, 0)

items, dict = subtables(data, split, delete=True)

for x in range(items.shape[0]):
child = create_node(dict[items[x]], metadata)
node.children.append((items[x], child))

return node
def empty(size):
s = ""
for x in range(size):
s += " "
return s

def print_tree(node, level):


if node.answer != "":
print(empty(level), node.answer)
return
print(empty(level), node.attribute)
for value, n in node.children:
print(empty(level + 1), value)
print_tree(n, level + 2)
metadata, traindata = read_data("tennisdata.csv")
data = np.array(traindata)
node = create_node(data, metadata)

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

print_tree(node, 0)

7.RESULTS&CONCLUSIONS:

Overcast
b'Yes'
Rainy
Windy
b'False'
b'Yes'
b'True'
b'No'
Sunny
Humidity
b'High'
b'No'
b'Normal'
b'Yes'

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

1. EXPERIMENT NO: 4
2. TITLE: BACKPROPAGATION ALGORITHM
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python
4. AIM:
• Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.
5. THEORY:
• Artificial neural networks (ANNs) provide a general, practical method for learning
real- valued, discrete-valued, and vector-valued functions from examples.
• Algorithms such as BACKPROPAGATION gradient descent to tune network parameters
to best fit a training set of input-output pairs.
• ANN learning is robust to errors in the training data and has been successfully applied
to problems such as interpreting visual scenes, speech recognition, and learning robot
control strategies.

Backpropogation algorithm
1. Create a feed-forward network with ni inputs, nhidden hidden units, and nout output units.
2. Initialize each wi to some small random value (e.g., between -.05 and .05).
3. Until the termination condition is met, do
For each training example <(x1,…xn),t>, do
// Propagate the input forward through the network:
a.
Input the instance (x1, ..,xn) to the n/w & compute the n/w outputs ok for every unit
// Propagate the errors backward through the network:
b.
For each output unit k, calculate its error term k ; k = ok(1-ok)(tk-ok)
c.
For each hidden unit h, calculate its error term h; h=oh(1-oh) k wh,k k
d.
For each network weight wi,j do; wi,j=wi,j+wi,j where wi,j=  j xi,j

6. PROCEDURE / PROGRAMME :
import numpy as np

class NeuralNetwork:
def __init__(self, input_size, hidden_size, output_size):
self.input_size = input_size
self.hidden_size = hidden_size
self.output_size = output_size

# Initialize weights
self.weights_input_hidden = np.random.randn(self.input_size, self.hidden_size)
self.weights_hidden_output = np.random.randn(self.hidden_size, self.output_size)

# Initialize the biases


self.bias_hidden = np.zeros((1, self.hidden_size))
self.bias_output = np.zeros((1, self.output_size))

def sigmoid(self, x):


return 1 / (1 + np.exp(-x))

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

def sigmoid_derivative(self, x):


return x * (1 - x)

def feedforward(self, X):


# Input to hidden
self.hidden_activation = np.dot(X, self.weights_input_hidden) + self.bias_hidden
self.hidden_output = self.sigmoid(self.hidden_activation)

# Hidden to output
self.output_activation = np.dot(self.hidden_output, self.weights_hidden_output) +
self.bias_output
self.predicted_output = self.sigmoid(self.output_activation)

return self.predicted_output

def backward(self, X, y, learning_rate):


# Compute the output layer error
output_error = y - self.predicted_output
output_delta = output_error * self.sigmoid_derivative(self.predicted_output)

# Compute the hidden layer error


hidden_error = np.dot(output_delta, self.weights_hidden_output.T)
hidden_delta = hidden_error * self.sigmoid_derivative(self.hidden_output)

# Update weights and biases


self.weights_hidden_output += np.dot(self.hidden_output.T, output_delta) *
learning_rate
self.bias_output += np.sum(output_delta, axis=0, keepdims=True) * learning_rate
self.weights_input_hidden += np.dot(X.T, hidden_delta) * learning_rate
self.bias_hidden += np.sum(hidden_delta, axis=0, keepdims=True) * learning_rate

def train(self, X, y, epochs, learning_rate):


for epoch in range(epochs):
output = self.feedforward(X)
self.backward(X, y, learning_rate)
if epoch % 4000 == 0:
loss = np.mean(np.square(y - output))
print(f"Epoch {epoch}, Loss:{loss}")

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])


y = np.array([[0], [1], [1], [0]])

nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)


nn.train(X, y, epochs=10000, learning_rate=0.1)

# Test the trained model


output = nn.feedforward(X)
print("Predictions after training:")
print(output)

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

7.RESULTS&CONCLUSIONS:

Epoch 0, Loss:0.2673109184298244
Epoch 4000, Loss:0.00889817576083016
Epoch 8000, Loss:0.002513476471712426
Predictions after training:
[[0.02631168]
[0.95830528]
[0.95885006]
[0.0553113 ]]

1. EXPERIMENT NO: 5
2. TITLE: NAÏVE BAYESIAN CLASSIFIER
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python

4. AIM:
• Write a program to implement the naïve Bayesian classifier for a sample training data set
stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets.

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

5. THEORY:
Naive Bayes algorithm : Naive Bayes algorithm is a classification technique based on Bayes’
Theorem with an assumption of independence among predictors. In simple terms, a Naive Bayes
classifier assumes that the presence of a particular feature in a class is unrelated to the presence of
any other feature. For example, a fruit may be considered to be an apple if it is red, round, and
about 3 inches in diameter. Even if these features depend on each other or upon the existence of the
other features, all of these properties independently contribute to the probability that this fruit is an
apple and that is why it is known as ‘Naive’.
Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.
Bayes theorem provides a way of calculating posterior probability P(c|x) from P(c), P(x) and P(x|c).
Look at the equation below:

where
P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
P(c) is the prior probability of class.
P(x|c) is the likelihood which is the probability of predictor given class.
P(x) is the prior probability of predictor.

The naive Bayes classifier applies to learning tasks where each instance x is described by a
conjunction of attribute values and where the target function f (x) can take on any value from some
finite set V. A set of training examples of the target function is provided, and a new instance is
presented, described by the tuple of attribute values (a1, a2, ... ,an). The learner is asked to predict
the target value, or classification, for this new instance.
The Bayesian approach to classifying the new instance is to assign the most probable target value,

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

vMAP, given the attribute values (al, a2, ..., an) that describe the instance.

We can use Bayes theorem to rewrite this expression as

Now we could attempt to estimate the two terms in Equation (19) based on the training data. It is
easy to estimate each of the P(vj) simply by counting the frequency with which each target value vj
occurs in the training data.

The naive Bayes classifier is based on the simplifying assumption that the attribute values are
conditionally independent given the target value. In other words, the assumption is that given the
target value of the instance, the probability of observing the conjunction a l, a2, … , an, is just the
product of the probabilities for the individual attributes: P(a l, a2, … , an | vj) = Πi P(ai | vj).
Substituting this, we have the approach used by the naive Bayes classifier.

where vNB denotes the target value output by the naive Bayes classifier.

When dealing with continuous data, a typical assumption is that the continuous values associated
with each class are distributed according to a Gaussian distribution. For example, suppose the
training data contains a continuous attribute, x. We first segment the data by the class, and then
compute the mean and variance of x in each class.

Let μ be the mean of the values in x associated with class Ck, and let σ2k be the variance of the
values in x associated with class Ck. Suppose we have collected some observation value v. Then,
the probability distribution of v given a class Ck, p(x=v|Ck) can be computed by plugging v into the
equation for a Normal distribution parameterized by μ and σ2k . That is

Above method is adopted in our implementation of the

program. Pima Indian diabetis dataset


This dataset is originally from the National Institute of Diabetes and Digestive and Kidney
Diseases. The objective of the dataset is to diagnostically predict whether or not a patient has
diabetes, based on certain diagnostic measurements included in the dataset.
Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,
Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

6. PROCEDURE / PROGRAMME :

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

salary_train=pd.read_csv('SalaryData_Train.csv')
salary_test=pd.read_csv('SalaryData_Test.csv')
salary_train.columns
salary_test.columns
string_columns=['workclass','education','maritalstatus','occupation','r
elationship','race','sex','native']

from sklearn import preprocessing


label_encoder=preprocessing.LabelEncoder()
for i in string_columns:
salary_train[i]=label_encoder.fit_transform(salary_train[i])
salary_test[i]=label_encoder.fit_transform(salary_test[i])

col_names=list(salary_train.columns)
train_X=salary_train[col_names[0:13]]
train_Y=salary_train[col_names[13]]
test_x=salary_test[col_names[0:13]]
test_y=salary_test[col_names[13]]

######### Naive Bayes ##############


#Gaussian Naive Bayes

from sklearn.naive_bayes import GaussianNB


Gmodel=GaussianNB()
train_pred_gau=Gmodel.fit(train_X,train_Y).predict(train_X)
test_pred_gau=Gmodel.fit(train_X,train_Y).predict(test_x)

train_acc_gau=np.mean(train_pred_gau==train_Y)
test_acc_gau=np.mean(test_pred_gau==test_y)
train_acc_gau#0.795
test_acc_gau#0.794

#Multinomial Naive Bayes

from sklearn.naive_bayes import MultinomialNB


Mmodel=MultinomialNB()
train_pred_multi=Mmodel.fit(train_X,train_Y).predict(train_X)
test_pred_multi=Mmodel.fit(train_X,train_Y).predict(test_x)

train_acc_multi=np.mean(train_pred_multi==train_Y)
test_acc_multi=np.mean(test_pred_multi==test_y)
train_acc_multi#0.772
test_acc_multi#0.774
RESULTS & CONCLUSIONS:

[' <=50K' ' <=50K' ' <=50K' ... ' <=50K' ' >50K' ' <=50K']
[' <=50K' ' <=50K' ' <=50K' ... ' <=50K' ' >50K' ' <=50K']
0.7749667994687915

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

1.EXPERIMENTNO:6
2.TITLE:DOCUMENTCLASSIFICATIONUSINGNAÏVEBAYESIANCLASSIFIER
3. LEARNINGOBJECTIVES:
• Make use o fDatasets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python
4. AIM:
• Assuming a set of documents that need to be classified, use the naïve Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to write the
program.Calculate the accuracy,precision, and recall for your dataset.
5. THEORY:
For the theory of the naive bayesian classifier refer Experiment No.5. Theory of performance
anaysis is ellaborated here.
Analysis of Document Classification

• For classification tasks, the terms true positives, true negatives, false positives, and
falsenegatives compare the results of the classifier under test with trusted external
judgments.The terms positive and negative refer to the classifier's prediction (sometimes
known as theexpectation), and the terms true and false refer to whether that prediction
corresponds to theexternaljudgment(sometimesknown as theobservation).
• Precision-Precisionistheratioofcorrectlypredicted positive
documentstothetotalpredicted
positivedocuments.Highprecisionrelatestothelowfalsepositiverate.
Precision=(ΣTrue positive)/(ΣTruepositive+ΣFalsepositive)
• Recall(Sensitivity)-Recallistheratioofcorrectlypredictedpositive documents
totheallobservationsin actualclass.
Recall=(ΣTruepositive )/(ΣTruepositive +ΣFalsenegative)
• Accuracy - Accuracy is the most intuitive performance measure and it is simply a ratio
ofcorrectly predicted observation to the total observations. One may think that, if we
havehigh accuracy then our model is best. Yes, accuracy is a great measure but only when
youhavesymmetric
datasetswherevaluesoffalsepositiveandfalsenegativesarealmostsame.Therefore, you have to
look at other parameters to evaluate the performance of your
model.Forourmodel,wehavegot 0.803which meansourmodel isapprox. 80%accurate.
Accuracy=(ΣTrue positive+ΣTrue negative)/ΣTotalpopulation

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

EXPERIMENT NO: 7
2. TITLE: BAYESIAN NETWORK
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python

4. AIM:
• Write a program to construct a Bayesian network considering medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can
use Java/Python ML library classes/API.

5. THEORY:
• Bayesian networks are very convenient for representing similar probabilistic relationships
between multiple events.
• Bayesian networks as graphs - People usually represent Bayesian
networks as directed graphs in which each node is a hypothesis or a
random process. In other words, something that takes at least 2 possible
values you can assign probabilities to. For example, there can be a node
that represents the state of the dog (barking or not barking at the
window), the weather (raining or not raining), etc.
• The arrows between nodes represent the conditional probabilities
between them — how information about the state of one node changes
the probability distribution of another node it’s connected to.

6. PROCEDURE / PROGRAMME :
import pandas as pd
data=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/heartdisease.csv")
heart_disease=pd.DataFrame(data)
print(heart_disease)

from pgmpy.models import BayesianModel


model=BayesianModel([
('age','Lifestyle'),
('Gender','Lifestyle'),
('Family','heartdisease'),
('diet','cholestrol'),
('Lifestyle','diet'),
('cholestrol','heartdisease'),
('diet','cholestrol')
])

from pgmpy.estimators import MaximumLikelihoodEstimator


Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,
Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

model.fit(heart_disease, estimator=MaximumLikelihoodEstimator)

from pgmpy.inference import VariableElimination


HeartDisease_infer = VariableElimination(model)

print('For age Enter { SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4 }')


print('For Gender Enter { Male:0, Female:1 }')
print('For Family History Enter { yes:1, No:0 }')
print('For diet Enter { High:0, Medium:1 }')
print('For lifeStyle Enter { Athlete:0, Active:1, Moderate:2, Sedentary:3 }')
print('For cholesterol Enter { High:0, BorderLine:1, Normal:2 }')

q = HeartDisease_infer.query(variables=['heartdisease'], evidence={
'age':int(input('Enter age :')),
'Gender':int(input('Enter Gender :')),
'Family':int(input('Enter Family history :')),
'diet':int(input('Enter diet :')),
'Lifestyle':int(input('Enter Lifestyle :')),
'cholestrol':int(input('Enter cholestrol :'))
})

print(q)

Course Teacher: Mrs.C.Eben Exceline, Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

age Gender Family diet Lifestyle cholestrol heartdisease


0 0 0 1 1 3 0 1
1 0 1 1 1 3 0 1
2 1 0 0 0 2 1 1
3 4 0 1 1 3 2 0
4 3 1 1 0 0 2 0
5 2 0 1 1 1 0 1
6 4 0 1 0 2 0 1
7 0 0 1 1 3 0 1
8 3 1 1 0 0 2 0
9 1 1 0 0 0 2 1
10 4 1 0 1 2 0 1
11 4 0 1 1 3 2 0
12 2 1 0 0 0 0 0
13 2 0 1 1 1 0 1
14 3 1 1 0 0 1 0
15 0 0 1 0 0 2 1
16 1 1 0 1 2 1 1
17 3 1 1 1 0 1 0
18 4 0 1 1 3 2 0
WARNING:pgmpy:BayesianModel has been renamed to BayesianNetwork. Please use BayesianNetwork
class, BayesianModel will be removed in future.
For age Enter { SuperSeniorCitizen:0, SeniorCitizen:1, MiddleAged:2, Youth:3, Teen:4 }
For Gender Enter { Male:0, Female:1 }
For Family History Enter { yes:1, No:0 }
For diet Enter { High:0, Medium:1 }
For lifeStyle Enter { Athlete:0, Active:1, Moderate:2, Sedentary:3 }
For cholesterol Enter { High:0, BorderLine:1, Normal:2 }
Enter age :1
Enter Gender :1
Enter Family history :1
Enter diet :1
Enter Lifestyle :2
Enter cholestrol :1
WARNING:pgmpy:BayesianModel has been renamed to BayesianNetwork. Please use BayesianNetwork
class, BayesianModel will be removed in future.
WARNING:pgmpy:BayesianModel has been renamed to BayesianNetwork. Please use BayesianNetwork
class, BayesianModel will be removed in future.
+-----------------+---------------------+
| heartdisease | phi(heartdisease) |
+=================+=====================+
| heartdisease(0) | 1.0000 |
+-----------------+---------------------+
| heartdisease(1) | 0.0000 |
+-----------------+---------------------+

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

8. LEARNING OUTCOMES :
• The student will be able to apply baysian network for the medical data and demonstrate the
diagnosis of heart patients using standard Heart Disease Data Set.

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

1. EXPERIMENT NO: 8
2. TITLE: CLUSTERING BASED ON EM ALGORITHM AND K-MEANS
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python
4. AIM: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment on
the quality of clustering. You can add Java/Python ML library classes/API in the program.

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

5. THEORY:
Expectation Maximization algorithm
• The basic approach and logic of this clustering method is as follows.
• Suppose we measure a single continuous variable in a large sample of observations.
Further, suppose that the sample consists of two clusters of observations with different
means (and perhaps different standard deviations); within each sample, the distribution of
values for the continuous variable follows the normal distribution.
• The goal of EM clustering is to estimate the means and standard deviations for each
cluster so as to maximize the likelihood of the observed data (distribution).
• Put another way, the EM algorithm attempts to approximate the observed distributions of
values based on mixtures of different distributions in different clusters. The results of EM
clustering are different from those computed by k-means clustering.
• The latter will assign observations to clusters to maximize the distances between clusters.
The EM algorithm does not compute actual assignments of observations to clusters, but
classification probabilities.
• In other words, each observation belongs to each cluster with a certain probability. Of
course, as a final result we can usually review an actual assignment of observations to
clusters, based on the (largest) classification probability.
K means Clustering
• The algorithm will categorize the items into k groups of similarity. To calculate
that similarity, we will use the euclidean distance as measurement.
• The algorithm works as follows:
1. First we initialize k points, called means, randomly.
2. We categorize each item to its closest mean and we update the mean’s coordinates,
which are the averages of the items categorized in that mean so far.
3. We repeat the process for a given number of iterations and at the end, we have our
clusters.
• The “points” mentioned above are called means, because they hold the mean values of the
items categorized in it. To initialize these means, we have a lot of options. An intuitive
method is to initialize the means at random items in the data set. Another method is to
initialize the means at random values between the boundaries of the data set (if for a feature
x the items have values in [0,3], we will initialize the means with values for x at [0,3]).
• Pseudocode:
1. Initialize k means with random values
2. For a given number of iterations:
Iterate through items:
Find the mean closest to the item
Assign item to mean
Update mean

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

6. PROCEDURE / PROGRAMME :

from sklearn.cluster import KMeans


from sklearn import preprocessing
from sklearn.mixture import GaussianMixture
from sklearn.datasets import load_iris
import sklearn.metrics as sm
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset=load_iris()
# print(dataset)
X=pd.DataFrame(dataset.data)
X.columns=['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y=pd.DataFrame(dataset.target)
y.columns=['Targets']
# print(X)
plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y.Targets],s=40)
plt.title('Real')

# K-PLOT
plt.subplot(1,3,2)
model=KMeans(n_clusters=3)
model.fit(X)
predY=np.choose(model.labels_,[0,1,2]).astype(np.int64)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[predY],s=40)
plt.title('KMeans')

# GMM PLOT
scaler=preprocessing.StandardScaler()
scaler.fit(X)
xsa=scaler.transform(X)
xs=pd.DataFrame(xsa,columns=X.columns)
gmm=GaussianMixture(n_components=3)
gmm.fit(xs)
y_cluster_gmm=gmm.predict(xs)
plt.subplot(1,3,3)
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm],s=40)
plt.title('GMM Classification')

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

6. RESULTS & CONCLUSIONS:


Sample Output
Text(0.5, 1.0, 'GMM Classification')

Observation: The GMM using EM algorithm based clustering matched the true labels more
closely than the Kmeans.
8. LEARNING OUTCOMES :
• The students will be apble to apply EM algorithm and k-Means algorithm for clustering and
anayse the results.

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

1. EXPERIMENT NO: 9
2. TITLE: K-NEAREST NEIGHBOUR
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python

4. AIM:
• Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set.
Print both correct and wrong predictions. Java/Python ML library classes can be used for
this problem.

5. THEORY:
• K-Nearest Neighbors is one of the most basic yet essential classification algorithms
in Machine Learning. It belongs to the supervised learning domain and finds intense
application in pattern recognition, data mining and intrusion detection.
• It is widely disposable in real-life scenarios since it is non-parametric, meaning, it does not
make any underlying assumptions about the distribution of data.
• Algorithm
Input: Let m be the number of training data samples. Let p be an unknown point.
Method:
1. Store the training samples in an array of data points arr[]. This means each
element of this array represents a tuple (x, y).
2. for i=0 to m
Calculate Euclidean distance d(arr[i], p).
3. Make set S of K smallest distances obtained. Each of these distances correspond to
an already classified data point.
4. Return the majority label among S.

6. PROCEDURE / PROGRAMME :

from sklearn.datasets import load_iris


from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
import numpy as np
dataset=load_iris()
#print(dataset)
X_train,X_test,y_train,y_test=train_test_split(dataset["data"],dataset["target"],random_state=0
)
kn=KNeighborsClassifier(n_neighbors=1)
kn.fit(X_train,y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
metric_params=None, n_jobs=None, n_neighbors=1, p=2,
weights='uniform')
for i in range(len(X_test)):
x=X_test[i]
x_new=np.array([x])
prediction=kn.predict(x_new)
print("TARGET=",y_test[i],dataset["target_names"]
[y_test[i]],"PREDICTED=",prediction,dataset["target_names"][prediction])
Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,
Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

print(kn.score(X_test,y_test))
7. RESULTS & CONCLUSIONS:

TARGET= 2 virginica PREDICTED= [2] ['virginica']


TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 2 virginica PREDICTED= [2] ['virginica']
TARGET= 1 versicolor PREDICTED= [1] ['versicolor']
TARGET= 0 setosa PREDICTED= [0] ['setosa']
TARGET= 1 versicolor PREDICTED= [2] ['virginica']
0.9736842105263158

8. LEARNING OUTCOMES :
• The student will be able to implement k-Nearest Neighbour algorithm to classify the iris
data set and Print both correct and wrong predictions.

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

EXPERIMENT NO: 10
2. TITLE: LOCALLY WEIGHTED REGRESSION ALGORITHM
3. LEARNING OBJECTIVES:
• Make use of Data sets in implementing the machine learning algorithms.
• Implement ML concepts and algorithms in Python
4. AIM:
• Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
5. THEORY:
• Given a dataset X, y, we attempt to find a linear model h(x) that minimizes residual sum
of squared errors. The solution is given by Normal equations.
• Linear model can only fit a straight line, however, it can be empowered by polynomial
features to get more powerful models. Still, we have to decide and fix the number and
types of features ahead.
• Alternate approach is given by locally weighted regression.
• Given a dataset X, y, we attempt to find a model h(x) that minimizes residual sum
of weighted squared errors.
• The weights are given by a kernel function which can be chosen arbitrarily and in my case I
chose a Gaussian kernel.
• The solution is very similar to Normal equations, we only need to insert diagonal weight
matrix W.

Algorithm
def local_regression(x0, X, Y, tau):
# add bias term
x0 = np.r_[1, x0]
X = np.c_[np.ones(len(X)), X]

# fit model: normal equations with kernel


xw = X.T * radial_kernel(x0, X, tau)
beta = np.linalg.pinv(xw @ X) @ xw @ Y

# predict value
return x0 @ beta
def radial_kernel(x0, X, tau):
return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau))

6. PROCEDURE / PROGRAMME :
mport matplotlib.pyplot as plt
import pandas as pd
import numpy as np

def kernel(point, xmat, k):


m,n = np.shape(xmat)
weights = np.mat(np.eye((m)))
for j in range(m):
diff = point - X[j]
weights[j,j] = np.exp(diff*diff.T/(-2.0*k**2))
return weights
Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,
Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

def localWeight(point, xmat, ymat, k):


wei = kernel(point,xmat,k)
W = (X.T*(wei*X)).I*(X.T*(wei*ymat.T))
return W

def localWeightRegression(xmat, ymat, k):


m,n = np.shape(xmat)
ypred = np.zeros(m)
for i in range(m):
ypred[i] = xmat[i]*localWeight(xmat[i],xmat,ymat,k)
return ypred

# load data points


data = pd.read_csv('10-dataset.csv')
bill = np.array(data.total_bill)
tip = np.array(data.tip)

#preparing and add 1 in bill


mbill = np.mat(bill)
mtip = np.mat(tip)

m= np.shape(mbill)[1]
one = np.mat(np.ones(m))
X = np.hstack((one.T,mbill.T))

#set k here
ypred = localWeightRegression(X,mtip,0.5)
SortIndex = X[:,1].argsort(0)
xsort = X[SortIndex][:,0]

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(bill,tip, color='green')
ax.plot(xsort[:,1],ypred[SortIndex], color = 'red', linewidth=5)
plt.xlabel('Total bill')
plt.ylabel('Tip')
plt.show();
7. RESULTS & CONCLUSIONS:

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303
COURSE: 20AI505 MACHINE LEARNING LABORATORY

8. LEARNING OUTCOMES :
• To understand and implement linear regression and analyse the results with change in the
parameters

Course Teacher: Mrs.C.Eben Exceline,Mrs.S.Santhiya Assistant Professor, Department of AI&DS,


Excel Engineering College, Komarapalayam-637303

You might also like