Department of Computer Engineering: Experiment No.6

1. This document describes an experiment implementing Bayesian classification on a dataset. It provides the theory behind Bayesian classification, which involves calculating the probabilities of class membership for new data based on attribute probabilities learned from training data. 2. The code loads and preprocesses a sample iris dataset, calculates statistics to summarize the training data by class, and uses these summaries to predict the class of new data by calculating probability distributions for each class. 3. The document includes an example predicting whether a new data point is stolen or not, showing the probability calculations and predicting the class with the highest probability.

Uploaded by

Bhumi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views5 pages

Department of Computer Engineering: Experiment No.6

Uploaded by

Bhumi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Department of Computer Engineering

Experiment No.6

Semester T.E. Semester V– Computer Engineering

Subject Data warehousing & Mining Lab
Subject Professor In-charge Prof. Prita Patil
Assisting Teachers Prof. Prita Patil
Laboratory MS-Team

Student Name Pratik Haldankar

Roll Number 20102A2006
Grade and Subject
Teacher’s Signature

Experiment Number 6

Experiment Title Implementation of Bayesian Classification

Resources / Apparatus Required Hardware: Software:
Computer system Text Editor(MS-Word)

Theory Algorithm:
Baye’s Theorem states: P(c/x) = P(x/c)*P(c)
-----------
P(x)
• It assumes that effect of an attribute value on class
membership probability is independent of value of another attribute.
This is called condition independence.
• Let x1, x2, x3…xn be the data set with ‘m attributes
a1,a2,a3….am.
• Suppose there are c2, c3….cn classes, unknown sample x is
place in the class who’s conditional probability is the highest, i.e.,
P(c/x) = P(x/c)*P(c)
----------------
P(x)
• Since, p(x) will be constant, so
o P(c/x)=p(x/c)*p(c)
o Where, x={x1,x2,x3….xn}
o P(c) = No. of samples belonging to class c(si)
Total no. of samples (s)
=si
s
Example of Bayesian Classifier
Example No. Color Type Origin Stolen?
1 Red Sports Domestic Yes
2 Red Sports Domestic No
3 Red Sports Domestic Yes
4 Yellow Sports Domestic No
Department of Computer Engineering
Experiment No.6
5 Yellow Sports Imported Yes
6 Yellow SUV Imported No
7 Yellow SUV Imported Yes
8 Yellow SUV Domestic No
9 Red SUV Imported No
10 Red Sports Imported Yes
2.2 Training example
We want to classify a Red Domestic SUV. Note there is no example of
a Red Domestic SUV in our data
set. Looking back at equation (2) we can see how to compute this.
We need to calculate the probabilities
P(Red|Yes), P(SUV|Yes), P(Domestic|Yes) ,
P(Red|No) , P(SUV|No), and P(Domestic|No)
1
and multiply them by P(Yes) and P(No) respectively . We can estimate
these values using equation (3).
Yes: No:
Red: Red:
n=5n=5
n_c= 3 n_c = 2
p = .5 p = .5
m=3m=3
SUV: SUV:
n=5n=5
n_c = 1 n_c = 3
p = .5 p = .5
m=3m=3
Domestic: Domestic:
n=5n=5
n_c = 2 n_c = 3
p = .5 p = .5
m = 3 m =3
Looking at P(Red|Y es), we have 5 cases where vj = Yes , and in 3 of
those cases ai = Red. So for
P(Red|Y es), n = 5 and nc = 3. Note that all attribute are binary (two
possible values). We are assuming
no other information so, p = 1 / (number-of-attribute-values) = 0.5
for all of our attributes. Our m value
is arbitrary, (We will use m = 3) but consistent for all attributes. Now
we simply apply eqauation (3)
using the precomputed values of n , nc, p, and m.
P(Red|Y es) = 3 + 3 ∗ .5
5+3
= .56 P(Red|No) = 2 + 3 ∗ .5
5+3
= .43
P(SUV |Y es) = 1 + 3 ∗ .5
5+3
Department of Computer Engineering
Experiment No.6
= .31 P(SUV |No) = 3 + 3 ∗ .5
5+3
= .56
P(Domestic|Y es) = 2 + 3 ∗ .5
5+3
= .43 P(Domestic|No) = 3 + 3 ∗ .5
5+3
= .56
We have P(Y es) = .5 and P(No) = .5, so we can apply equation (2).
For v = Y es, we have
P(Yes) * P(Red | Yes) * P(SUV | Yes) * P(Domestic|Yes)
= .5 * .56 * .31 * .43 = .037
and for v = No, we have
P(No) * P(Red | No) * P(SUV | No) * P (Domestic | No)
= .5 * .43 * .56 * .56 = .069

Since 0.069 > 0.037, our example gets classified as ’NO’

Code
#Make Predictions with Naive Bayes on the Iris Dataset
from csv import reader
from math import sqrt
from math import exp
from math import pi
#Load CSV file
def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
#Convert String Columns to Float
def str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
#Convert String Columns to Integer
def str_column_to_int(dataset, column):
class_values = [ row[column] for row in dataset ]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
print(value+' => '+str(i))
for row in dataset:
row[column] = lookup[row[column]]
Department of Computer Engineering
Experiment No.6
return lookup
#Split the dataset by class values, returns a dictionary
def separate_by_class(dataset):
separated = dict()
for i in range(len(dataset)):
vector = dataset[i]
class_value = vector[-1]
if (class_value not in separated):
separated[class_value] = list()
separated[class_value].append(vector)
return separated
#Calculate the mean of a list of numbers
def mean(numbers):
return sum(numbers)/float(len(numbers))
#Calculate the standard deviation of a list of numbers
def stdev(numbers):
avg = mean(numbers)
variance = sum([(x-avg)**2 for x in numbers]) / float(len(numbers)-1)
return sqrt(variance)
#Calculate the mean, stdev and count for each column in a dataset
def summarize_dataset(dataset):
summaries = [(mean(column), stdev(column), len(column)) for
column in
zip(*dataset)]
del(summaries[-1])
return summaries
#Split dataset by class then calculate statistics for each row
def summarize_by_class(dataset):
separated = separate_by_class(dataset)
summaries = dict()
for class_value, rows in separated.items():
summaries[class_value] = summarize_dataset(rows)
return summaries
#Calculate the Gaussian probability distribution function for x
def calculate_probability(x, mean, stdev):
exponent = exp(-((x-mean)**2 / (2 * stdev**2 )))
return (1 / (sqrt(2 * pi) * stdev)) * exponent
#Calculate the probabilities of predicting each class for a given row
def calculate_class_probabilities(summaries, row):
total_rows = sum([summaries[label][0][2] for label in summaries])
probabilities = dict()
for class_value, class_summaries in summaries.items():
probabilities[class_value] =
summaries[class_value][0][2]/float(total_rows)
for i in range(len(class_summaries)):
mean, stdev, _ = class_summaries[i]
probabilities[class_value] *= calculate_probability(row[i], mean,
stdev)
Department of Computer Engineering
Experiment No.6
return probabilities
#Predict the class for a given row
def predict(summaries, row):
probabilities = calculate_class_probabilities(summaries, row)
best_label, best_prob = None, -1
for class_value, probability in probabilities.items():
if best_label is None or probability > best_prob:
best_prob = probability
best_label = class_value
return best_label
#Make a prediction with Naive Bayes on Iris Dataset
filename = 'D:\Python\iris.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
str_column_to_float(dataset, i)
#Convert class column to integers
str_column_to_int(dataset, len(dataset[0])-1)
#Fit model
model = summarize_by_class(dataset)
#Define a new record
row = [6.9, 3.2, 5.7, 2.3]
#Predict the label
label = predict(model, row)
print('Data='+str(row)+' Predicted='+str(label))

Output

Conclusion Running the data first summarizes the mapping of class labels to
integers and then fits the model on the entire dataset. There are
three class labels. 0,1 & 2. In the output, when a new observation is
defined, a class label is predicted. Here, our observation is predicted
as belonging to class 2 which is “Iris-virginica“

ATO Tutorials
100% (1)
ATO Tutorials
36 pages
GRAUER &amp WEIL (INDIA) LTD PDF
100% (2)
GRAUER &amp WEIL (INDIA) LTD PDF
8 pages
Exercises695Clas Solution
100% (2)
Exercises695Clas Solution
13 pages
K. v. Narayanan, B. Lakshmikutty - Stoichiometry and Process Calculations-PHI Learning (2017)
No ratings yet
K. v. Narayanan, B. Lakshmikutty - Stoichiometry and Process Calculations-PHI Learning (2017)
613 pages
06 NaiveBayes Example
No ratings yet
06 NaiveBayes Example
2 pages
Bayesian Decision Theory
No ratings yet
Bayesian Decision Theory
27 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Class Adv Classification IV
No ratings yet
Class Adv Classification IV
49 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
ML Lecture#5
No ratings yet
ML Lecture#5
65 pages
2 Naive Bayes
No ratings yet
2 Naive Bayes
49 pages
Lecture-7 Classification Using Naive Bays
No ratings yet
Lecture-7 Classification Using Naive Bays
19 pages
32-Naive Bayes Cont''d-03-10-2024
No ratings yet
32-Naive Bayes Cont''d-03-10-2024
31 pages
KRK-rpg2 Manual PDF
No ratings yet
KRK-rpg2 Manual PDF
20 pages
Department of Computer Engineering Academic Term: June-Nov 2021
No ratings yet
Department of Computer Engineering Academic Term: June-Nov 2021
6 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
Linear Regression Example
No ratings yet
Linear Regression Example
26 pages
ML Lab PT
No ratings yet
ML Lab PT
25 pages
Bayesian
No ratings yet
Bayesian
23 pages
Pattern File
No ratings yet
Pattern File
29 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
Lecture 5-Naïve Bayes
No ratings yet
Lecture 5-Naïve Bayes
26 pages
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
No ratings yet
Data Mining Classification: Naïve Bayes Classifier Lecture Notes For Chapter 4 &5
26 pages
Pgm5 With Output
No ratings yet
Pgm5 With Output
13 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
Chapter 4
No ratings yet
Chapter 4
22 pages
AI Notes
No ratings yet
AI Notes
19 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Naïve Bayesv1
No ratings yet
Naïve Bayesv1
31 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
Bayesian Learning: Berrin Yanikoglu
No ratings yet
Bayesian Learning: Berrin Yanikoglu
64 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
ML Lab Experiments (1) - Pages-3
No ratings yet
ML Lab Experiments (1) - Pages-3
11 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
Bayes' Rule and Its Use
No ratings yet
Bayes' Rule and Its Use
13 pages
Naive Bayes
No ratings yet
Naive Bayes
9 pages
Mllabprog 5
No ratings yet
Mllabprog 5
6 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
LAB08 Bayes Theory
No ratings yet
LAB08 Bayes Theory
4 pages
Exp 5
No ratings yet
Exp 5
4 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
Naivebayes Labprg2
No ratings yet
Naivebayes Labprg2
3 pages
3 Naive Bayes Model
No ratings yet
3 Naive Bayes Model
3 pages
Assignment#3 (Naive Bayes)
No ratings yet
Assignment#3 (Naive Bayes)
5 pages
Unit-Iv Data Classification: Data Warehousing and Data Mining
No ratings yet
Unit-Iv Data Classification: Data Warehousing and Data Mining
7 pages
Motor Learning
No ratings yet
Motor Learning
3 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Example Bayes - Zero Conditional Probability - M Estimate
No ratings yet
Example Bayes - Zero Conditional Probability - M Estimate
2 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
Naive Bayes
No ratings yet
Naive Bayes
7 pages
Ex 6
No ratings yet
Ex 6
2 pages
CS178 Homework #1: Problem 0: Getting Connected
No ratings yet
CS178 Homework #1: Problem 0: Getting Connected
4 pages
Dissertation On Investment Analysis
100% (2)
Dissertation On Investment Analysis
5 pages
Pre Cal Circle
No ratings yet
Pre Cal Circle
16 pages
Exercises On Connectors
No ratings yet
Exercises On Connectors
4 pages
PhilipCardiff UCD Geometry, Meshing in OpenFOAM
No ratings yet
PhilipCardiff UCD Geometry, Meshing in OpenFOAM
72 pages
54dgftmar2006 07
No ratings yet
54dgftmar2006 07
107 pages
Chapter I
No ratings yet
Chapter I
8 pages
Koskela en Es
No ratings yet
Koskela en Es
298 pages
Experiment-1: Aim: Equipment Required
No ratings yet
Experiment-1: Aim: Equipment Required
17 pages
IT Tools and Business System - Module 1
No ratings yet
IT Tools and Business System - Module 1
36 pages
Good Latex Font For Thesis
100% (3)
Good Latex Font For Thesis
5 pages
Marketing Principles
No ratings yet
Marketing Principles
54 pages
DC1500 - Installation Manual: WWW - HHO-Plus - LV T: +371 27124103
No ratings yet
DC1500 - Installation Manual: WWW - HHO-Plus - LV T: +371 27124103
39 pages
LG 50PM4700-TA Chassis PA22A
No ratings yet
LG 50PM4700-TA Chassis PA22A
73 pages
Department of Computer Engineering: Experiment No.3
No ratings yet
Department of Computer Engineering: Experiment No.3
4 pages
Surfnews
No ratings yet
Surfnews
5 pages
Ecr RS
No ratings yet
Ecr RS
11 pages
10th Science Sample Paper 2024
No ratings yet
10th Science Sample Paper 2024
13 pages
Madhubhan Rejou Spa Services Menu
No ratings yet
Madhubhan Rejou Spa Services Menu
10 pages
Department of Computer Engineering: Experiment No.2
No ratings yet
Department of Computer Engineering: Experiment No.2
8 pages
CRI Test Method 114
No ratings yet
CRI Test Method 114
11 pages
Salih GÖKMEN - 07.2021
No ratings yet
Salih GÖKMEN - 07.2021
114 pages
Jyoti PPT (20erwcs025)
No ratings yet
Jyoti PPT (20erwcs025)
20 pages
Polity Lakshya Series Day 9
No ratings yet
Polity Lakshya Series Day 9
21 pages
Spring 2023 INT 500 - Syllabus (Marketing - Sales)
No ratings yet
Spring 2023 INT 500 - Syllabus (Marketing - Sales)
22 pages
Why Law Students Should Study The Course On Environmental Studies and The Law 2
No ratings yet
Why Law Students Should Study The Course On Environmental Studies and The Law 2
5 pages
Department of Computer Engineering: Experiment No.8
No ratings yet
Department of Computer Engineering: Experiment No.8
4 pages
Richard Zeckhauser and David V P. Marks: Sign Posting: The Selective Revelation of Product Information
No ratings yet
Richard Zeckhauser and David V P. Marks: Sign Posting: The Selective Revelation of Product Information
10 pages
Dharma Fiber Reactive Procion Dyes
No ratings yet
Dharma Fiber Reactive Procion Dyes
1 page
DWM Assignment 2 Aakanksha Atugade 20102A2001
No ratings yet
DWM Assignment 2 Aakanksha Atugade 20102A2001
3 pages

Department of Computer Engineering: Experiment No.6

Uploaded by

Department of Computer Engineering: Experiment No.6

Uploaded by

Department of Computer Engineering

Semester T.E. Semester V– Computer Engineering

Student Name Pratik Haldankar

Experiment Title Implementation of Bayesian Classification

Since 0.069 > 0.037, our example gets classified as ’NO’

You might also like