KNN and Baysian Method
KNN and Baysian Method
Classification
•predicts categorical class labels (discrete or nominal)
•classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
Defn: Given a Database D={t1,t2,…tn} of tuples and a
set C={C1,C2,…Cm}, the classification problem is to
define a mapping f: D- C where each ti is assigned
to one class Cj.
Prediction
•models continuous-valued functions, i.e., predicts
unknown or missing values
10/4/2024 By Dr. Kavita Bhosle 1
Typical applications
•Credit approval-applicant as good or poor
credit risk
•Target marketing-profile of a good customer
•Medical diagnosis- Develop a profile of
stroke victims
•Fraud detection -Determine a credit card
purchase is fraudulent
Classification is a two-step process
Classifier is built from a data set- learning step
The training data set contains tuples having
attributes one of which is a class label attribute
No Yes
No Yes
Diagnosis=Allergy Diagnosis =Cold
• Classification categorization
• Specifying boundaries-divides input space into regions
• Probabilistic- determine probability for each class and
assign tuple to the class with highest probability
13 72
60
3 36
6 43
40
11 59
21 90
20
1 20
16 83
x= 9.1 y =55.4
(3-9.1)(30-55.4)+………..
W1= ---------------------------------- = 3.5 W0 = 55.4-(3.5)(9.1)=23.6
(3-9.1)2 +(8-9.1)2…….
y=23.6+3.5 x Using this equation we can predict salary given experience
class membership.
The formula for a univariate logistic
curve is
p= e (c0+c1x1) /1+ e (c0+c1x1)
log(p/1-p)=c0+c1x1
Here p is the probability of being in
the class
10/4/2024 By Dr. Kavita Bhosle 15
Bayesian Classification:
It is based on Bayes’ Theorem of conditional
probability.
It is a statistical classifier: performs probabilistic
prediction, i.e., predicts class membership
probabilities
A simple Bayesian classifier, naïve Bayesian
classifier, assumes that different attribute values
are independent which simplifies computational
process
It has comparable performance with decision tree
and selected neural network classifiers
# Sample dataset
data = {
'Age': [22, 25, 47, 35, 26, 41, 39, 22, 30, 26],
'Income': ['Low', 'Medium', 'High', 'Medium', 'Low', 'High', 'High', 'Low',
'Medium', 'Low'],
'Student': ['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes'],
'Credit_Rating': ['Fair', 'Excellent', 'Fair', 'Fair', 'Fair', 'Excellent', 'Excellent',
'Fair', 'Fair', 'Excellent'],
'Buys_Computer': ['No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes']
}
# Create DataFrame
df = pd.DataFrame(data)