0% found this document useful (0 votes)

9 views43 pages

KNN and Baysian Method

machine learning notes

Uploaded by

renuka8177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views43 pages

KNN and Baysian Method

machine learning notes

Uploaded by

renuka8177

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Classification

Classification
•predicts categorical class labels (discrete or nominal)
•classifies data (constructs a model) based on the
training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
Defn: Given a Database D={t1,t2,…tn} of tuples and a
set C={C1,C2,…Cm}, the classification problem is to
define a mapping f: D- C where each ti is assigned
to one class Cj.
Prediction
•models continuous-valued functions, i.e., predicts
unknown or missing values
10/4/2024 By Dr. Kavita Bhosle 1
Typical applications
•Credit approval-applicant as good or poor
credit risk
•Target marketing-profile of a good customer
•Medical diagnosis- Develop a profile of
stroke victims
•Fraud detection -Determine a credit card
purchase is fraudulent
Classification is a two-step process
Classifier is built from a data set- learning step
The training data set contains tuples having
attributes one of which is a class label attribute

10/4/2024 By Dr. Kavita Bhosle 2

Example
Training Data Set Attributes Class label

Patien Sore throat Feve Swollen Congestion Headach Diagnosis

t Id r Glands e
1 Yes Yes Yes Yes Yes Strep throat
2 No No No Yes Yes Allergy
3 Yes Yes No Yes No Cold
4 Yes No No No No Strep throat
5 No Yes No Yes No Cold
6 No No No Yes No Allergy
7 No No Yes No No Strep throat
8 Yes No No Yes Yes Allergy
9 No Yes No Yes Yes Cold
10 Yes Yes No Yes Yes Cold

Supervised learning (classification)

10/4/2024 By Dr. Kavita Bhosle 3
Since class label is provided it is known as
supervised learning
Typically the model is represented in the form of
classification rules , decision trees or mathematical
formulae Swollen
Glands

No Yes

Fever Diagnosis=Strep Throat

No Yes
Diagnosis=Allergy Diagnosis =Cold

In second step the model is used for classification

First it is used on a test data to check its accuracy
Then It can be used to classify future data tuples
whose class label value are not known
10/4/2024 By Dr. Kavita Bhosle 4
Model construction:
•Describing a set of predetermined classes
•Each tuple/sample is assumed to belong to a predefined
class, as determined by the class label attribute
•The set of tuples used for model construction is training set
•The model is represented as classification rules, decision
trees, or mathematical formulae
Model usage:
•for classifying future or unknown objects
•Estimate accuracy of the model The known label of test
sample is compared with the classified result from the model
•Accuracy rate is the percentage of test set samples that are
correctly classified by the model
•Test set is independent of training set, otherwise over-fitting
will occur
•If the accuracy is acceptable, use the model to classify data
tuples whose class labels are not known
10/4/2024 By Dr. Kavita Bhosle 5
Classification
Training Algorithms
Data

Patie Sore Fev Swolle Congestio Headac Diagnosis

nt Id throat er n n he
Gland
s Classifier
1 Yes Yes Yes Yes Yes Strep throat (Model)
2 No No No Yes Yes Allergy
3 Yes Yes No Yes No Cold
4 Yes No No No No Strep throat Swollen
5 No Yes No Yes No Cold Glands
N Y
6 No No No Yes No Allergy
o e
Diagnosis=Strep Throat
7 No No Yes No No Strep throat Fever s
8 Yes No No Yes Yes Allergy N Y
o
Diagnosis=Allergy Diagnosis e
=Cold
9 No Yes No Yes Yes Cold
s
10 Yes Yes No Yes Yes Cold

10/4/2024 By Dr. Kavita Bhosle 6

Preparing data for classification
• Data cleaning
– Preprocess data in order to reduce noise and
handle missing value
Ignore the tuple:.
Fill in the missing value manually: tedious + infeasible?
Fill in it automatically with
– a global constant : e.g., “unknown”, a new class?!
– the attribute mean
– the attribute mean of the class: smarter
– the most probable value: inference-based
10/4/2024 By Dr. Kavita Bhosle 7
• Relevance analysis (feature selection)
– Remove the irrelevant or redundant attributes
– Redundant attributes may be able to be detected by
correlation analysis
– Improves classification efficiency and scalability
• Data transformation
– Generalize and/or normalize data
• Min-Max normalization
• z-score normalization
• normalization by decimal scaling
-- Data Reduction

10/4/2024 By Dr. Kavita Bhosle 8

Choosing Classification Algorithms
• Algorithm categorization
• Distance based
• Statistical
• Decision Tree Based
• Neural network
• Rule based

• Classification categorization
• Specifying boundaries-divides input space into regions
• Probabilistic- determine probability for each class and
assign tuple to the class with highest probability

10/4/2024 By Dr. Kavita Bhosle 9

Measuring Performance
• Performance of classification algorithm is by
evaluating accuracy of the classification
• Computational costs -Space and time
requirements-
• Scalability-efficient even for large databases
• Robustness-ability to make correct
classification in the presence of noisy data
• Overfitting problem- the classification fits
the training data exactly but may not be
applicable to a broader population of data
• Interpretability- insight provided by classifier
10/4/2024 By Dr. Kavita Bhosle 10
Statistical-based algorithms
Straight line regression analysis involves a response variable
y and a single predictor variable x and models y is a linear
function of x
y = w0 + w1 x
where w0 (y-intercept) and w1 (slope) are regression
coefficients
These coefficients can be solved by the method of least
squares which estimates the best fitting straight line
D be the training data set containing n data points
(x1,y1),(x2,y2)…(xn,yn), regression coefficients can be
estimated as
∑ (xi- x) (yi –y)
w1 = ----------------------- w0 = y – w1 x
∑ (xi- x)2
10/4/2024 By Dr. Kavita Bhosle 11
Yrs exprnce Salary in k
3 30
100
8 57
9 64 80

13 72
60
3 36
6 43
40
11 59
21 90
20
1 20
16 83

x= 9.1 y =55.4
(3-9.1)(30-55.4)+………..
W1= ---------------------------------- = 3.5 W0 = 55.4-(3.5)(9.1)=23.6
(3-9.1)2 +(8-9.1)2…….
y=23.6+3.5 x Using this equation we can predict salary given experience

10/4/2024 By Dr. Kavita Bhosle 12

Multiple Linear regression

It is an extension of Straight line regression analysis

so as to involve more than one predictor variable
It allows response variable y to be modeled as a
linear function of n predictor variables or attributes
describing a tuple x as (x1,x2,…xn)
y = w0 + w1 x1 +w2 x2+w3x3+…..wnxn
The method of least squares can be extended to
solve for w0, w1 etc. the equations are much more
complex and can be solved by using statistical
software packages

10/4/2024 By Dr. Kavita Bhosle 13

The linear model gets affected by the presence of noise
or outliers (extreme, exceptional values)
Nonlinear regression
• Some nonlinear models can be modeled by a polynomial
function
• A polynomial regression model can be transformed into
linear regression model. For example,
y = w0 + w1 x + w2 x2 + w3 x3
convertible to linear with new variables: x2 = x2, x3= x3
y = w0 + w1 x + w2 x2 + w3 x3
• Some models are intractable nonlinear (e.g., sum of
exponential terms)
– possible to obtain least square estimates through
extensive calculation on more complex formulae

10/4/2024 By Dr. Kavita Bhosle 14

Logistic regression
It uses a logistic curve.
The logistic curve gives a value
between 0 and 1 so it can be
interpreted as the probability of e (1+x) /1+e (1+x)

class membership.
The formula for a univariate logistic
curve is
p= e (c0+c1x1) /1+ e (c0+c1x1)
log(p/1-p)=c0+c1x1
Here p is the probability of being in
the class
10/4/2024 By Dr. Kavita Bhosle 15
Bayesian Classification:
It is based on Bayes’ Theorem of conditional
probability.
It is a statistical classifier: performs probabilistic
prediction, i.e., predicts class membership
probabilities
A simple Bayesian classifier, naïve Bayesian
classifier, assumes that different attribute values
are independent which simplifies computational
process
It has comparable performance with decision tree
and selected neural network classifiers

10/4/2024 By Dr. Kavita Bhosle 16

Let X be a data tuple (“evidence”): described by
values of its n attributes
Let H be a hypothesis that X belongs to class C
Classification is to determine P(H|X), the probability
that the hypothesis holds given the observed data
sample X
Probability that X belongs to class C having
known the attribute description of X
Given that X is 31..40 and medium income , X will
buy computer
P(H) (prior probability H ), the initial probability
E.g., X will buy computer, regardless of age,
income, …

10/4/2024 By Dr. Kavita Bhosle 17

P(H/X) (posteriori probability of H ), the probability of
H when attributes of X are known
P(X): (prior probability of X)
It is probability that sample data is in observed range
Ex It is probability that person is in the range
31..40 and medium income- evidence
P(X/H) (posteriori probability of X) -likelihood
E.g., Given that X will buy computer, the prob.
that X is 31..40, medium income
Baye’s theorem relates all these probabilities
P(H/X)= P(X/H) P(H)
P(X)
Posteriori= Likelihood x priori / evidence

10/4/2024 By Dr. Kavita Bhosle 18

Let D be a training set of tuples and their
associated class labels, and each tuple is
represented by an n-D attribute vector
X = (x1, x2, …, Xn)
Suppose there are m classes C1, C2, …, Cm.
Classification is to derive the maximum posteriori,
i.e., the maximal P(Ci|X)
This can be derived from Bayes’ theorem
P(Ci/X)= P(X/Ci) P(Ci)
P(X)
Since P(X) is constant for all classes, only
P(X/Ci)P(Ci) needs to be maximized

10/4/2024 By Dr. Kavita Bhosle 19

If class prior probabilities are not known, it can be
assumed that all classes are equally likely
P(C1)=P(C2) =………………..=P(Cn)
Reduced to maximizing P(X/Ci)
If data set has many attributes, it will be
computationally expensive to compute P(X/Ci)
To reduce computation, assumption of class
conditional independence is made
Attributes are conditionally independent
P(X/Ci)= P(x1/Ci)xP(x2/Ci)x………………x P(xn/Ci)

10/4/2024 By Dr. Kavita Bhosle 20

age income studentcredit_rating
buys_comp
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

10/4/2024 By Dr. Kavita Bhosle 21

P(xk/Ci) is the number of tuples of class Ci in training
set D having the value xk, divided by number of
tuples of class Ci in D
age income studentcredit_rating
buys_compu
Class: <=30 high no fair no
C1:buys_computer = ‘yes’ <=30 high no excellent no
C2:buys_computer = ‘no’ 31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
Data sample
>40 low yes excellent no
X = (age <=30, 31…40 low yes excellent yes
Income = medium, <=30 medium no fair no
Student = yes <=30 low yes fair yes
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

10/4/2024 By Dr. Kavita Bhosle 22

• P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667
= 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”)
= 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

10/4/2024 By Dr. Kavita Bhosle 23

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, classification_report

# Sample dataset
data = {
'Age': [22, 25, 47, 35, 26, 41, 39, 22, 30, 26],
'Income': ['Low', 'Medium', 'High', 'Medium', 'Low', 'High', 'High', 'Low',
'Medium', 'Low'],
'Student': ['No', 'No', 'Yes', 'No', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes'],
'Credit_Rating': ['Fair', 'Excellent', 'Fair', 'Fair', 'Fair', 'Excellent', 'Excellent',
'Fair', 'Fair', 'Excellent'],
'Buys_Computer': ['No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'Yes']
}

# Create DataFrame
df = pd.DataFrame(data)

10/4/2024 By Dr. Kavita Bhosle 24

# Preprocessing
# Convert categorical variables to numerical
df['Income'] = df['Income'].map({'Low': 0, 'Medium': 1, 'High': 2})
df['Student'] = df['Student'].map({'No': 0, 'Yes': 1})
df['Credit_Rating'] = df['Credit_Rating'].map({'Fair': 0, 'Excellent': 1})
df['Buys_Computer'] = df['Buys_Computer'].map({'No': 0, 'Yes': 1})

# Features and target variable

X = df.drop('Buys_Computer', axis=1)
y = df['Buys_Computer']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)

# Train the Naive Bayes model

model = GaussianNB()
model.fit(X_train, y_train)

10/4/2024 By Dr. Kavita Bhosle 25

# Sample data for prediction
sample = pd.DataFrame({
'Age': [30], # Age <= 30
'Income': [1], # Medium
'Student': [1], # Yes
'Credit_Rating': [0] # Fair
})
# Make prediction for the sample
prediction = model.predict(sample)

# Output the prediction

result = 'Yes' if prediction[0] == 1 else 'No'
print(f"The prediction for the sample is: {result}")
The prediction for the sample is: Yes

10/4/2024 By Dr. Kavita Bhosle 26

X=(Age 31..40
Income Low
Not student
Credit rating Excellent)

10/4/2024 By Dr. Kavita Bhosle 27

Zero-probability problem
Naïve Bayesian prediction requires each conditional
prob. to be non-zero. Otherwise, the predicted prob.
will be zero irrespective of all other probabilities
Ex. Suppose a dataset with 1000 tuples, income=low
(0), income= medium (990), and income = high (10),
Use Laplacian correction (or Laplacian estimator)
Adding 1 to each case
Prob(income = low) = 1/1003
Prob(income = medium) = 991/1003
Prob(income = high) = 11/1003
The “corrected” prob. estimates are close to their
“uncorrected” counterparts and the problem of zero
probability is solved

10/4/2024 By Dr. Kavita Bhosle 28

Advantages
• Easy to implement
• Only one scan of training data is required
• Good results obtained in most of the cases
• Can easily handle missing values
Disadvantages
• Assumption: class conditional independence,
therefore loss of accuracy
• Practically, dependencies exist among variables
E.g., hospitals: patients: Profile: age, family history,
etc.
Symptoms: fever, cough etc., Disease: lung cancer,
diabetes, etc.
Dependencies among these cannot be modeled by
Naïve Bayesian Classifier
Bayesian Belief Networks

10/4/2024 By Dr. Kavita Bhosle 29

Distance-based algorithms
Each tuple is assigned to class to which it is most similar
Each class is represented as a tuple
The representative for each class is the centre or centroid
Each tuple ti is assigned to class Cj such that sim(ti,Cj)
>sim(ti,Cl) for all Cl such that Cl ≠Cj
Each tuple must be compared to the center for a class and
there are fixed number of classes.
The complexity depends on the number of classes
K Nearest Neighbors is a distance based algorithm i.e a
lazy learning algorithm.
Simply stores training data (or only does minor processing)
and waits until it is given a test tuple
10/4/2024 By Dr. Kavita Bhosle 30
Distance-based algorithms
Similarity or distance measures may be used to
identify the alikeness of different items in the
database
The similarity between two tuples ti and tj sim(ti, tj) , in
a database D is a mapping from DxD to the range
[0,1]
Characteristics of a good similarity measure
1. sim(ti, ti)=1 for all ti
2. sim(ti, tj)=0 if ti and tj are not alike at all
3. sim(ti,tj) < sim(ti, tk) if ti is more like tk than it is like tj
10/4/2024 By Dr. Kavita Bhosle 31
Dice: sim(ti,tj) = ∑ tik tjk
∑ tik2 + ∑tjk2

Jaccard : sim(ti,tj) = ∑ tik tjk

∑ tik2 +∑tjk2 - ∑ tik tjk

Cosine : sim(ti,tj) = ∑ tik tjk

∑ tik2 ∑tjk2

Overlap : sim(ti,tj) = ∑ tik tjk

min (∑ tik2, ∑tjk2)
Distance or dissimilarity measures are often used instead
of similarity measures-usually distance measures
Euclidean : dis(ti,tj) = ∑ (tih-tjh)2
Manhattan : dis(ti,tj) = ∑ | tih-tik|
10/4/2024 By Dr. Kavita Bhosle 32
The k-Nearest neighbor algorithm
• K closet neighbors in the training set to the given
tuple are to be determined
• The nearest neighbor are defined in terms of
Euclidean distance, dist(X1, X2)
• The new item is then placed in the class that
contains the most items from this set k of closest
items
• The value of k can be determined experimentally.
Starting with k=1, a test set is used to estimate
the error rate of the classifier. The k value that
gives minimum error rate can be selected
• k-NN for real-valued prediction for a given
unknown tuple returns the mean values of the k
nearest neighbors
10/4/2024 By Dr. Kavita Bhosle 33
The k-Nearest neighbor algorithm
• Distance-weighted nearest neighbor algorithm
gives greater weight to closer neighbors
• Robust to noisy data by averaging k-nearest
neighbors
• The complexity is O(d), d is the size of training
set, can be reduced to O(logd) by storing training
set in search trees, can be O(1) by using
parallelism

10/4/2024 By Dr. Kavita Bhosle 34

10/4/2024 By Dr. Kavita Bhosle 35
10/4/2024 By Dr. Kavita Bhosle 36
10/4/2024 By Dr. Kavita Bhosle 37
10/4/2024 By Dr. Kavita Bhosle 38
10/4/2024 By Dr. Kavita Bhosle 39
10/4/2024 By Dr. Kavita Bhosle 40
SN Age SystolicBP DiastolicBP BS BodyTemp HeartRate RiskLevel
1 25 130 80 15 98 86 high risk
2 35 140 90 13 98 70 high risk
3 29 90 70 8 100 80 high risk
4 30 140 85 7 98 70 high risk
5 35 120 60 6.1 98 76 low risk
6 23 140 80 7.01 98 70 high risk
7 23 130 70 7.01 98 78 mid risk
8 35 85 60 11 102 86 high risk
9 32 120 90 6.9 98 70 mid risk
10 42 130 80 18 98 70 high risk
11 23 90 60 7.01 98 76 low risk
12 19 120 80 7 98 70 mid risk
13 25 110 89 7.01 98 77 low risk
14 20 120 75 7.01 100 70 mid risk
15 48 120 80 11 98 88 mid risk
16 15 120 80 7.01 98 70 low risk
17 50 140 90 15 98 90 high risk
18 25 140 100 7.01 98 80 high risk
19 30 120 80 6.9 101 76 mid risk
20 10 70 50 6.9 98 70 low risk
21 40 140 100 18 98 90 high risk
22 50 140 80 6.7 98 70 mid risk
23 21 90 65 7.5 98 76 low risk
24 18 90 60 7.5 98 70 low risk
25 21 120 80 7.5 98 76 low risk
26 16 100 70 7.2 98 80 low risk
10/4/2024 By Dr. Kavita Bhosle 41
Variable Demog Missing
Role Type Description Units
Name raphic Values
Any ages in years when a
Age Feature Integer Age no
women during pregnant.
Upper value of Blood
Pressure in mmHg,
SystolicBP Feature Integer another significant no
attribute during
pregnancy.
Lower value of Blood
Pressure in mmHg,
DiastolicBP Feature Integer another significant no
attribute during
pregnancy.
Blood glucose levels is in
BS Feature Integer terms of a molar mmol/L no
concentration
BodyTemp Feature Integer F no
A normal resting heart
HeartRate Feature Integer bpm no
rate
Predicted Risk Intensity
Level during pregnancy
RiskLevel Target Categorical no
considering the previous
attribute.

10/4/2024 By Dr. Kavita Bhosle 42

10/4/2024 By Dr. Kavita Bhosle 43

L24 Classification
No ratings yet
L24 Classification
40 pages
Non-Exact Differential Equation: Integrating Factors
80% (10)
Non-Exact Differential Equation: Integrating Factors
7 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
Lecture-5 Classification in ML
No ratings yet
Lecture-5 Classification in ML
50 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
19-Introduction Classification Algorithm-18-09-2024
No ratings yet
19-Introduction Classification Algorithm-18-09-2024
102 pages
StatLearning3r PDF
No ratings yet
StatLearning3r PDF
136 pages
Lecture2 Classification PartI
No ratings yet
Lecture2 Classification PartI
100 pages
Class10 13 PatternClassification 06 13oct2020
No ratings yet
Class10 13 PatternClassification 06 13oct2020
47 pages
Classification
No ratings yet
Classification
22 pages
Unit 4
No ratings yet
Unit 4
52 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
Classification and Clustering Algorithms
No ratings yet
Classification and Clustering Algorithms
108 pages
5 Classification
No ratings yet
5 Classification
40 pages
IntroClassificationDA 2024
No ratings yet
IntroClassificationDA 2024
129 pages
ICS 2408 - Lecture 6 - Classification and Prediction
No ratings yet
ICS 2408 - Lecture 6 - Classification and Prediction
47 pages
ML 2 PPT Unit 2
No ratings yet
ML 2 PPT Unit 2
214 pages
W8-Supervised Learning Methods
No ratings yet
W8-Supervised Learning Methods
30 pages
Unit-5 3161610
No ratings yet
Unit-5 3161610
92 pages
Machine Learning - Classification
No ratings yet
Machine Learning - Classification
13 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
ML Unit-IV Notes
No ratings yet
ML Unit-IV Notes
49 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
Decision Tree
No ratings yet
Decision Tree
30 pages
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
No ratings yet
Outline: - Learning Agents - Inductive Learning - Decision Tree Learning
30 pages
8.predictive Analytics - Classification 2
No ratings yet
8.predictive Analytics - Classification 2
28 pages
Data Mining and Warehousing Mod3
No ratings yet
Data Mining and Warehousing Mod3
69 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Unit-4 AML (1. Basics and K-NN)
No ratings yet
Unit-4 AML (1. Basics and K-NN)
25 pages
ITP4-Lesson 4-Week 7-8
No ratings yet
ITP4-Lesson 4-Week 7-8
18 pages
Unit 3
No ratings yet
Unit 3
53 pages
10 Classification2022
No ratings yet
10 Classification2022
20 pages
Machine Learning Lecture1
No ratings yet
Machine Learning Lecture1
56 pages
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
No ratings yet
ICT202B AI ML and Emerging Technologies UNIT 3 (Classification and Regression) 2
23 pages
Big Data Analytics - Unit 3
No ratings yet
Big Data Analytics - Unit 3
55 pages
ClassificationandPrediction Module3
No ratings yet
ClassificationandPrediction Module3
88 pages
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
No ratings yet
Classification & Prediction: - Shailesh Yadav Central University of Rajasthan
28 pages
Classification and Prediction
No ratings yet
Classification and Prediction
41 pages
Unit 5
No ratings yet
Unit 5
73 pages
DM Unit 4
No ratings yet
DM Unit 4
22 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
Classification
No ratings yet
Classification
33 pages
Chapter 4 Classification
No ratings yet
Chapter 4 Classification
78 pages
18mca52c U3
No ratings yet
18mca52c U3
8 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Assignment 02: Submitted To
No ratings yet
Assignment 02: Submitted To
4 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Week 4 - Intro To ML
No ratings yet
Week 4 - Intro To ML
37 pages
Lecture 9
No ratings yet
Lecture 9
27 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
CH 4
No ratings yet
CH 4
21 pages
Basic Machine Learning
No ratings yet
Basic Machine Learning
34 pages
R20 DMT Unit-Iii
No ratings yet
R20 DMT Unit-Iii
21 pages
DW Unit 6-Min
No ratings yet
DW Unit 6-Min
44 pages
Session01 DataScience
No ratings yet
Session01 DataScience
79 pages
For More Visit WWW - Ktunotes.in
No ratings yet
For More Visit WWW - Ktunotes.in
21 pages
Stats - Mock Set 2
No ratings yet
Stats - Mock Set 2
20 pages
NAFEMS Books PDF
No ratings yet
NAFEMS Books PDF
4 pages
LMI Control Toolbox
0% (1)
LMI Control Toolbox
356 pages
QP MFML Midsem
No ratings yet
QP MFML Midsem
3 pages
CH 11 Powerpoint
No ratings yet
CH 11 Powerpoint
62 pages
The Finite Element Method For Stress Analysis
No ratings yet
The Finite Element Method For Stress Analysis
47 pages
A Case Study On-3 Level Security Using 3D Password
No ratings yet
A Case Study On-3 Level Security Using 3D Password
13 pages
Heba DSBook 2022
No ratings yet
Heba DSBook 2022
337 pages
Lecture 1: Automation in Environmental Engineering Lecture 1
No ratings yet
Lecture 1: Automation in Environmental Engineering Lecture 1
45 pages
Intraday Trading Strategy Based On Time Series and
No ratings yet
Intraday Trading Strategy Based On Time Series and
47 pages
Optimizing Ore-Waste Dig-Limits As Part of Operational Mine Planning Through Genetic Algorithms
No ratings yet
Optimizing Ore-Waste Dig-Limits As Part of Operational Mine Planning Through Genetic Algorithms
13 pages
Chapter 5 (5.3)
No ratings yet
Chapter 5 (5.3)
84 pages
Understanding Time Complexity With Simple Examples
No ratings yet
Understanding Time Complexity With Simple Examples
78 pages
Auxiliary Monge-Ampere Equations in Geometric Analysis
No ratings yet
Auxiliary Monge-Ampere Equations in Geometric Analysis
38 pages
Sparse, Stacked and Variational Autoencoder - by Venkata Krishna Jonnalagadda - Medium
No ratings yet
Sparse, Stacked and Variational Autoencoder - by Venkata Krishna Jonnalagadda - Medium
17 pages
Lecture2 Eee547 02
No ratings yet
Lecture2 Eee547 02
25 pages
Operation Management Forecast
No ratings yet
Operation Management Forecast
2 pages
Adam vs. SGD - Closing The Generalization Gap On Image Classification
No ratings yet
Adam vs. SGD - Closing The Generalization Gap On Image Classification
7 pages
Iae 2 Answer Key
No ratings yet
Iae 2 Answer Key
4 pages
Adsaa Sem Important Questions
No ratings yet
Adsaa Sem Important Questions
3 pages
Econ 213 Course Outline 2023 - 2024
No ratings yet
Econ 213 Course Outline 2023 - 2024
3 pages
Unit-4 Introduction To Data Mining
No ratings yet
Unit-4 Introduction To Data Mining
26 pages
A Quick Assesment of "Automatic" Curve Discretization
No ratings yet
A Quick Assesment of "Automatic" Curve Discretization
4 pages
Face Recognition Using Matlab PDF
0% (1)
Face Recognition Using Matlab PDF
2 pages
ITC Project
No ratings yet
ITC Project
9 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
2 pages
BI&analytics 3
No ratings yet
BI&analytics 3
16 pages
IAF 0610 Hardware Test Design Coursework
No ratings yet
IAF 0610 Hardware Test Design Coursework
20 pages
A Python Based Multi-Point Geostatistics by Using Direct Sampling Algorithm
No ratings yet
A Python Based Multi-Point Geostatistics by Using Direct Sampling Algorithm
4 pages
Practice Tests for CASAS Math GOAL 2 Level D, Forms 927M and 928M
From Everand
Practice Tests for CASAS Math GOAL 2 Level D, Forms 927M and 928M
Coaching For Better Learning
No ratings yet
Practice Tests for CASAS Math GOAL 2 Level E, Forms 929M and 930M
From Everand
Practice Tests for CASAS Math GOAL 2 Level E, Forms 929M and 930M
Coaching For Better Learning
No ratings yet

KNN and Baysian Method

Uploaded by

KNN and Baysian Method

Uploaded by

Classification

10/4/2024 By Dr. Kavita Bhosle 2

Patien Sore throat Feve Swollen Congestion Headach Diagnosis

Supervised learning (classification)

Fever Diagnosis=Strep Throat

In second step the model is used for classification

Patie Sore Fev Swolle Congestio Headac Diagnosis

10/4/2024 By Dr. Kavita Bhosle 6

10/4/2024 By Dr. Kavita Bhosle 8

10/4/2024 By Dr. Kavita Bhosle 9

10/4/2024 By Dr. Kavita Bhosle 12

It is an extension of Straight line regression analysis

10/4/2024 By Dr. Kavita Bhosle 13

10/4/2024 By Dr. Kavita Bhosle 14

10/4/2024 By Dr. Kavita Bhosle 16

10/4/2024 By Dr. Kavita Bhosle 17

10/4/2024 By Dr. Kavita Bhosle 18

10/4/2024 By Dr. Kavita Bhosle 19

10/4/2024 By Dr. Kavita Bhosle 20

10/4/2024 By Dr. Kavita Bhosle 21

10/4/2024 By Dr. Kavita Bhosle 22

10/4/2024 By Dr. Kavita Bhosle 23

10/4/2024 By Dr. Kavita Bhosle 24

# Features and target variable

# Split the dataset into training and testing sets

# Train the Naive Bayes model

10/4/2024 By Dr. Kavita Bhosle 25

# Output the prediction

10/4/2024 By Dr. Kavita Bhosle 26

10/4/2024 By Dr. Kavita Bhosle 27

10/4/2024 By Dr. Kavita Bhosle 28

10/4/2024 By Dr. Kavita Bhosle 29

Jaccard : sim(ti,tj) = ∑ tik tjk

Cosine : sim(ti,tj) = ∑ tik tjk

Overlap : sim(ti,tj) = ∑ tik tjk

10/4/2024 By Dr. Kavita Bhosle 34

10/4/2024 By Dr. Kavita Bhosle 42

You might also like