0% found this document useful (0 votes)

51 views7 pages

Unit-Iv Data Classification: Data Warehousing and Data Mining

This document discusses data classification techniques including Naive Bayesian classification and Bayesian belief networks. It provides examples of how to apply Naive Bayesian classification to classify data using Bayes' theorem. Specifically, it shows how to calculate the probability that a data tuple belongs to a particular class and predicts the class with the highest probability. It also describes how Bayesian belief networks allow dependencies between attributes unlike Naive Bayesian classifiers which assume attribute independence given the class.

Uploaded by

Pradeepkumar 05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views7 pages

Unit-Iv Data Classification: Data Warehousing and Data Mining

Uploaded by

Pradeepkumar 05

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Data Warehousing and Data Mining

UNIT –IV: Syllabus

Classification: Alterative Techniques, Bayes’ Theorem, Naïve Bayesian
Classification, Bayesian Belief Networks

UNIT-IV
DATA CLASSIFICATION (Alternative Techniques)
Classification is a form of data analysis that extracts models describing important data
classes. Such models, called classifiers, predict categorical (discrete, unordered) class labels.
For example, we can build a classification model to categorize bank loan applications as either
safe or risky. Such analysis can help provide us with a better understanding of the data at large.
Many classification methods have been proposed by researchers in machine learning, pattern
recognition, and statistics.

Classification: Alternative Techniques:

Bayesian Classification:
 Bayesian classifiers are statistical classifiers.
 They can predict class membership probabilities, such as the probability that a given
tuple belongs to a particular class.
 Bayesian classification is based on Bayes’ theorem.
Bayes’ Theorem:
 Let X be a data tuple. In Bayesian terms, X is considered ― “evidence” and it is
described by measurements made on a set of n attributes.
 Let H be some hypothesis, such as that the data tuple X belongs to a specified class C.
 For classification problems, we want to determine P(H|X), the probability that the
hypothesis H holds given the ―evidence‖ or observed data tuple X.
 P(H|X) is the posterior probability, or a posteriori probability, of H conditioned on X.
 Bayes’ theorem is useful in that it provides a way of calculating the posterior
probability, P(H|X), from P(H), P(X|H), and P(X).
𝑷 𝑿 𝑯 𝑷(𝑯)
𝑷𝑯𝑿=
𝑷(𝑿)

Naïve Bayesian Classification:

The naïve Bayesian classifier, or simple Bayesian classifier, works as follows:
1. Let=be a training set of tuples and their associated class labels. As usual, each tuple is
represented by an n-dimensional attribute vector, X = (x1, x2, …,xn), depicting n
measurements made on the tuple from n attributes, respectively, A1, A2, …, An.
2. Suppose that there are m classes, C1, C2, …, Cm. Given a tuple, X, the classifier will
predict that X belongs to the class having the highest posterior probability, conditioned
on X. That is, the naïve Bayesian classifier predicts that tuple X belongs to the class Ci
if and only if
𝑷 𝑪𝒊 𝑿 > 𝑃 𝑪𝒋 𝑿 𝒇𝒐𝒓 𝟏 < 𝒋 < 𝒎, 𝒋 ≠ 𝒊.
Thus we maximize(𝐶𝑗 |𝑋). The class Ci for which (𝐶𝑗 |𝑋). is maximized is called the

Page 1
Data Warehousing and Data Mining
maximum posteriori hypothesis. By Bayes’ theorem
𝑷 𝑿 𝑪𝒊 𝑷(𝑪𝒊)
𝑷 𝑪 𝒊𝑿 =
𝑷(𝑿)

Page 2
Data Warehousing and Data Mining
3. As P(X) is constant for all classes, only P(X|Ci)P(Ci) need be maximized. If the class
prior probabilities are not known, then it is commonly assumed that the classes are
equally likely, that is, P(C1) = P(C2) = …= P(Cm), and we would therefore maximize
P(X|Ci). Otherwise, we maximize P(X|Ci)P(Ci).
4. Given data sets with many attributes, it would be extremely computationally expensive
to compute P(X|Ci). In order to reduce computation in evaluating P(X|Ci), the naive
assumption of class conditional independence is made. This presumes that the values
of the attributes are conditionally independent of one another, given the class label of
the tuple. Thus,
𝒏

𝑷 𝑿 𝑪𝒊 = 𝑷 𝒙𝒌 𝑪𝒊
𝒌=𝟏
= 𝑷 𝒙𝟏 𝑪𝟏 × 𝑷 𝒙𝟐 𝑪𝟐 × … … . .× 𝑷 𝒙𝒏 𝑪𝒊
5. We can easily estimate the probabilities P(x1|Ci), P(x2|Ci), : : : , P(xn|Ci) from the
training tuples.
6. For each attribute, we look at whether the attribute is categorical or continuous-
valued. For instance, to compute P(X|Ci), we consider the following:
 If Ak is categorical, then P(xk|Ci) is the number of tuples of class Ci in=havingthe
value xk for Ak, divided by |Ci ,D| the number of tuples of class Ci in D.
 If Ak is continuous-valued, then we need to do a bit more work, but the calculation is
pretty straightforward.

Example:

age income student credit_rating buys_computer

youth high no fair no
youth high no excellent no
middle_aged high no fair yes
senior medium no fair yes
senior low yes fair yes
senior low yes excellent no
middle_aged low yes excellent yes
youth medium no fair no
youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle_aged medium no excellent yes
middle_aged high yes fair yes
senior medium no excellent no

We wish to predict the class label of a tuple using naïve Bayesian classification, given
the same training data above. The training data were shown above in Table. The data tuples are
described by the attributes age, income, student, and credit rating. The class label

Page 3
Data Warehousing and Data Mining
attribute, buys computer, has two distinct values (namely, {yes, no}). Let C1 correspond to the
class buys computer=yes and C2 correspond to buys computer=no. The tuple we wish to
classify is
X={age= “youth”, income= “medium”, student= “yes”, credit_rating=
“fair”}

We need to maximize P(X|Ci)P(Ci), for i=1,2. P(Ci), the prior probability of each
class, can be computed based on the training tuples:

P(buys computer = yes) = 9/14 = 0.643

P(buys computer = no) = 5/14 = 0.357

To compute P(X|Ci), for i = 1, 2, we compute the following conditional probabilities:

P(age = youth | buys computer = yes) = 2/9 = 0.222

P(income=medium | buys computer=yes) = 4/9 = 0.444
P(student=yes | buys computer=yes) = 6/9 = 0.667
P(credit rating=fair | buys computer=yes) = 6/9 = 0.667

P(age=youth | buys computer=no) = 3/5 = 0.600

P(income=medium | buys computer=no) = 2/5 = 0.400
P(student=yes | buys computer=no) = 1/5 = 0.200
P(credit rating=fair | buys computer=no) = 2/5 = 0.400

Using these probabilities, we obtain

To find the class, Ci, that P(X|Ci)P(Ci), we compute

P(X | buys computer=yes) P(buys computer=yes) = 0.044 × 0.643 = 0.028
P(X | buys computer=no) P(buys computer=no) = 0.019 × 0.357 = 0.007
Therefore, the naïve Bayesian classifier predicts buys computer = yes for tuple X.
Page 4
Data Warehousing and Data Mining

Bayesian Belief Networks

 Bayesian belief networks—probabilistic graphical models, which unlike naïve Bayesian
classifiers allow the representation of dependencies among subsets of attributes.
 The naïve Bayesian classifier makes the assumption of class conditional independence,
that is, given the class label of a tuple, the values of the attributes are assumed to be
conditionally independent of one another.
 When the assumption holds true, then the naïve Bayesian classifier is the most accurate in
comparison with all other classifiers.
 They provide a graphical model of causal relationships, on which learning can be
performed.
 A belief network is defined by two components—a directed acyclic graph and a set of
conditional probability tables (See Figure).
 Each node in the directed acyclic graph represents a random variable. The variables may
be discrete- or continuous-valued.
 They may correspond to actual attributes given in the data or to “hidden variables”
believed to form a relationship.
 Each arc represents a probabilistic dependence. If an arc is drawn from a node Y to a
node Z, then Y is a parent or immediate predecessor of Z, and Z is a descendant of Y.
 Each variable is conditionally independent of its nondescendants in the graph, given its
parents.

For example, having lung cancer is influenced by a person’s family history of lung
cancer, as well as whether or not the person is a smoker. Note that the variable PositiveXRay is
independent of whether the patient has a family history of lung cancer or is a smoker, given
that we know the patient has lung cancer.

Page 5
Data Warehousing and Data Mining

In other words, once we know the outcome of the variable LungCancer, then the
variables FamilyHistory and Smoker do not provide any additional information regarding
PositiveXRay. The arcs also show that the variable LungCancer is conditionally independent
of Emphysema, given its parents, FamilyHistory and Smoker.
A belief network has one conditional probability table (CPT) for each variable.
The CPT for a variable Y specifies the conditional distribution P(Y|Parents(Y)), where
Parents(Y) are the parents of Y. Figure (b) shows a CPT for the variable LungCancer. The
conditional probability for each known value of LungCancer is given for each possible
combination of the values of its parents. For instance, from the upper leftmost and bottom
rightmost entries, respectively.

Page 6
Data Warehousing and Data Mining

Page 7

9-Decision Tree Induction-23-01-2025
No ratings yet
9-Decision Tree Induction-23-01-2025
40 pages
Unit-3 AML (Bayesian Concept Learning)
No ratings yet
Unit-3 AML (Bayesian Concept Learning)
40 pages
3 - Bayesian Classification
No ratings yet
3 - Bayesian Classification
15 pages
Unit 3 Machine Learning
No ratings yet
Unit 3 Machine Learning
159 pages
Classification Clustering
No ratings yet
Classification Clustering
44 pages
UNIT - IV
No ratings yet
UNIT - IV
169 pages
DATA - FA 2024 - Dist
No ratings yet
DATA - FA 2024 - Dist
85 pages
10 Classification New 1
No ratings yet
10 Classification New 1
31 pages
29-Naive Bayes-03-10-2024
No ratings yet
29-Naive Bayes-03-10-2024
48 pages
Naive Bayes
No ratings yet
Naive Bayes
37 pages
CS-DM Module-4
No ratings yet
CS-DM Module-4
22 pages
Bayes Classification Methods
No ratings yet
Bayes Classification Methods
22 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Bays Classifier (Machine Learning)
No ratings yet
Bays Classifier (Machine Learning)
16 pages
Data Classification and Prediction : Lecture-11
No ratings yet
Data Classification and Prediction : Lecture-11
36 pages
Chapter 5 Classification
No ratings yet
Chapter 5 Classification
24 pages
Lecture 8 - Naive Bayes
No ratings yet
Lecture 8 - Naive Bayes
27 pages
Lecture12 Ch8 ClassBasic Part2
No ratings yet
Lecture12 Ch8 ClassBasic Part2
22 pages
L3 (Week3) Bayesian Classifier
No ratings yet
L3 (Week3) Bayesian Classifier
21 pages
IME672 - Lecture 44
No ratings yet
IME672 - Lecture 44
16 pages
Classification and Prediction
No ratings yet
Classification and Prediction
21 pages
Unit Iv
No ratings yet
Unit Iv
38 pages
Bayes Classification Method
No ratings yet
Bayes Classification Method
18 pages
8 Classification
No ratings yet
8 Classification
45 pages
Nayes Bayes Classifier
No ratings yet
Nayes Bayes Classifier
46 pages
ML Module4 Classification
No ratings yet
ML Module4 Classification
79 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
16 pages
DWDM Unit Iv
No ratings yet
DWDM Unit Iv
30 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
Lecture Slide 03 - Bayesian Classifier - Summer 2023
No ratings yet
Lecture Slide 03 - Bayesian Classifier - Summer 2023
23 pages
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
No ratings yet
Bayesian Classification: Cse 634 Data Mining - Prof. Anita Wasilewska
66 pages
DWDM Unit 3 Part 2
No ratings yet
DWDM Unit 3 Part 2
8 pages
Unit Iv
No ratings yet
Unit Iv
34 pages
4 22865 IS465 2019 1 2 1 08ClassBasic
No ratings yet
4 22865 IS465 2019 1 2 1 08ClassBasic
43 pages
NB Classifier & Bayesian Network 2
No ratings yet
NB Classifier & Bayesian Network 2
37 pages
Bayes Classifier
No ratings yet
Bayes Classifier
20 pages
Bayesian
No ratings yet
Bayesian
23 pages
Bayes Classification
No ratings yet
Bayes Classification
9 pages
2.3 Bayes Classification
No ratings yet
2.3 Bayes Classification
15 pages
Statistical Inference INF312 - Is - Lecture 03 - Part 3
No ratings yet
Statistical Inference INF312 - Is - Lecture 03 - Part 3
18 pages
Data Mining - Module 7
No ratings yet
Data Mining - Module 7
8 pages
Bayesian Classification
No ratings yet
Bayesian Classification
25 pages
20210913115710D3708 - Session 09-12 Bayes Classifier
No ratings yet
20210913115710D3708 - Session 09-12 Bayes Classifier
30 pages
Classification-Alternative Techniques: Bayesian Classifiers
No ratings yet
Classification-Alternative Techniques: Bayesian Classifiers
7 pages
K - Nearest Neighbours Classifier / Regressor
No ratings yet
K - Nearest Neighbours Classifier / Regressor
35 pages
AI Notes
No ratings yet
AI Notes
19 pages
Unit6 - 3 Classification-Bayesian
No ratings yet
Unit6 - 3 Classification-Bayesian
15 pages
ML 05 Bayesian Classifier
No ratings yet
ML 05 Bayesian Classifier
19 pages
Module 3 - Bayesian Classifier
No ratings yet
Module 3 - Bayesian Classifier
17 pages
Bayesian Classification - Problem
No ratings yet
Bayesian Classification - Problem
4 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
No ratings yet
Bayesian Classification: Dr. Navneet Goyal BITS, Pilani
35 pages
Unit-4 DWDM
No ratings yet
Unit-4 DWDM
10 pages
8 - Classification NaiveBayes PDF
No ratings yet
8 - Classification NaiveBayes PDF
13 pages
A5 PDF
No ratings yet
A5 PDF
9 pages
Lecture 5 Bayesian Classification
No ratings yet
Lecture 5 Bayesian Classification
16 pages
Data Mining - Bayesian Classification
No ratings yet
Data Mining - Bayesian Classification
6 pages
Oil and Gas Indonesia
No ratings yet
Oil and Gas Indonesia
87 pages
Teddy CADiNP Introduction
No ratings yet
Teddy CADiNP Introduction
59 pages
Top 41 SAP Security Interview Questions and Answers
No ratings yet
Top 41 SAP Security Interview Questions and Answers
6 pages
HLID Credential 2023
No ratings yet
HLID Credential 2023
21 pages
Quarterly Presentation On Training As Probationary Deputy Executive Engineer (Civil)
No ratings yet
Quarterly Presentation On Training As Probationary Deputy Executive Engineer (Civil)
22 pages
Dokumen - Pub - Move Over Brokers Here Comes The Blockchain 1175682526
No ratings yet
Dokumen - Pub - Move Over Brokers Here Comes The Blockchain 1175682526
272 pages
API Checklist INFO
No ratings yet
API Checklist INFO
17 pages
Sales Manual Africa
No ratings yet
Sales Manual Africa
9 pages
Motherboard Manual
No ratings yet
Motherboard Manual
23 pages
Predicate and Quantifiers
No ratings yet
Predicate and Quantifiers
8 pages
Isolation Forest Step by Step. Overview - by Hyunsu Kim - Medium
No ratings yet
Isolation Forest Step by Step. Overview - by Hyunsu Kim - Medium
5 pages
Whatsapp Document PDF
No ratings yet
Whatsapp Document PDF
5 pages
AX90032-i002 - Yeti™ XL 1-8th Scale RTR
No ratings yet
AX90032-i002 - Yeti™ XL 1-8th Scale RTR
32 pages
FALLSEM2019-20 EEE2004 ETH VL2019201000960 MODEL QUESTION PAPER Model Question Paper
No ratings yet
FALLSEM2019-20 EEE2004 ETH VL2019201000960 MODEL QUESTION PAPER Model Question Paper
2 pages
Gemini For Google Cloud Documentation
No ratings yet
Gemini For Google Cloud Documentation
2 pages
Meanstack Lab Manual 2022-23
No ratings yet
Meanstack Lab Manual 2022-23
80 pages
NEW Java Mannual-Lab
No ratings yet
NEW Java Mannual-Lab
43 pages
CHCCCS011 Learner Workbook
No ratings yet
CHCCCS011 Learner Workbook
27 pages
Loresco SC 3
No ratings yet
Loresco SC 3
1 page
Memorial On Behalf of Appellant (Team Code - 03)
No ratings yet
Memorial On Behalf of Appellant (Team Code - 03)
29 pages
Module 3 Report
No ratings yet
Module 3 Report
66 pages
PP 2500PC 20221010
No ratings yet
PP 2500PC 20221010
2 pages
Isb-Iba1-P01-N9k DR Drill Cab Meeting
No ratings yet
Isb-Iba1-P01-N9k DR Drill Cab Meeting
10 pages
Nara Cognitive Technologies Whitepaper
No ratings yet
Nara Cognitive Technologies Whitepaper
29 pages
RSCH Methods - 511 Paris - Exam Paper
No ratings yet
RSCH Methods - 511 Paris - Exam Paper
2 pages
Complaint Copy Uppcl
No ratings yet
Complaint Copy Uppcl
2 pages
p102613 Docjl Burnerspec Sheet 3
No ratings yet
p102613 Docjl Burnerspec Sheet 3
2 pages
Tectura Cloud Capability - 2017
No ratings yet
Tectura Cloud Capability - 2017
26 pages
Product Data Sheet Metco 5MPE Series Powder Feeders
No ratings yet
Product Data Sheet Metco 5MPE Series Powder Feeders
4 pages
Pechenik Worksheet 2013
No ratings yet
Pechenik Worksheet 2013
2 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
MCS-013: Discrete Mathematics
From Everand
MCS-013: Discrete Mathematics
Dr. DK Sukhani
No ratings yet
Naive Bayes Classifier: Fundamentals and Applications
From Everand
Naive Bayes Classifier: Fundamentals and Applications
Fouad Sabry
No ratings yet

Unit-Iv Data Classification: Data Warehousing and Data Mining

Uploaded by

Unit-Iv Data Classification: Data Warehousing and Data Mining

Uploaded by

Data Warehousing and Data Mining

UNIT –IV: Syllabus

Classification: Alternative Techniques:

Naïve Bayesian Classification:

age income student credit_rating buys_computer

P(buys computer = yes) = 9/14 = 0.643

To compute P(X|Ci), for i = 1, 2, we compute the following conditional probabilities:

P(age = youth | buys computer = yes) = 2/9 = 0.222

P(age=youth | buys computer=no) = 3/5 = 0.600

Using these probabilities, we obtain

To find the class, Ci, that P(X|Ci)P(Ci), we compute

Bayesian Belief Networks

You might also like