0% found this document useful (0 votes)
29 views24 pages

CIVI6731 Week8

The document discusses classification techniques including eager vs lazy learning, k-nearest neighbors algorithm, and Naive Bayes algorithm. It covers the kNN algorithm in detail, explaining how it works, proximity measures used to find closest neighbors, and applying kNN and Naive Bayes for classification using RapidMiner and Python.

Uploaded by

Fasih Ur Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views24 pages

CIVI6731 Week8

The document discusses classification techniques including eager vs lazy learning, k-nearest neighbors algorithm, and Naive Bayes algorithm. It covers the kNN algorithm in detail, explaining how it works, proximity measures used to find closest neighbors, and applying kNN and Naive Bayes for classification using RapidMiner and Python.

Uploaded by

Fasih Ur Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

2023‐10‐24

CIVI 6731
BIG DATA ANALYTICS FOR SMART CITIES

Week 8
Classification (2)

Learning Objectives

 Eager Learning Vs Lazy Learning


 k-Nearest Neighbors Algorithm
 Proximity Measures
 Naïve Bayes Algorithm
 Applying classification via kNN and NB using RapidMiner and
Python for occupancy detection

CIVI6731 | Mazdak Nik‐Bakht | 2023 1


2023‐10‐24

Classification
with k-Nearest Neighbors

Main Reference: Kotu & Deshpande 3

Eager Learning VS Lazy Learning


 “Eager Learners”
 Learning the [mathematical] relationship(or the best
approximation) between predictors and classes  and then
use it for predicting

 “Lazy Learners”
 Looking up the examples from training set which best match
the new example in the test set  and predict accordingly
 No “learning” really happens!

CIVI6731 | Mazdak Nik‐Bakht | 2023 2


2023‐10‐24

k-NN – Underlying idea


Birds of a feather flock together!
 In an n-dimensional space (each dimension representing one
attribute), similar records (with the same target class label)
gather in one neighborhood,.
Training
Data Memorize the entire training set
(Labelled)

Search all predictors (attributes) of the


New labeled examples in training set
(unlabeled)
Example
Find the closest match in the training
set and use its class as the prediction
5

k-NN – How it works


 Visualizing a labelled record in n-D space of its attributes
Lighting
VT% WWR% VT%
Level
WWR%
VT%

85 85 High
85 90 80 High
(85,85) & High
78 83 Low
83 (90,80) & High
(78,83) & Low

80

WWR%
78 85 90

CIVI6731 | Mazdak Nik‐Bakht | 2023 3


2023‐10‐24

k-NN – How it works


 Visualizing a labelled record in n-D space of its attributes
Lighting
WWR% VT% Shading

WWR%
Level

Shad.
VT%
85 85 FALSE High
90 80 TRUE High
(85,85, FALSE) & High
78 83 FALSE Low
(90,80, TRUE) & High
(78,83,FALSE) & Low

k-NN – How it works


 Visualizing a labelled record in n-D space of its attributes
Lighting
WWR% VT% Shading Orientation Level
WWR%

Orient.
Shad.

85 85 FALSE West High


VT%

90 80 TRUE West High


(85,85, FALSE, West) & High 78 83 FALSE North Low
(90,80, TRUE, West) & High
(78, 83,FALSE,North) & Low

4D Vectors
(4 attributes)

CIVI6731 | Mazdak Nik‐Bakht | 2023 4


2023‐10‐24

k-NN – How it works


 Predicting for a new (unlabeled) record Unlabeled Set
Lighting
VT% k=1 WWR% VT%
Level
k=5 77 81 ?
86 84 ?
84 ? ? High! 82 84 ?

81 ?
Low!
WWR%
77 82 86

What should the k be?!


9

k-NN – How it works


 Find the k closest neighbors (from the labeled training set) to
the unlabeled data
 Vote among the neighbors to predict the class of the
unlabeled record

 How to determine closest neighbors? (most similar records)


 Select a Measure of Proximity

10

CIVI6731 | Mazdak Nik‐Bakht | 2023 5


2023‐10‐24

Measures of Proximity
Measures for evaluating similarity of two records (two vectors in
the n-D attribute space)
 Distance
 Cosine Similarity
 Correlation Similarity
 Simple Matching Coefficient
 Jaccard Similarity

11

Measures of Proximity Datapoint X

d
Distance
 𝑋⃗ 𝑥 ,𝑥 ,…,𝑥
Datapoint Y
𝑌 𝑦 ,𝑦 ,…,𝑦

Euclidean Distance

𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒅 𝒙𝟏 𝒚𝟏 𝟐 𝒙𝟐 𝒚𝟐 𝟐 ⋯ 𝒙𝒏 𝒚𝒏 𝟐

12

CIVI6731 | Mazdak Nik‐Bakht | 2023 6


2023‐10‐24

Measures of Proximity Datapoint X

Distance (cnt.)
 𝑋⃗ 𝑥 ,𝑥 ,…,𝑥
Datapoint Y
𝑌 𝑦 ,𝑦 ,…,𝑦

Manhattan (or Hamming or taxicab) Distance


𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒅 |𝒙𝟏 𝒚𝟏 | |𝒙𝟐 𝒚𝟐 | ⋯ |𝒙𝒏 𝒚𝒏 |
𝒏

|𝒙𝒊 𝒚𝒊 |
𝒊 𝟏 13

Measures of Proximity Datapoint X

Distance (cnt.)
 𝑋⃗ 𝑥 ,𝑥 ,…,𝑥
Datapoint Y
𝑌 𝑦 ,𝑦 ,…,𝑦

Minkowski (p-norm) Distance


𝟏 Norm
𝒏 𝒑

𝑫𝒊𝒔𝒕𝒂𝒏𝒄𝒆 𝒅 |𝒙𝒊 𝒚𝒊 |𝒑
𝒊 𝟏
14

CIVI6731 | Mazdak Nik‐Bakht | 2023 7


2023‐10‐24

Measures of Proximity
Distance (cnt.)
What distance measure to select?

 Depends on data (Grabuts, 2011)


 Numeric  Euclidean the most common
 Binary attributes  Manhattan
 Unknown?  No rule of thumb! But Euclidean is a good start point

15

Measures of Proximity
Distance (cnt.)
Issue with distance: it depends on the scale and units of
attributes!
 Attributes are in different measures
 Attributes are in different units
Solution:
 Normalize all attributes
 Range transformation: rescale between 0 and1: x  (x-min)/(max-min)
 Z-transformation: x  (x-mean)/SD

16

CIVI6731 | Mazdak Nik‐Bakht | 2023 8


2023‐10‐24

Measures of Proximity
Distance (cnt.)
For Categorical data:
 If ordinal: turn to integers
 (cold, mild, warm, hot)  (0,1,2,3)

 Else: Dummy coding


 (metro, bus, streetcar)  metro (0 or 1)
bus (0 or 1)
streetcar (0 or 1)

17

Measures of Proximity Datapoint X

Cosine Similarity
Measure of angles between vectors: θ
Datapoint Y
𝑿. 𝒀
𝑪𝒐𝒔 𝑿, 𝒀
𝑿 . 𝒀

 E.G.
1 5 0 0 0 3 7
𝑋 1, 2, 0, 0, 3 𝐶𝑜𝑠 𝑋, 𝑌 0.66
𝑌 5 0, 0, 6, 7 1 2 3 5 6 7

18

CIVI6731 | Mazdak Nik‐Bakht | 2023 9


2023‐10‐24

Measures of Proximity
Correlation Similarity
 Pearson correlation between 𝑋⃗ and 𝑌 is a measure of linear
relationship between their attributes
𝑐𝑜𝑣 𝑋, 𝑌
𝑟 𝑋⃗, 𝑌
𝑉𝑎𝑟 𝑋 . 𝑉𝑎𝑟 𝑦

 NOTE! This is similarity between records (rows) than attributes


(columns)

19

Measures of Proximity
Simple Matching Coefficient (SMC)
 Good for binary attributes
𝑚𝑎𝑡𝑐ℎ𝑖𝑛𝑔 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠
𝑆𝑀𝐶 𝑋⃗, 𝑌
𝑡𝑜𝑡𝑎𝑙 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠

 E.G.
4
𝑋 1, 1, 0, 0, 1, 1, 0 𝑆𝑀𝐶 𝑋, 𝑌
7
𝑌 1, 0, 0, 1, 1, 0, 0

20

CIVI6731 | Mazdak Nik‐Bakht | 2023 10


2023‐10‐24

Measures of Proximity
Jaccard Similarity
 Similar to SMC, but first sets the nonoccurrence aside
𝑐𝑜𝑚𝑚𝑜𝑛 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒
𝐽 𝑋⃗, 𝑌
𝑡𝑜𝑡𝑎𝑙 𝑜𝑐𝑐𝑢𝑟𝑟𝑒𝑛𝑐𝑒𝑠

 E.G.
𝑋 1, 1, 0, 0, 1, 1, 0 2
𝐽 𝑋, 𝑌
𝑌 1, 0, 0, 1, 1, 0, 0 5

21

Class Prediction (Voting)


Once the k nearest neighbors are identified: the class of majority is
selected as the prediction
c* = max Class (c1,c2,…,ck)
 Weights
The nearer a neighbor, the higher its weight should be in predicting
the class!
c* = max Class (w1 c1, w2 c2,…, wk ck)
−d(x,ni)
e
⩥ wi =
−d(x,ni)
∑k e
i=1

22

CIVI6731 | Mazdak Nik‐Bakht | 2023 11


2023‐10‐24

k-NN – Summary
 A lazy learner
 Easiest to implement – can be implemented even in Excel;
 Proper quality needs a significant number of inputs.
 Preparation:
 Numeric attributes must be normalized;
 Categorical attributes must be turned to Boolean or integer.
 Application:
 Select a k;
 Select a proximity measure;
 Decide whether or not to apply distance-based weights in prediction;
 Find the k nearest neighbors;
 Evaluate votes among the k nearest neighbors and predicting.

23

k-NN – Summary

Advantages of kNN Limitations of kNN


 Very cheap to build  Only “memorizes” and
 Even can be built in Excel; doesn’t “learn”
 Training (i.e. memorizing) is  Relationships between inputs &
completed super fast. output can’t be explained,
generalized or abstracted.
 Robust against missing data
 Attribute with missing data is  Predicting for unlabeled
set aside in evaluating data is expensive
proximity without violating  Hard to use in time-sensitive
accuracy. applications.

24

CIVI6731 | Mazdak Nik‐Bakht | 2023 12


2023‐10‐24

Application Example –
Occupancy Prediction
 Brennan et al. in 2015 showed the high accuracy of kNN in predicting
occupancy rate (count), using environmental sensory information

Storage &
Communication
[Raspberry pi
3B]
Sensing
Processing Unit
Unit

Data
interpretation Temperature CO2
[Arduino UNO] & Humidity [K30]
[DHT22] 25

Application Example –
Occupancy Prediction
 Brennan et al in 2015 showed the high accuracy of kNN in predicting
occupancy rate (count), using environmental sensory information
 Results:

26

CIVI6731 | Mazdak Nik‐Bakht | 2023 13


2023‐10‐24

Any questions so far?

27

Classification
with Naïve Bayesian

Main Reference: Kotu & Deshpande 28

CIVI6731 | Mazdak Nik‐Bakht | 2023 14


2023‐10‐24

Naïve Bayesian – Underlying idea


Bayes Theorem

𝑃 𝐵 𝐴 .𝑃 𝐴 𝑃 𝐴 𝐵 .𝑃 𝐵
Class Conditional
Pr (CCP)
“Posterior Pr” “Prior Pr”
𝑷 𝑨𝑩 .𝑷 𝑩
 𝑷 𝑩𝑨
𝑷 𝑨
Outcome
(in classification Thomas Bayes 1701 – 1761
Evidence
task: the class)
(in classification task:
predictive attribute(s))
29

Naïve Bayesian – Underlying idea


Bayes Theorem
 Example – Probability of having a morning rush hour in
Montréal is 40%. Probability of your teacher, Maz, being
late in the morning classes is generally 8%, but when he’s
late in the morning, then you know that with a good
chance (of about 75%) there has been a rush hour!
 You’re late for your morning class with Maz, and you’re stuck
in a rush hour!
What is the chance of Maz, himself, being late too?!

30

CIVI6731 | Mazdak Nik‐Bakht | 2023 15


2023‐10‐24

Naïve Bayesian – Underlying idea


 How does NB work?!
𝑃 𝐴𝐶 .𝑃 𝐶
𝑃 𝐶∗ 𝐶 𝐴 𝑎 ,𝑎 ,…,𝑎
𝑃 𝐴 Labeled Data (Training Set)
𝑃 𝐴𝐶 .𝑃 𝐶 A 1, A 2, … , A n & C
𝑃 𝐶∗ 𝐶 𝐴 𝑎 ,𝑎 ,…,𝑎
𝑃 𝐴
.
. Unlabeled Datapoint
. {a1, a2, a3, … , an}
𝑃 𝐴𝐶 .𝑃 𝐶
𝑃 𝐶∗ 𝐶 𝐴 𝑎 ,𝑎 ,…,𝑎
𝑃 𝐴
C1 C2 … Ck
? ? ?

31

Naïve Bayesian – Underlying idea


But: What is the “Naïve” aspect?!
In practice, we have a SET OF EVIDENCES!
{A} = {a1, a2, …, an}

Hence, must calculate: P(C|{A}) = P({A}|C)*P(C)/P({A})

 P({A}) may be calculable from the dataset


 But what about P({A}|C) ?
 We can calculate P(A1|C), P(A2|C) … P(Ai|C) … from the dataset
 But how about P({a1,a2,a3}|C)??
32

CIVI6731 | Mazdak Nik‐Bakht | 2023 16


2023‐10‐24

Naïve Bayesian – Underlying idea


But: What is the “Naïve” aspect?!
In practice, we have a SET OF EVIDENCES!
{A} = {a1, a2, …, an}

Hence, must calculate: P(C|{A}) = P({A}|C)*P(C)/P({A})

 P({A}) may be calculable from the dataset


 But what about P({A}|C) ?
 We can calculate P(a1|C), P(a2|C) … P(ai|C) … from the dataset
 But how about P({a1,a2,…,an}|C)??
33

Naïve Bayesian – Underlying idea


Back to the Bayes Theorem
 Example: I – Maz, am generally late for  The probability of snow in
20% of my winter term classes 😥 40% of Montréal winter is 50%.
which happen in my morning classes; 35% Also 20% of my winter term
happen in snowy days; and 25% happen classes are in morning, and
to the classes I have in Hall building! 20% of my winter classes
are in Hall building.

I open my eyes this morning; have a morning class in Hall and it’s snowing again!
What is the chance of me being late today?!

34

CIVI6731 | Mazdak Nik‐Bakht | 2023 17


2023‐10‐24

Naïve Bayesian – How it works


 A new (unlabeled) record:
 Observe the set of attribute values A = a1 through an
 What class would the record belong to?
 Calculate P(c|A) from the dataset (for each class label c)
𝑃 𝑐 ∏ 𝑃 𝑎 𝑐
𝑃 𝑐𝐴 No need to consider the
𝑃 𝐴 denominator, since it’s
the same for all classes!
 Select the class with the highest probability and use it as
prediction for the input record!
35

Naïve Bayesian – Example


A1: A2: A3: A4: c:
STEP 1) Prior Probabilities VT% WWR% Orientation Shading
Level
Lighting

 P(c=High) = 5/14 High


High
Med
High
West
West
FALSE
TRUE
High
High
 P(c=Low) = 9/14 Low Low East TRUE High
Med High West FALSE High
Low Med East TRUE High
High Med North FALSE Low
Low High East FALSE Low
Low Med East FALSE Low
Low Low North TRUE Low
Low Low West FALSE Low
Med Med East FALSE Low
Med Low West TRUE Low
Med High North TRUE Low
High Low North FALSE Low
[dataset extracted from J. Ross Quinlan to introduce ID3]
36

CIVI6731 | Mazdak Nik‐Bakht | 2023 18


2023‐10‐24

Naïve Bayesian – Example


A1: A2: A3: A4: c:
STEP 2) Class Conditional Probabilities VT% WWR% Orientation Shading
Lighting
Level

 P(VT=H | c= High)= 2/5 High


High
Med
High
West
West
FALSE
TRUE
High
High
 P(VT=H | c= Low) = 2/9 Low Low East TRUE High
Med High West FALSE High
Low Med East TRUE High
 P(VT=M | c= High) = 1/5 High
Low
Med
High
North
East
FALSE
FALSE
Low
Low
 P(VT=M | c= Low) = 3/9 Low Med East FALSE Low
Low Low North TRUE Low
Low Low West FALSE Low
 P(VT=L | c= High) = 2/5 Med
Med
Med
Low
East
West
FALSE
TRUE
Low
Low
 P(VT=L | c=Low) = 4/9 Med High North TRUE Low
High Low North FALSE Low
37

Naïve Bayesian – Example


A1: A2: A3: A4: c:
STEP 2) Class Conditional Probabilities (Cnt.) VT% WWR% Orientation Shading
Lighting
Level

 P(WWR=H | c= High) = 2/5 High


High
Med
High
West
West
FALSE
TRUE
High
High
 P(WWR=H | c= Low) = 2/9 Low Low East TRUE High
Med High West FALSE High
Low Med East TRUE High
 P(WWR=M | c= High) = 2/5 High
Low
Med
High
North
East
FALSE
FALSE
Low
Low
 P(WWR=M | c= Low) = 3/9 Low Med East FALSE Low
Low Low North TRUE Low
Low Low West FALSE Low
 P(WWR=L | c= High) = 1/5 Med
Med
Med
Low
East
West
FALSE
TRUE
Low
Low
 P(WWR=L | c= Low) = 4/9 Med High North TRUE Low
High Low North FALSE Low
38

CIVI6731 | Mazdak Nik‐Bakht | 2023 19


2023‐10‐24

Naïve Bayesian – Example


A1: A2: A3: A4: c:
STEP 2) Class Conditional Probabilities (Cnt.) VT% WWR% Orientation Shading
Lighting
Level

 P(Orient=W | c= High) = 3/5 High


High
Med
High
West
West
FALSE
TRUE
High
High
 P(Orient=W | c= Low) = 2/9 Low Low East TRUE High
Med High West FALSE High
Low Med East TRUE High
 P(Orient=E | c= High) = 2/5 High
Low
Med
High
North
East
FALSE
FALSE
Low
Low
 P(Orient=E | c= Low) = 3/9 Low Med East FALSE Low
Low Low North TRUE Low
Low Low West FALSE Low
 P(Orient=N | c= High) = 0/5 Med
Med
Med
Low
East
West
FALSE
TRUE
Low
Low
 P(Orient=N | c= Low) = 4/9 Med High North TRUE Low
High Low North FALSE Low
39

Naïve Bayesian – Example


A1: A2: A3: A4: c:
STEP 2) Class Conditional Probabilities (Cnt.) VT% WWR% Orientation Shading
Lighting
Level

 P(Shad=TRUE | c= High) = 3/5 High


High
Med
High
West
West
FALSE
TRUE
High
High
 P(Shad=TRUE | c= Low) = 3/9 Low Low East TRUE High
Med High West FALSE High
Low Med East TRUE High
 P(Shad=FALSE | c= High) = 2/5 High
Low
Med
High
North
East
FALSE
FALSE
Low
Low
 P(Shad=FALSE | c= Low) = 6/9 Low Med East FALSE Low
Low Low North TRUE Low
Low Low West FALSE Low
Med Med East FALSE Low
Med Low West TRUE Low
Med High North TRUE Low
High Low North FALSE Low
40

CIVI6731 | Mazdak Nik‐Bakht | 2023 20


2023‐10‐24

Naïve Bayesian – Example


STEP 3) Predicting Outcome
 A new “Test” record:
A1: VT% A2: WWR% A3: Orientation A4: Shading
High Low West FALSE
 𝑷 𝒄𝑨 𝑷 𝒄 ∏𝒏𝒊 𝟏 𝑷 𝒂𝒊 𝒄

 P(Low|A) = P(Low) ∏P(ai|Low)


= P(L)*[P(VT=H|L)*P(WWR=L|L)*P(Ornt=W|L)*P(Shad=False|L)]
= 9/14 * [2/9 * 4/9 * 2/9 * 6/9] = 0.0094

 P(High|A) = P(High) X ∏P(ai|High)


= 5/14*[2/5 * 1/5 * 3/5 * 2/5] = 0.00685
P(Low|A) > P(High|A)  Low!
41

Naïve Bayesian – Summary

Advantages of NB Limitations of NB
 Easy to understand  Issue with incomplete
 Particularly when understanding training sets (zero-frequency
Bayes theorem! problem)
 Easy to implement  Issue with continuous
 Robust against missing attributes
values  Issue with attribute
 Set aside attributes with missing independence
data

42

CIVI6731 | Mazdak Nik‐Bakht | 2023 21


2023‐10‐24

Naïve Bayesian – Issues


1) Zero-frequency problem
 In calculating Class Conditional Probabilities P({A} | c); even if one
of the P(ai|c) = 0; then the entire CCP will be zero!
 E.G. In the ‘Natural Lighting’ example, for any test record with “Orientation =
North”, the probability of High will be calculated as ZERO!
 Hence, if the training set doesn’t include records with all possible
categories of all attributes/classes, the answers won’t be reliable

 One [partial] possible solution (LapLace Correction):


 Assigning a small default probability instead of zero to P(ai|c) = 0 for missing
records

43

Naïve Bayesian – Issues


2) Continuous Attributes
 NB (as introduced so far) only works for attributes with
nominal values!

 Solution 1) Discretization
 Problems: subjectivity of bucketing ranges

 Solution 2) Using probability density functions (continuous


probability distributions)
 Assume a probability distribution for numerical attributes (Normal
DistN; Poisson DistN, etc.)
44

CIVI6731 | Mazdak Nik‐Bakht | 2023 22


2023‐10‐24

Naïve Bayesian – Issues


3) Attribute Independence
 What if the attributes are NOT independent?!

 Solution 1) Pre-processing
 Complete a correlation analysis and remove strongly correlated
attributes before training NB model
o Numeric attributes: Pearson test
o Categorical attributes: Chi-squared (χ2) test

 Solution 2) Use Bayesian Belief Networks


 An advanced application of Bayes theorem designed to handle
attribute dependence
45

Week8 Tutorial
Occupancy Detection to Enhance the Digital Twin!
As a part of the Digital Twin project, Concordia Facility
Management aims to detect the occupancy from environmental
sensory data (similar to Brennan et al. in 2015)
 You’re developing a classifier which reads the sensory
information in a room and predicts whether the room
currently:
 is vacant;
 has a LOW count of occupants (1~4);
 has a MEDIUM count of occupants (5~14); or
 has a HIGH count of occupants (≥15).
46

CIVI6731 | Mazdak Nik‐Bakht | 2023 23


2023‐10‐24

What do you think?

47

CIVI6731 | Mazdak Nik‐Bakht | 2023 24

You might also like