CIVI6731 Week5
CIVI6731 Week5
CIVI 6731
BIG DATA ANALYTICS FOR SMART CITIES
Week 5
Association Rule Learning
Learning Objectives
But there are still ways to find out if categorical variables are
related in some way; We need to simply move from correlation
to:
Association!
?
8
Ref: G. Liu, L. Shi, C. Lan, T. Z. Qiu and J. Fang, "Use of Data Mining Technology to Investigate Vehicle Speed
in Winter Weather: A Case Study," in TRB 94th Annual Meeting Compendium of Papers, 2015. 10
11
Support
Confidence
Lift
Conviction
12
Support
Relative frequency of occurrence for an (item or) item-set in
the transaction
Pr
Support of a rule:
,
→ Pr ,
Indicates whether a rule is worth considering
Low support Occurrence just by chance!
Only those rules exceeding the support threshold will be considered
for further analysis
13
,
→
14
Lift
Lift of a rule (XY):
The probability of observing X and Y together compared to the
probability that we see them together randomly (i.e. if X and Y
were completely independent)
,
→
∗
15
Conviction
Conviction of a rule (X Y):
How often X occurs in a transaction where Y does not?
1
→
1 →
The ratio of expected frequency of not having Y but having X (how
often when Y is not in the set the rule is being violated?)
16
17
Rule Mining
Generating meaningful, yet interesting association rules from
transaction dataset:
18
19
Apriori Algorithm
20
Example
Records of six complete travels in Toronto, ON (in terms of
travel modes used) are shown in the table bellow.
Can we find any affinity among adoption of different modes of public
transit?
Travel Travel Modes Travel Bike
Record Record Bus Metro Share Streetcar Drive
1 {Bus, Metro} 1 1 1 0 0 0
2 {Bus, Metro} Preparation
2 1 1 0 0 0
3 {BikeShare, Metro, Bus} 3 1 1 1 0 0
4 {Drive} 4 0 0 0 0 1
5 {BikeShare, Bus, Metro} 5 1 1 1 0 0
6 {Drive,Bus,Stretcar} 6 1 0 0 1 1
21
null 22
Apriori Algorithm
Agrawal & Srikant, 1994
Simple Logic:
“If an itemset is frequent, then all its subset items will be
frequent too! “
AND Conversely:
“If
If the itemset is infrequent,
infrequent, then all its super sets will be infrequent
too!”
23
null 24
25
If {Streetcar} is
Bus Metro Bike Share Streetcar
infrequent, all the
supersets will be
infrequent too
null 26
27
Apriori Algorithm
1. Frequent Item-set Generation
Travel Bike
Assuming minsup= 25%: Record Bus Metro Share Streetcar
Apriori Algorithm
1. Frequent Item-set Generation
Travel Bike
Record Bus Metro Share Streetcar
Bus 0.33
7 Item-sets 1 1 1 0 0
Metro (Instead of 24 – 1 = 15) 2 1 1 0 0
Bike Share
3 1 1 1 0
4 0 0 0 0
Bus 0.67 Bus 0.33 Metro0.33 5 1 1 1 0
Metro Bike Share Bike Share
6 1 0 0 1
null 29
Apriori Algorithm
2. Rule Generation
We generate ALL possible combinations of the frequent itemsets
(as antecedent and consequent)!
Select a measure of “usefulness” for rules (and a threshold)
Support, Confidence, Lift, Conviction, etc.
For each frequent itemset, generate possible rules
An item-set of n items can generate 2n – 2 rules
The number can be reduced by applying some “pruning”
Eliminate those rules which do not satisfy the usefulness
measure
30
Apriori Algorithm
2. Rule Generation
(E.g.)
Select a measure of “usefulness” for rules
Generate the 2n – 2 possible rules
Eliminate rules with confidence < 1 (minconf) [CONSERVATIVE!]
Bus
Confidence Metro
Bike Share
o {Bus, Metro} {BikeShare} 0.33/0.67 = 0.5
o {Bus, BikeShare} {Metro} 0.33/0.33 = 1.0 n = 3 23 – 2 =6
o {Metro, BikeShare} {Bus} 0.33/0.33 = 1.0
o {Bus} {Metro, BikeShare} 0.33/0.83 = 0.4
o {Metro} {Bus, BikeShare} 0.33/0.67 = 0.5
o {BikeShare} {Bus, Metro} 0.33/0.33 = 1.0 31
Apriori Algorithm
2. Rule Generation
Pruning (potentially low confidence rules)
For any item-set I = (XUY); if X Y is a low confidence rule;
then any subset of X (let’s call it xi) as an antecedent will create
a low confidence rule in this item-set!
Apriori Algorithm
SUMMARY
The Apriori Algorithm uses the simple logical rules of:
Support (Sub-set) ≥ Support (set) ≥ Support (Super-set) to:
reduce the number of item-sets while generating frequent item-sets;
and
further reduce the number of rules being tested when generating rules
In Simple:
Calculate Support for single-item itemsets
If supp < minsup; remove the itemset and all its supersets
Expand to two-item itemsets and more
33
34
FP-Growth Algorithm
35
FP Growth Algorithm
Frequent Pattern (FP)-Growth (Han et al., 2000)
36
5 1 1 1 0 2 {Bus, Metro}
7 1 0 1 0 4 {BikeShare}
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Step 1 – Map first transaction to FP-Tree 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transaction 1 Bus (1)
7 {Bus, BikeShare}
Metro (1)
38
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Step 2 – When a same transaction appears, 2 {Bus, Metro}
Metro (2)
39
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Step 3 – When transaction has new item, 2 {Bus, Metro}
Metro (3)
BikeShare (1)
40
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Step 4 – When item doesn’t succeed the 2 {Bus, Metro}
Metro (3)
BikeShare (1)
BikeShare (1)
41
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Similar to step 2 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~5 Bus (4)
7 {Bus, BikeShare}
Metro (4)
BikeShare (2)
BikeShare (1)
42
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Similar to step 3 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~6 Bus (5)
7 {Bus, BikeShare}
Metro (4)
BikeShare (2)
BikeShare (1)
Streetcar (1) 43
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Similar to step 3 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~6 Bus (6)
7 {Bus, BikeShare}
Metro (4)
BikeShare (2)
BikeShare (1) BikeShare (1)
Streetcar (1) 44
FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
Step 5 – Stop when all transactions are 2 {Bus, Metro}
scanned. Null
3 {Bus, Metro, BikeShare}
4 {BikeShare}
5 {Bus, Metro, BikeShare}
BikeShare (2)
BikeShare (1) BikeShare (1)
Streetcar (1) 45
FP Growth Algorithm
2. Frequent Item-set Generation
A Bottom-Up Procedure – Start from “leaves”
Step 0 – Prune those leaves which do not Null
have the minimum Support required
E.G. If minsup = 1: Bus (6)
BikeShare (2)
BikeShare BikeShare
(1) (1)
Streetcar (1) 46
FP Growth Algorithm
2. Frequent Item-set Generation
Step 1 – Finding “Conditional Pattern Base” for each item
A Bottom-Up Procedure
Null
Bus
Null
Item Conditional Pattern-base Bus Metro
Bike(4) {Bus,Metro}(2), {Bus}(1)
Metro(4) {Bus}(4) Null
Bus (6) – Bus
47
FP Growth Algorithm
2. Frequent Item-set Generation
Step 2 – Building “Conditional FP-Tree” for each conditional
pattern-base
48
FP Growth Algorithm
2. Frequent Item-set Generation
Step 3 – Frequent Pattern generation
49
FP Growth Algorithm
3. Association Rule Generation
For each frequent pattern, all possible rules are generated and
the rest is the same as Apriori algorithm
Selecting a usefulness measure and threshold
Eliminating rules not meeting the usefulness requirement
Frequent Patterns
o Bik Bus
<Bike,Bus>(3)
o Bus Bike
<Bus,Metro>(4)
o Bus Metero
o Metro Bus
50
FP Growth Algorithm
3. Association Rule Generation
Bus Bike Bike Bus
Metro Bike Bike Metro
Bus Metro Metro Bus IF we had ended up with these
Or frequent patterns (e.g):
Bus Metro,Bike Metro,Bike Bus
Metro Bus,Bike Bus,Bike Metro Frequent Patterns
Bike Bus,Metro Bus,Metro Bike <Bike,Bus>(2); <Bike,Metro>(2); <Bike,Bus,Metro >(2)
<Bus,Metro>(4)
Either Or
Bus Metro Metro Bus
51
FP Growth Algorithm
Summary
For each item
52
FP Growth Algorithm
Advantages Disadvantages
“Compresses” data-set FP-Tree (particularly in the
Much faster than Apriori case of big data) may not fit
algorithm in memory!!
Once FP-Tree is in place, it’ll FP-Tree is expensive to build!
be super straightforward to
work with
53
REMEMBER!
NOT causality!
54
55
56
Week 5 Tutorial
MaaS Solution for City of Oz
Motivated by what you’ve learned in this course, you decided to
create a start-up to offer MaaS solutions in the city of Oz.
Seven mobility services are available in the city:
metro, streetcar (LRT), bus, bike-share, e-scooters, car-share (Zapcar®),
and E-hailing (Uber).
having totally separate fare set-ups
Your main question now is:
What “service bundle(s)” should you offer?
57
58