0% found this document useful (0 votes)
24 views29 pages

CIVI6731 Week5

The document discusses association rule mining techniques for discovering patterns in data. It describes key concepts like items, itemsets, and rules in association analysis. It also covers metrics like support, confidence, lift and conviction for evaluating rule strength. Popular rule mining algorithms like Apriori and FP-growth are also introduced.

Uploaded by

Fasih Ur Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views29 pages

CIVI6731 Week5

The document discusses association rule mining techniques for discovering patterns in data. It describes key concepts like items, itemsets, and rules in association analysis. It also covers metrics like support, confidence, lift and conviction for evaluating rule strength. Popular rule mining algorithms like Apriori and FP-growth are also introduced.

Uploaded by

Fasih Ur Rehman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

2023-02-10

CIVI 6731
BIG DATA ANALYTICS FOR SMART CITIES

Week 5
Association Rule Learning

Learning Objectives

 Describing the concepts of “affinity” and “association analysis”


 Differentiating the concepts of “item”, “itemset”, and “Rules” as
<antecedent  consequent> pairs
 Metrics for calculating itemsets’/rules strength:
 Support, Confidence, Lift, Conviction
 Understanding Rule mining algorithms
 Apriori Algorithm
 FP-growth Algorithm
 Applying Rule Mining using Rapidminer (& Python)
 Developing applications for rule mining in Urban Transportation

CIVI6731 | Mazdak Nik-Bakht | 2023 1


2023-02-10

Review – Smart Transportation


 Examples  Lowering Restoring Time/
Downtime
IMPROVE DISRUPT
Smart Cards/ MaaS  Lower Emission
Integrated data  Better User Experience
 Economic Self Sufficiency
collection on New Carsharing Models
mobility services (E-hailing, free-float
carshare, etc.)

Digitalization of “Crowdsourcing” Asset  Highest return on


Infrastructure Asset Inspection investment
Management  Max (mobility)
 Asset Mgmt.

Ideas originated from Finger & Audouin (2017) 3

Review – Correlation Analysis


 Correlation Analysis

CIVI6731 | Mazdak Nik-Bakht | 2023 2


2023-02-10

Association Rule Mining


(Affinity Analysis)

Main Reference: Kotu & Deshpande 5

What is Affinity Analysis?


 We can’t easily apply correlation analysis to categorical
variables; particularly when they’re not ordinal.

 But there are still ways to find out if categorical variables are
related in some way; We need to simply move from correlation
to:
Association!

CIVI6731 | Mazdak Nik-Bakht | 2023 3


2023-02-10

How Association Analysis Works


A branch of unsupervised learning
 Measuring the strength of co-occurrence between one item
and another
 Discovers hidden patterns in data, in form of easily
recognizable rules
ITEM
An ITEMSET A TRNSACTION
Basket

How Association Analysis Works


Model outcome: A set of “Rules”
 A Rule looks like: { Itemset A }  { Itemset B }
Antecedent Consequent
Or Or
Premise Conclusion
must be DISJOINTED

?

8

CIVI6731 | Mazdak Nik-Bakht | 2023 4


2023-02-10

Association Analysis Applications


 Recommender systems
 Real-time cross selling
 Post purchase marketing strategies
 Bundle pricing
 Shelf space optimization

Case Study – Winter Road


Maintenance in Alberta
Class Activity
Liu et al in 2015 analyzed data collected on weather, travel speed,
road condition, and maintenance condition in Alberta, to study
performance of road maintenance practices (Case study#2 for
this week).
 They take travel speed of the cars as performance measure
 They came up with 13 rules
 The rules show affinity between environmental/road conditions
AND travel speed

Ref: G. Liu, L. Shi, C. Lan, T. Z. Qiu and J. Fang, "Use of Data Mining Technology to Investigate Vehicle Speed
in Winter Weather: A Case Study," in TRB 94th Annual Meeting Compendium of Papers, 2015. 10

CIVI6731 | Mazdak Nik-Bakht | 2023 5


2023-02-10

Association Mining in 2+1 Steps


 Step0 Preparation
 Input data must be in a specific format (known as transaction format)

 Step1 Generating “frequent Itemsets”


 Association algorithms limit the analysis to the most frequently
occurring items

 Step2 Generating Rules


 Algorithms generate and filter the rules based on a selected measure
(of usefulness for rules)

11

Strength of association rules


Measures evaluated based on the relative frequency of item-sets
in the training dataset

 Support
 Confidence
 Lift
 Conviction

12

CIVI6731 | Mazdak Nik-Bakht | 2023 6


2023-02-10

Support
 Relative frequency of occurrence for an (item or) item-set in
the transaction
Pr
 Support of a rule:
,
→ Pr ,
 Indicates whether a rule is worth considering
 Low support  Occurrence just by chance!
 Only those rules exceeding the support threshold will be considered
for further analysis
13

Confidence (AKA strength)


 Confidence of a rule (X  Y)=
The likelihood of occurrence of the consequent (Y) when the
antecedent (X) happens
,
→ Pr

,

 A measure of reliability of the rule

14

CIVI6731 | Mazdak Nik-Bakht | 2023 7


2023-02-10

Lift
 Lift of a rule (XY):
The probability of observing X and Y together compared to the
probability that we see them together randomly (i.e. if X and Y
were completely independent)
,

 A measure of how Interesting (or surprising) the rule is.


o Smaller Lift values (closer to 1) mean independence among X and Y hence NOT
being a an interesting rule

15

Conviction
 Conviction of a rule (X  Y):
How often X occurs in a transaction where Y does not?
1

1 →
The ratio of expected frequency of not having Y but having X (how
often when Y is not in the set the rule is being violated?)

 A measure of expected error of the rule

16

CIVI6731 | Mazdak Nik-Bakht | 2023 8


2023-02-10

Any questions so far?

17

Rule Mining
Generating meaningful, yet interesting association rules from
transaction dataset:

 Finding all [frequent] item-sets


 n items can generate  2n – 1 item-sets (excluding the “Null” item set)

 Extracting rules from frequent item-sets


 On n items  a total of 3n – 2n+1 + 1 rules can be generated!
(Tan et al., 2005)

18

CIVI6731 | Mazdak Nik-Bakht | 2023 9


2023-02-10

Rule Mining Algorithms


 Apriori algorithm
 FP-Growth algorithm
…

Algorithm: A specific procedure used to implement a particular


data mining technique

19

Apriori Algorithm

20

CIVI6731 | Mazdak Nik-Bakht | 2023 10


2023-02-10

Example
 Records of six complete travels in Toronto, ON (in terms of
travel modes used) are shown in the table bellow.
 Can we find any affinity among adoption of different modes of public
transit?
Travel Travel Modes Travel Bike
Record Record Bus Metro Share Streetcar Drive
1 {Bus, Metro} 1 1 1 0 0 0
2 {Bus, Metro} Preparation
2 1 1 0 0 0
3 {BikeShare, Metro, Bus} 3 1 1 1 0 0
4 {Drive} 4 0 0 0 0 1
5 {BikeShare, Bus, Metro} 5 1 1 1 0 0
6 {Drive,Bus,Stretcar} 6 1 0 0 1 1
21

Item-set Tree (Lattice)


Bus
Metro
 For “Public” Transit: Bike Share
Streetcar

Bus Bus Bus Metro


Metro Metro Bike Share Bike Share
supersets

Bike Share Streetcar Streetcar Streetcar

Bus Bus Bus Metro Metro Bike Share


Metro Bike Share Streetcar Bike Share Streetcar Streetcar
subsets

Bus Metro Bike Share Streetcar

null 22

CIVI6731 | Mazdak Nik-Bakht | 2023 11


2023-02-10

Apriori Algorithm
Agrawal & Srikant, 1994
 Simple Logic:
“If an itemset is frequent, then all its subset items will be
frequent too! “

AND Conversely:

“If
If the itemset is infrequent,
infrequent, then all its super sets will be infrequent
too!”
23

If an item set is frequent, then all its subset items will be


“If
frequent too!”
too!
Apriori Algorithm
Bus
Metro
If {Bus, Metro, Bike Share
Bike-share} is Streetcar
frequent, all the
subsets will be Bus Bus Metro
Bus
frequent too Metro Metro Bike Share Bike Share
Bike Share Streetcar Streetcar Streetcar
subsets

Bus Bus Bus Metro Metro Bike Share


Metro Bike Share Streetcar Bike Share Streetcar Streetcar

Bus Metro Bike Share Streetcar

null 24

CIVI6731 | Mazdak Nik-Bakht | 2023 12


2023-02-10

If an item set is frequent, then all its subset items will be


“If
frequent too!”
too!
Apriori Algorithm
If Support threshold measure (minsup) =25%
 Support {Bus, Metro, Bike-share} = 2/6 = 0.33 >minsup  Frequent

 Support {Bus, Metro} = 4/6 = 0.67 Travel Bike


Record Bus Metro Share Streetcar
 Support {Bus, Bike-share} = 2/6 = 0.33 1 1 1 0 0
 Support {Metro, Bike-share} = 2/6 = 0.33 2 1 1 0 0
 Support{Bus} = 5/6 = 0.83 3 1 1 1 0
 Support{Metro} = 4/6 = 0.67 4 0 0 0 0
 Support {Bike-share} = 2/6 = 0.33 5 1 1 1 0
All above threshold! 6 1 0 0 1

25

If an item set is infrequent,, then all its superset items will


“If
be infrequent too!”
too!
Apriori Algorithm
Bus
Metro
Bike Share
Streetcar

Bus Bus Bus Metro


Metro Metro Bike Share Bike Share
Bike Share Streetcar Streetcar Streetcar
supersets

Bus Bus Bus Metro Metro Bike Share


Metro Bike Share Streetcar Bike Share Streetcar Streetcar

If {Streetcar} is
Bus Metro Bike Share Streetcar
infrequent, all the
supersets will be
infrequent too
null 26

CIVI6731 | Mazdak Nik-Bakht | 2023 13


2023-02-10

If an item set is frequent, then all its subset items will be


“If
frequent too!”
too!
Apriori Algorithm
If Support threshold measure (minsup) =25%
 Support {Streetcar} = 1/6 = 0.17 <minsup  Infrequent

 Support {Bus, Streetcar} = 1/6 = 0.17 Travel Bike


Record Bus Metro Share Streetcar
 Support {Metro, Streetcar} = 0 1 1 1 0 0
 Support {Bike-share, Streetcar} = 0 2 1 1 0 0
 Support {Bus, Metro, Streetcar} = 0 3 1 1 1 0
… 4 0 0 0 0
 Support {Bus, Metro, Bike-share, Streetcar} = 0 5 1 1 1 0
All bellow threshold! 6 1 0 0 1

27

Apriori Algorithm
1. Frequent Item-set Generation
Travel Bike
 Assuming minsup= 25%: Record Bus Metro Share Streetcar

 Start from the bottom of item-set lattice 1 1 1 0 0


o Calculate Support Count and Support for all 2 1 1 0 0
item-sets 3 1 1 1 0
o Eliminate any item-set with support <minsup 4 0 0 0 0
(AND subsequently its supersets will be
eliminated!) 5 1 1 1 0
6 1 0 0 1
 Start generating 2(and more)-item
itemsets (from the remaining items)
 Continue until all item-sets (meeting the
threshold requirement) are generated
28

CIVI6731 | Mazdak Nik-Bakht | 2023 14


2023-02-10

Apriori Algorithm
1. Frequent Item-set Generation
Travel Bike
Record Bus Metro Share Streetcar
Bus 0.33
7 Item-sets 1 1 1 0 0
Metro (Instead of 24 – 1 = 15) 2 1 1 0 0
Bike Share
3 1 1 1 0
4 0 0 0 0
Bus 0.67 Bus 0.33 Metro0.33 5 1 1 1 0
Metro Bike Share Bike Share
6 1 0 0 1

0.83 0.67 0.33 0.17


Bus Metro Bike Share Streetcar

null 29

Apriori Algorithm
2. Rule Generation
We generate ALL possible combinations of the frequent itemsets
(as antecedent and consequent)!
 Select a measure of “usefulness” for rules (and a threshold)
 Support, Confidence, Lift, Conviction, etc.
 For each frequent itemset, generate possible rules
 An item-set of n items can generate 2n – 2 rules
 The number can be reduced by applying some “pruning”
 Eliminate those rules which do not satisfy the usefulness
measure

30

CIVI6731 | Mazdak Nik-Bakht | 2023 15


2023-02-10

Apriori Algorithm
2. Rule Generation
 (E.g.)
 Select a measure of “usefulness” for rules
 Generate the 2n – 2 possible rules
 Eliminate rules with confidence < 1 (minconf) [CONSERVATIVE!]
Bus
Confidence Metro
Bike Share
o {Bus, Metro}  {BikeShare} 0.33/0.67 = 0.5
o {Bus, BikeShare}  {Metro} 0.33/0.33 = 1.0 n = 3  23 – 2 =6
o {Metro, BikeShare}  {Bus} 0.33/0.33 = 1.0
o {Bus}  {Metro, BikeShare} 0.33/0.83 = 0.4
o {Metro}  {Bus, BikeShare} 0.33/0.67 = 0.5
o {BikeShare}  {Bus, Metro} 0.33/0.33 = 1.0 31

Apriori Algorithm
2. Rule Generation
Pruning (potentially low confidence rules)
 For any item-set I = (XUY); if X Y is a low confidence rule;
then any subset of X (let’s call it xi) as an antecedent will create
a low confidence rule in this item-set!

 E.g. From: x1 x2 Bus


o Confidence({Bus, Metro}  {BikeShare}) = 0.5 < 1 Metro
Bike Share
one can conclude that:
o {Bus}  {Metro, BikeShare} ; and Item-set (I)
o {Metro}  {Bus, BikeShare}
will also be of low confidence and can be discarded!
32

CIVI6731 | Mazdak Nik-Bakht | 2023 16


2023-02-10

Apriori Algorithm
SUMMARY
 The Apriori Algorithm uses the simple logical rules of:
Support (Sub-set) ≥ Support (set) ≥ Support (Super-set) to:
 reduce the number of item-sets while generating frequent item-sets;
and
 further reduce the number of rules being tested when generating rules

 In Simple:
 Calculate Support for single-item itemsets
 If supp < minsup; remove the itemset and all its supersets
 Expand to two-item itemsets and more
33

Any questions so far?

34

CIVI6731 | Mazdak Nik-Bakht | 2023 17


2023-02-10

FP-Growth Algorithm

35

FP Growth Algorithm
 Frequent Pattern (FP)-Growth (Han et al., 2000)

 Compressing transaction records using a special graph data


structure called: FP-TREE
 FP-Tree: Transformation of the dataset into a graph format

 Algorithm works by:


 Generating FP-Tree
 Using FP-Tree to generate frequent item-sets

36

CIVI6731 | Mazdak Nik-Bakht | 2023 18


2023-02-10

FP Growth Algorithm 6 transactions for 4 items


Travel Record Travel Modes (itemsets)
1 {Bus, Metro}
0. Dataset Re-formatting
2 {Metro, Bus}
 Sort items in each transaction in a 3 {BikeShare, Metro, Bus}
descending “support count” order 4 {BikeShare}
5 {BikeShare, Bus, Metro}
Travel Record Bus Metro Bike Share Streetcar
6 {Streetcar, Bus}
1 1 1 0 0
7 {BikeShare, Bus}
2 1 1 0 0
Travel Record Travel Modes
3 1 1 1 0
4 0 0 1 0 1 {Bus, Metro}

5 1 1 1 0 2 {Bus, Metro}

6 1 0 0 1 3 {Bus, Metro, BikeShare}

7 1 0 1 0 4 {BikeShare}

Support Count 5 {Bus, Metro, BikeShare}


6 4 4 1
6 {Bus, Streetcar}
DESCENDING 7 {Bus, BikeShare}
37

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Step 1 – Map first transaction to FP-Tree 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transaction 1 Bus (1)
7 {Bus, BikeShare}

Metro (1)

38

CIVI6731 | Mazdak Nik-Bakht | 2023 19


2023-02-10

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Step 2 – When a same transaction appears, 2 {Bus, Metro}

simply add to the counts Null


3 {Bus, Metro, BikeShare}
4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1&2 Bus (2)
7 {Bus, BikeShare}

Metro (2)

39

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Step 3 – When transaction has new item, 2 {Bus, Metro}

extend the tree Null


3 {Bus, Metro, BikeShare}
4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1,2,3 Bus (3)
7 {Bus, BikeShare}

Metro (3)

BikeShare (1)

40

CIVI6731 | Mazdak Nik-Bakht | 2023 20


2023-02-10

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Step 4 – When item doesn’t succeed the 2 {Bus, Metro}

ones before it, create a Null


3 {Bus, Metro, BikeShare}
4 {BikeShare}
direct path
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~4 Bus (3)
7 {Bus, BikeShare}

Metro (3)

BikeShare (1)
BikeShare (1)
41

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Similar to step 2 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~5 Bus (4)
7 {Bus, BikeShare}

Metro (4)

BikeShare (2)
BikeShare (1)
42

CIVI6731 | Mazdak Nik-Bakht | 2023 21


2023-02-10

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Similar to step 3 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~6 Bus (5)
7 {Bus, BikeShare}

Metro (4)

BikeShare (2)
BikeShare (1)
Streetcar (1) 43

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Similar to step 3 2 {Bus, Metro}
3 {Bus, Metro, BikeShare}
Null 4 {BikeShare}
5 {Bus, Metro, BikeShare}
6 {Bus, Streetcar}
Transactions 1~6 Bus (6)
7 {Bus, BikeShare}

Metro (4)

BikeShare (2)
BikeShare (1) BikeShare (1)

Streetcar (1) 44

CIVI6731 | Mazdak Nik-Bakht | 2023 22


2023-02-10

FP Growth Algorithm
Travel Record Travel Modes
1. FP-Tree Generation 1 {Bus, Metro}
 Step 5 – Stop when all transactions are 2 {Bus, Metro}

scanned. Null
3 {Bus, Metro, BikeShare}
4 {BikeShare}
5 {Bus, Metro, BikeShare}

Compact FP-Tree 6 {Bus, Streetcar}


Bus (6)
7 {Bus, BikeShare}
(full transactions)
Metro (4)

BikeShare (2)
BikeShare (1) BikeShare (1)
Streetcar (1) 45

FP Growth Algorithm
2. Frequent Item-set Generation
A Bottom-Up Procedure – Start from “leaves”
 Step 0 – Prune those leaves which do not Null
have the minimum Support required
 E.G. If minsup = 1: Bus (6)

 WHY BikeShare was not pruned?! Metro (4)

BikeShare (2)

BikeShare BikeShare
(1) (1)
Streetcar (1) 46

CIVI6731 | Mazdak Nik-Bakht | 2023 23


2023-02-10

FP Growth Algorithm
2. Frequent Item-set Generation
 Step 1 – Finding “Conditional Pattern Base” for each item
 A Bottom-Up Procedure
Null
Bus
Null
Item Conditional Pattern-base Bus Metro
Bike(4) {Bus,Metro}(2), {Bus}(1)
Metro(4) {Bus}(4) Null
Bus (6) – Bus

47

FP Growth Algorithm
2. Frequent Item-set Generation
 Step 2 – Building “Conditional FP-Tree” for each conditional
pattern-base

Item Conditional Pattern-base Conditional FP Tree


Bike(4) {Bus,Metro}(2), {Bus}(1) {Bus}(3)
Metro(4) {Bus}(4) {Bus}(4)
Bus (6) – –

48

CIVI6731 | Mazdak Nik-Bakht | 2023 24


2023-02-10

FP Growth Algorithm
2. Frequent Item-set Generation
 Step 3 – Frequent Pattern generation

Item Conditional Pattern-base Conditional FP Tree Frequent Patterns


Bike(4) {Bus,Metro}(2), {Bus}(1) {Bus}(3) <Bike,Bus>(3)
Metro(4) {Bus}(4) {Bus}(4) <Bus,Metro>(4)
Bus (6) – – –

49

FP Growth Algorithm
3. Association Rule Generation
 For each frequent pattern, all possible rules are generated and
the rest is the same as Apriori algorithm
 Selecting a usefulness measure and threshold
 Eliminating rules not meeting the usefulness requirement
Frequent Patterns
o Bik  Bus
<Bike,Bus>(3)
o Bus  Bike
<Bus,Metro>(4)
o Bus  Metero
o Metro  Bus

50

CIVI6731 | Mazdak Nik-Bakht | 2023 25


2023-02-10

FP Growth Algorithm
3. Association Rule Generation
Bus  Bike Bike Bus
Metro  Bike Bike Metro
Bus  Metro Metro  Bus IF we had ended up with these
Or frequent patterns (e.g):
Bus  Metro,Bike Metro,Bike  Bus
Metro  Bus,Bike Bus,Bike  Metro Frequent Patterns
Bike  Bus,Metro Bus,Metro  Bike <Bike,Bus>(2); <Bike,Metro>(2); <Bike,Bus,Metro >(2)
<Bus,Metro>(4)
Either Or
Bus  Metro Metro  Bus

51

FP Growth Algorithm
Summary
For each item

Transaction Scan; calculate items sup Bottom-up Conditional


count; order ascendingly FP-Tree
DB (reverse order item support Pattern Base
count)

For each item

All possible combinations Frequent Combine with the item Conditional


Rules
for: antecedent  Consequent Patterns all possible combinations FP-Tree

52

CIVI6731 | Mazdak Nik-Bakht | 2023 26


2023-02-10

FP Growth Algorithm

Advantages Disadvantages
 “Compresses” data-set  FP-Tree (particularly in the
 Much faster than Apriori case of big data) may not fit
algorithm in memory!!
 Once FP-Tree is in place, it’ll  FP-Tree is expensive to build!
be super straightforward to
work with

53

REMEMBER!

The rules only mean “Co-occurrence” &

NOT causality!

54

CIVI6731 | Mazdak Nik-Bakht | 2023 27


2023-02-10

Dealing with Polynomial Data


Data must be in transaction format (Boolean) for association
analysis

 “Dummy Coding” (one common solution)

 Converting categorical (polynomial) to Boolean (binomial)

 For numerical: We first turn to categorical (discretizing) by “binning”


and then to Boolean

55

Dealing with Polynomial Data Example

56

CIVI6731 | Mazdak Nik-Bakht | 2023 28


2023-02-10

Week 5 Tutorial
MaaS Solution for City of Oz
Motivated by what you’ve learned in this course, you decided to
create a start-up to offer MaaS solutions in the city of Oz.
 Seven mobility services are available in the city:
 metro, streetcar (LRT), bus, bike-share, e-scooters, car-share (Zapcar®),
and E-hailing (Uber).
 having totally separate fare set-ups
 Your main question now is:
 What “service bundle(s)” should you offer?

57

What do you think?

58

CIVI6731 | Mazdak Nik-Bakht | 2023 29

You might also like