0% found this document useful (0 votes)

148 views75 pages

RDataMining Slides Association Rules PDF

The document discusses association rule mining and the FP-growth algorithm. It begins with an overview of association rules, including support, confidence and lift measures. It then introduces the FP-growth algorithm, which mines frequent itemsets without generating candidates. FP-growth compresses the database into an FP-tree and divides it into conditional databases mined separately. This reduces search costs by looking for short patterns recursively and concatenating them into long frequent patterns.

Uploaded by

Md Rezaul Karim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

148 views75 pages

RDataMining Slides Association Rules PDF

Uploaded by

Md Rezaul Karim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

∗

Association Rule Mining with R

Yanchang Zhao
http://www.RDataMining.com

R and Data Mining Course

Beijing University of Posts and Telecommunications,
Beijing, China

July 2019

∗
Chapter 9 - Association Rules, in R and Data Mining: Examples and Case
Studies. http://www.rdatamining.com/docs/RDataMining-book.pdf
1 / 68
Contents
Association Rules: Concept and Algorithms
Basics of Association Rules
Algorithms: Apriori, ECLAT and FP-growth
Interestingness Measures
Applications

Association Rule Mining with R

Mining Association Rules
Removing Redundancy
Interpreting Rules
Visualizing Association Rules
Wrap Up

Association rules are rules presenting association or correlation

between itemsets.

support(A ⇒ B) = support(A ∪ B) = P(A ∧ B)

confidence(A ⇒ B) = P(B|A)
P(A ∧ B)
=
P(A)
confidence(A ⇒ B)
lift(A ⇒ B) =
P(B)
P(A ∧ B)
=
P(A)P(B)

where P(A) is the percentage (or probability) of cases containing

4 / 68
An Example

I Assume there are 100 students.

I 10 out of them know data mining techniques, 8 know R
language and 6 know both of them.
I R ⇒ DM: If a student knows R, then he or she knows data
mining.

5 / 68
An Example

I Assume there are 100 students.

I 10 out of them know data mining techniques, 8 know R
language and 6 know both of them.
I R ⇒ DM: If a student knows R, then he or she knows data
mining.
I support =

5 / 68
An Example

I Assume there are 100 students.

I 10 out of them know data mining techniques, 8 know R
language and 6 know both of them.
I R ⇒ DM: If a student knows R, then he or she knows data
mining.
I support = P(R ∧ DM) = 6/100 = 0.06

5 / 68
An Example

I Assume there are 100 students.

I 10 out of them know data mining techniques, 8 know R
language and 6 know both of them.
I R ⇒ DM: If a student knows R, then he or she knows data
mining.
I support = P(R ∧ DM) = 6/100 = 0.06
I confidence =

5 / 68
An Example

I Assume there are 100 students.

5 / 68
An Example

I Assume there are 100 students.

5 / 68
An Example

I Assume there are 100 students.

5 / 68
Association Rule Mining

I Association Rule Mining is normally composed of two steps:

I Finding all frequent itemsets whose supports are no less than a
minimum support threshold;
I From above frequent itemsets, generating association rules
with confidence above a minimum confidence threshold.
I The second step is straightforward, but the first one, frequent
itemset generateion, is computing intensive.
I The number of possible itemsets is 2n − 1, where n is the
number of unique items.
I Algorithms: Apriori, ECLAT, FP-Growth

6 / 68
Downward-Closure Property

I Downward-closure property of support, a.k.a.

anti-monotonicity
I For a frequent itemset, all its subsets are also frequent.
if {A,B} is frequent, then both {A} and {B} are frequent.
I For an infrequent itemset, all its super-sets are infrequent.
if {A} is infrequent, then {A,B}, {A,C} and {A,B,C} are
infrequent.
I Useful to prune candidate itemsets

7 / 68
Itemset Lattice

Frequent

Infrequent

8 / 68
Apriori

I Apriori [Agrawal and Srikant, 1994]: a classic algorithm for

association rule mining
I A level-wise, breadth-first algorithm
I Counts transactions to find frequent itemsets
I Generates candidate itemsets by exploiting downward closure
property of support

9 / 68
Apriori Process

1. Find all frequent 1-itemsets L1

2. Join step: generate candidate k-itemsets by joining Lk−1 with
itself
3. Prune step: prune candidate k-itemsets using
downward-closure property
4. Scan the dataset to count frequency of candidate k-itemsets
and select frequent k-itemsets Lk
5. Repeat above process, until no more frequent itemsets can be
found.

10 / 68
From [?] 11 / 68
FP-growth

I FP-growth: frequent-pattern growth, which mines frequent

itemsets without candidate generation [Han et al., 2004]
I Compresses the input database creating an FP-tree instance
to represent frequent items.
I Divides the compressed database into a set of conditional
databases, each one associated with one frequent pattern.
I Each such database is mined separately.
I It reduces search costs by looking for short patterns recursively
and then concatenating them in long frequent patterns.†

†
https://en.wikibooks.org/wiki/Data_Mining_Algorithms_In_R/
Frequent_Pattern_Mining/The_FP-Growth_Algorithm
12 / 68
FP-tree
I The frequent-pattern tree (FP-tree) is a compact structure
that stores quantitative information about frequent patterns in
a dataset. It has two components:
I A root labeled as “null” with a set of item-prefix subtrees as
children
I A frequent-item header table
I Each node has three attributes:
I Item name
I Count: number of transactions represented by the path from
root to the node
I Node link: links to the next node having the same item name
I Each entry in the frequent-item header table also has three
attributes:
I Item name
I Head of node link: point to the first node in the FP-tree
having the same item name
I Count: frequency of the item

13 / 68
FP-tree

From [Han, 2005]

14 / 68
The FP-growth Algorithm

I In the first pass, the algorithm counts occurrence of items

(attribute-value pairs) in the dataset, and stores them to a
header table.
I In the second pass, it builds the FP-tree structure by inserting
instances.
I Items in each instance have to be sorted by descending order
of their frequency in the dataset, so that the tree can be
processed quickly.
I Items in each instance that do not meet minimum coverage
threshold are discarded.
I If many instances share most frequent items, FP-tree provides
high compression close to tree root.

15 / 68
The FP-growth Algorithm
I Recursive processing of this compressed version of main
dataset grows large item sets directly, instead of generating
candidate items and testing them against the entire database.
I Growth starts from the bottom of the header table (having
longest branches), by finding all instances matching given
condition.
I New tree is created, with counts projected from the original
tree corresponding to the set of instances that are conditional
on the attribute, with each node getting sum of its children
counts.
I Recursive growth ends when no individual items conditional
on the attribute meet minimum support threshold, and
processing continues on the remaining header items of the
original FP-tree.
I Once the recursive process has completed, all large item sets
with minimum coverage have been found, and association rule
creation begins.
16 / 68
ECLAT

I ECLAT: equivalence class transformation [Zaki et al., 1997]

I A depth-first search algorithm using set intersection
I Idea: use tid (transaction ID) set intersecion to compute the
support of a candidate itemset, avoiding the generation of
subsets that does not exist in the prefix tree.
I t(AB) = t(A) ∩ t(B), where t(A) is the set of IDs of
transactions containing A.
I support(AB) = |t(AB)|
I Eclat intersects the tidsets only if the frequent itemsets share
a common prefix.
I It traverses the prefix search tree in a way of depth-first
searching, processing a group of itemsets that have the same
prefix, also called a prefix equivalence class.

17 / 68
ECLAT

I It works recursively.
I The initial call uses all single items with their tid-sets.
I In each recursive call, it verifies each itemset tid-set pair
(X , t(X )) with all the other pairs to generate new candidates.
If the new candidate is frequent, it is added to the set Px .
I Recursively, it finds all frequent itemsets in the X branch.

18 / 68
ECLAT

From [?]
19 / 68
Interestingness Measures

I Which rules or patterns are interesting (and useful)?

I Two types of rule interestingness measures: subjective and
objective [Freitas, 1998, Silberschatz and Tuzhilin, 1996].
I Objective measures, such as lift, odds ratio and conviction,
are often data-driven and give the interestingness in terms of
statistics or information theory.
I Subjective (user-driven) measures, such as unexpectedness
and actionability, focus on finding interesting patterns by
matching against a given set of user beliefs.

20 / 68
Objective Interestingness Measures
I Support, confidence and lift are the most widely used
objective measures to select interesting rules.
I Many other objective measures introduced by Tan et al.
[Tan et al., 2002], such as φ-coefficient, odds ratio, kappa,
mutual information, J-measure, Gini index, laplace,
conviction, interest and cosine.
I Different measures have different intrinsic properties and there
is no measure that is better than others in all application
domains.
I In addition, any-confidence, all-confidence and bond, are
designed by Omiecinski [Omiecinski, 2003].
I Utility is used by Chan et al. [Chan et al., 2003] to find top-k
objective-directed rules.
I Unexpected Confidence Interestingness and Isolated
Interestingness are designed by Dong and Li
[Dong and Li, 1998] by considering its unexpectedness in
terms of other association rules in its neighbourhood.
21 / 68
Subjective Interestingness Measures

I A pattern is unexpected if it is new to a user or contradicts

the user’s experience or domain knowledge.
I A pattern is actionable if the user can do something with it to
his/her advantage [Silberschatz and Tuzhilin, 1995].
I Liu and Hsu [Liu and Hsu, 1996] proposed to rank learned
rules by matching against expected patterns provided by the
user.
I Ras and Wieczorkowska [Ras and Wieczorkowska, 2000]
designed action-rules which show “what actions should be
taken to improve the profitability of customers”. The
attributes are grouped into “hard attributes” which cannot be
changed and “soft attributes” which are possible to change
with reasonable costs. The status of customers can be moved
from one to another by changing the values of soft ones.

22 / 68
Interestingness Measures - I

From [Tan et al., 2002]

23 / 68
Interestingness Measures - II

From [Tan et al., 2002]

24 / 68
Applications
I Market basket analysis
I Identifying associations between items in shopping baskets,
i.e., which items are frequently purchsed together
I Can be used by retailers to understand customer shopping
habits, do selective marketing and plan shelf space
I Churn analysis and selective marketing
I Discovering demographic characteristics and behaviours of
customers who are likely/unlikely to switch to other telcos
I Identifying customer groups who are likely to purchase a new
service or product
I Credit card risk analysis
I Finding characteristics of customers who are likely to default
on credit card or mortgage
I Can be used by banks to reduce risks when assessing new
credit card or mortgage applications

25 / 68
Applications (cont.)

I Stock market analysis

I Finding relationships between individual stocks, or between
stocks and economic factors
I Can help stock traders select interesting stocks and improve
trading strategies
I Medical diagnosis
I Identifying relationships between symptoms, test results and
illness
I Can be used for assisting doctors on illness diagnosis or even
on treatment

26 / 68
Contents
Association Rules: Concept and Algorithms
Basics of Association Rules
Algorithms: Apriori, ECLAT and FP-growth
Interestingness Measures
Applications

Association Rule Mining with R

Mining Association Rules
Removing Redundancy
Interpreting Rules
Visualizing Association Rules
Wrap Up

I Apriori [Agrawal and Srikant, 1994]

I A level-wise, breadth-first algorithm which counts transactions
to find frequent itemsets and then derive association rules from
them
I apriori() in package arules

I ECLAT [Zaki et al., 1997]

I Finds frequent itemsets with equivalence classes, depth-first
search and set intersection instead of counting
I eclat() in package arules

28 / 68
The Titanic Dataset

I The Titanic dataset in the datasets package is a 4-dimensional

table with summarized information on the fate of passengers
on the Titanic according to social class, sex, age and survival.
I To make it suitable for association rule mining, we reconstruct
the raw data as titanic.raw, where each row represents a
person.
I The reconstructed raw data can also be downloaded at
http://www.rdatamining.com/data/titanic.raw.rdata.

29 / 68
Pipe Operations in R

I Load library magrittr for pipe operations

I Avoid nested function calls
I Make code easy to understand
I Supported by dplyr and ggplot2

library(magrittr) ## for pipe operations

## traditional way
b <- fun3(fun2(fun1(a), p2))
## the above can be rewritten to
b <- a %>% fun1() %>% fun2(p2) %>% fun3()

30 / 68
## download data
download.file(url="http://www.rdatamining.com/data/titanic.raw.rdata",
destfile="./data/titanic.raw.rdata")

library(magrittr) ## for pipe operations

## load data, and the name of the R object is titanic.raw
load("../data/titanic.raw.rdata")
## dimensionality
titanic.raw %>% dim()
## [1] 2201 4

## structure of data
titanic.raw %>% str()
## 'data.frame': 2201 obs. of 4 variables:
## $ Class : Factor w/ 4 levels "1st","2nd","3rd",..: 3 3 3...
## $ Sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 ...
## $ Age : Factor w/ 2 levels "Adult","Child": 2 2 2 2 2 ...
## $ Survived: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1...

31 / 68
## draw a random sample of 5 records
idx <- 1:nrow(titanic.raw) %>% sample(5)
titanic.raw[idx, ]
## Class Sex Age Survived
## 2080 2nd Female Adult Yes
## 1162 Crew Male Adult No
## 954 Crew Male Adult No
## 2172 3rd Female Adult Yes
## 456 3rd Male Adult No

## a summary of the dataset

titanic.raw %>% summary()
## Class Sex Age Survived
## 1st :325 Female: 470 Adult:2092 No :1490
## 2nd :285 Male :1731 Child: 109 Yes: 711
## 3rd :706
## Crew:885

32 / 68
Function apriori()

I Mine frequent itemsets, association rules or association

hyperedges using the Apriori algorithm.
I The Apriori algorithm employs level-wise search for frequent
itemsets.
I Default settings:
I minimum support: supp=0.1
I minimum confidence: conf=0.8
I maximum length of rules: maxlen=10

33 / 68
## mine association rules
library(arules) ## load required library
rules.all <- titanic.raw %>% apriori() ## run the APRIORI algorithm
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime
## 0.8 0.1 1 none FALSE TRUE 5
## support minlen maxlen target ext
## 0.1 1 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 220
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[10 item(s), 2201 transaction(s)] done ...
## sorting and recoding items ... [9 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [27 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s]. 34 / 68
rules.all %>% length() ## number of rules discovered
## [1] 27

rules.all %>% inspect() ## print all rules

## lhs rhs support confidence ...
## [1] {} => {Age=Adult} 0.9504771 0.9504771 1...
## [2] {Class=2nd} => {Age=Adult} 0.1185825 0.9157895 0...
## [3] {Class=1st} => {Age=Adult} 0.1449341 0.9815385 1...
## [4] {Sex=Female} => {Age=Adult} 0.1930940 0.9042553 0...
## [5] {Class=3rd} => {Age=Adult} 0.2848705 0.8881020 0...
## [6] {Survived=Yes} => {Age=Adult} 0.2971377 0.9198312 0...
## [7] {Class=Crew} => {Sex=Male} 0.3916402 0.9740113 1...
## [8] {Class=Crew} => {Age=Adult} 0.4020900 1.0000000 1...
## [9] {Survived=No} => {Sex=Male} 0.6197183 0.9154362 1...
## [10] {Survived=No} => {Age=Adult} 0.6533394 0.9651007 1...
## [11] {Sex=Male} => {Age=Adult} 0.7573830 0.9630272 1...
## [12] {Sex=Female, ...
## Survived=Yes} => {Age=Adult} 0.1435711 0.9186047 0...
## [13] {Class=3rd, ...
## Sex=Male} => {Survived=No} 0.1917310 0.8274510 1...
## [14] {Class=3rd, ...
## Survived=No} => {Age=Adult} 0.2162653 0.9015152 0...
## [15] {Class=3rd, ... 35 / 68
I Suppose we want to find patterns of survival and non-survival
I verbose=F: suppress progress report
I minlen=2: find rules that contain at least two items
I Use lower threshholds for support and confidence
I rhs=c(...): find rules whose right-hand sides are in the list
I default="lhs": use default setting for left-hand side
I quality(...): interestingness measures

## run APRIORI again to find rules with rhs containing "Survived" only
rules.surv <- titanic.raw %>% apriori(
control = list(verbose=F),
parameter = list(minlen=2, supp=0.005, conf=0.8),
appearance = list(rhs=c("Survived=No",
"Survived=Yes"),
default="lhs"))
## keep three decimal places
quality(rules.surv) <- rules.surv %>% quality() %>% round(digits=3)
## sort rules by lift
rules.surv.sorted <- rules.surv %>% sort(by="lift")

36 / 68
rules.surv.sorted %>% inspect() ## print rules
## lhs rhs support confidence lif...
## [1] {Class=2nd, ...
## Age=Child} => {Survived=Yes} 0.011 1.000 3.09...
## [2] {Class=2nd, ...
## Sex=Female, ...
## Age=Child} => {Survived=Yes} 0.006 1.000 3.09...
## [3] {Class=1st, ...
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.01...
## [4] {Class=1st, ...
## Sex=Female, ...
## Age=Adult} => {Survived=Yes} 0.064 0.972 3.01...
## [5] {Class=2nd, ...
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.71...
## [6] {Class=Crew, ...
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.69...
## [7] {Class=Crew, ...
## Sex=Female, ...
## Age=Adult} => {Survived=Yes} 0.009 0.870 2.69...
## [8] {Class=2nd, ...
## Sex=Female, ...
## Age=Adult} => {Survived=Yes} 0.036 0.860 2.66...
## [9] {Class=2nd, ...
37 / 68
Redundant Rules

I There are often too many association rules discovered from a

dataset.
I It is necessary to remove redundant rules before a user is able
to study the rules and identify interesting ones from them.

38 / 68
Redundant Rules
## redundant rules
rules.surv.sorted[1:2] %>% inspect()
## lhs rhs support confidence lift...
## [1] {Class=2nd, ...
## Age=Child} => {Survived=Yes} 0.011 1 3.096...
## [2] {Class=2nd, ...
## Sex=Female, ...
## Age=Child} => {Survived=Yes} 0.006 1 3.096...

I Rule #2 provides no extra knowledge in addition to rule #1,

since rules #1 tells us that all 2nd-class children survived.
I When a rule (such as #2) is a super rule of another rule (#1)
and the former has the same or a lower lift, the former rule
(#2) is considered to be redundant.
I Other redundant rules in the above result are rules #4, #7
and #8, compared respectively with #3, #6 and #5.
39 / 68
Remove Redundant Rules
## find redundant rules
subset.matrix <- is.subset(rules.surv.sorted, rules.surv.sorted)
subset.matrix[lower.tri(subset.matrix, diag = T)] <- F
redundant <- colSums(subset.matrix) >= 1

## which rules are redundant

redundant %>% which()
## {Class=2nd,Sex=Female,Age=Child,Survived=Yes}
## 2
## {Class=1st,Sex=Female,Age=Adult,Survived=Yes}
## 4
## {Class=Crew,Sex=Female,Age=Adult,Survived=Yes}
## 7
## {Class=2nd,Sex=Female,Age=Adult,Survived=Yes}
## 8

## remove redundant rules

rules.surv.pruned <- rules.surv.sorted[!redundant]

40 / 68
Remaining Rules
rules.surv.pruned %>% inspect() ## print rules
## lhs rhs support confidence lift...
## [1] {Class=2nd, ...
## Age=Child} => {Survived=Yes} 0.011 1.000 3.096...
## [2] {Class=1st, ...
## Sex=Female} => {Survived=Yes} 0.064 0.972 3.010...
## [3] {Class=2nd, ...
## Sex=Female} => {Survived=Yes} 0.042 0.877 2.716...
## [4] {Class=Crew, ...
## Sex=Female} => {Survived=Yes} 0.009 0.870 2.692...
## [5] {Class=2nd, ...
## Sex=Male, ...
## Age=Adult} => {Survived=No} 0.070 0.917 1.354...
## [6] {Class=2nd, ...
## Sex=Male} => {Survived=No} 0.070 0.860 1.271...
## [7] {Class=3rd, ...
## Sex=Male, ...
## Age=Adult} => {Survived=No} 0.176 0.838 1.237...
## [8] {Class=3rd, ...
## Sex=Male} => {Survived=No} 0.192 0.827 1.222...
41 / 68
rules.surv.pruned[1] %>% inspect() ## print rules
## lhs rhs support confidence
## [1] {Class=2nd,Age=Child} => {Survived=Yes} 0.011 1
## lift count
## [1] 3.096 24

I Did children have a higher survival rate than adults?

I Did children of the 2nd class have a higher survival rate than
other children?

42 / 68
rules.surv.pruned[1] %>% inspect() ## print rules
## lhs rhs support confidence
## [1] {Class=2nd,Age=Child} => {Survived=Yes} 0.011 1
## lift count
## [1] 3.096 24

I Did children have a higher survival rate than adults?

I Did children of the 2nd class have a higher survival rate than
other children?
I The rule states only that all children of class 2 survived, but
provides no information at all about the survival rates of other
classes.

42 / 68
Find Rules about Age Groups
I Use lower thresholds to find all rules for children of different
classes
I verbose=F: suppress progress report
I minlen=3: find rules that contain at least three items
I Use lower threshholds for support and confidence
I rhs=c(...), rhs=c(...): find rules whose left/right-hand
sides are in the list
I quality(...): interestingness measures

## mine rules about class and age group

rules.age <- titanic.raw %>% apriori(control = list(verbose=F),
parameter = list(minlen=3, supp=0.002, conf=0.2),
appearance = list(default="none", rhs=c("Survived=Yes"),
lhs=c("Class=1st", "Class=2nd", "Class=3rd",
"Age=Child", "Age=Adult")))
rules.age <- sort(rules.age, by="confidence")

43 / 68
Rules about Age Groups
rules.age %>% inspect() ## print rules
## lhs rhs support
## [1] {Class=2nd,Age=Child} => {Survived=Yes} 0.010904134
## [2] {Class=1st,Age=Child} => {Survived=Yes} 0.002726034
## [3] {Class=1st,Age=Adult} => {Survived=Yes} 0.089504771
## [4] {Class=2nd,Age=Adult} => {Survived=Yes} 0.042707860
## [5] {Class=3rd,Age=Child} => {Survived=Yes} 0.012267151
## [6] {Class=3rd,Age=Adult} => {Survived=Yes} 0.068605179
## confidence lift count
## [1] 1.0000000 3.0956399 24
## [2] 1.0000000 3.0956399 6
## [3] 0.6175549 1.9117275 197
## [4] 0.3601533 1.1149048 94
## [5] 0.3417722 1.0580035 27
## [6] 0.2408293 0.7455209 151

## average survival rate

titanic.raw$Survived %>% table() %>% prop.table()
## .
## No Yes
## 0.676965 0.323035 44 / 68
## rule visualisation
library(arulesViz)
rules.all %>% plot()

Scatter plot for 27 rules

1
1.25

I 1.2X-axis:
0.95
support
1.15

I Y-axis:
confidence

0.9
confidence
1.1

I Color: lift
1.05

0.85
1

0.95

lift
0.2 0.4 0.6 0.8
support

45 / 68
Items in LHS Group
2 rules: {Age=Child, Class=2nd, +1 items}

2 rules: {Class=1st, Sex=Female, +1 items}

1 rules: {Class=2nd, Sex=Female}

2 rules: {Class=Crew, Sex=Female, +1 items}

1 rules: {Age=Adult, Class=2nd, +1 items}

1 rules: {Sex=Male, Age=Adult, +1 items}

1 rules: {Sex=Male, Class=2nd}

rules.surv %>% plot(method = "grouped")

1 rules: {Class=3rd, Sex=Male, +1 items}

Grouped Matrix for 12 Rules

1 rules: {Class=3rd, Sex=Male}

RHS

{Class=1st, Sex=Female, +1 items} ⇒ {Survived=Yes}

{Survived=No}
{Survived=Yes}
Size: support
Color: lift

46 / 68
rules.surv %>% plot(method="graph",
control=list(layout=igraph::with_fr()))

Graph for 12 rules

size: support (0.006 − 0.192)
color: lift (1.222 − 3.096)

Age=Child

Class=2nd
Survived=Yes
Sex=Female
Class=Crew

Sex=Male
Age=Adult Survived=No

Class=1st

Class=3rd
47 / 68
rules.surv %>% plot(method="graph",
control=list(layout=igraph::in_circle()))

Graph for 12 rules

size: support (0.006 − 0.192)
color: lift (1.222 − 3.096)

Age=AdultSex=Male
Age=Child Sex=Female

Survived=No Class=Crew

Survived=Yes Class=3rd

Class=2nd

Class=1st

48 / 68
rules.surv %>% plot(method="paracoord",
control=list(reorder=T))

Parallel coordinates plot for 12 rules

Survived=Yes

Survived=No

Age=Adult

Class=1st

Class=Crew

Sex=Female

Class=2nd

Age=Child

Sex=Male

Class=3rd

3 2 1 rhs
Position

49 / 68
Interactive Plots and Reorder rules

rules.all %>% plot(interactive = T)

interactive = TRUE
I Selecting and inspecting one or multiple rules
I Zooming
I Filtering rules with an interesting measure

rules.surv %>% plot(method = "paracoord", control = list(reorder = T))

reorder = TRUE
I To improve visualisation by reordering rules and minimizing
crossovers
I The visualisation is likely to change from run to run.

50 / 68
Wrap Up

I Starting with a high support, to get a small set of rules quickly

I Setting constraints to left and/or right hand side of rules, to
focus on rules that you are interested in
I Digging down data to find more associations with lower
threshholds of support and confidence
I Rules of low confidence / lift can be interesting and useful.
I Be cautious when interpreting rules

51 / 68
Contents
Association Rules: Concept and Algorithms
Basics of Association Rules
Algorithms: Apriori, ECLAT and FP-growth
Interestingness Measures
Applications

Association Rule Mining with R

Mining Association Rules
Removing Redundancy
Interpreting Rules
Visualizing Association Rules
Wrap Up

Association Rule Mining with R

Mining Association Rules
Removing Redundancy
Interpreting Rules
Visualizing Association Rules
Wrap Up

11. stalk-shape: enlarging=e,tapering=t

12. stalk-root: bulbous=b,club=c,cup=u,equal=e,
rhizomorphs=z,rooted=r,missing=?
13. stalk-surface-above-ring: fibrous=f,scaly=y,silky=k,smooth=s
14. stalk-surface-below-ring: fibrous=f,scaly=y,silky=k,smooth=s
15. stalk-color-above-ring:
brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
16. stalk-color-below-ring:
brown=n,buff=b,cinnamon=c,gray=g,orange=o,
pink=p,red=e,white=w,yellow=y
17. veil-type: partial=p,universal=u
18. veil-color: brown=n,orange=o,white=w,yellow=y
19. ring-number: none=n,one=o,two=t
20. ring-type: cobwebby=c,evanescent=e,flaring=f,large=l,
none=n,pendant=p,sheathing=s,zone=z

57 / 68
The Mushroom Dataset III

21. spore-print-color:
black=k,brown=n,buff=b,chocolate=h,green=r,
orange=o,purple=u,white=w,yellow=y
22. population: abundant=a,clustered=c,numerous=n,
scattered=s,several=v,solitary=y
23. habitat: grasses=g,leaves=l,meadows=m,paths=p,
urban=u,waste=w,woods=d

‡
https://archive.ics.uci.edu/ml/datasets/Mushroom
58 / 68
Load Mushroom Dataset

## load mushroom data from UCI the Machine Learning Repository

url <- past0("http://archive.ics.uci.edu/ml/",
"machine-learning-databases/mushroom/agaricus-lepiota.data")

mushrooms <- read.csv(file = url, header = FALSE)

names(mushrooms) <- c("class", "cap-shape", "cap-surface",
"cap-color", "bruises", "odor", "gill-attachment", "gill-spacing",
"gill-size", "gill-color", "stalk-shape", "stalk-root",
"stalk-surface-above-ring", "stalk-surface-below-ring",
"stalk-color-above-ring", "stalk-color-below-ring",
"veil-type", "veil-color", "ring-number", "ring-type",
"spore-print-color", "population", "habitat")
table(mushrooms$class, useNA="ifany")
##
## e p
## 4208 3916

59 / 68
The Mushroom Dataset
str(mushrooms)
## 'data.frame': 8124 obs. of 23 variables:
## $ class : Factor w/ 2 levels "e","p": 2 ...
## $ cap-shape : Factor w/ 6 levels "b","c","f"...
## $ cap-surface : Factor w/ 4 levels "f","g","s"...
## $ cap-color : Factor w/ 10 levels "b","c","e...
## $ bruises : Factor w/ 2 levels "f","t": 2 ...
## $ odor : Factor w/ 9 levels "a","c","f"...
## $ gill-attachment : Factor w/ 2 levels "a","f": 2 ...
## $ gill-spacing : Factor w/ 2 levels "c","w": 1 ...
## $ gill-size : Factor w/ 2 levels "b","n": 2 ...
## $ gill-color : Factor w/ 12 levels "b","e","g...
## $ stalk-shape : Factor w/ 2 levels "e","t": 1 ...
## $ stalk-root : Factor w/ 5 levels "?","b","c"...
## $ stalk-surface-above-ring: Factor w/ 4 levels "f","k","s"...
## $ stalk-surface-below-ring: Factor w/ 4 levels "f","k","s"...
## $ stalk-color-above-ring : Factor w/ 9 levels "b","c","e"...
## $ stalk-color-below-ring : Factor w/ 9 levels "b","c","e"...
## $ veil-type : Factor w/ 1 level "p": 1 1 1 1...
## $ veil-color : Factor w/ 4 levels "n","o","w"...
## $ ring-number : Factor w/ 3 levels "n","o","t"...
60 / 68
Exercise

I From the mushroom data, find association rules that can be

used to identify the edibility of a mushroom
I Think about parameters: length of rules, minimum support,
minimum confidence
I How to find only rules relevant to edibility?
I Which interestingness measures to use?
I Any reduntant rules? How to remove them?
I What are characteristics of edible mushrooms? And
characteristics of poisonous ones?

61 / 68
Mining Association Rules from Mushroom Dataset
## find associatin rules from the mushroom dataset
rules <- apriori(mushrooms, control = list(verbose=F),
parameter = list(minlen=2, maxlen=5),
appearance = list(rhs=c("class=p", "class=e"),
default="lhs"))
quality(rules) <- round(quality(rules), digits=3)
rules.sorted <- sort(rules, by="confidence")
inspect(head(rules.sorted))
## lhs rhs support confidence
## [1] {ring-type=l} => {class=p} 0.160 1
## [2] {gill-color=b} => {class=p} 0.213 1
## [3] {odor=f} => {class=p} 0.266 1
## [4] {gill-size=b,gill-color=n} => {class=e} 0.108 1
## [5] {odor=n,stalk-root=e} => {class=e} 0.106 1
## [6] {bruises=f,stalk-root=e} => {class=e} 0.106 1
## lift count
## [1] 2.075 1296
## [2] 2.075 1728
## [3] 2.075 2160
## [4] 1.931 880
## [5] 1.931 864
62 / 68
Online Resources

I Book titled R and Data Mining: Examples and Case

Studies [Zhao, 2012]
http://www.rdatamining.com/docs/RDataMining-book.pdf
I R Reference Card for Data Mining
http://www.rdatamining.com/docs/RDataMining-reference-card.pdf
I Free online courses and documents
http://www.rdatamining.com/resources/
I RDataMining Group on LinkedIn (27,000+ members)
http://group.rdatamining.com
I Twitter (3,300+ followers)
@RDataMining

63 / 68
The End

Thanks!
Email: yanchang(at)RDataMining.com
Twitter: @RDataMining
64 / 68
How to Cite This Work

I Citation
Yanchang Zhao. R and Data Mining: Examples and Case Studies. ISBN
978-0-12-396963-7, December 2012. Academic Press, Elsevier. 256
pages. URL: http://www.rdatamining.com/docs/RDataMining-book.pdf.
I BibTex
@BOOK{Zhao2012R,
title = {R and Data Mining: Examples and Case Studies},
publisher = {Academic Press, Elsevier},
year = {2012},
author = {Yanchang Zhao},
pages = {256},
month = {December},
isbn = {978-0-123-96963-7},
keywords = {R, data mining},
url = {http://www.rdatamining.com/docs/RDataMining-book.pdf}
}

65 / 68
References I
Agrawal, R., Imielinski, T., and Swami, A. (1993).
Mining association rules between sets of items in large databases.
In Proc. of the ACM SIGMOD International Conference on Management of Data, pages 207–216,
Washington D.C. USA.

Agrawal, R. and Srikant, R. (1994).

Fast algorithms for mining association rules in large databases.
In Proc. of the 20th International Conference on Very Large Data Bases, pages 487–499, Santiago, Chile.

Chan, R., Yang, Q., and Shen, Y.-D. (2003).

Mining high utility itemsets.
In Data Mining, 2003. ICDM 2003. Third IEEE International Conference on, pages 19–26.

Dong, G. and Li, J. (1998).

Interestingness of discovered association rules in terms of neighborhood-based unexpectedness.
In PAKDD ’98: Proceedings of the Second Pacific-Asia Conference on Research and Development in
Knowledge Discovery and Data Mining, pages 72–86, London, UK. Springer-Verlag.

Freitas, A. A. (1998).
On objective measures of rule surprisingness.
In PKDD ’98: Proceedings of the Second European Symposium on Principles of Data Mining and
Knowledge Discovery, pages 1–9, London, UK. Springer-Verlag.

Han, J. (2005).
Data Mining: Concepts and Techniques.
Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.

Han, J., Pei, J., Yin, Y., and Mao, R. (2004).

Mining frequent patterns without candidate generation.
Data Mining and Knowledge Discovery, 8:53–87.

66 / 68
References II
Liu, B. and Hsu, W. (1996).
Post-analysis of learned rules.
In Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96), pages 828–834,
Portland, Oregon, USA.

Omiecinski, E. R. (2003).
Alternative interest measures for mining associations in databases.
IEEE Transactions on Knowledge and Data Engineering, 15(1):57–69.

Ras, Z. W. and Wieczorkowska, A. (2000).

Action-rules: How to increase profit of a company.
In PKDD ’00: Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge
Discovery, pages 587–592, London, UK. Springer-Verlag.

Silberschatz, A. and Tuzhilin, A. (1995).

On subjective measures of interestingness in knowledge discovery.
In Knowledge Discovery and Data Mining, pages 275–281.

Silberschatz, A. and Tuzhilin, A. (1996).

What makes patterns interesting in knowledge discovery systems.
IEEE Transactions on Knowledge and Data Engineering, 8(6):970–974.

Tan, P.-N., Kumar, V., and Srivastava, J. (2002).

Selecting the right interestingness measure for association patterns.
In KDD ’02: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, pages 32–41, New York, NY, USA. ACM Press.

Zaki, M. J., Parthasarathy, S., Ogihara, M., and Li, W. (1997).

New algorithms for fast discovery of association rules.
Technical Report 651, Computer Science Department, University of Rochester, Rochester, NY 14627.

67 / 68
References III

Zhao, Y. (2012).
R and Data Mining: Examples and Case Studies, ISBN 978-0-12-396963-7.
Academic Press, Elsevier.

Zhao, Y., Zhang, C., and Cao, L., editors (2009).

Post-Mining of Association Rules: Techniques for Effective Knowledge Extraction, ISBN 978-1-60566-404-0.

Information Science Reference, Hershey, PA.

68 / 68

Edge-Picking Algorithm
67% (3)
Edge-Picking Algorithm
31 pages
Data Mining Hahahaha
No ratings yet
Data Mining Hahahaha
65 pages
Association Rule Mining With R
No ratings yet
Association Rule Mining With R
58 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
No ratings yet
Association Rule-A Tool For Data Mining: Praveen Ranjan Srivastava
6 pages
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
No ratings yet
Association Rule: Association Rule Learning Is A Popular and Well Researched Method For Discovering
10 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
Association Rule Mining Lesson PDF
No ratings yet
Association Rule Mining Lesson PDF
9 pages
Association
No ratings yet
Association
40 pages
Apriori Algorithm in Data Mining
No ratings yet
Apriori Algorithm in Data Mining
8 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
From Introduction To Data Mining: Data Mining Association Analysis: Basic Concepts and Algorithms
37 pages
Association Rules FP Tree1
No ratings yet
Association Rules FP Tree1
31 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Week 3
No ratings yet
Week 3
56 pages
Lecture 4
No ratings yet
Lecture 4
76 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
DM 2
No ratings yet
DM 2
71 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
33 pages
Association Rules
No ratings yet
Association Rules
48 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
30 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
Notes On Association Rules
No ratings yet
Notes On Association Rules
3 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
93 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
Updated Module 3
No ratings yet
Updated Module 3
31 pages
Notes 4 DWM Data Mining
No ratings yet
Notes 4 DWM Data Mining
34 pages
Lecture 5 - FP-Growth Algorithm
No ratings yet
Lecture 5 - FP-Growth Algorithm
26 pages
06 FPBasic
No ratings yet
06 FPBasic
69 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
No ratings yet
UNIT-5 DWDM (Data Warehousing and Data Mining) Association Analysis
7 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
No ratings yet
Thabet Slimani - Efficiant Analysis of Pattern and Association Rule Mining Approaches
14 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
No ratings yet
Introduction To Data Mining: Saeed Salem Department of Computer Science North Dakota State University Cs - Ndsu.edu/ Salem
30 pages
06apriori Edited v3
No ratings yet
06apriori Edited v3
29 pages
Mtech Project Seminar1
No ratings yet
Mtech Project Seminar1
36 pages
DWDM Unit 2 and 3
No ratings yet
DWDM Unit 2 and 3
31 pages
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
No ratings yet
Performance Analysis of Distributed Association Rule Mining With Apriori Algorithm
5 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
06 FPBasic
No ratings yet
06 FPBasic
37 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Module 4.2 Association Rule Mining
No ratings yet
Module 4.2 Association Rule Mining
88 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
Unit2 Apriori FP Growth
No ratings yet
Unit2 Apriori FP Growth
27 pages
FP Growth (Tree)
No ratings yet
FP Growth (Tree)
24 pages
Association Rule Mining
No ratings yet
Association Rule Mining
11 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
State Space Search: Fundamentals and Applications
From Everand
State Space Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
DS - Module-1-I
No ratings yet
DS - Module-1-I
30 pages
Data - Structures Using C Semester 4 Text Books
No ratings yet
Data - Structures Using C Semester 4 Text Books
122 pages
100 Days of DSA
No ratings yet
100 Days of DSA
1 page
DSA Imp Question Bank by MCA SCHOLARS Group
No ratings yet
DSA Imp Question Bank by MCA SCHOLARS Group
2 pages
Complexity: Prefixes
No ratings yet
Complexity: Prefixes
3 pages
01 1 Arrays and Lists
No ratings yet
01 1 Arrays and Lists
118 pages
Lab#9 PF CPE-27 M.usama Saghar
No ratings yet
Lab#9 PF CPE-27 M.usama Saghar
12 pages
Priority Queue
No ratings yet
Priority Queue
9 pages
6.3 Exercises For Laboratory Work 6
No ratings yet
6.3 Exercises For Laboratory Work 6
4 pages
C ++ Programming
No ratings yet
C ++ Programming
10 pages
9 Association Rule Mining
No ratings yet
9 Association Rule Mining
24 pages
Post Optimality Analysis
No ratings yet
Post Optimality Analysis
13 pages
22 - Elementary Graph Algorithms
No ratings yet
22 - Elementary Graph Algorithms
55 pages
Data Structure
100% (2)
Data Structure
72 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page
ML Article Writing
No ratings yet
ML Article Writing
3 pages
20+ Coding Patterns To Crack Any Coding Interviews
No ratings yet
20+ Coding Patterns To Crack Any Coding Interviews
26 pages
Trees: Data Structures Using C Satish 8886503423
No ratings yet
Trees: Data Structures Using C Satish 8886503423
27 pages
Experiment Title.: Infix To Postfix Notation
No ratings yet
Experiment Title.: Infix To Postfix Notation
7 pages
Data Structure Using1 C Lab Program
No ratings yet
Data Structure Using1 C Lab Program
43 pages
07 Network Flow I
No ratings yet
07 Network Flow I
96 pages
Bidirectional Search A Smarter Way To Navigate AI Problems
No ratings yet
Bidirectional Search A Smarter Way To Navigate AI Problems
12 pages
Dsa Notes
No ratings yet
Dsa Notes
510 pages
Busca em Grafos
No ratings yet
Busca em Grafos
48 pages
Hierarchical Problem Solving Using Reinforcement Learning: Methodology and Methods
No ratings yet
Hierarchical Problem Solving Using Reinforcement Learning: Methodology and Methods
107 pages
Assignment#1 (2024)
No ratings yet
Assignment#1 (2024)
3 pages
Unique Algorithm
No ratings yet
Unique Algorithm
6 pages
G.D. Goenka Public School, Sector 22, Rohini Class - Ix Practice Sheet-2 Subjective Polynomials
No ratings yet
G.D. Goenka Public School, Sector 22, Rohini Class - Ix Practice Sheet-2 Subjective Polynomials
2 pages
Dsa Series
No ratings yet
Dsa Series
23 pages

RDataMining Slides Association Rules PDF

Uploaded by

RDataMining Slides Association Rules PDF

Uploaded by

∗

Association Rule Mining with R

R and Data Mining Course

Association Rule Mining with R

Further Readings and Online Resources

Association rules are rules presenting association or correlation

support(A ⇒ B) = support(A ∪ B) = P(A ∧ B)

where P(A) is the percentage (or probability) of cases containing

I Assume there are 100 students.

I Assume there are 100 students.

I Assume there are 100 students.

I Assume there are 100 students.

I Assume there are 100 students.

I Assume there are 100 students.

I Assume there are 100 students.

I Association Rule Mining is normally composed of two steps:

I Downward-closure property of support, a.k.a.

I Apriori [Agrawal and Srikant, 1994]: a classic algorithm for

1. Find all frequent 1-itemsets L1

I FP-growth: frequent-pattern growth, which mines frequent

From [Han, 2005]

I In the first pass, the algorithm counts occurrence of items

I ECLAT: equivalence class transformation [Zaki et al., 1997]

I Which rules or patterns are interesting (and useful)?

I A pattern is unexpected if it is new to a user or contradicts

From [Tan et al., 2002]

From [Tan et al., 2002]

I Stock market analysis

Association Rule Mining with R

Further Readings and Online Resources

I Apriori [Agrawal and Srikant, 1994]

I ECLAT [Zaki et al., 1997]

I The Titanic dataset in the datasets package is a 4-dimensional

I Load library magrittr for pipe operations

library(magrittr) ## for pipe operations

library(magrittr) ## for pipe operations

## a summary of the dataset

I Mine frequent itemsets, association rules or association

rules.all %>% inspect() ## print all rules

I There are often too many association rules discovered from a

I Rule #2 provides no extra knowledge in addition to rule #1,

## which rules are redundant

## remove redundant rules

I Did children have a higher survival rate than adults?

I Did children have a higher survival rate than adults?

## mine rules about class and age group

## average survival rate

Scatter plot for 27 rules

2 rules: {Class=1st, Sex=Female, +1 items}

1 rules: {Class=2nd, Sex=Female}

2 rules: {Class=Crew, Sex=Female, +1 items}

1 rules: {Age=Adult, Class=2nd, +1 items}

1 rules: {Sex=Male, Age=Adult, +1 items}

1 rules: {Sex=Male, Class=2nd}

1 rules: {Class=3rd, Sex=Male, +1 items}

1 rules: {Class=3rd, Sex=Male}

{Class=1st, Sex=Female, +1 items} ⇒ {Survived=Yes}

Graph for 12 rules

Graph for 12 rules

Parallel coordinates plot for 12 rules

rules.all %>% plot(interactive = T)

rules.surv %>% plot(method = "paracoord", control = list(reorder = T))

I Starting with a high support, to get a small set of rules quickly

Association Rule Mining with R

Further Readings and Online Resources

Association Rule Mining with R

Further Readings and Online Resources

11. stalk-shape: enlarging=e,tapering=t

## load mushroom data from UCI the Machine Learning Repository

mushrooms <- read.csv(file = url, header = FALSE)

I From the mushroom data, find association rules that can be

I Book titled R and Data Mining: Examples and Case

Agrawal, R. and Srikant, R. (1994).

Chan, R., Yang, Q., and Shen, Y.-D. (2003).

Dong, G. and Li, J. (1998).

Han, J., Pei, J., Yin, Y., and Mao, R. (2004).