Association rule mining finds frequent patterns and correlations among items in transactional databases. It aims to discover rules that describe large portions of your data like "customers that buy X also tend to buy Y". The key step is finding all frequent itemsets that occur above a minimum support threshold. The Apriori algorithm is commonly used, joining potentially frequent itemsets in each pass over the database. Support and confidence are typical measures but have limitations, and other measures like lift address rules between negatively correlated items better. Association rule mining has had significant impact and continued research explores new data types.
Association rule mining finds frequent patterns and correlations among items in transactional databases. It aims to discover rules that describe large portions of your data like "customers that buy X also tend to buy Y". The key step is finding all frequent itemsets that occur above a minimum support threshold. The Apriori algorithm is commonly used, joining potentially frequent itemsets in each pass over the database. Support and confidence are typical measures but have limitations, and other measures like lift address rules between negatively correlated items better. Association rule mining has had significant impact and continued research explores new data types.
Association rule mining finds frequent patterns and correlations among items in transactional databases. It aims to discover rules that describe large portions of your data like "customers that buy X also tend to buy Y". The key step is finding all frequent itemsets that occur above a minimum support threshold. The Apriori algorithm is commonly used, joining potentially frequent itemsets in each pass over the database. Support and confidence are typical measures but have limitations, and other measures like lift address rules between negatively correlated items better. Association rule mining has had significant impact and continued research explores new data types.
Association rule mining finds frequent patterns and correlations among items in transactional databases. It aims to discover rules that describe large portions of your data like "customers that buy X also tend to buy Y". The key step is finding all frequent itemsets that occur above a minimum support threshold. The Apriori algorithm is commonly used, joining potentially frequent itemsets in each pass over the database. Support and confidence are typical measures but have limitations, and other measures like lift address rules between negatively correlated items better. Association rule mining has had significant impact and continued research explores new data types.
Download as PPT, PDF, TXT or read online from Scribd
Download as ppt, pdf, or txt
You are on page 1of 14
Data Mining:
Association Rules Techniques
May 10, 2023 1
What Is Association Mining?
Association rule mining:
Finding frequent patterns, associations, correlations among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: Basket data analysis, cross-marketing, catalog design, loss- leader analysis, clustering, classification, etc. Examples. Rule form: “Body ead [support, confidence]”. buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%] major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
May 10, 2023 2
Association Rule: Basic Concepts
Given: (1) database of transactions, (2) each transaction is a
list of items (purchased by a customer in a visit) Find: all rules that correlate the presence of one set of items with that of another set of items E.g., 98% of people who purchase tires and auto
accessories also get automotive services done
May 10, 2023 3
Rule Measures: Support and Confidence Customer Customer buys both Find all the rules X & Y Z with buys diaper minimum confidence and support support, s, probability that a
transaction contains {X, Y, Z}
confidence, c, conditional
Customer probability that a transaction
buys beer having {X, Y} also contains Z
Transaction ID Items Bought Let minimum support 50%, and
2000 A,B,C minimum confidence 50%, we have A C (50%, 66.6%) 1000 A,C C A (50%, 100%) 4000 A,D 5000 B,E,F May 10, 2023 4 Association Rule Mining: A Road Map
Boolean vs. quantitative associations (Based on the
types of values handled) buys(x, “SQLServer”) ^ buys(x, “DMBook”) buys(x, “DBMiner”) [0.2%, 60%] age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “PC”) [1%, 75%] Single dimension vs. multiple dimensional associations (see ex. Above) Single level vs. multiple-level analysis What brands of beers are associated with what brands of diapers? May 10, 2023 5 Mining Association Rules—An Example
Transaction ID Items Bought Min. support 50%
2000 A,B,C Min. confidence 50% 1000 A,C 4000 A,D Frequent Itemset Support {A} 75% 5000 B,E,F {B} 50% {C} 50% For rule A C : {A,C} 50% support = support({A, C}) = 50% confidence = support({A, C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent May 10, 2023 6 Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset) Use the frequent itemsets to generate association rules.
May 10, 2023 7
The Apriori Algorithm Join Step: Ck is generated by joining Lk with itself -1
Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a
frequent k-itemset Pseudo-code: Ck: Candidate itemset of size k Lk : frequent itemset of size k L1 = {frequent items}; for (k = 1; Lk !=; k++) do begin Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support end return k Lk;
KDD95) A rule (pattern) is interesting if it is unexpected (surprising to the user); and/or
actionable (the user can do something with it)
May 10, 2023 10
Criticism to Support and Confidence Example 1: (Aggarwal & Yu, PODS98) Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal play basketball eat cereal [40%, 66.7%] is misleading
because the overall percentage of students eating cereal is 75%
which is higher than 66.7%. play basketball not eat cereal [20%, 33.3%] is far more
accurate, although with lower support and confidence
basketball not basketball sum(row) cereal 2000 1750 3750 not cereal 1000 250 1250 sum(col.) 3000 2000 5000 May 10, 2023 11 Criticism to Support and Confidence (Cont.) Example 2: X and Y: positively correlated, X 1 1 1 1 0 0 0 0 X and Z, negatively related Y 1 1 0 0 0 0 0 0 support and confidence of Z 0 1 1 1 1 1 1 1 X=>Z dominates We need a measure of dependent or correlated events Rule Support Confidence P ( A B) X=>Y 25% 50% corrA, B P ( A) P( B) X=>Z 37.50% 75% P(B|A)/P(B) is also called the lift of rule A => B May 10, 2023 12 Other Interestingness Measures: Interest Interest (correlation, lift) P( A B) P ( A) P ( B ) taking both P(A) and P(B) in consideration P(A^B)=P(B)*P(A), if A and B are independent events A and B negatively correlated, if the value is less than 1; otherwise A and B positively correlated Itemset Support Interest X 1 1 1 1 0 0 0 0 X,Y 25% 2 Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9 Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57
May 10, 2023 13
Summary
Association rule mining
probably the most significant contribution from the database community in KDD A large number of papers have been published Many interesting issues have been explored An interesting research direction Association analysis in other types of data: spatial data, multimedia data, time series data, etc.