Unit II
Unit II
UNIT-II
UNIT II
DATA MINING - FREQUENT PATTERN ANALYSIS
Mining Frequent Patterns, Associations and Correlations – Mining Methods – Pattern Evaluation
Method -Pattern Mining in Multilevel ,Multi Dimensional Space – Constraint Based Frequent Pattern
Mining ,Classification using Frequent Patterns .
Association Mining
• Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a
customer in a visit)
• Find: all rules that correlate the presence of one set of items with that of another set of items
– E.g., 98% of people who purchase tires and auto accessories also get automotive services
done
• Applications
Page 1
191CSC503T DATA MINING
UNIT-II
• Find all the rules X & Y Z with minimum confidence and support
– support, s, probability that a transaction contains {X Y Z}
– confidence, c, conditional probability that a transaction having {X Y} also contains Z
– A C (50%, 66.6%)
– C A (50%, 100%)
Page 2
191CSC503T DATA MINING
UNIT-II
Page 3
191CSC503T DATA MINING
UNIT-II
Rule support and confidence are two measures of rule interestingness. They respectively reflect
the usefulness and certainty of discovered rules. A support of 2% for association Rule means that 2%
of all the transactions under analysis show that computer and financial management software are
purchased together. A confidence of 60% means that 60% of the customers who purchased a computer
also bought the software. Typically, association rules are considered interesting if they satisfy both a
minimum support threshold and a minimum confidence threshold.
The method that mines the complete set of frequent itemsets with candidate generation.
Apriori property & The Apriori Algorithm.
Apriori property
Page 4
191CSC503T DATA MINING
UNIT-II
Example
Page 5
191CSC503T DATA MINING
UNIT-II
The method that mines the complete set of frequent itemsets without generation.
Page 6
191CSC503T DATA MINING
UNIT-II
Header Table
• Completeness:
– never breaks a long pattern of any transaction
– preserves complete information for frequent pattern mining
• Compactness
– reduce irrelevant information—infrequent items are gone
– frequency descending ordering: more frequent items are more likely to be shared
– never be larger than the original database (if not count node-links and counts)
– Example: For Connect-4 DB, compression ratio could be over 100
Page 7
191CSC503T DATA MINING
UNIT-II
Page 8
191CSC503T DATA MINING
UNIT-II
Food
Milk Bread
Fraser Sunset
TID Items
T1 {111, 121, 211, 221}
T2 {111, 211, 222, 323}
T3 {112, 122, 221, 411}
T4 {111, 121}
T5 {111, 122, 211, 221, 413}
Page 9
191CSC503T DATA MINING
UNIT-II
Page 10
191CSC503T DATA MINING
UNIT-II
– If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is
infrequent.
– If adopting reduced min_support at lower levels then examine only those descendents
whose ancestor’s support is frequent/non-negligible.
Correlation in detail.
Page 11
191CSC503T DATA MINING
UNIT-II
Numeric correlation
Page 12
191CSC503T DATA MINING
UNIT-II
– Interestingness constraints:
• strong rules (min_support 3%, min_confidence 60%).
Page 13
191CSC503T DATA MINING
UNIT-II
This section first focuses on multilevel associations involving different abstraction levels.
Consider the example shown in Fig. 3.1.6. It shows the concept hierarchy of the transactions from Table 3.1.5.
Mining Multidimensional Associations
In this section three methods of a mining quantitative association rules are discussed.