0% found this document useful (0 votes)

15 views17 pages

Unit 2 Material

Unit II covers various methods of association and correlation mining, focusing on market basket analysis and the Apriori algorithm for mining frequent itemsets. It discusses the importance of support and confidence in association rules, as well as techniques for mining closed frequent itemsets and multidimensional association rules. Additionally, it explores correlation analysis and constraint-based mining strategies to enhance the effectiveness of data mining processes.

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views17 pages

Unit 2 Material

Uploaded by

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 17

UNIT II

Unit-2 - Associations and Correlations

Market Basket Analysis – Apriori Algorithm –
Mining Frequent Itemsets without Candidate Generation –
Mining Frequent Itemsets Using Vertical Data Format –
Mining Closed Frequent Itemsets –
Mining Multilevel Association Rules –
Mining Multidimensional Association Rules –
Correlation Analysis –
Constraint-Based Association Mining
Mining Frequent Patterns, Associations and Correlations – Mining Methods – Mining Various Kinds of
Association Rules – Correlation Analysis – Constraint Based Association Mining

Association Mining

• Association rule mining:

– Finding frequent patterns, associations, correlations, or causal structures among sets of
items or objects in transaction databases, relational databases, and other information
repositories.
• Applications:
– Basket data analysis, cross-marketing, catalog design, loss-leader analysis,
clustering, classification, etc.
• Examples.
– Rule form: “Body ® Head [support, confidence]”.
– buys(x, “diapers”) ® buys(x, “beers”) [0.5%, 60%]
– major(x, “CS”) ^ takes(x, “DB”) ® grade(x, “A”) [1%, 75%]

Association Rule: Basic Concepts

• Given: (1) database of transactions, (2) each transaction is a list of items (purchased by
a customer in a visit)
• Find: all rules that correlate the presence of one set of items with that of another set of items
– E.g., 98% of people who purchase tires and auto accessories also get automotive
services done
• Applications
– *  Maintenance Agreement (What the store should do to boost
Maintenance Agreement sales)
– Home Electronics  * (What other products should the store stocks up?)
– Attached mailing in direct marketing
– Detecting “ping-pong”ing of patients, faulty “collisions”

Rule Measures: Support and Confidence

• Find all the rules X & Y  Z with minimum confidence and support
– support, s, probability that a transaction contains {X  Y  Z}
– confidence, c, conditional probability that a transaction having {X  Y} also contains Z

Let minimum support 50%, and minimum confidence 50%, we have

– A  C (50%, 66.6%)
– C  A (50%, 100%)

Transaction ID Items
Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F

Association Rule Mining: A Road Map

• Boolean vs. quantitative associations (Based on the types of values handled)

– buys(x, “SQLServer”) ^ buys(x, “DMBook”) ® buys(x, “DBMiner”) [0.2%, 60%]
– age(x, “30..39”) ^ income(x, “42..48K”) ® buys(x, “PC”) [1%, 75%]
• Single dimension vs. multiple dimensional associations (see ex. Above)
• Single level vs. multiple-level analysis
– What brands of beers are associated with what brands of diapers?
• Various extensions
– Correlation, causality analysis
• Association does not necessarily imply correlation or causality
– Maxpatterns and closed itemsets
– Constraints enforced
• E.g., small sales (sum < 100) trigger big buys (sum > 1,000)?

Market – Basket analysis

A market basket is a collection of items purchased by a customer in a single transaction, which

is a well-defined business activity. For example, a customer's visits to a grocery store or an online
purchase from a virtual store on the Web are typical customer transactions. Retailers accumulate
huge collections of transactions by recording business activities over time. One common analysis run
against a transactions database is to find sets of items, or itemsets, that appear together in many
transactions. A business can use knowledge of these patterns to improve the Placement of these items
in the store or the layout of mail- order catalog page and Web pages. An itemset containing i items is
called an i-itemset. The percentage of transactions that contain an itemset is called the itemset's
support. For an itemset to be interesting, its support must be higher than a user-specified minimum.
Such itemsets are said to be frequent.

Figure : Market basket analysis.

Rule support and confidence are two measures of rule interestingness. They respectively
reflect the usefulness and certainty of discovered rules. A support of 2% for association Rule means
that 2% of all the transactions under analysis show that computer and financial management
software are purchased together. A confidence of 60% means that 60% of the customers who
purchased a computer also bought the software. Typically, association rules are considered
interesting if they satisfy both a minimum support threshold and a minimum confidence threshold.

Mining Frequent Patterns

The method that mines the complete set of frequent itemsets with candidate generation.
Apriori property & The Apriori Algorithm.

Apriori property

• All nonempty subsets of a frequent item set most also be frequent.

– An item set I does not satisfy the minimum support threshold, min-sup, then I is not
frequent, i.e., support(I) < min-sup
– If an item A is added to the item set I then the resulting item set (I U A) can not
occur more frequently than I.
• Monotonic functions are functions that move in only one direction.
• This property is called anti-monotonic.
• If a set can not pass a test, all its supersets will fail the same test as well.
• This property is monotonic in failing the test.

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset
Example
The method that mines the complete set of frequent itemsets without generation.

• Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure

– highly condensed, but complete for frequent pattern mining
– avoid costly database scans
• Develop an efficient, FP-tree-based frequent pattern mining method
– A divide-and-conquer methodology: decompose mining tasks into smaller ones
– Avoid candidate generation: sub-database test only!

Construct FP-tree from a Transaction DB

TID Items bought (ordered) frequentitems

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
Steps:

1. Scan DB once, find frequent 1-itemset (single item pattern)

2. Order frequent items in frequency descending order
3. Scan DB again, construct FP-tree
Header Table

Item frequency head

f 4
c 4
a 3
b 3
m 3
p 3

Benefits of the FP-tree Structure

• Completeness:
– never breaks a long pattern of any transaction
– preserves complete information for frequent pattern mining
• Compactness
– reduce irrelevant information—infrequent items are gone
– frequency descending ordering: more frequent items are more likely to be shared
– never be larger than the original database (if not count node-links and counts)
– Example: For Connect-4 DB, compression ratio could be over 100

Mining Frequent Patterns Using FP-tree

• General idea (divide-and-conquer)

– Recursively grow frequent pattern path using the FP-tree
• Method
– For each item, construct its conditional pattern-base, and then its conditional FP-tree
– Repeat the process on each newly created conditional FP-tree
– Until the resulting FP-tree is empty, or it contains only one path (single path will
generate all the combinations of its sub-paths, each of which is a frequent
pattern)
Major Steps to Mine FP-tree

1) Construct conditional pattern base for each node in the FP-tree

2) Construct conditional FP-tree from each conditional pattern-base
3) Recursively mine conditional FP-trees and grow frequent patterns obtained so far
 If the conditional FP-tree contains a single path, simply enumerate all the patterns

Principles of Frequent Pattern Growth

• Pattern growth property

– Let  be a frequent itemset in DB, B be 's conditional pattern base, and  be an
itemset in B. Then    is a frequent itemset in DB iff  is frequent in B.
• “abcdef ” is a frequent pattern, if and only if
– “abcde ” is a frequent pattern, and
– “f ” is frequent in the set of transactions containing “abcde ”

Why Is Frequent Pattern Growth Fast?

• Our performance study shows

– FP-growth is an order of magnitude faster than Apriori, and is also faster than
tree- projection
• Reasoning
– No candidate generation, no candidate test
– Use compact data structure
– Eliminate repeated database scan
Basic operation is counting and FP-tree building

Mining multilevel association rules from transactional databases.

• Items often form hierarchy.

• Items at the lower level are expected to have lower support.
• Rules regarding itemsets at
appropriate levels could be quite useful.

• Transaction database can be encoded based on dimensions and levels

• We can explore shared multi-level mining

Food

Milk Bread

Wheat
Skim 2% White

Fraser
Sunset

TID Items
T1 {111, 121, 211, 221}
T2 {111, 211, 222, 323}
T3 {112, 122, 221, 411}
T4 {111, 121}
T5 {111,122,211, 221, 413}

Mining Multi-Level Associations

• A top_down, progressive deepening approach:

– First find high-level strong
rules: milk ® bread [20%,
60%].

– Then find their lower-level “weaker”

rules: 2% milk ® wheat bread [6%,
50%].

• Variations at mining multiple-level association rules.

– Level-crossed association rules:
2% milk ® Wonder wheat bread
– Association rules with multiple, alternative hierarchies:
2% milk ® Wonder bread

Multi-level Association: Uniform Support vs. Reduced Support

• Uniform Support: the same minimum support for all levels

– + One minimum support threshold. No need to examine itemsets containing any
item whose ancestors do not have minimum support.
– – Lower level items do not occur as frequently. If support threshold
• too high  miss low level associations
• too low  generate too many high level associations
• Reduced Support: reduced minimum support at lower levels
– There are 4 search strategies:
• Level-by-level independent
• Level-cross filtering by k-itemset
• Level-cross filtering by single item
• Controlled level-cross filtering by single item

Multi-level Association: Redundancy Filtering

• Some rules may be redundant due to “ancestor” relationships between items.

• Example
– milk  wheat bread [support = 8%, confidence = 70%]
– 2% milk  wheat bread [support = 2%, confidence = 72%]
• We say the first rule is an ancestor of the second rule.
• A rule is redundant if its support is close to the “expected” value, based on the rule’s ancestor

Multi-Level Mining: Progressive Deepening

• A top-down, progressive deepening approach:

– First mine high-level frequent items:
milk (15%), bread (10%)
– Then mine their lower-level “weaker” frequent
itemsets: 2% milk (5%), wheat bread (4%)

• Different min_support threshold across multi-levels lead to different algorithms:

– If adopting the same min_support across multi-levels then toss t if any of t’s ancestors is
infrequent.
– If adopting reduced min_support at lower levels then examine only those
descendents whose ancestor’s support is frequent/non-negligible.

Correlation in detail.

• Interest (correlation, lift)

– taking both P(A) and P(B) in consideration
– P(A^B)=P(B)*P(A), if A and B are independent events
– A and B negatively correlated, if the value is less than 1; otherwise A and B
positively correlated

X 1 1 1 1 0 0 0 0 Itemset Support Interest

Y11000000 X,Y 25% 2
Z01111111 X,Z 37.50% 0.9
2 Correlation
Y,Z 12.50% 0.57
• 2 measures correlation between categorical attributes

2
  (observed _ exp ected )
2

 exp ected

game not game sum(row)

video 4000(4500) 3500(3000) 7500
not video 2000(1500) 500 (1000) 2500
sum(col.) 6000 4000 10000
• expected(i,j) = count(row i) * count(column j) / N
• 2 = (4000 - 4500)2 / 4500 - (3500 - 3000)2 / 3000 - (2000 - 1500)2 / 1500 - (500 - 1000)2 /
1000 = 555.6
• 2 > 1 and observed value of (game, video) < expected value, there is a negative correlation

Numeric correlation

• Correlation concept in statistics

– Used to study the relationship existing between 2 or more numeric variables
– A correlation is a measure of the linear relationship between variables
Ex: number of hours spent studying in a class with grade received
– Outcomes:
•  positively related
•  Not related
•  negatively related
– Statistical relationships
• Covariance
• Correlation coefficient

Constraint-Based Mining in detail.

• Interactive, exploratory mining giga-bytes of data?

– Could it be real? — Making good use of constraints!
• What kinds of constraints can be used in mining?
– Knowledge type constraint: classification, association, etc.
– Data constraint: SQL-like queries
• Find product pairs sold together in Vancouver in Dec.’98.
– Dimension/level constraints:
• in relevance to region, price, brand, customer category.
– Rule constraints
• small sales (price < $10) triggers big sales (sum > $200).
– Interestingness constraints:
• strong rules (min_support  3%, min_confidence  60%).

Rule Constraints in Association Mining

• Two kind of rule constraints:

– Rule form constraints: meta-rule guided mining.
• P(x, y) ^ Q(x, w) ® takes(x, “database systems”).
– Rule (content) constraint: constraint-based query optimization (Ng, et al., SIGMOD’98).
• sum(LHS) < 100 ^ min(LHS) > 20 ^ count(LHS) > 3 ^ sum(RHS) > 1000
• 1-variable vs. 2-variable constraints (Lakshmanan, et al. SIGMOD’99):
– 1-var: A constraint confining only one side (L/R) of the rule, e.g., as shownabove.
– 2-var: A constraint confining both sides (L and R).
• sum(LHS) < min(RHS) ^ max(RHS) < 5* sum(LHS)

Constrain-Based Association Query

• Database: (1) trans (TID, Itemset ), (2) itemInfo (Item, Type, Price)
• A constrained asso. query (CAQ) is in the form of {(S1, S2 )|C },
– where C is a set of constraints on S1, S2 including frequency constraint
• A classification of (single-variable) constraints:
– Class constraint: S  A. e.g. S  Item
– Domain constraint:
• S v,   { , , , , ,  }. e.g. S.Price < 100
• v S,  is  or . e.g. snacks  S.Type
• V S, or S V,   { , , , ,  }
– e.g. {snacks, sodas }  S.Type
– Aggregation constraint: agg(S)  v, where agg is in {min, max, sum, count, avg}, and  
{ , , , , ,  }.
• e.g. count(S1.Type)  1 , avg(S2.Price)  100
Constrained Association Query Optimization Problem

• Given a CAQ = { (S1, S2) | C }, the algorithm should be :

– sound: It only finds frequent sets that satisfy the given constraints C
– complete: All frequent sets satisfy the given constraints C are found
• A naïve solution:
– Apply Apriori for finding all frequent sets, and then to test them for constraint
satisfaction one by one.
• Our approach:
– Comprehensive analysis of the properties of constraints and try to push them as
deeply as possible inside the frequent set computation.
Categories of Constraints.

1. Anti-monotone and Monotone Constraints

• constraint Ca is anti-monotone iff. for any pattern S not satisfying Ca, none of the super-
patterns of S can satisfy Ca
• A constraint Cm is monotone iff. for any pattern S satisfying Cm, every super-pattern of S also
satisfies it

2. Succinct Constraint
• A subset of item Is is a succinct set, if it can be expressed as p(I) for some selection predicate
p, where  is a selection operator
• SP2I is a succinct power set, if there is a fixed number of succinct set I1, …, Ik I, s.t. SP can
be expressed in terms of the strict power sets of I1, …, Ik using union and minus
• A constraint Cs is succinct provided SATCs(I) is a succinct power set

3. Convertible Constraint
• Suppose all items in patterns are listed in a total order R
• A constraint C is convertible anti-monotone iff a pattern S satisfying the constraint implies
that each suffix of S w.r.t. R also satisfies C
• A constraint C is convertible monotone iff a pattern S satisfying the constraint implies
that each pattern of which S is a suffix w.r.t. R also satisfies C
Property of Constraints: Anti-Monotone

• Anti-monotonicity: If a set S violates the constraint, any superset of S violates theconstraint.

• Examples:
– sum(S.Price)  v is anti-monotone
– sum(S.Price)  v is not anti-monotone
– sum(S.Price) = v is partly anti-monotone
• Application:
– Push “sum(S.price)  1000” deeply into iterative frequent set computation.

Example of Convertible Constraints: Avg(S)  V

• Let R be the value descending order over the set of items

– E.g. I={9, 8, 6, 4, 3, 1}
• Avg(S)  v is convertible monotone w.r.t. R
– If S is a suffix of S1, avg(S1)  avg(S)
• {8, 4, 3} is a suffix of {9, 8, 4, 3}
• avg({9, 8, 4, 3})=6  avg({8, 4, 3})=5
– If S satisfies avg(S) v, so does S1
• {8, 4, 3} satisfies constraint avg(S)  4, so does {9, 8, 4, 3}

Property of Constraints: Succinctness

• Succinctness:
– For any set S1 and S2 satisfying C, S1  S2 satisfies C
– Given A1 is the sets of size 1 satisfying C, then any set S satisfying C are based on A1
, i.e., it contains a subset belongs to A1 ,
• Example :
– sum(S.Price )  v is not succinct
– min(S.Price )  v is succinct
• Optimization:
– If C is succinct, then C is pre-counting prunable. The satisfaction of the constraint
alone is not affected by the iterative support counting.

Data Mining Unit-Ii Notes
No ratings yet
Data Mining Unit-Ii Notes
24 pages
DM Chapter 6 (Association)
100% (1)
DM Chapter 6 (Association)
21 pages
Unit 2
No ratings yet
Unit 2
65 pages
Data Mining M2
No ratings yet
Data Mining M2
18 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Data Mining Unit 2 1
No ratings yet
Data Mining Unit 2 1
15 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
DWDM - Unit - IV
No ratings yet
DWDM - Unit - IV
67 pages
Unit II
No ratings yet
Unit II
22 pages
Association Rule Mining
No ratings yet
Association Rule Mining
24 pages
2 Unit DM K Raj Kuamr
No ratings yet
2 Unit DM K Raj Kuamr
26 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Association-Analysis
No ratings yet
Association-Analysis
72 pages
CH - 5
No ratings yet
CH - 5
43 pages
Data - Analytics - Chapter 3
No ratings yet
Data - Analytics - Chapter 3
54 pages
Data Mining UNIT 3 LECTURE NOTES
No ratings yet
Data Mining UNIT 3 LECTURE NOTES
13 pages
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
No ratings yet
BCA Semester VI Data Mining Module 3 (Presentation Kind of N
108 pages
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
No ratings yet
14-Introduction To Apriori Level Wise Algorithm-03-09-2024
32 pages
Fundamentals of Data Science Unit 5
No ratings yet
Fundamentals of Data Science Unit 5
25 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
TMK - DWDM - Unit 4. From Government Engineering College
No ratings yet
TMK - DWDM - Unit 4. From Government Engineering College
176 pages
DMDW - Association Analysis
No ratings yet
DMDW - Association Analysis
12 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
28 pages
DMDW Chapter 4
No ratings yet
DMDW Chapter 4
29 pages
Mining Frequent Patterns and Associations
No ratings yet
Mining Frequent Patterns and Associations
52 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Association Rule Mining:: Dm-Unit-2
No ratings yet
Association Rule Mining:: Dm-Unit-2
16 pages
DM Unit - 2
No ratings yet
DM Unit - 2
14 pages
CH-4 Mining Association Rules
No ratings yet
CH-4 Mining Association Rules
35 pages
FDS Unit02
No ratings yet
FDS Unit02
16 pages
DMDW Chapter 4 (Updated)
No ratings yet
DMDW Chapter 4 (Updated)
28 pages
Unit 2 - Apriori and FP Growth Algortithm
No ratings yet
Unit 2 - Apriori and FP Growth Algortithm
15 pages
DMDW U3
No ratings yet
DMDW U3
16 pages
DWDM Lecture Notes U-4
No ratings yet
DWDM Lecture Notes U-4
17 pages
Royal Enfield
100% (1)
Royal Enfield
12 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
DM Unit2 - 1 Association Mining 19I504
No ratings yet
DM Unit2 - 1 Association Mining 19I504
86 pages
06 Association Rule Mining
No ratings yet
06 Association Rule Mining
20 pages
Note 1455181909
No ratings yet
Note 1455181909
30 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
Chapter 5 Data Mining: Dr. Huma Lone
No ratings yet
Chapter 5 Data Mining: Dr. Huma Lone
56 pages
ROOF INDIA 2023 - List
No ratings yet
ROOF INDIA 2023 - List
14 pages
Association RuleMining
No ratings yet
Association RuleMining
52 pages
Unit-II Association Rules
No ratings yet
Unit-II Association Rules
16 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Association Rule Mining
No ratings yet
Association Rule Mining
10 pages
Mining Frequent Patterns Unit-3
No ratings yet
Mining Frequent Patterns Unit-3
13 pages
Unit 4 - Part 1
No ratings yet
Unit 4 - Part 1
152 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Association Rules
No ratings yet
Association Rules
48 pages
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
No ratings yet
Data Mining - : Dr. Mahmoud Mounir Mahmoud - Mounir@cis - Asu.edu - Eg
26 pages
What Are The Benefits of A Financial Management Information System
No ratings yet
What Are The Benefits of A Financial Management Information System
5 pages
CH 04
No ratings yet
CH 04
51 pages
CIS664-Knowledge Discovery and Data Mining
No ratings yet
CIS664-Knowledge Discovery and Data Mining
74 pages
DM-BS-lec6-Mining Frequent Patterns
No ratings yet
DM-BS-lec6-Mining Frequent Patterns
37 pages
Data Mining Frequent Patterns
No ratings yet
Data Mining Frequent Patterns
22 pages
Powerpoint Presentation On Somlething
No ratings yet
Powerpoint Presentation On Somlething
181 pages
Association Rule: Frequent Pattern Approach
No ratings yet
Association Rule: Frequent Pattern Approach
16 pages
Contents
No ratings yet
Contents
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Subsidiary Agreement
No ratings yet
Subsidiary Agreement
4 pages
DM - Unit II
No ratings yet
DM - Unit II
65 pages
Lead Import - CSV
No ratings yet
Lead Import - CSV
4 pages
Chap 6
No ratings yet
Chap 6
77 pages
FT8 - VRC For DISA - Day3 - 21072021
No ratings yet
FT8 - VRC For DISA - Day3 - 21072021
62 pages
Group 5 - Value Engineering Use Case
No ratings yet
Group 5 - Value Engineering Use Case
25 pages
GQA Qualifications Level 1 Health and Safety in A Construction Environment Qualification Number 601/2322/9 Candidate Qualification Workbook
No ratings yet
GQA Qualifications Level 1 Health and Safety in A Construction Environment Qualification Number 601/2322/9 Candidate Qualification Workbook
76 pages
Domino's Pizza: Business Continuity Strategy During The Covid-19 Pandemic
100% (1)
Domino's Pizza: Business Continuity Strategy During The Covid-19 Pandemic
15 pages
Global Energy Perspective 2021 Final
No ratings yet
Global Energy Perspective 2021 Final
9 pages
Form 6 Hazardous Certificate
No ratings yet
Form 6 Hazardous Certificate
2 pages
GOODS RECEIPT Part 01
No ratings yet
GOODS RECEIPT Part 01
5 pages
800 Series
No ratings yet
800 Series
292 pages
Level Up by PGPX - Janani's Marketing Strategy Case
No ratings yet
Level Up by PGPX - Janani's Marketing Strategy Case
2 pages
Unit 4
No ratings yet
Unit 4
106 pages
Dsa Team 4 Project
No ratings yet
Dsa Team 4 Project
11 pages
GST 2022-23
No ratings yet
GST 2022-23
11 pages
The Food Wars
No ratings yet
The Food Wars
4 pages
Lecture 2
No ratings yet
Lecture 2
42 pages
21CSE356T-NLP - Unit 5
No ratings yet
21CSE356T-NLP - Unit 5
118 pages
Digital Payment
No ratings yet
Digital Payment
11 pages
21CSE356T-NLP-Unit 4.1
No ratings yet
21CSE356T-NLP-Unit 4.1
46 pages
Chapter 4 Market Segmentation Part
No ratings yet
Chapter 4 Market Segmentation Part
32 pages
18 03 Rigid Containers Selby BRC Report Rev 1
No ratings yet
18 03 Rigid Containers Selby BRC Report Rev 1
26 pages
DataScience Project-New
No ratings yet
DataScience Project-New
16 pages
Lotus
No ratings yet
Lotus
18 pages
Check Your English Vocabulary For IELTS - Pages-101-107
No ratings yet
Check Your English Vocabulary For IELTS - Pages-101-107
7 pages
InterBev Beverage Innovation Awards
No ratings yet
InterBev Beverage Innovation Awards
2 pages
NLP Unit-2 QB Updated
No ratings yet
NLP Unit-2 QB Updated
10 pages
The Minimum Wages Act, 1948 vs. Code On Wages, 2019 - A Comprehensive Analysis
No ratings yet
The Minimum Wages Act, 1948 vs. Code On Wages, 2019 - A Comprehensive Analysis
8 pages
Resume of Crescente B
No ratings yet
Resume of Crescente B
4 pages
21CSS303T Data Science Syllabus
No ratings yet
21CSS303T Data Science Syllabus
2 pages
3 Relevant Cost Analysis For Projects - 26680383 - 2025 - 03 - 08 - 14 - 50
No ratings yet
3 Relevant Cost Analysis For Projects - 26680383 - 2025 - 03 - 08 - 14 - 50
2 pages
Mil Writtren
No ratings yet
Mil Writtren
2 pages
GBR Case Law Salomon V Salomon 1897 Ac 22 1897 Eng
No ratings yet
GBR Case Law Salomon V Salomon 1897 Ac 22 1897 Eng
3 pages
Eufy WiFi Smart Door Lock T8502T11
No ratings yet
Eufy WiFi Smart Door Lock T8502T11
1 page
Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition
From Everand
Machine Learning with R - Third Edition: Expert techniques for predictive modeling, 3rd Edition
Brett Lantz
No ratings yet

Unit 2 Material

Uploaded by

Unit 2 Material

Uploaded by

UNIT II

Unit-2 - Associations and Correlations

• Association rule mining:

Association Rule: Basic Concepts

Rule Measures: Support and Confidence

Let minimum support 50%, and minimum confidence 50%, we have

Association Rule Mining: A Road Map

• Boolean vs. quantitative associations (Based on the types of values handled)

Market – Basket analysis

A market basket is a collection of items purchased by a customer in a single transaction, which

Figure : Market basket analysis.

Mining Frequent Patterns

• All nonempty subsets of a frequent item set most also be frequent.

The Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

• Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure

Construct FP-tree from a Transaction DB

TID Items bought (ordered) frequentitems

1. Scan DB once, find frequent 1-itemset (single item pattern)

Item frequency head

Benefits of the FP-tree Structure

Mining Frequent Patterns Using FP-tree

• General idea (divide-and-conquer)

1) Construct conditional pattern base for each node in the FP-tree

Principles of Frequent Pattern Growth

• Pattern growth property

Why Is Frequent Pattern Growth Fast?

• Our performance study shows

Mining multilevel association rules from transactional databases.

• Items often form hierarchy.

• Transaction database can be encoded based on dimensions and levels

Mining Multi-Level Associations

• A top_down, progressive deepening approach:

– Then find their lower-level “weaker”

• Variations at mining multiple-level association rules.

Multi-level Association: Uniform Support vs. Reduced Support

• Uniform Support: the same minimum support for all levels

Multi-level Association: Redundancy Filtering

• Some rules may be redundant due to “ancestor” relationships between items.

Multi-Level Mining: Progressive Deepening

• A top-down, progressive deepening approach:

• Different min_support threshold across multi-levels lead to different algorithms:

• Interest (correlation, lift)

X 1 1 1 1 0 0 0 0 Itemset Support Interest

game not game sum(row)

• Correlation concept in statistics

Constraint-Based Mining in detail.

• Interactive, exploratory mining giga-bytes of data?

Rule Constraints in Association Mining

• Two kind of rule constraints:

Constrain-Based Association Query

• Given a CAQ = { (S1, S2) | C }, the algorithm should be :

1. Anti-monotone and Monotone Constraints

• Anti-monotonicity: If a set S violates the constraint, any superset of S violates theconstraint.

Example of Convertible Constraints: Avg(S)  V

• Let R be the value descending order over the set of items

Property of Constraints: Succinctness

You might also like