0% found this document useful (0 votes)
15 views39 pages

Unit - IV A DA

The document discusses frequent itemsets and their significance in market basket analysis, where sets of items frequently purchased together are identified. It outlines the Apriori algorithm for generating candidate itemsets and discovering association rules, as well as techniques for handling large datasets efficiently. Additionally, it introduces clustering techniques and the concept of counting frequent items in a data stream.

Uploaded by

ianurags2509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views39 pages

Unit - IV A DA

The document discusses frequent itemsets and their significance in market basket analysis, where sets of items frequently purchased together are identified. It outlines the Apriori algorithm for generating candidate itemsets and discovering association rules, as well as techniques for handling large datasets efficiently. Additionally, it introduces clustering techniques and the concept of counting frequent items in a data stream.

Uploaded by

ianurags2509
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Unit - IV

FREQUENT ITEMSETS
• Frequent Item sets:
– A set of items that appears in many baskets is said
to be “frequent.”
– we assume there is a number s, called the support
threshold.
• I = { 1,2 3}
• I1 = {2,3}
Example
• A sets of words. Each set is a basket, and the words are items.
• 1. {Cat, and, dog, bites}
• 2. {Yahoo, news, claims, a, cat, mated, with, a, dog, and,
produced, viable, offspring}
• 3. {Cat, killer, likely, is, a, big, dog}
• 4. {Professional, free, advice, on, dog, training, puppy, training}
• 5. {Cat, and, kitten, training, and, behavior}
• 6. {Dog, &, Cat, provides, dog, training, in, Eugene, Oregon}
• 7. {“Dog, and, cat”, is, a, slang, term, used, by, police, officers,
for, a, male–
• female, relationship}
• 8. {Shop, for, your, show, dog, grooming, and, pet, supplies}
Example , Cont:

Fig: 1: Occurrences of doubletons


Market Basket Model
• A large Set of items
– Eg: Items Sold in Super market
– A large set of baskets, each is small subsets of
items eg: customer purchase the things on one
day
– Can be used any many relations ship
Association Rule
• Given a set of Baskets
• Want to discover
– Association Rules :
• People Who Bought {X,Y,Z} tend to buy {V,W}

• 2 steps Approach
– Find Frequent item sets
– Generate the Association Rules
Applications
• Items – Products
• Baskets - Set of Products some \one bought
in one time at the store
• Real Time Baskets:
– Chain Stores Keep Terabytes of data what
customers buy together.
– Eg: Run sales on diapers, raise price in other baby
products.
Apriori - Algorithm
• Idea is to generate candidate item sets of a
given size and then scan dataset to check their
counts are really large the process is iterative.
– All singleton are candidates in the first pass , Any
items with less than specified Support Value is
eliminated.
– Two Member Candidate item sets
– Three member candidate item sets
• Frequent Item sets constitutes set of frequent
item sets.
• Generate Association rule which have
confidence values greater than or equal to
specified min confidence.
Tid Items
iTEMS sUPPORTS iTEMS sUPPORTS
1 X,Y,Z X 3 X,Y 1
Y 2 Y,Z 1
2 X,Z

3 X,W
Z 2
X,Z 2
W 1
4 Y,A,B B 1
A 1 eliminated
Item sets Support
S
X,Z 2

Associativ Support Confidenc Confidenc


e Rule e e%
X->A 2 2/3=0.66 66%
Z- > A 2 2/2=1 100%
HANDLING LARGE DATA SETS IN MAIN
MEMORY
• Park, Chen, and Yu Algorithm
Hash pairs to a bucket
1 [{1,3} {1,4} {1,5} {3,4}{3,5}{4,5}}
2 [{2,3}{2,4}{3,4}}
3 And so on for all
4 “”
Memory organized in Aprori Algorithm

Memory Map for PCY - Algorithm


T1: [{1,3} {1,4} {1,5} {3,4}{3,5}{4,5}}

T1: 2 1 2 2
T1: [{1,3} {1,4} {1,5} {3,4}{3,5}{4,5}}
T2: {2,3} }2,4 }{3,4}
T1: 2 1 2 2
BUCK 1 2 3 4 5
ET
Count 4 12 0 0 0
MultiStage PCY Algorithm
MultiStage PCY
Example :
ITEMS: {A,B,C,D}
TRACTIONS: 8
SUPPORT:3
BUCKETS WITH COUNTS > S , YET NO PAIRS OF COUNT>S
Limited-Pass Algorithms
 Finding all frequent itemsets.
 Repeatedly read small subsets of the baskets into
main memory and perform the simple algorithm on
each subset.
 An itemset becomes a candidate if it is found to be
frequent in any one or more subsets of the baskets.
 On second pass count all the candidate itemsets and
determine which are frequent in the entire set.
 Key “monotonically” means itemsets cannot be
frequent in the entire set of baskets unless it is
frequent in at least on subset.
Savasere, Omiecinski, and
Navathe Algorithm (SON)
SON – Map Reduce
False Negative
COUNTING FREQUENT ITEMS IN A
STREAM
we use a decaying window with constant c, then
we can start counting an item whenever we see
it in a basket. We start counting an itemset if we
see it contained within the current basket, and
all its immediate proper subsets already are
being counted. As the window is decaying, we
multiply all counts by 1 − c and eliminate those
that are less than 1/2.
CLUSTERING TECHNIQUES
• Clustering Definition:
CLUSTERING PROCESS
Hierarchical Agglomerative Clustering

You might also like