0% found this document useful (0 votes)
159 views9 pages

Answer To Assignment 3

This document provides the details of an assignment involving market basket analysis of transaction data. The key steps are: 1) The document identifies all frequent itemsets and sequential patterns in the transaction data using the Apriori algorithm, finding itemsets and sequences that meet minimum support thresholds. 2) It derives association rules from the frequent itemsets that meet minimum confidence thresholds. 3) Based on the analysis, a recommendation is made to store management to place certain items near each other to encourage customers to purchase them together.

Uploaded by

lastindor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
159 views9 pages

Answer To Assignment 3

This document provides the details of an assignment involving market basket analysis of transaction data. The key steps are: 1) The document identifies all frequent itemsets and sequential patterns in the transaction data using the Apriori algorithm, finding itemsets and sequences that meet minimum support thresholds. 2) It derives association rules from the frequent itemsets that meet minimum confidence thresholds. 3) Based on the analysis, a recommendation is made to store management to place certain items near each other to encourage customers to purchase them together.

Uploaded by

lastindor
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

ACCTG 6910, Spring 2003

DESB, University of Utah


Assignment 3 (3/27 – 4/8)

Question 1(50 points): Given the following transactions and minimum support - 50%
and minimum confidence - 80% large item sets, sequential patterns, rules, lifts,
recommend some management decisions

TID Brand_Item_bought
100 King’s-Crab, Sunset-Milk, Dairyland-Cheese, Best-Bread
200 Best-Cheese, Dairyland-Milk, Goldenfarm-Apple, Tasty-Pie, Wonder-Bread
300 Westcoast-Apple, Dairyland-Milk, Wonder-Bread, Tasty-Pie
400 Wonder-Bread, Sunset-Milk, Dairyland-Cheese

a) At the granularity of item without brand (e.g., “milk” and “bread”), please identify all
large itemsets using the Apriori algorithm. Be sure to include all steps in Apriori, i.e.,
Large (k-1)-itemset  Candidate k-itemset (Join, Prune)  Large k-itemset.

Step 1: Identify all large 1-itemsets


{Apple} 2/4 = 50%
{Bread} 4/4 = 100%
{Cheese} 3/4 = 75%
{Milk} 4/4 = 100%
{Pie} 2/4 = 50%

Step 2: Generate Candidate 2-itemsets by join


{Apple, Bread} {Apple, Cheese} {Apple, Milk} {Apple, Pie}
{Bread, Cheese} {Bread, Milk} {Bread, Pie}
{Cheese, Milk} {Cheese, Pie}
{Milk, Pie}

Step 3: Identify large 2-itemsets


{Apple, Bread} 2/4 = 50%
{Apple, Milk} 2/4 = 50%
{Apple, Pie} 2/4 = 50%
{Bread, Cheese} 3/4 = 75%
{Bread, Milk} 4/4 = 100%
{Bread, Pie} 2/4 = 50%
{Cheese, Milk} 3/4 = 75%
{Milk, Pie} 2/4 = 50%
Step 4: Generate candidate 3-itemsets by join
{Apple, Bread, Milk} {Apple, Bread, Pie} {Apple, Milk, Pie}
{Bread, Cheese, Milk} {Bread, Cheese, Pie} {Bread, Milk, Pie}

Step 5: Prune candidate 3-itemsets


{Apple, Bread, Milk} {Apple, Bread, Pie} {Apple, Milk, Pie}
{Bread, Cheese, Milk} {Bread, Milk, Pie}

{Bread, Cheese, Pie} is pruned because its subset {Cheese, Pie} is not large 2-
itemset.

Step 6: Identify Large 3-itemsets


{Apple, Bread, Milk} 2/4 = 50%
{Apple, Bread, Pie} 2/4 = 50%
{Apple, Milk, Pie} 2/4 = 50%
{Bread, Cheese, Milk} 3/4 = 75%
{Bread, Milk, Pie} 2/4 = 50%

Step 7: Generate candidate 4-itemsets by join


{Apple, Bread, Milk, Pie}

Step 8: prune candidate 4-itemsets


{Apple, Bread, Milk, Pie}

Step 9: Identify Large 4-itemsets


{Apple, Bread, Milk, Pie} 2/4 = 50%

b) At the granularity of brand-item (e.g., “Sunset-Milk” and “Wonder-Bread”), please


identify all large itemsets using the Apriori algorithm. Be sure to include all steps in
Apriori, i.e., Large (k-1)-itemset  Candidate k-itemset (Join, Prune)  Large k-
itemset.

Step 1: Identify all large 1-itemsets


{Dairyland-Cheese} 2/4 = 50%
{Dairyland-Milk} 2/4 = 50%
{Sunset-Milk} 2/4 = 50%
{Tasty-Pie} 2/4 = 50%
{Wonder-Bread} 3/4 = 75%

Step 2: Generate candidate 2-itemsets by join


{Dairyland-Cheese, Dairyland-Milk} {Dairyland-Cheese, Sunset-Milk}
{Dairyland-Cheese, Tasty-Pie} {Dairyland-Cheese, Wonder-Bread}
{Dairyland-Milk, Sunset-Milk} {Dairyland-Milk, Tasty-Pie}
{ Dairyland-Milk, Wonder-Bread} {Sunset-Milk, Tasty-Pie}
{Sunset-Milk, Wonder-Bread} {Tasty-Pie, Wonder-Bread }

Step 3: Identify large 2-itemsets


{Dairyland-Cheese, Sunset-Milk} 2/4 = 50%
{Dairyland-Milk, Tasty-Pie} 2/4 = 50%
{Dairyland-Milk, Wonder-Bread} 2/4 = 50%
{Tasty-Pie, Wonder-Bread} 2/4 = 50%

Step 4: Generate candidate 3-itemsets by join


{Dairyland-Milk, Tasty-Pie, Wonder-Bread}

Step 5: Prune candidate 3-itemsets


{Dairyland-Milk, Tasty-Pie, Wonder-Bread}

Step 6: Identify Large 3-itemsets


{Dairyland-Milk, Tasty-Pie, Wonder-Bread} 2/4 = 50%

c) Please list all association rules (i.e., association rules that meet minimum support and
minimum confidence requirements) derived from the itemsets you derived in b) and
their supports, confidences and lifts.

Dairyland-Cheese => Sunset-Milk


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Sunset-Milk => Dairyland-Cheese


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Dairyland-Milk => Tasty-Pie


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Tasty-Pie => Dairyland-Milk


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Dairyland-Milk => Wonder-Bread


support = 50% confidence = 50%/50% = 100% lift = 100%/75% = 1.33

Tasty-Pie => Wonder-Bread


support = 50% confidence = 50%/50% = 100% lift = 100%/75% = 1.33

Dairyland-Milk ∧ Tasty-Pie => Wonder-Bread


support = 50% confidence = 50%/50% = 100% lift = 100%/75% = 1.33

Dairyland-Milk ∧Wonder-Bread => Tasty-Pie


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Tasty-Pie ∧Wonder-Bread => Dairyland-Milk


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Dairyland-Milk => Tasty-Pie ∧Wonder-Bread


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

Tasty-Pie => Dairyland-Milk ∧Wonder-Bread


support = 50% confidence = 50%/50% = 100% lift = 100%/50% = 2

d) Please give one recommendation (e.g., store layout or promotion) to store


management based on the association rules and large item sets you discovered.

The store can put the Tasty-Pie and Wonder-Bread near the Dairyland-Milk to further
encourage the customer to buy them together.

Question 2 (25 points): Let the minimum support be 60% when you derive large
sequences from the following transaction database.

Customer ID Transaction ID Items


A 100 1,2
A 200 3,4
A 300 5,6
A 400 1,2
B 500 1
B 600 3
B 700 5
B 800 1
C 900 2
C 1000 4
C 1100 6
C 1200 2

a) Please identify all large sequencies using the Apriori algorithm. Be sure to include all
steps in Apriori, i.e., Large (k-1)-sequences  Candidate k-sequencies (Join, Prune) 
Large k-sequences.

Version 1 (no repetitive itemsets in sequences)


Step 1: Identify large 1-sequencies
<{1}> 2/3 = 66.67%
<{2}> 2/3 = 66.67%
<{3}> 2/3 = 66.67%
<{4}> 2/3 = 66.67%
<{5}> 2/3 = 66.67%
<{6}> 2/3 = 66.67%

Step 2: Generate candidate 2-sequencies by join


<{1}, {2}> <{2}, {1}> <{1}, {3}> <{3}, {1}>
<{1}, {4}> <{4}, {1}> <{1}, {5}> <{5}, {1}> <{1}, {6}> <{6}, {1}>
<{2}, {3}> <{3}, {2}> <{2}, {4}> <{4}, {2}>
<{2}, {5}> <{5}, {2}> <{2}, {6}> <{6}, {2}>
<{3}, {4}> <{4}, {3}> <{3}, {5}> <{5}, {3}>
<{3}, {6}> <{6}, {3}>
<{4}, {5}> <{5}, {4}> <{4}, {6}> <{6}, {4}>
<{5}, {6}> <{6}, {5}>

Step 3: Identify large 2-sequencies


<{1}, {3}> 2/3 = 66.67%
<{1}, {5}> 2/3 = 66.67%
<{2}, {4}> 2/3 = 66.67%
<{2}, {6}> 2/3 = 66.67%
<{3}, {1}> 2/3 = 66.67%
<{3}, {5}> 2/3 = 66.67%
<{4}, {2}> 2/3 = 66.67%
<{4}, {6}> 2/3 = 66.67%
<{5}, {1}> 2/3 = 66.67%
<{6}, {2}> 2/3 = 66.67%

Step 4: Generate candidate 3-sequencies by join


<{1}, {3}, {5}> <{1}, {5}, {3}>
<{2}, {4}, {6}> <{2}, {6}, {4}>
<{3}, {1}, {5}> <{3}, {5}, {1}>
<{4}, {2}, {6}> <{4}, {6}, {2}>

Step 4: Prune candidate 3-sequencies


<{1}, {3}, {5}>
<{2}, {4}, {6}>
<{3}, {1}, {5}> <{3}, {5}, {1}>
<{4}, {2}, {6}> <{4}, {6}, {2}>

Step 5: Identify large 3-sequencies


<{1}, {3}, {5}> 2/3 = 66.67%
<{2}, {4}, {6}> 2/3 = 66.67%
<{3}, {5}, {1}> 2/3 = 66.67%
<{4}, {6}, {2}> 2/3 = 66.67%

Step 6: Generate candidate 4-sequencies by join


no 4-sequence can be generated.
Version 2 (repetitive itemsets included in sequences)
Step 1: Identify large 1-sequencies
<{1}> 2/3 = 66.67%
<{2}> 2/3 = 66.67%
<{3}> 2/3 = 66.67%
<{4}> 2/3 = 66.67%
<{5}> 2/3 = 66.67%
<{6}> 2/3 = 66.67%

Step 2: Generate candidate 2-sequencies by join


<{1}, {1}> <{1}, {2}> <{2}, {1}> <{1}, {3}> <{3}, {1}>
<{1}, {4}> <{4}, {1}> <{1}, {5}> <{5}, {1}> <{1}, {6}> <{6}, {1}>
<{2}, {2}> <{2}, {3}> <{3}, {2}> <{2}, {4}> <{4}, {2}>
<{2}, {5}> <{5}, {2}> <{2}, {6}> <{6}, {2}>
<{3}, {3}> <{3}, {4}> <{4}, {3}> <{3}, {5}> <{5}, {3}>
<{3}, {6}> <{6}, {3}>
<{4}, {4}> <{4}, {5}> <{5}, {4}> <{4}, {6}> <{6}, {4}>
<{5}, {5}> <{5}, {6}> <{6}, {5}>
<{6}, {6}>

Step 3: Identify large 2-sequencies


<{1}, {1}> 2/3 = 66.67%
<{1}, {3}> 2/3 = 66.67%
<{1}, {5}> 2/3 = 66.67%
<{2}, {2}> 2/3 = 66.67%
<{2}, {4}> 2/3 = 66.67%
<{2}, {6}> 2/3 = 66.67%
<{3}, {1}> 2/3 = 66.67%
<{3}, {5}> 2/3 = 66.67%
<{4}, {2}> 2/3 = 66.67%
<{4}, {6}> 2/3 = 66.67%
<{5}, {1}> 2/3 = 66.67%
<{6}, {2}> 2/3 = 66.67%

Step 4: Generate candidate 3-sequencies by join


<{1}, {1}, {1}> <{1}, {1}, {3}> <{1}, {3}, {1}> <{1}, {3}, {3}>
<{1}, {1}, {5}> <{1}, {5}, {1}> <{1}, {5}, {5}>
<{1}, {3}, {5}> <{1}, {5}, {3}>
<{2}, {2}, {2}> <{2}, {2}, {4}> <{2}, {4}, {2}> <{2}, {4}, {4}>
<{2}, {2}, {6}> <{2}, {6}, {2}> <{2}, {6}, {6}>
<{2}, {4}, {6}> <{2}, {6}, {4}>
<{3}, {1}, {1}> <{3}, {1}, {5}> <{3}, {5}, {1}> <{3}, {5}, {5}>
<{4}, {2}, {2}> <{4}, {2}, {6}> <{4}, {6}, {2}> <{4}, {6}, {6}>
<{5}, {1}, {1}> <{6}, {2}, {2}>
Step 4: Prune candidate 3-sequencies
<{1}, {1}, {1}> <{1}, {1}, {3}> <{1}, {3}, {1}>
<{1}, {1}, {5}> <{1}, {5}, {1}>
<{1}, {3}, {5}>
<{2}, {2}, {4}> <{2}, {4}, {2}> <{2}, {2}, {6}> <{2}, {6}, {2}>
<{2}, {4}, {6}>
<{3}, {1}, {1}> <{3}, {1}, {5}> <{3}, {5}, {1}>
<{4}, {2}, {2}> <{4}, {2}, {6}> <{4}, {6}, {2}>
<{5}, {1}, {1}> <{6}, {2}, {2}>

Step 5: Identify large 3-sequencies


<{1}, {3}, {1}> 2/3 = 66.67%
<{1}, {3}, {5}> 2/3 = 66.67%
<{1}, {5}, {1}> 2/3 = 66.67%
<{2}, {4}, {2}> 2/3 = 66.67%
<{2}, {4}, {6}> 2/3 = 66.67%
<{2}, {6}, {2}> 2/3 = 66.67%
<{3}, {5}, {1}> 2/3 = 66.67%
<{4}, {6}, {2}> 2/3 = 66.67%

Step 6: Generate candidate 4-sequencies by join


<{1}, {3}, {1}, {1}> <{1}, {3}, {1}, {5}> <{1}, {3}, {5}, {1}> <{1}, {3}, {5}, {5}>
<{1}, {5}, {1}, {1}>
<{2}, {4}, {2}, {2}> <{2}, {4}, {2}, {6}> <{2}, {4}, {6}, {2}> <{2}, {4}, {6}, {6}>
<{2}, {6}, {2}, {2}>
<{3}, {5}, {1}, {1}>
<{4}, {6}, {2}, {2}>

Step 7: Prune candidate 4-sequencies


<{1}, {3}, {5}, {1}>
<{2}, {4}, {6}, {2}>

Step 8: Identify large 4-sequencies


<{1}, {3}, {5}, {1}> 2/3 = 66.67%
<{2}, {4}, {6}, {2}> 2/3 = 66.67%

Step 9: Generate candidate 5-sequencies


no 5-sequencies since the largest number of transactions of one customer is 4 in term of
the given dataset.

Question3 (25 points): Go to an ecommerce web site such as amazon.com or buy.com.


Discover and describe one application of the use association rules or sequential patterns.
Please comment on whether it is effective or needs improvement.

In amazon.com, when you are looking at description of a book, it also provides


you the information about the books that the customers who bought this book also
bought, the title that the customers are interested in may also be interested in, and the
customers who bought this book may also buy the books by other authors. This correlated
information about the book you are going to buy is provided by association rules, which
are mined from the past sales transactions. It is effective if amazon.com wants to
recommend relevant books to the customer who is going to buy a book of certain topic.
However, we do not know if the amazon.com sort the associated books according to the
support, confidence or lift, which may be helpful for the customer to locate the books
they really need efficiently.

You might also like