0% found this document useful (0 votes)
102 views

MBA in Python - 1

The document discusses market basket analysis, which involves analyzing transaction data to identify products that are frequently purchased together. It provides examples of transaction data showing items purchased by customers. Market basket analysis can be used to generate recommendations, improve product placement, and increase sales. It explains how to load transaction data into Python, prepare the data for analysis, and compute basic association rules between items.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views

MBA in Python - 1

The document discusses market basket analysis, which involves analyzing transaction data to identify products that are frequently purchased together. It provides examples of transaction data showing items purchased by customers. Market basket analysis can be used to generate recommendations, improve product placement, and increase sales. It explains how to load transaction data into Python, prepare the data for analysis, and compute basic association rules between items.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

What is market

basket analysis?
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Economist
Selecting a bookstore layout
   

MARKET BASKET ANALYSIS IN PYTHON


Exploring transaction data
TID Transaction  

1 biography, history TID = unique ID associated with each


transaction.
2 ction
 
3 biography, poetry
Transaction = set of unique items
4 ction, history purchased together.

5 biography

... ...

75000 ction, poetry

MARKET BASKET ANALYSIS IN PYTHON


What is market basket analysis?
1. Identify products frequently purchased together.
Biography and history

Fiction and poetry

2. Construct recommendations based on these ndings.


Place biography and history sections together.

Keep ction and history apart.

MARKET BASKET ANALYSIS IN PYTHON


The use cases of market basket analysis
1. Build Net ix-style recommendations engine.

2. Improve product recommendations on an e-commerce store.

3. Cross-sell products in a retail setting.

4. Improve inventory management.

5. Upsell products.

MARKET BASKET ANALYSIS IN PYTHON


Using market basket analysis
TID Transaction Market basket analysis
Construct association rules
11 ction, biography
Identify items frequently purchased
12 ction, biography together

13 history, biography Association rules


{antecedent} → {consequent}
... ... { ction} → {biography}

19 ction, biography

20 ction, biography

... ...

MARKET BASKET ANALYSIS IN PYTHON


Loading the data
import pandas as pd

# Load transactions from pandas.


books = pd.read_csv("datasets/bookstore.csv")

# Print the header


print(books.head(2))

TID Transaction
0 biography, history
1 fiction

For a refresher, see the Pandas Cheat Sheet.

MARKET BASKET ANALYSIS IN PYTHON


Building transactions
 

# Split transaction strings into lists.


transactions = books['Transaction'].apply(lambda t: t.split(','))

# Convert DataFrame into list of strings.


transactions = list(transactions)

MARKET BASKET ANALYSIS IN PYTHON


Counting the itemsets
# Print the first transaction.
print(transactions[0])

['biography', 'history']

# Count the number of transactions that contain biography and fiction.


transactions.count(['biography', 'fiction'])

218

MARKET BASKET ANALYSIS IN PYTHON


Making a recommendation
# Count the number of transactions that contain fiction and poetry.
transactions.count(['fiction', 'poetry'])

5357

MARKET BASKET ANALYSIS IN PYTHON


Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
Identifying
association rules
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Economist
Loading and preparing data
import pandas as pd

# Load transactions from pandas.


books = pd.read_csv("datasets/bookstore.csv")

# Split transaction strings into lists.


transactions = books['Transaction'].apply(lambda t: t.split(','))

# Convert DataFrame into list of strings.


transactions = list(transactions)

MARKET BASKET ANALYSIS IN PYTHON


Exploring the data
print(transactions[:5])

[['language', 'travel', 'humor', 'fiction'],


['humor', 'language'],
['humor', 'biography', 'cooking'],
['cooking', 'language'],
['travel']]

MARKET BASKET ANALYSIS IN PYTHON


Association rules
Association rule
Contains antecedent and consequent
{health} → {cooking}

Multi-antecedent rule
{humor, travel} → {language}

Multi-consequent rule
{biography} → {history, language}

MARKET BASKET ANALYSIS IN PYTHON


Dif culty of selecting rules
Finding useful rules is dif cult.
Set of all possible rules is large.

Most rules are not useful.

Must discard most rules.

What if we restrict ourselves to simple rules?


One antecedent and one consequent.

Still challenging, even for small dataset.

MARKET BASKET ANALYSIS IN PYTHON


Generating the rules
   

ction health

poetry travel

history language

biography humor

cooking

MARKET BASKET ANALYSIS IN PYTHON


Generating the rules
Fiction Rules Poetry Rules ... Humor Rules

ction->poetry poetry-> ction ... humor-> ction

ction->history poetry->history ... humor->history

ction->biography poetry->biography ... humor->biography

ction->cooking poetry->cooking ... humor->cooking

... ... ... ...

ction->humor poetry->humor ...

MARKET BASKET ANALYSIS IN PYTHON


Generating rules with itertools
from itertools import permutations

# Extract unique items.


flattened = [item for transaction in transactions for item in transaction]
items = list(set(flattened))

# Compute and print rules.


rules = list(permutations(items, 2))
print(rules)

[('fiction', 'poetry'),
('fiction', 'history'),
...
('humor', 'travel'),
('humor', 'language')]

MARKET BASKET ANALYSIS IN PYTHON


Counting the rules
# Print the number of rules
print(len(rules))

72

MARKET BASKET ANALYSIS IN PYTHON


Looking ahead
# Import the association rules function
from mlxtend.frequent_patterns import association_rules
from mlxtend.frequent_patterns import apriori

# Compute frequent itemsets using the Apriori algorithm


frequent_itemsets = apriori(onehot, min_support = 0.001,
max_len = 2, use_colnames = True)

# Compute all association rules for frequent_itemsets


rules = association_rules(frequent_itemsets,
metric = "lift",
min_threshold = 1.0)

MARKET BASKET ANALYSIS IN PYTHON


Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N
The simplest metric
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

Isaiah Hull
Economist
Metrics and pruning
A metric is a measure of performance for rules.
{humor} → {poetry}
0.81

{ ction} → {travel}
0.23

Pruning is the use of metrics to discard rules.


Retain: {humor} → {poetry}

Discard: { ction} → {travel}

MARKET BASKET ANALYSIS IN PYTHON


The simplest metric
The support metric measures the share of transactions that contain an itemset.

number of transactions with items(s)
number of transactions
 

number of transactions with milk
total transactions

MARKET BASKET ANALYSIS IN PYTHON


Support for language
TID Transaction TID Transaction

0 travel, humor, ction 5 poetry, health, travel, history

1 humor, language 6 humor

2 humor, biography, cooking 7 travel

3 cooking, language 8 poetry, ction, humor

4 travel 9 ction, biography

Support for {language} = 2 / 10 = 0.2

MARKET BASKET ANALYSIS IN PYTHON


Support for {Humor} → {Language}
TID Transaction TID Transaction

0 travel,humor, ction 5 poetry,health,travel,history

1 humor,language 6 humor

2 humor,biography,cooking 7 travel

3 cooking,language 8 poetry, ction,humor

4 travel 9 ction,biography

SUPPORT for {language} → {humor} = 0.1

MARKET BASKET ANALYSIS IN PYTHON


Preparing the data
print(transactions)

[['travel', 'humor', 'fiction'],


...
['fiction', 'biography']]

from mlxtend.preprocessing import TransactionEncoder

# Instantiate transaction encoder


encoder = TransactionEncoder().fit(transactions)

MARKET BASKET ANALYSIS IN PYTHON


Preparing the data
# One-hot encode itemsets by applying fit and transform
onehot = encoder.transform(transactions)

# Convert one-hot encoded data to DataFrame


onehot = pd.DataFrame(onehot, columns = encoder.columns_)
print(onehot)

biography cooking ... poetry travel


0 False False ... False True
...
9 True False ... False False

MARKET BASKET ANALYSIS IN PYTHON


Computing support for single items
print(onehot.mean())

biography 0.2
cooking 0.2
fiction 0.3
health 0.1
history 0.1
humor 0.5
language 0.2
poetry 0.2
travel 0.4
dtype: float64

MARKET BASKET ANALYSIS IN PYTHON


Computing support for multiple items
import numpy as np

# Define itemset that contains fiction and poetry


onehot['fiction+poetry'] = np.logical_and(onehot['fiction'],onehot['poetry'])

print(onehot.mean())

biography 0.2
cooking 0.2
... ...
travel 0.4
fiction+poetry 0.1
dtype: float64

MARKET BASKET ANALYSIS IN PYTHON


Let's practice!
M A R K E T B A S K E T A N A LY S I S I N P Y T H O N

You might also like