Data MINING Acitivity 2-1
Data MINING Acitivity 2-1
Name::Rishab.Ashok.Bhadoriya
Class:: TYBCA(sci)
Roll No::81
1. Load and Preprocess the Data: The dataset will be prepared in a format suitable for the Apriori
algorithm, typically a list of transactions.
2. Apply the Apriori Algorithm: We will use the apriori function from the mlxtend library to extract
frequent itemsets.
3. Generate Association Rules: Using the frequent itemsets, we can generate association rules and
their support, confidence, and lift.
4. Visualize the Results: We'll use libraries like matplotlib and seaborn to create visualizations such as
bar plots for frequent itemsets and scatter plots for the association rules.
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
'Item2': [2,3,4,5],
'Item3': [2,3],
'Item4': [1],
'Item5’: [ 1,2,3]}
df = pd.DataFrame(data)
# Step 4: Visualization
plt.figure(figsize=(10, 6))
plt.title('Frequent Itemsets')
plt.xlabel('Support')
plt.ylabel('Itemsets')
plt.show()
plt.figure(figsize=(10, 6))
plt.title('Association Rules')
plt.xlabel('Support')
plt.ylabel('Confidence')
plt.show()
print(rules)
Steps Breakdown:
1. Dataset: The dataset (df) is created as a binary matrix where each row is a transaction and each
column represents an item. Replace it with your dataset.
2. Apriori Algorithm: We run the Apriori algorithm using mlxtend to get the frequent itemsets with a
minimum support threshold.
3. Association Rules: The association rules are extracted based on the frequent itemsets, and metrics
like lift, support, and confidence are calculated.
4. Visualization:
A scatter plot to visualize the association rules, with support on the x-axis, confidence on the y-axis,
and the size and color representing the lift.
Make sure to adjust the dataset and the parameters (min_support, min_threshold, etc.) to suit your
specific needs. Let me know if you'd like help adapting this to your specific dataset.