0% found this document useful (0 votes)
115 views35 pages

Walmart Solution PDF

The document analyzes transaction data from Walmart. It performs exploratory data analysis on the dataset, which has over 55,000 rows and 10 columns. Key insights include that most transactions were by males, in the age group 26-35, and unmarried customers.

Uploaded by

ASWINKUMAR R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views35 pages

Walmart Solution PDF

The document analyzes transaction data from Walmart. It performs exploratory data analysis on the dataset, which has over 55,000 rows and 10 columns. Key insights include that most transactions were by males, in the age group 26-35, and unmarried customers.

Uploaded by

ASWINKUMAR R
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 35

walmart

April 28, 2024

WALMART - CASE ANALYSIS


[ ]: #importing libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import t
import warnings
warnings.filterwarnings('ignore')
import copy

[ ]: !gdown https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/293/
↪original/walmart_data.csv?1641285094

Downloading…
From: https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/293/ori
ginal/walmart_data.csv?1641285094
To: /content/walmart_data.csv?1641285094
100% 23.0M/23.0M [00:00<00:00, 87.3MB/s]
1. Exploratory Data Analysis
[ ]: # loading the dataset
df = pd.read_csv('walmart_data.csv')

[ ]: df.head()

[ ]: User_ID Product_ID Gender Age Occupation City_Category \


0 1000001 P00069042 F 0-17 10 A
1 1000001 P00248942 F 0-17 10 A
2 1000001 P00087842 F 0-17 10 A
3 1000001 P00085442 F 0-17 10 A
4 1000002 P00285442 M 55+ 16 C

Stay_In_Current_City_Years Marital_Status Product_Category Purchase


0 2 0 3 8370
1 2 0 1 15200
2 2 0 12 1422

1
3 2 0 12 1057
4 4+ 0 8 7969

[ ]: df.tail()

[ ]: User_ID Product_ID Gender Age Occupation City_Category \


550063 1006033 P00372445 M 51-55 13 B
550064 1006035 P00375436 F 26-35 1 C
550065 1006036 P00375436 F 26-35 15 B
550066 1006038 P00375436 F 55+ 1 C
550067 1006039 P00371644 F 46-50 0 B

Stay_In_Current_City_Years Marital_Status Product_Category Purchase


550063 1 1 20 368
550064 3 0 20 371
550065 4+ 1 20 137
550066 2 0 20 365
550067 4+ 1 20 490

[ ]: df.shape

[ ]: (550068, 10)

[ ]: df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550068 entries, 0 to 550067
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 550068 non-null int64
1 Product_ID 550068 non-null object
2 Gender 550068 non-null object
3 Age 550068 non-null object
4 Occupation 550068 non-null int64
5 City_Category 550068 non-null object
6 Stay_In_Current_City_Years 550068 non-null object
7 Marital_Status 550068 non-null int64
8 Product_Category 550068 non-null int64
9 Purchase 550068 non-null int64
dtypes: int64(5), object(5)
memory usage: 42.0+ MB
Insights:
From the above analysis, it is clear that, data has total of 10 features with lots of mixed alpha
numeric data.
Apart from Purchase Column, all the other data types are of categorical type. We will change the

2
datatypes of all such columns to category
Changing the Datatype of Columns:
[ ]: for i in df.columns[:-1]:
df[i] = df[i].astype('category')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 550068 entries, 0 to 550067
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 550068 non-null category
1 Product_ID 550068 non-null category
2 Gender 550068 non-null category
3 Age 550068 non-null category
4 Occupation 550068 non-null category
5 City_Category 550068 non-null category
6 Stay_In_Current_City_Years 550068 non-null category
7 Marital_Status 550068 non-null category
8 Product_Category 550068 non-null category
9 Purchase 550068 non-null int64
dtypes: category(9), int64(1)
memory usage: 10.3 MB
2. Satatistical Summary:
a. Satistical summary of object type columns:
[ ]: df.describe(include = 'category')

[ ]: User_ID Product_ID Gender Age Occupation City_Category \


count 550068 550068 550068 550068 550068 550068
unique 5891 3631 2 7 21 3
top 1001680 P00265242 M 26-35 4 B
freq 1026 1880 414259 219587 72308 231173

Stay_In_Current_City_Years Marital_Status Product_Category


count 550068 550068 550068
unique 5 2 20
top 1 0 5
freq 193821 324731 150933

Insights: 1. User_ID - Among 5,50,068 transactions there are 5891 unique user_id, indicating
same customers buying multiple products. 2. Product_ID - Among 5,50,068 transactions there are
3631 unique products,with the product having the code P00265242 being the highest seller , with
a maximum of 1,880 units sold. 3. Gender - Out of 5,50,068 transactions, 4,14,259 (nearly 75%)
were done by male gender indicating a significant disparity in purchase behavior between males
and females during the Black Friday event. 4. Age - We have 7 unique age groups in the dataset.

3
26 - 35 Age group has maximum of 2,19,587 transactions. We will analyse this feature in detail in
future 5. Stay_In_Current_City_Years - Customers with 1 year of stay in current city accounted
to maximum of 1,93,821 transactions among all the other customers with (0,2,3,4+) years of stay in
current city 6. Marital_Status - 59% of the total transactions were done by Unmarried Customers
and 41% by Married Customers .
b.Satistical summary of numerical data type columns:
[ ]: df.describe()

[ ]: Purchase
count 550068.000000
mean 9263.968713
std 5023.065394
min 12.000000
25% 5823.000000
50% 8047.000000
75% 12054.000000
max 23961.000000

c.Duplicate Detection:
[ ]: df.duplicated().value_counts()

[ ]: False 550068
Name: count, dtype: int64

Insight: There is no duplicate entries in the dataset


c. Sanity Check for columns
[ ]: # checking the unique values for columns
for i in df.columns:
print('Unique Values in',i,'column are :-')
print(df[i].unique())
print('-'*70)

Unique Values in User_ID column are :-


[1000001, 1000002, 1000003, 1000004, 1000005, …, 1004588, 1004871, 1004113,
1005391, 1001529]
Length: 5891
Categories (5891, int64): [1000001, 1000002, 1000003, 1000004, …, 1006037,
1006038, 1006039, 1006040]
----------------------------------------------------------------------
Unique Values in Product_ID column are :-
['P00069042', 'P00248942', 'P00087842', 'P00085442', 'P00285442', …,
'P00375436', 'P00372445', 'P00370293', 'P00371644', 'P00370853']
Length: 3631
Categories (3631, object): ['P00000142', 'P00000242', 'P00000342', 'P00000442',
…, 'P0099642',

4
'P0099742', 'P0099842', 'P0099942']
----------------------------------------------------------------------
Unique Values in Gender column are :-
['F', 'M']
Categories (2, object): ['F', 'M']
----------------------------------------------------------------------
Unique Values in Age column are :-
['0-17', '55+', '26-35', '46-50', '51-55', '36-45', '18-25']
Categories (7, object): ['0-17', '18-25', '26-35', '36-45', '46-50', '51-55',
'55+']
----------------------------------------------------------------------
Unique Values in Occupation column are :-
[10, 16, 15, 7, 20, …, 18, 5, 14, 13, 6]
Length: 21
Categories (21, int64): [0, 1, 2, 3, …, 17, 18, 19, 20]
----------------------------------------------------------------------
Unique Values in City_Category column are :-
['A', 'C', 'B']
Categories (3, object): ['A', 'B', 'C']
----------------------------------------------------------------------
Unique Values in Stay_In_Current_City_Years column are :-
['2', '4+', '3', '1', '0']
Categories (5, object): ['0', '1', '2', '3', '4+']
----------------------------------------------------------------------
Unique Values in Marital_Status column are :-
[0, 1]
Categories (2, int64): [0, 1]
----------------------------------------------------------------------
Unique Values in Product_Category column are :-
[3, 1, 12, 8, 5, …, 10, 17, 9, 20, 19]
Length: 20
Categories (20, int64): [1, 2, 3, 4, …, 17, 18, 19, 20]
----------------------------------------------------------------------
Unique Values in Purchase column are :-
[ 8370 15200 1422 … 135 123 613]
----------------------------------------------------------------------
Insights:
The dataset does not contain any abnormal values.
We will convert the 0,1 in Marital Status column as married and unmarried
[ ]: #replacing the values in marital_status column
df['Marital_Status'] = df['Marital_Status'].replace({0:'Unmarried',1:'Married'})
df['Marital_Status'].unique()

[ ]: ['Unmarried', 'Married']
Categories (2, object): ['Unmarried', 'Married']

5
d. Missing value Analysis
[ ]: df.isnull().sum()

[ ]: User_ID 0
Product_ID 0
Gender 0
Age 0
Occupation 0
City_Category 0
Stay_In_Current_City_Years 0
Marital_Status 0
Product_Category 0
Purchase 0
dtype: int64

Insights: The dataset does not contain any missing values.


3.Univariate Analysis:
3.1 Numerical Variables
� 3.1.1 Purchase Amount Distribution
[ ]: #setting the plot style
fig = plt.figure(figsize = (15,10))
gs = fig.add_gridspec(2,1,height_ratios=[0.65, 0.35])
#creating purchase amount histogram

ax0 = fig.add_subplot(gs[0,0])
ax0.hist(df['Purchase'],color= '#5C8374',linewidth=0.5,edgecolor='black',bins =␣
↪20)

ax0.set_xlabel('Purchase Amount',fontsize = 12,fontweight = 'bold')


ax0.set_ylabel('Frequency',fontsize = 12,fontweight = 'bold')
#removing the axis lines
for s in ['top','left','right']:
ax0.spines[s].set_visible(False)

#setting title for visual


ax0.set_title('Purchase Amount Distribution',{'font':'serif', 'size':
↪15,'weight':'bold'})

#creating box plot for purchase amount

ax1 = fig.add_subplot(gs[1,0])
boxplot = ax1.boxplot(x = df['Purchase'],vert = False,patch_artist =␣
↪True,widths = 0.5)

# Customize box and whisker colors


boxplot['boxes'][0].set(facecolor='#5C8374')

6
# Customize median line
boxplot['medians'][0].set(color='red')
# Customize outlier markers
for flier in boxplot['fliers']:
flier.set(marker='o', markersize=8, markerfacecolor= "#4b4b4c")

#removing the axis lines


for s in ['top','left','right']:
ax1.spines[s].set_visible(False)
#adding 5 point summary annotations
info = [i.get_xdata() for i in boxplot['whiskers']] #getting the␣
↪upperlimit,Q1,Q3 and lowerlimit

median = df['Purchase'].quantile(0.5) #getting Q2


for i,j in info: #using i,j here because of the output type of info list␣
↪comprehension

ax1.annotate(text = f"{i:.1f}", xy = (i,1), xytext = (i,1.4),fontsize = 12,


arrowprops= dict(arrowstyle="<-", lw=1, connectionstyle="arc,rad=0"))

ax1.annotate(text = f"{j:.1f}", xy = (j,1), xytext = (j,1.4),fontsize = 12,


arrowprops= dict(arrowstyle="<-", lw=1, connectionstyle="arc,rad=0"))
#adding the median separately because it was included in info list
ax1.annotate(text = f"{median:.1f}",xy = (median,1),xytext = (median + 1,1.
↪4),fontsize = 12,

arrowprops= dict(arrowstyle="<-", lw=1, connectionstyle="arc,rad=0"))


#removing y-axis ticks
ax1.set_yticks([])
#adding axis label
ax1.set_xlabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
plt.show()

7
Calculating the Number of Outliers:
As seen above, Purchase amount over 21399 is considered as outlier. We will count the number of
outliers as below
[ ]: len(df.loc[df['Purchase'] > 21399,'Purchase'])

[ ]: 2677

Insights:
Outliers:
There are total of 2677 outliers which is roughly 0.48% of the total data present in purchase
amount. We will not remove them as it indicates a broad range of spending behaviors during the
sale, highlighting the importance of tailoring marketing strategies to both regular and high-value
customers to maximize revenue.
Distribution:
Data suggests that the majority of customers spent between 5,823 USD and 12,054 USD , with the
median purchase amount being 8,047 USD . The lower limit of 12 USD while the upper limit of
21,399 USD reveal significant variability in customer spending
3.2 Categorical Variables:
3.2.1 Gnender, Marital Status and city category Distribution:

8
[ ]: #setting the plot style
fig = plt.figure(figsize = (15,12))
gs = fig.add_gridspec(1,3)
# creating pie chart for gender disribution
ax0 = fig.add_subplot(gs[0,0])
color_map = ["#3A7089", "#4b4b4c"]
ax0.pie(df['Gender'].value_counts().values,labels = df['Gender'].value_counts().
↪index,autopct = '%.1f%%',

shadow = True,colors = color_map,textprops={'fontsize': 13, 'color': 'black'})


#setting title for visual
ax0.set_title('Gender Distribution',{'font':'serif', 'size':15,'weight':'bold'})
# creating pie chart for marital status
ax1 = fig.add_subplot(gs[0,1])
color_map = ["#3A7089", "#4b4b4c"]
ax1.pie(df['Marital_Status'].value_counts().values,labels =␣
↪df['Marital_Status'].value_counts().index,autopct = '%.1f%%',

shadow = True,colors = color_map,textprops={'fontsize': 13, 'color': 'black'})


#setting title for visual
ax1.set_title('Marital Status Distribution',{'font':'serif', 'size':15,'weight':
↪'bold'})

# creating pie chart for city category


ax1 = fig.add_subplot(gs[0,2])
color_map = ["#3A7089", "#4b4b4c",'#99AEBB']
ax1.pie(df['City_Category'].value_counts().values,labels = df['City_Category'].
↪value_counts().index,autopct = '%.1f%%',

shadow = True,colors = color_map,textprops={'fontsize': 13, 'color': 'black'})


#setting title for visual
ax1.set_title('City Category Distribution',{'font':'serif', 'size':15,'weight':
↪'bold'})

plt.show()

Insights:
1. Gender Distribution - Data indicates a significant disparity in purchase behavior between

9
males and females during the Black Friday event.
2. Marital Status - Given that unmarried customers account for a higher percentage of trans-
actions, it may be worthwhile to consider specific marketing campaigns or promotions that
appeal to this group.
3. City Category - City B saw the most number of transactions followed by City C and City A
respectively
3.2.2 Customer Age Distribution
[ ]: #setting the plot style
fig = plt.figure(figsize = (15,7))
gs = fig.add_gridspec(1,2,width_ratios=[0.6, 0.4])
# creating bar chart for age disribution

ax0 = fig.add_subplot(gs[0,0])
temp = df['Age'].value_counts()
color_map = ["#3A7089",␣
↪"#4b4b4c",'#99AEBB','#5C8374','#6F7597','#7A9D54','#9EB384']

ax0.bar(x=temp.index,height = temp.values,color = color_map,zorder = 2)


#adding the value_counts
for i in temp.index:
ax0.text(i,temp[i]+5000,temp[i],{'font':'serif','size' : 10},ha =␣
↪'center',va = 'center')

#adding grid lines


ax0.grid(color = 'black',linestyle = '--', axis = 'y', zorder = 0, dashes =␣
↪(5,10))

#removing the axis lines


for s in ['top','left','right']:
ax0.spines[s].set_visible(False)

#adding axis label


ax0.set_ylabel('Count',fontweight = 'bold',fontsize = 12)
ax0.set_xlabel('Age Group',fontweight = 'bold',fontsize = 12)
ax0.set_xticklabels(temp.index,fontweight = 'bold')
#creating a info table for age

ax1 = fig.add_subplot(gs[0,1])
age_info = age_info =␣
↪[['26-35','40%'],['36-45','20%'],['18-25','18%'],['46-50','8%'],['51-55','7%'],['55+','4%'],

['0-17','3%']]
color_2d =␣
↪[["#3A7089",'#FFFFFF'],["#4b4b4c",'#FFFFFF'],['#99AEBB','#FFFFFF'],['#5C8374','#FFFFFF'],['#

['#7A9D54','#FFFFFF'],['#9EB384','#FFFFFF']]
table = ax1.table(cellText = age_info, cellColours=color_2d,␣
↪cellLoc='center',colLabels =['Age Group','Percent Dist.'],

colLoc = 'center',bbox =[0, 0, 1, 1])

10
table.set_fontsize(15)
#removing axis
ax1.axis('off')
#setting title for visual
fig.suptitle('Customer Age Distribution',font = 'serif', size = 18, weight =␣
↪'bold')

plt.show()

Insights:
The age group of 26-35 represents the largest share of Walmart’s Black Friday sales, accounting
for 40% of the sales. This suggests that the young and middle-aged adults are the most active and
interested in shopping for deals and discounts .
The 36-45 and 18-25 age groups are the second and third largest segments, respectively, with 20%
and 18% of the sales. This indicates that Walmart has a diverse customer base that covers different
life stages and preferences..
The 46-50, 51-55, 55+, and 0-17 age groups are the smallest customer segments , with less than 10%
of the total sales each. This implies that Walmart may need to improve its marketing strategies
and product offerings to attract more customers from these age groups, especially the seniors and
the children.
3.2.3 Customer Stay In current City Distribution
[ ]: #setting the plot style
fig = plt.figure(figsize = (15,7))
gs = fig.add_gridspec(1,2,width_ratios=[0.6, 0.4])
# creating bar chart for Customer Stay In current City

11
ax1 = fig.add_subplot(gs[0,0])
temp = df['Stay_In_Current_City_Years'].value_counts()
color_map = ["#3A7089", "#4b4b4c",'#99AEBB','#5C8374','#6F7597']
ax1.bar(x=temp.index,height = temp.values,color = color_map,zorder = 2,width =␣
↪0.6)

#adding the value_counts


for i in temp.index:
ax1.text(i,temp[i]+4000,temp[i],{'font':'serif','size' : 10},ha =␣
↪'center',va = 'center')

#adding grid lines


ax1.grid(color = 'black',linestyle = '--', axis = 'y', zorder = 0, dashes =␣
↪(5,10))

#removing the axis lines


for s in ['top','left','right']:
ax1.spines[s].set_visible(False)

#adding axis label


ax1.set_ylabel('Count',fontweight = 'bold',fontsize = 12)
ax1.set_xlabel('Stay in Years',fontweight = 'bold',fontsize = 12)
ax1.set_xticklabels(temp.index,fontweight = 'bold')
#creating a info table for Customer Stay In current City

ax2 = fig.add_subplot(gs[0,1])
stay_info = [['1','35%'],['2','19%'],['3','17%'],['4+','15%'],['0','14%']]
color_2d =␣
↪[["#3A7089",'#FFFFFF'],["#4b4b4c",'#FFFFFF'],['#99AEBB','#FFFFFF'],['#5C8374','#FFFFFF'],['#

table = ax2.table(cellText = stay_info, cellColours=color_2d,␣


↪cellLoc='center',colLabels =['Stay in Years','Percent Dist.'],

colLoc = 'center',bbox =[0, 0, 1, 1])


table.set_fontsize(15)
#removing axis
ax2.axis('off')
#setting title for visual
fig.suptitle('Customer Current City Stay Distribution',font = 'serif', size =␣
↪18, weight = 'bold')

plt.show()

12
Insights:
The data suggests that the customers are either new to the city or move frequently, and may have
different preferences and needs than long-term residents.
The majority of the customers (49%) have stayed in the current city for one year or less . This
suggests that Walmart has a strong appeal to newcomers who may be looking for affordable and
convenient shopping options.
4+ years category (14%) customers indicates that Walmart has a loyal customer base who have
been living in the same city for a long time.
The percentage of customers decreases as the stay in the current city increases which suggests that
Walmart may benefit from targeting long-term residents for loyalty programs and promotions .
3.2.4 Top 10 Products and Categories:
Sales Snapshot Top 10 Products and Product Categories which has sold most during Black Friday
Sales
[ ]: #setting the plot style
fig = plt.figure(figsize = (15,6))
gs = fig.add_gridspec(1,2)
#Top 10 Product_ID Sales
ax = fig.add_subplot(gs[0,0])
temp = df['Product_ID'].value_counts()[0:10]
# reversing the list
temp = temp.iloc[-1:-11:-1]
color_map = ['#99AEBB' for i in range(7)] + ["#3A7089" for i in range(3)]
#creating the plot
ax.barh(y = temp.index,width = temp.values,height = 0.2,color = color_map)

13
ax.scatter(y = temp.index, x = temp.values, s = 150 , color = color_map )
#removing x-axis
ax.set_xticks([])
#adding label to each bar
for y,x in zip(temp.index,temp.values):
ax.text( x + 50 , y , x,{'font':'serif', 'size':10,'weight':
↪'bold'},va='center')

#removing the axis lines


for s in ['top','bottom','right']:
ax.spines[s].set_visible(False)

#adding axis labels


ax.set_xlabel('Units Sold',{'font':'serif', 'size':10,'weight':'bold'})
ax.set_ylabel('Product ID',{'font':'serif', 'size':12,'weight':'bold'})
#creating the title
ax.set_title('Top 10 Product_ID with Maximum Sales',
{'font':'serif', 'size':15,'weight':'bold'})
#Top 10 Product Category Sales
ax = fig.add_subplot(gs[0,1])
temp = df['Product_Category'].value_counts()[0:10]
# reversing the list
temp = temp.iloc[-1:-11:-1]
#creating the plot
ax.barh(y = temp.index,width = temp.values,height = 0.2,color = color_map)
ax.scatter(y = temp.index, x = temp.values, s = 150 , color = color_map )
#removing x-axis
ax.set_xticks([])
#adding label to each bar
for y,x in zip(temp.index,temp.values):
ax.text( x + 5000 , y , x,{'font':'serif', 'size':10,'weight':
↪'bold'},va='center')

#removing the axis lines


for s in ['top','bottom','right']:
ax.spines[s].set_visible(False)

#adding axis labels


ax.set_xlabel('Units Sold',{'font':'serif', 'size':12,'weight':'bold'})
ax.set_ylabel('Product Category',{'font':'serif', 'size':12,'weight':'bold'})
#creating the title
ax.set_title('Top 10 Product Category with Maximum Sales',
{'font':'serif', 'size':15,'weight':'bold'})
plt.show()

14
Insights:
1. Top 10 Products Sold - The top-selling products during Walmart’s Black Friday sales are
characterized by a relatively small variation in sales numbers, suggesting that Walmart offers
a variety of products that many different customers like to buy.
2. Top 10 Product Categories - Categories 5,1 and 8 have significantly outperformed other
categories with combined Sales of nearly 75% of the total sales suggesting a strong preference
for these products among customers.
3.2.5 Top 10 Customer Occupation
Top 10 Occupation of Customer in Black Friday Sales
[ ]: temp = df['Occupation'].value_counts()[0:10]
#setting the plot style
fig,ax = plt.subplots(figsize = (13,6))
color_map = ["#3A7089" for i in range(3)] + ['#99AEBB' for i in range(7)]
#creating the plot
ax.bar(temp.index,temp.values, color = color_map, zorder = 2)
#adding valuecounts
for x,y in zip(temp.index,temp.values):
ax.text(x, y + 2000, y,{'font':'serif', 'size':10,'weight':
↪'bold'},va='center',ha = 'center')

#setting grid style


ax.grid(color = 'black',linestyle = '--',axis = 'y',zorder = 0,dashes = (5,10))
#customizing the axis labels
ax.set_xticklabels(temp.index,fontweight = 'bold',fontfamily='serif')
ax.set_xlabel('Occupation Category',{'font':'serif', 'size':12,'weight':'bold'})
ax.set_ylabel('Count',{'font':'serif', 'size':12,'weight':'bold'})
#removing the axis lines
for s in ['top','left','right']:
ax.spines[s].set_visible(False)

15
#adding title to the visual
ax.set_title('Top 10 Occupation of Customers',
{'font':'serif', 'size':15,'weight':'bold'})
plt.show()

Insights:
Customers with Occupation category 4,0 and 7 contributed significantly i.e. almost 37% of the total
purchases suggesting that these occupations have a high demand for Walmart products or services,
or that they have more disposable income to spend on Black Friday.
4.Bivariate Analysis:
4.1 Exploring Purchase Patterns
[ ]: #setting the plot style
fig = plt.figure(figsize = (15,20))
gs = fig.add_gridspec(3,2)
for i,j,k in␣
↪[(0,0,'Gender'),(0,1,'City_Category'),(1,0,'Marital_Status'),(1,1,'Stay_In_Current_City_Year

#plot position
if i <= 1:
ax0 = fig.add_subplot(gs[i,j])
else:
ax0 = fig.add_subplot(gs[i,:])

#plot

16
color_map = ["#3A7089",␣
↪"#4b4b4c",'#99AEBB','#5C8374','#6F7597','#7A9D54','#9EB384']

sns.boxplot(data = df, x = k, y = 'Purchase' ,ax = ax0,width = 0.5, palette␣


↪=color_map)

#plot title
ax0.set_title(f'Purchase Amount Vs {k}',{'font':'serif', 'size':12,'weight':
↪'bold'})

#customizing axis
ax0.set_xticklabels(df[k].unique(),fontweight = 'bold',fontsize = 12)
ax0.set_ylabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
ax0.set_xlabel('')

plt.show()

17
Insights:
Out of all the variables analysed above, it’s noteworthy that the purchase amount remains relatively
stable regardless of the variable under consideration. As indicated in the data, the median purchase
amount consistently hovers around 8,000 USD , regardless of the specific variable being examined.

18
5. Gender vs Purchase Amount:
5.1 Data Visualization:
[ ]: #creating a df for purchase amount vs gender
temp = df.groupby('Gender')['Purchase'].agg(['sum','count']).reset_index()
#calculating the amount in billions
temp['sum_in_billions'] = round(temp['sum'] / 10**9,2)
#calculationg percentage distribution of purchase amount
temp['%sum'] = round(temp['sum']/temp['sum'].sum(),3)
#calculationg per purchase amount
temp['per_purchase'] = round(temp['sum']/temp['count'])
#renaming the gender
temp['Gender'] = temp['Gender'].replace({'F':'Female','M':'Male'})
temp

[ ]: Gender sum count sum_in_billions %sum per_purchase


0 Female 1186232642 135809 1.19 0.233 8735.0
1 Male 3909580100 414259 3.91 0.767 9438.0

[ ]: #setting the plot style


fig = plt.figure(figsize = (15,14))
gs = fig.add_gridspec(3,2,height_ratios =[0.10,0.4,0.5])
#Distribution of Purchase Amount
ax = fig.add_subplot(gs[0,:])
#plotting the visual
ax.barh(temp.loc[0,'Gender'],width = temp.loc[0,'%sum'],color = "#3A7089",label␣
↪= 'Female')

ax.barh(temp.loc[0,'Gender'],width = temp.loc[1,'%sum'],left =temp.


↪loc[0,'%sum'], color = "#4b4b4c",label = 'Male' )

#inserting the text


txt = [0.0] #for left parameter in ax.text()
for i in temp.index:
#for amount
ax.text(temp.loc[i,'%sum']/2 + txt[0],0.15,f"${temp.loc[i,'sum_in_billions']}␣
↪Billion",

va = 'center', ha='center',fontsize=18, color='white')

#for gender
ax.text(temp.loc[i,'%sum']/2 + txt[0],- 0.20 ,f"{temp.loc[i,'Gender']}",
va = 'center', ha='center',fontsize=14, color='white')

txt += temp.loc[i,'%sum']

#removing the axis lines


for s in ['top','left','right','bottom']:
ax.spines[s].set_visible(False)

19
#customizing ticks
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim(0,1)
#plot title
ax.set_title('Gender-Based Purchase Amount Distribution',{'font':'serif',␣
↪'size':15,'weight':'bold'})

#Distribution of Purchase Amount per Transaction

ax1 = fig.add_subplot(gs[1,0])
color_map = ["#3A7089", "#4b4b4c"]
#plotting the visual
ax1.bar(temp['Gender'],temp['per_purchase'],color = color_map,zorder = 2,width␣
↪= 0.3)

#adding average transaction line


avg = round(df['Purchase'].mean())
ax1.axhline(y = avg, color ='red', zorder = 0,linestyle = '--')
#adding text for the line
ax1.text(0.4,avg + 300, f"Avg. Transaction Amount ${avg:.0f}",
{'font':'serif','size' : 12},ha = 'center',va = 'center')
#adjusting the ylimits
ax1.set_ylim(0,11000)
#adding the value_counts
for i in temp.index:
ax1.text(temp.loc[i,'Gender'],temp.loc[i,'per_purchase']/2,f"${temp.
↪loc[i,'per_purchase']:.0f}",

{'font':'serif','size' : 12,'color':'white','weight':'bold' },ha = 'center',va␣


↪= 'center')

#adding grid lines


ax1.grid(color = 'black',linestyle = '--', axis = 'y', zorder = 0, dashes =␣
↪(5,10))

#removing the axis lines


for s in ['top','left','right']:
ax1.spines[s].set_visible(False)

#adding axis label


ax1.set_ylabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
ax1.set_xticklabels(temp['Gender'],fontweight = 'bold',fontsize = 12)
#setting title for visual
ax1.set_title('Average Purchase Amount per Transaction',{'font':'serif', 'size':
↪15,'weight':'bold'})

# creating pie chart for gender disribution


ax2 = fig.add_subplot(gs[1,1])
color_map = ["#3A7089", "#4b4b4c"]
ax2.pie(temp['count'],labels = temp['Gender'],autopct = '%.1f%%',

20
shadow = True,colors = color_map,wedgeprops = {'linewidth':␣
↪5},textprops={'fontsize': 13, 'color': 'black'})

#setting title for visual


ax2.set_title('Gender-Based Transaction Distribution',{'font':'serif', 'size':
↪15,'weight':'bold'})

# creating kdeplot for purchase amount distribution


ax3 = fig.add_subplot(gs[2,:])
#plotting the kdeplot
sns.kdeplot(data = df, x = 'Purchase', hue = 'Gender', palette = color_map,fill␣
↪= True, alpha = 1,ax = ax3)

#removing the axis lines


for s in ['top','left','right']:
ax3.spines[s].set_visible(False)

# adjusting axis labels


ax3.set_yticks([])
ax3.set_ylabel('')
ax3.set_xlabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
#setting title for visual
ax3.set_title('Purchase Amount Distribution by Gender',{'font':'serif', 'size':
↪15,'weight':'bold'})

plt.show()

21
Insights:
1. Total Sales and Transactions Comparison The total purchase amount and number of trans-
actions by male customers was more than three times the amount and transactions by female
customers indicating that they had a more significant impact on the Black Friday sales.
2. Average Transaction Value The average purchase amount per transaction was slightly higher
for male customers than female customers ($9438 vs $8735) .
3. Distribution of Purchase Amount As seen above, the purchase amount for both the genders
is not normally distributed
5.2 Confidence Interval Construction: Estimating Average Purchase Amount per
Transaction
1. Step 1 - Building CLT Curve As seen above, the purchase amount distribution is not Normal.
So we need to use Central Limit Theorem . It states the distribution of sample means will
approximate a normal distribution, regardless of the underlying population distribution

22
2. Step 2 - Building Confidence Interval After building CLT curve, we will create a confidence
interval predicting population mean at 99%,95% and 90% Confidence level .
Note - We will use different sample sizes of [100,1000,5000,50000]

[55]: #creating a function to calculate confidence interval


def confidence_interval(data,ci):
#converting the list to series
l_ci = (100-ci)/2
u_ci = (100+ci)/2

#calculating lower limit and upper limit of confidence interval


interval = np.percentile(data,[l_ci,u_ci]).round(0)

return interval

[77]: #defining a function for plotting the visual for given confidence interval
def plot(ci):
#setting the plot style
fig = plt.figure(figsize = (15,8))
gs = fig.add_gridspec(2,2)
#creating separate data frames for each gender
df_male = df.loc[df['Gender'] == 'M','Purchase']
df_female = df.loc[df['Gender'] == 'F','Purchase']
#sample sizes and corresponding plot positions
sample_sizes = [(100,0,0),(1000,0,1),(5000,1,0),(50000,1,1)]
#number of samples to be taken from purchase amount
bootstrap_samples = 20000
male_samples = {}
female_samples = {}

for i,x,y in sample_sizes:


male_means = [] #list for collecting the means of male sample
female_means = [] #list for collecting the means of female sample
for j in range(bootstrap_samples):
#creating random 5000 samples of i sample size
male_bootstrapped_samples = np.random.choice(df_male,size = i)
female_bootstrapped_samples = np.random.choice(df_female,size = i)
#calculating mean of those samples
male_sample_mean = np.mean(male_bootstrapped_samples)
female_sample_mean = np.mean(female_bootstrapped_samples)
#appending the mean to the list
male_means.append(male_sample_mean)
female_means.append(female_sample_mean)

#storing the above sample generated


male_samples[f'{ci}%_{i}'] = male_means
female_samples[f'{ci}%_{i}'] = female_means

23
#creating a temporary dataframe for creating kdeplot
temp_df = pd.DataFrame(data = {'male_means':male_means,'female_means':
↪female_means})

#plotting kdeplots
#plot position
ax = fig.add_subplot(gs[x,y])

#plots for male and female


sns.kdeplot(data = temp_df,x = 'male_means',color ="#3A7089" ,fill =␣
↪True, alpha = 0.5,ax = ax,label = 'Male')

sns.kdeplot(data = temp_df,x = 'female_means',color ="#4b4b4c" ,fill =␣


↪True, alpha = 0.5,ax = ax,label = 'Female')

#calculating confidence intervals for given confidence level(ci)


m_range = confidence_interval(male_means,ci)
f_range = confidence_interval(female_means,ci)
#plotting confidence interval on the distribution
for k in m_range:
ax.axvline(x = k,ymax = 0.9, color ="#3A7089",linestyle = '--')
for k in f_range:
ax.axvline(x = k,ymax = 0.9, color ="#4b4b4c",linestyle = '--')
#removing the axis lines
for s in ['top','left','right']:
ax.spines[s].set_visible(False)
# adjusting axis labels
ax.set_yticks([])
ax.set_ylabel('')
ax.set_xlabel('')
#setting title for visual
ax.set_title(f'CLT Curve for Sample Size = {i}',{'font':'serif', 'size':
↪11,'weight':'bold'})

plt.legend()

#setting title for visual


fig.suptitle(f'{ci}% Confidence Interval',font = 'serif', size = 18, weight␣
↪= 'bold')

plt.show()

return male_samples,female_samples

[78]: m_samp_90,f_samp_90 = plot(90)

24
[79]: m_samp_95,f_samp_95 = plot(95)

25
[80]: m_samp_99,f_samp_99 = plot(99)

Are confidence intervals of average male and female spending overlapping?


[83]: fig = plt.figure(figsize = (20,10))
gs = fig.add_gridspec(3,1)
for i,j,k,l in␣
↪[(m_samp_90,f_samp_90,90,0),(m_samp_95,f_samp_95,95,1),(m_samp_99,f_samp_99,99,2)]:

#list for collecting ci for given cl


m_ci = ['Male']
f_ci = ['Female']

#finding ci for each sample size (males)


for m in i:
m_range = confidence_interval(i[m],k)
m_ci.append(f"CI = ${m_range[0]:.0f} - ${m_range[1]:.0f}, Range =␣
↪{(m_range[1] - m_range[0]):.0f}")

#finding ci for each sample size (females)


for f in j:
f_range = confidence_interval(j[f],k)
f_ci.append(f"CI = ${f_range[0]:.0f} - ${f_range[1]:.0f}, Range =␣
↪{(f_range[1] - f_range[0]):.0f}")

26
#plotting the summary
ax = fig.add_subplot(gs[l])

#contents of the table


ci_info = [m_ci,f_ci]

#plotting the table


table = ax.table(cellText = ci_info, cellLoc='center',
colLabels =['Gender','Sample Size = 100','Sample Size =␣
↪1000','Sample Size = 5000','Sample Size = 50000'],

colLoc = 'center',colWidths = [0.05,0.2375,0.2375,0.


↪2375,0.2375],bbox =[0, 0, 1, 1])

table.set_fontsize(13)
#removing axis
ax.axis('off')

#setting title
ax.set_title(f"{k}% Confidence Interval Summary",{'font':'serif', 'size':
↪14,'weight':'bold'})

Insights:
1. Sample Size The analysis highlights the importance of sample size in estimating population
parameters. It suggests that as the sample size increases, the confidence intervals become
narrower and more precise . In business, this implies that larger sample sizes can provide
more reliable insights and estimates.
2. Confidence Intervals From the above analysis, we can see that except for the Sample Size
of 100, the confidence interval do not overlap as the sample size increases. This means that

27
there is a statistically significant difference between the average spending per transaction for
men and women within the given samples.
3. Population Average We are 95% confident that the true population average for males falls
between $9,393 and $9,483 , and for females , it falls between $8,692 and $8,777 .
4. Women spend less Men tend to spend more money per transaction on average than women
, as the upper bounds of the confidence intervals for men are consistently higher than those
for women across different sample sizes.
5. How can Walmart leverage this conclusion to make changes or improvements?
5.1. Segmentation Opportunities Walmart can create targeted marketing campaigns, loyalty pro-
grams, or product bundles to cater to the distinct spending behaviors of male and female customers.
This approach may help maximize revenue from each customer segment.
5.2. Pricing Strategies Based on the above data of average spending per transaction by gender, they
might adjust pricing or discount strategies to incentivize higher spending among male customers
while ensuring competitive pricing for female-oriented products.
Note Moving forward in our analysis, we will use 95% Confidence Level only.
6. Marital Staus vs Purchase Amount:
6.1. Data Visulaisation
[84]: #creating a df for purchase amount vs marital status
temp = df.groupby('Marital_Status')['Purchase'].agg(['sum','count']).
↪reset_index()

#calculating the amount in billions


temp['sum_in_billions'] = round(temp['sum'] / 10**9,2)
#calculationg percentage distribution of purchase amount
temp['%sum'] = round(temp['sum']/temp['sum'].sum(),3)
#calculationg per purchase amount
temp['per_purchase'] = round(temp['sum']/temp['count'])
temp

[84]: Marital_Status sum count sum_in_billions %sum per_purchase


0 Unmarried 3008927447 324731 3.01 0.59 9266.0
1 Married 2086885295 225337 2.09 0.41 9261.0

[85]: #setting the plot style


fig = plt.figure(figsize = (15,14))
gs = fig.add_gridspec(3,2,height_ratios =[0.10,0.4,0.5])
#Distribution of Purchase Amount
ax = fig.add_subplot(gs[0,:])
#plotting the visual
ax.barh(temp.loc[0,'Marital_Status'],width = temp.loc[0,'%sum'],color =␣
↪"#3A7089",label = 'Unmarried')

ax.barh(temp.loc[0,'Marital_Status'],width = temp.loc[1,'%sum'],left =temp.


↪loc[0,'%sum'], color = "#4b4b4c",label = 'Married')

28
#inserting the text
txt = [0.0] #for left parameter in ax.text()
for i in temp.index:
#for amount
ax.text(temp.loc[i,'%sum']/2 + txt[0],0.15,f"${temp.
↪loc[i,'sum_in_billions']} Billion",

va = 'center', ha='center',fontsize=18, color='white')

#for marital status


ax.text(temp.loc[i,'%sum']/2 + txt[0],- 0.20 ,f"{temp.
↪loc[i,'Marital_Status']}",

va = 'center', ha='center',fontsize=14, color='white')

txt += temp.loc[i,'%sum']

#removing the axis lines


for s in ['top','left','right','bottom']:
ax.spines[s].set_visible(False)

#customizing ticks
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim(0,1)
#plot title
ax.set_title('Marital_Status-Based Purchase Amount Distribution',{'font':
↪'serif', 'size':15,'weight':'bold'})

#Distribution of Purchase Amount per Transaction

ax1 = fig.add_subplot(gs[1,0])
color_map = ["#3A7089", "#4b4b4c"]
#plotting the visual
ax1.bar(temp['Marital_Status'],temp['per_purchase'],color = color_map,zorder =␣
↪2,width = 0.3)

#adding average transaction line


avg = round(df['Purchase'].mean())
ax1.axhline(y = avg, color ='red', zorder = 0,linestyle = '--')
#adding text for the line
ax1.text(0.4,avg + 300, f"Avg. Transaction Amount ${avg:.0f}",
{'font':'serif','size' : 12},ha = 'center',va = 'center')
#adjusting the ylimits
ax1.set_ylim(0,11000)
#adding the value_counts
for i in temp.index:
ax1.text(temp.loc[i,'Marital_Status'],temp.loc[i,'per_purchase']/
↪2,f"${temp.loc[i,'per_purchase']:.0f}",

{'font':'serif','size' : 12,'color':'white','weight':'bold' },ha =␣


↪'center',va = 'center')

29
#adding grid lines
ax1.grid(color = 'black',linestyle = '--', axis = 'y', zorder = 0, dashes =␣
↪(5,10))

#removing the axis lines


for s in ['top','left','right']:
ax1.spines[s].set_visible(False)

#adding axis label


ax1.set_ylabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
ax1.set_xticklabels(temp['Marital_Status'],fontweight = 'bold',fontsize = 12)
#setting title for visual
ax1.set_title('Average Purchase Amount per Transaction',{'font':'serif', 'size':
↪15,'weight':'bold'})

# creating pie chart for Marital_Status disribution


ax2 = fig.add_subplot(gs[1,1])
color_map = ["#3A7089", "#4b4b4c"]
ax2.pie(temp['count'],labels = temp['Marital_Status'],autopct = '%.1f%%',
shadow = True,colors = color_map,wedgeprops = {'linewidth':␣
↪5},textprops={'fontsize': 13, 'color': 'black'})

#setting title for visual


ax2.set_title('Marital_Status-Based Transaction Distribution',{'font':'serif',␣
↪'size':15,'weight':'bold'})

# creating kdeplot for purchase amount distribution


ax3 = fig.add_subplot(gs[2,:])
color_map = [ "#4b4b4c","#3A7089"]
#plotting the kdeplot
sns.kdeplot(data = df, x = 'Purchase', hue = 'Marital_Status', palette =␣
↪color_map,fill = True, alpha = 1,

ax = ax3,hue_order = ['Married','Unmarried'])
#removing the axis lines
for s in ['top','left','right']:
ax3.spines[s].set_visible(False)

# adjusting axis labels


ax3.set_yticks([])
ax3.set_ylabel('')
ax3.set_xlabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
#setting title for visual
ax3.set_title('Purchase Amount Distribution by Marital_Status',{'font':'serif',␣
↪'size':15,'weight':'bold'})

plt.show()

30
Insights:
1. Total Sales and Transactions Comparison The total purchase amount and number of transac-
tions by Unmarried customers was more than 20% the amount and transactions by married
customers indicating that they had a more significant impact on the Black Friday sales.
2. Average Transaction Value The average purchase amount per transaction was almost similar
for married and unmarried customers ($9261 vs $9266) .
3. Distribution of Purchase Amount As seen above, the purchase amount for both married and
unmarried customers is not normally distributed
7. Customer Age VS Purchase Amount:
7.1 Data Visualization
[86]: #creating a df for purchase amount vs age group
temp = df.groupby('Age')['Purchase'].agg(['sum','count']).reset_index()

31
#calculating the amount in billions
temp['sum_in_billions'] = round(temp['sum'] / 10**9,2)
#calculationg percentage distribution of purchase amount
temp['%sum'] = round(temp['sum']/temp['sum'].sum(),3)
#calculationg per purchase amount
temp['per_purchase'] = round(temp['sum']/temp['count'])
temp

[86]: Age sum count sum_in_billions %sum per_purchase


0 0-17 134913183 15102 0.13 0.026 8933.0
1 18-25 913848675 99660 0.91 0.179 9170.0
2 26-35 2031770578 219587 2.03 0.399 9253.0
3 36-45 1026569884 110013 1.03 0.201 9331.0
4 46-50 420843403 45701 0.42 0.083 9209.0
5 51-55 367099644 38501 0.37 0.072 9535.0
6 55+ 200767375 21504 0.20 0.039 9336.0

[87]: #setting the plot style


fig = plt.figure(figsize = (20,14))
gs = fig.add_gridspec(3,1,height_ratios =[0.10,0.4,0.5])
#Distribution of Purchase Amount
ax = fig.add_subplot(gs[0])
color_map = ["#3A7089",␣
↪"#4b4b4c",'#99AEBB','#5C8374','#6F7597','#7A9D54','#9EB384']

#plotting the visual


left = 0
for i in temp.index:
ax.barh(temp.loc[0,'Age'],width = temp.loc[i,'%sum'],left = left,color =␣
↪color_map[i],label = temp.loc[i,'Age'])

left += temp.loc[i,'%sum']
#inserting the text
txt = 0.0 #for left parameter in ax.text()
for i in temp.index:
#for amount
ax.text(temp.loc[i,'%sum']/2 + txt,0.15,f"{temp.loc[i,'sum_in_billions']}B",
va = 'center', ha='center',fontsize=14, color='white')

#for age grp


ax.text(temp.loc[i,'%sum']/2 + txt,- 0.20 ,f"{temp.loc[i,'Age']}",
va = 'center', ha='center',fontsize=12, color='white')

txt += temp.loc[i,'%sum']

#removing the axis lines


for s in ['top','left','right','bottom']:
ax.spines[s].set_visible(False)

32
#customizing ticks
ax.set_xticks([])
ax.set_yticks([])
ax.set_xlim(0,1)
#plot title
ax.set_title('Age Group Purchase Amount Distribution',{'font':'serif', 'size':
↪15,'weight':'bold'})

#Distribution of Purchase Amount per Transaction

ax1 = fig.add_subplot(gs[1])
#plotting the visual
ax1.bar(temp['Age'],temp['per_purchase'],color = color_map,zorder = 2,width = 0.
↪3)

#adding average transaction line


avg = round(df['Purchase'].mean())
ax1.axhline(y = avg, color ='red', zorder = 0,linestyle = '--')
#adding text for the line
ax1.text(0.4,avg + 300, f"Avg. Transaction Amount ${avg:.0f}",
{'font':'serif','size' : 12},ha = 'center',va = 'center')
#adjusting the ylimits
ax1.set_ylim(0,11000)
#adding the value_counts
for i in temp.index:
ax1.text(temp.loc[i,'Age'],temp.loc[i,'per_purchase']/2,f"${temp.
↪loc[i,'per_purchase']:.0f}",

{'font':'serif','size' : 12,'color':'white','weight':'bold' },ha =␣


↪'center',va = 'center')

#adding grid lines


ax1.grid(color = 'black',linestyle = '--', axis = 'y', zorder = 0, dashes =␣
↪(5,10))

#removing the axis lines


for s in ['top','left','right']:
ax1.spines[s].set_visible(False)

#adding axis label


ax1.set_ylabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
ax1.set_xticklabels(temp['Age'],fontweight = 'bold',fontsize = 12)
#setting title for visual
ax1.set_title('Average Purchase Amount per Transaction',{'font':'serif', 'size':
↪15,'weight':'bold'})

# creating kdeplot for purchase amount distribution


ax3 = fig.add_subplot(gs[2,:])
#plotting the kdeplot
sns.kdeplot(data = df, x = 'Purchase', hue = 'Age', palette = color_map,fill =␣
↪True, alpha = 0.5,

33
ax = ax3)
#removing the axis lines
for s in ['top','left','right']:
ax3.spines[s].set_visible(False)

# adjusting axis labels


ax3.set_yticks([])
ax3.set_ylabel('')
ax3.set_xlabel('Purchase Amount',fontweight = 'bold',fontsize = 12)
#setting title for visual
ax3.set_title('Purchase Amount Distribution by Age Group',{'font':'serif',␣
↪'size':15,'weight':'bold'})

plt.show()

Insights:
1. Total Sales Comparison Age group between 26 - 45 accounts to almost 60% of the total sales
suggesting that Walmart’s Black Friday sales are most popular among these age groups. The
age group 0-17 has the lowest sales percentage (2.6%) , which is expected as they may not
have as much purchasing power. Understanding their preferences
2. Average Transaction Value While there is not a significant difference in per purchase spending
among the age groups, the 51-55 age group has a relatively low sales percentage (7.2%) but

34
they have the highest per purchase spending at 9535 . Walmart could consider strategies to
attract and retain this high-spending demographic.
3. Distribution of Purchase Amount As seen above, the purchase amount for all age groups is
not normally distributed
******

35

You might also like