Exploratory Data Analysis
Exploratory Data Analysis
[3]: df=pd.read_csv("/kaggle/input/customer-shopping-dataset/customer_shopping_data.
↪csv")
[4]: df.head()
[5]: df.tail()
1
99456 Credit Card 15/10/2022 Mall of Istanbul
[6]: df.shape
[7]: df.size
[7]: 994570
[8]: df.isnull().sum()
[8]: invoice_no 0
customer_id 0
gender 0
age 0
category 0
quantity 0
price 0
payment_method 0
invoice_date 0
shopping_mall 0
dtype: int64
[9]: df.duplicated().value_counts()
[10]: df.columns
[11]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99457 entries, 0 to 99456
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 invoice_no 99457 non-null object
1 customer_id 99457 non-null object
2 gender 99457 non-null object
3 age 99457 non-null int64
4 category 99457 non-null object
5 quantity 99457 non-null int64
6 price 99457 non-null float64
2
7 payment_method 99457 non-null object
8 invoice_date 99457 non-null object
9 shopping_mall 99457 non-null object
dtypes: float64(1), int64(2), object(7)
memory usage: 7.6+ MB
[12]: df['invoice_date']=df['invoice_date'].apply(pd.to_datetime)
[13]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 99457 entries, 0 to 99456
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 invoice_no 99457 non-null object
1 customer_id 99457 non-null object
2 gender 99457 non-null object
3 age 99457 non-null int64
4 category 99457 non-null object
5 quantity 99457 non-null int64
6 price 99457 non-null float64
7 payment_method 99457 non-null object
8 invoice_date 99457 non-null datetime64[ns]
9 shopping_mall 99457 non-null object
dtypes: datetime64[ns](1), float64(1), int64(2), object(6)
memory usage: 7.6+ MB
[14]: df.nunique()
3
ax.set_title('Customer Gender Distribution')
plt.show()
4
plt.show()
plt.show()
5
[64]: # create a horizontal bar chart to visualize the average quantity for each␣
↪category
category_mean = df.groupby('category')['price'].mean().
↪sort_values(ascending=False)
plt.figure(figsize=(8,4))
category_mean.plot(kind='barh', color='orange')
plt.xlabel('Price')
plt.ylabel('Category')
plt.title('Average Price by Category')
6
plt.show()
plt.show()
7
[37]: # get the count of each shopping mall
mall_count = df['shopping_mall'].value_counts()
plt.show()
8
[48]: # create a scatter plot to visualize the relationship between price and quantity
plt.scatter(x='price', y='quantity', data=df)
plt.xlabel('Price')
plt.ylabel('Quantity')
plt.title('Price vs Quantity')
plt.show()
9
[ ]:
10