0% found this document useful (0 votes)

36 views17 pages

Exploratory Data Analysis66

The document analyzes customer data from a shop including gender, age, income, spending score, profession, work experience, and family size. Various visualizations are created to segment and analyze customer attributes and their relationship to spending score, including count plots, histograms, line plots, hexbin plots, bar plots, box plots, and violin plots. Insights on customer demographics, income distribution, spending behavior by subgroup are explored.

Uploaded by

Rishi Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views17 pages

Exploratory Data Analysis66

Uploaded by

Rishi Sahu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

shop-customer-data-analysis

March 16, 2023

[1]: #importing necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

[2]: #loading the dataset

df = pd.read_csv("/kaggle/input/customers-dataset/Customers.csv")

[3]: #extracting first-five rows

df.head()

[3]: CustomerID Gender Age Annual Income ($) Spending Score (1-100) \
0 1 Male 19 15000 39
1 2 Male 21 35000 81
2 3 Female 20 86000 6
3 4 Female 23 59000 77
4 5 Female 31 38000 40

Profession Work Experience Family Size

0 Healthcare 1 4
1 Engineer 3 3
2 Engineer 1 1
3 Lawyer 0 2
4 Entertainment 2 6

[4]: #extracting last-five rows

df.tail()

[4]: CustomerID Gender Age Annual Income ($) Spending Score (1-100) \
1995 1996 Female 71 184387 40
1996 1997 Female 91 73158 32
1997 1998 Male 87 90961 14
1998 1999 Male 77 182109 4
1999 2000 Male 90 110610 52

Profession Work Experience Family Size

1995 Artist 8 7

1
1996 Doctor 7 7
1997 Healthcare 9 2
1998 Executive 7 2
1999 Entertainment 5 2

[5]: #determining the shape

df.shape

[5]: (2000, 8)

[6]: #determining the size

df.size

[6]: 16000

[7]: #checking the null values

df.isnull().sum()

[7]: CustomerID 0
Gender 0
Age 0
Annual Income ($) 0
Spending Score (1-100) 0
Profession 35
Work Experience 0
Family Size 0
dtype: int64

[8]: #determining mode of 'Profession' column

df["Profession"].mode()

[8]: 0 Artist
dtype: object

[9]: #replacing null values with mode

df["Profession"].fillna("Artist", inplace=True)

[10]: # checking the duplicates

df.duplicated().value_counts()

[10]: False 2000

dtype: int64

[11]: #checking the information

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 8 columns):

2
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 2000 non-null int64
1 Gender 2000 non-null object
2 Age 2000 non-null int64
3 Annual Income ($) 2000 non-null int64
4 Spending Score (1-100) 2000 non-null int64
5 Profession 2000 non-null object
6 Work Experience 2000 non-null int64
7 Family Size 2000 non-null int64
dtypes: int64(6), object(2)
memory usage: 125.1+ KB

[12]: #extracting statistical summary

df.describe()

[12]: CustomerID Age Annual Income ($) Spending Score (1-100) \

count 2000.000000 2000.000000 2000.000000 2000.000000
mean 1000.500000 48.960000 110731.821500 50.962500
std 577.494589 28.429747 45739.536688 27.934661
min 1.000000 0.000000 0.000000 0.000000
25% 500.750000 25.000000 74572.000000 28.000000
50% 1000.500000 48.000000 110045.000000 50.000000
75% 1500.250000 73.000000 149092.750000 75.000000
max 2000.000000 99.000000 189974.000000 100.000000

Work Experience Family Size

count 2000.000000 2000.000000
mean 4.102500 3.768500
std 3.922204 1.970749
min 0.000000 1.000000
25% 1.000000 2.000000
50% 3.000000 4.000000
75% 7.000000 5.000000
max 17.000000 9.000000

[13]: #creating the pairplot

sns.pairplot(df.drop("CustomerID", axis=1))

[13]: <seaborn.axisgrid.PairGrid at 0x7f21431e3c90>

3
[14]: # segment customers by gender
sns.countplot(x='Gender', data=df)
plt.title('Customer Gender Distribution')
plt.show()

4
[28]: # segment customers by age
sns.histplot(x='Age', data=df, color='purple', bins=20)
plt.title('Customer Age Distribution')
plt.show()

5
[29]: # segment by income
sns.kdeplot(x='Annual Income ($)', data=df, color="green", fill=True)
plt.title('Income Distribution')
plt.show()

6
[17]: # segment customers by profession
sns.countplot(x='Profession', data=df)
plt.xticks(rotation=45)
plt.title('Customer Profession Distribution')
plt.show()

7
[30]: # segment customers by work experience
sns.kdeplot(x='Work Experience', data=df, color='red', fill=True)
plt.title('Work Experience Distribution')
plt.show()

8
[19]: # segment customers by family size
sns.countplot(x='Family Size', data=df)
plt.title('Customer Family Size Distribution')
plt.show()

9
[20]: # spending score by gender
sns.violinplot(x='Gender', y='Spending Score (1-100)', data=df)
plt.title('Spending Score by Gender')
plt.show()

10
[31]: # spending behavior by age
sns.lineplot(x='Age', y='Spending Score (1-100)', color="orange", data=df)
plt.title('Spending Score by Age')
plt.show()

11
[22]: # analyze spending behavior by age and gender
sns.lineplot(x='Age', y='Spending Score (1-100)', hue='Gender', data=df)
plt.title('Spending Score by Age and Gender')
plt.show()

12
[23]: # spending behavior by income
plt.hexbin(x='Annual Income ($)', y='Spending Score (1-100)', data=df,␣
↪gridsize=20, cmap='Blues')

plt.xlabel('Annual Income ($)')

plt.xticks(rotation=45)
plt.ylabel('Spending Score (1-100)')
plt.title('Spending Score by Income')
plt.colorbar()
plt.show()

13
[24]: # spending behavior by profession
sns.barplot(x='Profession', y='Spending Score (1-100)', data=df)
plt.xticks(rotation=45)
plt.title('Spending Score by Profession')
plt.show()

14
[32]: # spending behavior by work experience
sns.boxplot(x='Work Experience', y='Spending Score (1-100)', data=df)
plt.title('Spending Score by Work Experience')
plt.show()

15
[26]: # spending behavior by family size
sns.violinplot(x='Family Size', y='Spending Score (1-100)', data=df)
plt.title('Spending Score by Family Size')
plt.show()

16
[ ]:

Seaborn Final
No ratings yet
Seaborn Final
67 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Data Science
No ratings yet
Data Science
22 pages
CH2 Descriptive Analytics QA PDF
No ratings yet
CH2 Descriptive Analytics QA PDF
25 pages
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
KMEANS
No ratings yet
KMEANS
13 pages
Customer Retail Shopping Analysis 1686591558
No ratings yet
Customer Retail Shopping Analysis 1686591558
45 pages
Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts
No ratings yet
Data Visualization: Types of Data Visualization: Charts and Graphs Line Charts
15 pages
ALOJIPAN Assessment_Task_1_Sampling_Data_Visualization
No ratings yet
ALOJIPAN Assessment_Task_1_Sampling_Data_Visualization
12 pages
Customer Segmentation With K-Means and RMF
No ratings yet
Customer Segmentation With K-Means and RMF
13 pages
Data Science Project VI - Ipynb - Colaboratory
No ratings yet
Data Science Project VI - Ipynb - Colaboratory
15 pages
prac2
No ratings yet
prac2
11 pages
03-01-25
No ratings yet
03-01-25
8 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
Unit 3-5 15 Marks
No ratings yet
Unit 3-5 15 Marks
8 pages
Assignment ....
No ratings yet
Assignment ....
8 pages
Case Study 3 Aman
No ratings yet
Case Study 3 Aman
9 pages
Diwali Sales Analysis
No ratings yet
Diwali Sales Analysis
14 pages
ADS2
No ratings yet
ADS2
3 pages
CSTSGTCODE
No ratings yet
CSTSGTCODE
3 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Code
No ratings yet
Code
5 pages
DBMS Lab Manual
No ratings yet
DBMS Lab Manual
28 pages
VTU Exam Question Paper With Solution of 18CS53 Database Management System March-2021-Dr. Anand R
No ratings yet
VTU Exam Question Paper With Solution of 18CS53 Database Management System March-2021-Dr. Anand R
35 pages
Phase 4
No ratings yet
Phase 4
5 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
Experiment No 9
No ratings yet
Experiment No 9
13 pages
BIDA practical print
No ratings yet
BIDA practical print
56 pages
All Analysiscode Explanation
No ratings yet
All Analysiscode Explanation
22 pages
Intro Count Sea
No ratings yet
Intro Count Sea
1 page
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
raw customer analysis
No ratings yet
raw customer analysis
2 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
Final Ca
No ratings yet
Final Ca
10 pages
Religious Education and Moderation A Bibliometric Analysis
No ratings yet
Religious Education and Moderation A Bibliometric Analysis
16 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
K Means Clustering Customer Clustering
No ratings yet
K Means Clustering Customer Clustering
7 pages
An Extensive Step by Step Guide To Exploratory Data Analysis
No ratings yet
An Extensive Step by Step Guide To Exploratory Data Analysis
26 pages
Axe Submission
No ratings yet
Axe Submission
4 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
Guides
No ratings yet
Guides
23 pages
Customer Segmentation PDF
No ratings yet
Customer Segmentation PDF
18 pages
Divyanshi 05401172023 Ds Practical
No ratings yet
Divyanshi 05401172023 Ds Practical
18 pages
Tasks for Students (1)
No ratings yet
Tasks for Students (1)
4 pages
banking_analysis
No ratings yet
banking_analysis
2 pages
Project
No ratings yet
Project
12 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Comprehensive Data Exploration With Python
No ratings yet
Comprehensive Data Exploration With Python
20 pages
ORA-01591 Solution
100% (1)
ORA-01591 Solution
2 pages
Types of Business Analytics.pptx
No ratings yet
Types of Business Analytics.pptx
59 pages
CCPA GDPR Chart
100% (1)
CCPA GDPR Chart
9 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
data_preprocess_steps
No ratings yet
data_preprocess_steps
2 pages
Generative Ai in Finance Finding The Way To Faster Deeper Insights
No ratings yet
Generative Ai in Finance Finding The Way To Faster Deeper Insights
4 pages
_p_Tax Snippet__p___
No ratings yet
_p_Tax Snippet__p___
1 page
Getting started in Excel 365 (2023)
No ratings yet
Getting started in Excel 365 (2023)
1 page
20it403 DBMS Digital Material Unit Iv
No ratings yet
20it403 DBMS Digital Material Unit Iv
115 pages
3220524_E_20240702[Sporadically failing backint-based backups after upgrading to HANA 2.0 SPS 06]
No ratings yet
3220524_E_20240702[Sporadically failing backint-based backups after upgrading to HANA 2.0 SPS 06]
4 pages
transcript asus
No ratings yet
transcript asus
3 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Rti 2
No ratings yet
Rti 2
10 pages
Rishabh_Sahu_Resume
No ratings yet
Rishabh_Sahu_Resume
1 page
PWC Managing Data Optimise Cloud Strategy PDF
No ratings yet
PWC Managing Data Optimise Cloud Strategy PDF
15 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Next-Generation Vessel Traffic Services SystemsFrom Passive To Proactive
No ratings yet
Next-Generation Vessel Traffic Services SystemsFrom Passive To Proactive
16 pages
Supermarket Sales Data analysis
No ratings yet
Supermarket Sales Data analysis
6 pages
Mall Customer Data Analysis PDF
No ratings yet
Mall Customer Data Analysis PDF
10 pages
DOC-20250118-WA0002.
No ratings yet
DOC-20250118-WA0002.
4 pages
James A. Senn's Information Technology, 3 Edition: Enterprise Databases and Data Warehouses
No ratings yet
James A. Senn's Information Technology, 3 Edition: Enterprise Databases and Data Warehouses
38 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
790 1549 1 PB 1
No ratings yet
790 1549 1 PB 1
9 pages
DSBDAL - Assignment No 9
No ratings yet
DSBDAL - Assignment No 9
12 pages
Data Structures: Stacks
No ratings yet
Data Structures: Stacks
12 pages
Sample of Short PHD Proposal
100% (4)
Sample of Short PHD Proposal
8 pages
Oracle Performance Optimization Using The Wait Interface - 7, 8, 9 and Beyond
No ratings yet
Oracle Performance Optimization Using The Wait Interface - 7, 8, 9 and Beyond
121 pages
Data+Analysis+Project+on+Customer+Purchases+Dataset
No ratings yet
Data+Analysis+Project+on+Customer+Purchases+Dataset
1 page
AI in Marketing
No ratings yet
AI in Marketing
5 pages
Acknowledgement: Doodhaganga-Krishna Sahakari Sakkare Karkhane Niyamit, Chikkodi
No ratings yet
Acknowledgement: Doodhaganga-Krishna Sahakari Sakkare Karkhane Niyamit, Chikkodi
89 pages
Bowie PDF
No ratings yet
Bowie PDF
15 pages
BCP Utility
No ratings yet
BCP Utility
16 pages
Top 24 T SQL Interview Questions
No ratings yet
Top 24 T SQL Interview Questions
3 pages
Question Answers NOTES
No ratings yet
Question Answers NOTES
18 pages
Chapter 3 and 4
No ratings yet
Chapter 3 and 4
11 pages
Lecture 26
No ratings yet
Lecture 26
4 pages
Syllabus
No ratings yet
Syllabus
16 pages
MSFT - Cloud - Architecture - O365 File Protection
No ratings yet
MSFT - Cloud - Architecture - O365 File Protection
10 pages
Lead2Pass - Latest Free Oracle 1Z0 060 Dumps (131 140) Download!
No ratings yet
Lead2Pass - Latest Free Oracle 1Z0 060 Dumps (131 140) Download!
6 pages
Berkley Data Science
No ratings yet
Berkley Data Science
4 pages
Unit 5 SQL Injection
No ratings yet
Unit 5 SQL Injection
4 pages
Individual Assignment Ims506
No ratings yet
Individual Assignment Ims506
12 pages
Agile DWH
No ratings yet
Agile DWH
3 pages
The Numerate Leader: How to Pull Game-Changing Insights from Statistical Data
From Everand
The Numerate Leader: How to Pull Game-Changing Insights from Statistical Data
Thomas A. King
No ratings yet
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet

Exploratory Data Analysis66

Uploaded by

Exploratory Data Analysis66

Uploaded by

shop-customer-data-analysis

March 16, 2023

[1]: #importing necessary libraries

[2]: #loading the dataset

[3]: #extracting first-five rows

Profession Work Experience Family Size

[4]: #extracting last-five rows

Profession Work Experience Family Size

[5]: #determining the shape

[6]: #determining the size

[7]: #checking the null values

[8]: #determining mode of 'Profession' column

[9]: #replacing null values with mode

[10]: # checking the duplicates

[10]: False 2000

[11]: #checking the information

[12]: #extracting statistical summary

[12]: CustomerID Age Annual Income ($) Spending Score (1-100) \

Work Experience Family Size

[13]: #creating the pairplot

[13]: <seaborn.axisgrid.PairGrid at 0x7f21431e3c90>

plt.xlabel('Annual Income ($)')

You might also like