0% found this document useful (0 votes)

56 views18 pages

Divyanshi 05401172023 Ds Practical

Uploaded by

diviyanshimehra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

56 views18 pages

Divyanshi 05401172023 Ds Practical

Uploaded by

diviyanshimehra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Load the dataset

data = pd.read_csv('Walmart Sales data.csv.csv') # Replace
'sales_data.csv' with your actual file path

# Display the first few rows of the dataset to understand its

structure
print(data.head())

Invoice ID Branch City Customer type Gender \

0 750-67-8428 A Yangon Member Female
1 226-31-3081 C Naypyitaw Normal Female
2 631-41-3108 A Yangon Normal Male
3 123-19-1176 A Yangon Member Male
4 373-73-7910 A Yangon Normal Male

Product line Unit price Quantity Tax 5% Total \

0 Health and beauty 74.69 7 26.1415 548.9715
1 Electronic accessories 15.28 5 3.8200 80.2200
2 Home and lifestyle 46.33 7 16.2155 340.5255
3 Health and beauty 58.22 8 23.2880 489.0480
4 Sports and travel 86.31 7 30.2085 634.3785

Date Time Payment cogs gross margin percentage

\
0 2019-01-05 13:08:00 Ewallet 522.83 4.761905

1 2019-03-08 10:29:00 Cash 76.40 4.761905

2 2019-03-03 13:23:00 Credit card 324.31 4.761905

3 2019-01-27 20:33:00 Ewallet 465.76 4.761905

4 2019-02-08 10:37:00 Ewallet 604.17 4.761905

gross income Rating

0 26.1415 9.1
1 3.8200 9.6
2 16.2155 7.4
3 23.2880 8.4
4 30.2085 5.3
Q1. How many distinct cities are present in the
dataset?
distinct_cities = data['City'].nunique()
print("Number of distinct cities:", distinct_cities)

Number of distinct cities: 3

Q2. In which city is each branch situated?

branch_city_mapping = data.groupby('Branch')['City'].unique()
print("Branches and their respective cities:")
for branch, city in branch_city_mapping.items():
print("Branch:", branch, "-> City:", city)

Branches and their respective cities:

Branch: A -> City: ['Yangon']
Branch: B -> City: ['Mandalay']
Branch: C -> City: ['Naypyitaw']

Visualizations

Distribution of sales across branches

plt.figure(figsize=(10, 6))
sns.countplot(x='Branch', data=data, palette='Set2')
plt.title('Distribution of Sales Across Branches')
plt.xlabel('Branch')
plt.ylabel('Number of Sales')
plt.show()
Distribution of customer types
plt.figure(figsize=(10, 6))
sns.countplot(x='Customer type', data=data, palette='Pastel1')
plt.title('Distribution of Customer Types')
plt.xlabel('Customer Type')
plt.ylabel('Number of Customers')
plt.show()
Gender distribution
plt.figure(figsize=(10, 6))
sns.countplot(x='Gender', data=data, palette='Dark2')
plt.title('Gender Distribution of Customers')
plt.xlabel('Gender')
plt.ylabel('Number of Customers')
plt.show()
Product line distribution
plt.figure(figsize=(12, 6))
sns.countplot(y='Product line', data=data, palette='Set3')
plt.title('Distribution of Product Lines')
plt.xlabel('Number of Sales')
plt.ylabel('Product Line')
plt.show()
Distribution of ratings
plt.figure(figsize=(10, 6))
sns.histplot(data['Rating'], bins=10, kde=True, color='skyblue')
plt.title('Distribution of Ratings')
plt.xlabel('Rating')
plt.ylabel('Frequency')
plt.show()
3. How many distinct product lines are there in
the dataset?
distinct_product_lines = data['Product line'].nunique()
print("Number of distinct product lines:", distinct_product_lines)

Number of distinct product lines: 6

4. What is the most common payment method?

most_common_payment_method = data['Payment'].mode()[0]
print("Most common payment method:", most_common_payment_method)

Most common payment method: Ewallet

5. What is the most selling product line?

most_selling_product_line = data['Product
line'].value_counts().idxmax()
print("Most selling product line:", most_selling_product_line)

Most selling product line: Fashion accessories

6. What is the total revenue by month?

# Convert 'date' column to datetime format
data['Date'] = pd.to_datetime(data['Date'])

# Extract month from the 'date' column

data['month'] = data['Date'].dt.month

# Calculate total revenue by month

total_revenue_by_month = data.groupby('month')['Total'].sum()
print("Total revenue by month:")
print(total_revenue_by_month)

Total revenue by month:

month
1 116291.868
2 97219.374
3 109455.507
Name: Total, dtype: float64

7. Which month recorded the highest Cost of

Goods Sold (COGS)?
# Convert 'date' column to datetime format
data['Date'] = pd.to_datetime(data['Date'])

# Ensure the column has been converted successfully

print(data['Date'].dtype)

# Calculate total revenue by month

total_revenue_by_month = data.groupby(data['Date'].dt.month)
['Total'].sum()
print("Total revenue by month:")
print(total_revenue_by_month)

# Find the month with the highest Cost of Goods Sold (COGS)
highest_cogs_month = data.groupby(data['Date'].dt.month)
['cogs'].sum().idxmax()
print("Month with the highest Cost of Goods Sold (COGS):",
highest_cogs_month)
datetime64[ns]
Total revenue by month:
Date
1 116291.868
2 97219.374
3 109455.507
Name: Total, dtype: float64
Month with the highest Cost of Goods Sold (COGS): 1

8. Which product line generated the highest

revenue?
highest_revenue_product_line = data.groupby('Product line')
['Total'].sum().idxmax()
print("Product line with the highest revenue:",
highest_revenue_product_line)

Product line with the highest revenue: Food and beverages

9. Which city has the highest revenue?

highest_revenue_city = data.groupby('City')['Total'].sum().idxmax()
print("City with the highest revenue:", highest_revenue_city)

City with the highest revenue: Naypyitaw

10. Which product line incurred the highest

VAT?
# 10. Which product line incurred the highest VAT?
highest_vat_product_line = data.groupby('Product line')['Tax
5%'].sum().idxmax()
print("Product line with the highest VAT:", highest_vat_product_line)

Product line with the highest VAT: Food and beverages

11. Retrieve each product line and add a column
product_category, indicating 'Good' or 'Bad,'

based on whether its sales are above the

average.
# 11. Retrieve each product line and add a column product_category,
indicating 'Good' or 'Bad',
# based on whether its sales are above the average.

average_quantity_sold = data['Quantity'].mean()

# Function to categorize sales

def categorize_sales(quantity):
if quantity > average_quantity_sold:
return 'Good'
else:
return 'Bad'

# Apply the function to create the product category column

data['product_category'] = data['Quantity'].apply(categorize_sales)

# Display the updated DataFrame with the new column

print(data[['Product line', 'Quantity', 'product_category']].head())

Product line Quantity product_category

0 Health and beauty 7 Good
1 Electronic accessories 5 Bad
2 Home and lifestyle 7 Good
3 Health and beauty 8 Good
4 Sports and travel 7 Good

12. Which branch sold more products than

average product sold?
# 12. Which branch sold more products than average product sold?
branch_product_counts = data.groupby('Branch')['Quantity'].sum()
branch_more_than_average = branch_product_counts[branch_product_counts
> average_sales].index.tolist()
print("Branch(es) with more products sold than the average:",
branch_more_than_average)
Branch(es) with more products sold than the average: ['A', 'B', 'C']

13. What is the most common product line by

gender?
# 13. What is the most common product line by gender?
common_product_line_by_gender = data.groupby(['Gender', 'Product
line']).size().idxmax()
print("Most common product line by gender:",
common_product_line_by_gender[1])

Most common product line by gender: Fashion accessories

14. What is the average rating of each product

line?
# 14. What is the average rating of each product line?
average_rating_by_product_line = data.groupby('Product line')
['Rating'].mean()
print("Average rating of each product line:")
print(average_rating_by_product_line)

Average rating of each product line:

Product line
Electronic accessories 6.924706
Fashion accessories 7.029213
Food and beverages 7.113218
Health and beauty 7.003289
Home and lifestyle 6.837500
Sports and travel 6.916265
Name: Rating, dtype: float64

15. Number of sales made in each time of the

day per weekday
# 15. Number of sales made in each time of the day per weekday
data['weekday'] = data['Date'].dt.weekday
sales_per_time_per_weekday = data.groupby(['weekday', 'Time']).size()
print("Number of sales made in each time of the day per weekday:")
print(sales_per_time_per_weekday)

Number of sales made in each time of the day per weekday:

weekday Time
0 10:00:00 1
10:02:00 1
10:05:00 1
10:11:00 1
10:23:00 2
..
6 20:33:00 1
20:37:00 1
20:38:00 1
20:46:00 1
20:51:00 1
Length: 914, dtype: int64

16. Identify the customer type that generates

the highest revenue.
# 16. Identify the customer type that generates the highest revenue.
highest_revenue_customer_type = data.groupby('Customer type')
['Total'].sum().idxmax()
print("Customer type that generates the highest revenue:",
highest_revenue_customer_type)

Customer type that generates the highest revenue: Member

17. Which city has the largest tax percent/ VAT

(Value Added Tax)?
# 17. Which city has the largest tax percent/ VAT (Value Added Tax)?
city_with_largest_vat_percent = data.groupby('City')['Tax
5%'].mean().idxmax()
print("City with the largest tax percent/ VAT:",
city_with_largest_vat_percent)

City with the largest tax percent/ VAT: Naypyitaw

18. Which customer type pays the most VAT?
# 18. Which customer type pays the most VAT?
customer_type_with_most_vat = data.groupby('Customer type')['Tax
5%'].sum().idxmax()
print("Customer type that pays the most VAT:",
customer_type_with_most_vat)

Customer type that pays the most VAT: Member

19. How many unique customer types does the

data have?
# 19. How many unique customer types does the data have?
unique_customer_types = data['Customer type'].nunique()
print("Number of unique customer types:", unique_customer_types)

Number of unique customer types: 2

20. How many unique payment methods does

the data have?
# 20. How many unique payment methods does the data have?
unique_payment_methods = data['Payment'].nunique()
print("Number of unique payment methods:", unique_payment_methods)

Number of unique payment methods: 3

21. Which is the most common customer type?

most_common_customer_type = data['Customer type'].mode()[0]
print("Most common customer type:", most_common_customer_type)

Most common customer type: Member

22. Which customer type buys the most?

most_buying_customer_type = data.groupby('Customer type')
['Quantity'].sum().idxmax()
print("Customer type that buys the most:", most_buying_customer_type)

Customer type that buys the most: Member

23. What is the gender of most of the

customers?
most_common_gender = data['Gender'].mode()[0]
print("Gender of most of the customers:", most_common_gender)

Gender of most of the customers: Female

24. What is the gender distribution per branch?

gender_distribution_per_branch = data.groupby(['Branch',
'Gender']).size()
print("Gender distribution per branch:")
print(gender_distribution_per_branch)

Gender distribution per branch:

Branch Gender
A Female 161
Male 179
B Female 162
Male 170
C Female 178
Male 150
dtype: int64

25. Which time of the day do customers give

most ratings?
most_rated_time_of_day = data.groupby('Time')['Rating'].sum().idxmax()
print("Time of the day when customers give the most ratings:",
most_rated_time_of_day)

Time of the day when customers give the most ratings: 19:48:00
26. Which time of the day do customers give
most ratings per branch?
most_rated_time_of_day_per_branch = data.groupby(['Branch', 'Time'])
['Rating'].sum().idxmax()
print("Time of the day when customers give the most ratings per
branch:", most_rated_time_of_day_per_branch)

Time of the day when customers give the most ratings per branch: ('C',
'10:23:00')

27. Which day of the week has the best avg

ratings?
best_avg_ratings_day_of_week = data.groupby(data['Date'].dt.dayofweek)
['Rating'].mean().idxmax()
print("Day of the week with the best average ratings:",
best_avg_ratings_day_of_week)

Day of the week with the best average ratings: 0

28. Which day of the week has the best average

ratings per branch?
best_avg_ratings_day_of_week_per_branch = data.groupby(['Branch',
data['Date'].dt.dayofweek])['Rating'].mean().idxmax()
print("Day of the week with the best average ratings per branch:",
best_avg_ratings_day_of_week_per_branch)

Day of the week with the best average ratings per branch: ('B', 0)

29. Are there any patterns or trends in sales

over time (by month, day of the week, or time of
day)?
# For example, we can visualize total sales over time
import matplotlib.pyplot as plt

# Extract month, day of the week, and hour from the date
data['Month'] = data['Date'].dt.month
data['DayOfWeek'] = data['Date'].dt.dayofweek
data['Hour'] = data['Time'].apply(lambda x: int(x.split(':')[0]))

# Total sales by month

total_sales_by_month = data.groupby('Month')['Total'].sum()

# Total sales by day of the week

total_sales_by_day_of_week = data.groupby('DayOfWeek')['Total'].sum()

# Total sales by hour of the day

total_sales_by_hour = data.groupby('Hour')['Total'].sum()

# Plotting
plt.figure(figsize=(18, 5))

plt.subplot(1, 3, 1)
plt.plot(total_sales_by_month, marker='o')
plt.title('Total Sales by Month')
plt.xlabel('Month')
plt.ylabel('Total Sales')

plt.subplot(1, 3, 2)
plt.plot(total_sales_by_day_of_week, marker='o')
plt.title('Total Sales by Day of the Week')
plt.xlabel('Day of the Week')
plt.ylabel('Total Sales')

plt.subplot(1, 3, 3)
plt.plot(total_sales_by_hour, marker='o')
plt.title('Total Sales by Hour of the Day')
plt.xlabel('Hour of the Day')
plt.ylabel('Total Sales')

plt.tight_layout()
plt.show()
30. Are there any differences in customer
ratings between branches?
ratings_by_branch = data.groupby('Branch')['Rating'].mean()
print("Average ratings by branch:")
print(ratings_by_branch)

Average ratings by branch:

Branch
A 7.027059
B 6.818072
C 7.072866
Name: Rating, dtype: float64

31. Is there any correlation between the tax

amount and the total transaction amount?
correlation_tax_total = data['Tax 5%'].corr(data['Total'])
print("Correlation between tax amount and total transaction amount:",
correlation_tax_total)

Correlation between tax amount and total transaction amount:

0.9999999999999998

32. Do certain product lines tend to have higher

ratings than others?
ratings_by_product_line = data.groupby('Product line')
['Rating'].mean()
print("Average ratings by product line:")
print(ratings_by_product_line)

Average ratings by product line:

33. Is there any correlation between the

quantity of items purchased and the total
transaction amount?
correlation_quantity_total = data['Quantity'].corr(data['Total'])
print("Correlation between quantity and total transaction amount:",
correlation_quantity_total)

MCQ On Quality Control
83% (30)
MCQ On Quality Control
3 pages
Machine Learning - Project
80% (10)
Machine Learning - Project
14 pages
India Post
100% (1)
India Post
29 pages
Supermart Grocery Sales Analysis
No ratings yet
Supermart Grocery Sales Analysis
8 pages
Real Feel Test 1
100% (2)
Real Feel Test 1
50 pages
Barney, J. B & Hesterly, W. S. (2019) Strategic Management and Competitive Advantage. Concepts and Cases
0% (1)
Barney, J. B & Hesterly, W. S. (2019) Strategic Management and Competitive Advantage. Concepts and Cases
21 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Architectural Project Feasibility Note
100% (1)
Architectural Project Feasibility Note
11 pages
Labor Supply, Population Growth, Wages
No ratings yet
Labor Supply, Population Growth, Wages
13 pages
MeriSkill Sales Analysis
No ratings yet
MeriSkill Sales Analysis
17 pages
Data Mini Proj
100% (2)
Data Mini Proj
44 pages
Case Study
50% (2)
Case Study
8 pages
SAP Digital Supplier Network - Deployment Playbook
No ratings yet
SAP Digital Supplier Network - Deployment Playbook
18 pages
Record Keeping Final
No ratings yet
Record Keeping Final
70 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Case Digest - Aquino Vs NLRC
100% (1)
Case Digest - Aquino Vs NLRC
2 pages
The Trial Balance: By: Justine V Andrada Kean Angelie Relator
No ratings yet
The Trial Balance: By: Justine V Andrada Kean Angelie Relator
14 pages
6449 Northridge Mall Brochure 2024 - P2
No ratings yet
6449 Northridge Mall Brochure 2024 - P2
13 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Amazon Data Analysis With SQL
No ratings yet
Amazon Data Analysis With SQL
4 pages
Vivas Vs The Monetary Board
No ratings yet
Vivas Vs The Monetary Board
4 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
52nd IPS National Conference
No ratings yet
52nd IPS National Conference
3 pages
Scenario For Assignment 3
No ratings yet
Scenario For Assignment 3
6 pages
Ali Shafi BSBA 2-A 6522 Sales Market Data
No ratings yet
Ali Shafi BSBA 2-A 6522 Sales Market Data
40 pages
Amazon Sales Analysis
No ratings yet
Amazon Sales Analysis
51 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
No ratings yet
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
3 pages
The Champion Legal Ads: 07-15-21
No ratings yet
The Champion Legal Ads: 07-15-21
36 pages
Python Vs SQL
No ratings yet
Python Vs SQL
25 pages
EDA Process For Shopify Sales Data
No ratings yet
EDA Process For Shopify Sales Data
35 pages
Guides
No ratings yet
Guides
23 pages
Economic Data Analysis (Finance Analyst)
No ratings yet
Economic Data Analysis (Finance Analyst)
38 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
ALOJIPAN Assessment - Task - 1 - Sampling - Data - Visualization
No ratings yet
ALOJIPAN Assessment - Task - 1 - Sampling - Data - Visualization
12 pages
Walmart Sales Analysis
No ratings yet
Walmart Sales Analysis
29 pages
Analyzing Consumer Behavior of Mobile Phone Industry QT Pening Report
No ratings yet
Analyzing Consumer Behavior of Mobile Phone Industry QT Pening Report
19 pages
The Factories Act, 1948
No ratings yet
The Factories Act, 1948
2 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
Black Friday Sales
No ratings yet
Black Friday Sales
26 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Python Project
No ratings yet
Python Project
20 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
No ratings yet
DF PD - Read - Excel ('Sample - Superstore - XLS') : Anjaliassignmnet - Ipy NB
23 pages
Chpater 7
No ratings yet
Chpater 7
19 pages
Data Collection and Data Cleaning: Next Connect To The Drive
No ratings yet
Data Collection and Data Cleaning: Next Connect To The Drive
16 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
Diwali Sales Analysis
No ratings yet
Diwali Sales Analysis
14 pages
Amazon Sales Analysis-1
No ratings yet
Amazon Sales Analysis-1
14 pages
Sales Analysis Project
No ratings yet
Sales Analysis Project
11 pages
Tugas Olah Data (Tita Rostiawati)
No ratings yet
Tugas Olah Data (Tita Rostiawati)
4 pages
Task 6
No ratings yet
Task 6
14 pages
2055 Air Transport It Insights
No ratings yet
2055 Air Transport It Insights
29 pages
Masterclass Data Analysis - Ipynb - Colab
No ratings yet
Masterclass Data Analysis - Ipynb - Colab
4 pages
DMV - 5 - Jupyter Notebook
No ratings yet
DMV - 5 - Jupyter Notebook
5 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
2nd Revision Test of English of 10 Class
No ratings yet
2nd Revision Test of English of 10 Class
7 pages
College Assignment
No ratings yet
College Assignment
4 pages
Supermarket SQL&Python
No ratings yet
Supermarket SQL&Python
9 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
IP Project Final
No ratings yet
IP Project Final
9 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
Document 11
No ratings yet
Document 11
6 pages
GMS Retainer Agreement 1
No ratings yet
GMS Retainer Agreement 1
5 pages
Task-by-Task Guide - Retail Data Analysis
No ratings yet
Task-by-Task Guide - Retail Data Analysis
6 pages
UNIT 5 Scenario
No ratings yet
UNIT 5 Scenario
5 pages
SQL Capstone Project
No ratings yet
SQL Capstone Project
4 pages
Case Study Reportf
No ratings yet
Case Study Reportf
6 pages
Understanding The 10 Key Reversal Candlestick Patterns
No ratings yet
Understanding The 10 Key Reversal Candlestick Patterns
14 pages
Wa0002.
No ratings yet
Wa0002.
4 pages
Walmart Sales Data Analysis
No ratings yet
Walmart Sales Data Analysis
4 pages
Efficiency of Financial Ratios in Predicting Stock Price Trends of Listed Banks at Nairobi Securities Exchange
No ratings yet
Efficiency of Financial Ratios in Predicting Stock Price Trends of Listed Banks at Nairobi Securities Exchange
13 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Case Study Module 1
No ratings yet
Case Study Module 1
4 pages
Solution
No ratings yet
Solution
4 pages
Requirements Engineering Good Practices
No ratings yet
Requirements Engineering Good Practices
13 pages
Notes 20241025083428
No ratings yet
Notes 20241025083428
4 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
Assignment
No ratings yet
Assignment
2 pages
Kwabre East Account Details
No ratings yet
Kwabre East Account Details
10 pages
Sales Analysis Assessment
No ratings yet
Sales Analysis Assessment
2 pages
Engineering Head JD
No ratings yet
Engineering Head JD
2 pages
Ho Huu Hoang Anh: Education
No ratings yet
Ho Huu Hoang Anh: Education
1 page
What Is Business History
No ratings yet
What Is Business History
5 pages
Rcci Discount Brochure 3
No ratings yet
Rcci Discount Brochure 3
4 pages
Deliver
No ratings yet
Deliver
2 pages
Building a Tip Calculator Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Tip Calculator
From Everand
Building a Tip Calculator Web App with Vanilla HTML, CSS, and JavaScript.: A Practical Q&A Guide Using a Tip Calculator
Lumavalle Press
No ratings yet