0% found this document useful (0 votes)

24 views11 pages

Sales Analysis Project

The document outlines a sales analysis process involving the importation of sales data from multiple CSV files, merging them into a single dataset, and performing data cleaning. It includes steps for adding new columns such as 'Month' and 'Sales', and visualizing sales data by month and city. Additionally, it discusses optimizing advertisement timing based on order timestamps.

Uploaded by

Hend Selmy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views11 pages

Sales Analysis Project

Uploaded by

Hend Selmy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

sales-analysis

May 3, 2025

0.0.1 Import Necessary Libraries

[1]: import pandas as pd

import os
import matplotlib.pyplot as plt

0.0.2 Merge 12 Month of Sales Data into a Single CSV File

[2]: files = [file for file in os.listdir("./Sales_Data")]

for f in files:
print(f)

Sales_April_2019.csv
Sales_August_2019.csv
Sales_December_2019.csv
Sales_February_2019.csv
Sales_January_2019.csv
Sales_July_2019.csv
Sales_June_2019.csv
Sales_March_2019.csv
Sales_May_2019.csv
Sales_November_2019.csv
Sales_October_2019.csv
Sales_September_2019.csv

[3]: df_list = []

for dir in files:

file_path = os.path.join("./Sales_Data", dir) # use to combine multiple␣
↪parts of a file path

df = pd.read_csv(file_path)
df_list.append(df)

# concate all dataframes

combined_df = pd.concat(df_list, ignore_index=True)

# save all files in one file

combined_df.to_csv("all_data.csv", index=False)

1
[4]: all_data = pd.read_csv("all_data.csv")
all_data.head()

[4]: Order ID Product Quantity Ordered Price Each \

0 176558 USB-C Charging Cable 2 11.95
1 NaN NaN NaN NaN
2 176559 Bose SoundSport Headphones 1 99.99
3 176560 Google Phone 1 600
4 176560 Wired Headphones 1 11.99

Order Date Purchase Address

0 04/19/19 08:46 917 1st St, Dallas, TX 75001
1 NaN NaN
2 04/07/19 22:30 682 Chestnut St, Boston, MA 02215
3 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001
4 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001

0.0.3 Clean up the Data

[5]: all_data.isnull().sum()

[5]: Order ID 545

Product 545
Quantity Ordered 545
Price Each 545
Order Date 545
Purchase Address 545
dtype: int64

[6]: all_data = all_data.dropna(how="all")

[7]: all_data.isnull().sum()

[7]: Order ID 0
Product 0
Quantity Ordered 0
Price Each 0
Order Date 0
Purchase Address 0
dtype: int64

Find “OR” and dlt it

[8]: all_data.columns

[8]: Index(['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date',
'Purchase Address'],

2
dtype='object')

[9]: all_data = all_data[all_data["Order Date"].str[0:2] != "Or"]

Convert Column to the Correct Type

[10]: all_data["Quantity Ordered"] = pd.to_numeric(all_data["Quantity Ordered"]) #␣
↪make int

all_data["Price Each"] = pd.to_numeric(all_data["Price Each"]) #␣

↪make float

Add Month Column

[11]: all_data["Month"] = all_data["Order Date"].str[0:2]
all_data["Month"] = all_data["Month"].astype("int32")
all_data.head()

[11]: Order ID Product Quantity Ordered Price Each \

0 176558 USB-C Charging Cable 2 11.95
2 176559 Bose SoundSport Headphones 1 99.99
3 176560 Google Phone 1 600.00
4 176560 Wired Headphones 1 11.99
5 176561 Wired Headphones 1 11.99

Order Date Purchase Address Month

0 04/19/19 08:46 917 1st St, Dallas, TX 75001 4
2 04/07/19 22:30 682 Chestnut St, Boston, MA 02215 4
3 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4
4 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4
5 04/30/19 09:27 333 8th St, Los Angeles, CA 90001 4

Add Sales Column

[12]: all_data["Sales"] = all_data["Quantity Ordered"] * all_data["Price Each"]
all_data

[12]: Order ID Product Quantity Ordered Price Each \

0 176558 USB-C Charging Cable 2 11.95
2 176559 Bose SoundSport Headphones 1 99.99
3 176560 Google Phone 1 600.00
4 176560 Wired Headphones 1 11.99
5 176561 Wired Headphones 1 11.99
… … … … …
186845 259353 AAA Batteries (4-pack) 3 2.99
186846 259354 iPhone 1 700.00
186847 259355 iPhone 1 700.00
186848 259356 34in Ultrawide Monitor 1 379.99
186849 259357 USB-C Charging Cable 1 11.95

3
Order Date Purchase Address Month Sales
0 04/19/19 08:46 917 1st St, Dallas, TX 75001 4 23.90
2 04/07/19 22:30 682 Chestnut St, Boston, MA 02215 4 99.99
3 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4 600.00
4 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4 11.99
5 04/30/19 09:27 333 8th St, Los Angeles, CA 90001 4 11.99
… … … … …
186845 09/17/19 20:56 840 Highland St, Los Angeles, CA 90001 9 8.97
186846 09/01/19 16:00 216 Dogwood St, San Francisco, CA 94016 9 700.00
186847 09/23/19 07:39 220 12th St, San Francisco, CA 94016 9 700.00
186848 09/19/19 17:30 511 Forest St, San Francisco, CA 94016 9 379.99
186849 09/30/19 00:18 250 Meadow St, San Francisco, CA 94016 9 11.95

[185950 rows x 8 columns]

0.0.4 Which is the Best Month for Sales?

[13]: results = all_data.groupby("Month").sum()

[14]: plt.figure(figsize=(6,4))
months = range(1,13)
plt.bar(months, results["Sales"])
plt.xticks(months)
plt.xlabel("Month Number")
plt.ylabel("Sales in USD ($)")
plt.show()

4
Add City Column
[15]: def get_city(address):
return address.split(",")[1]

def get_state(address):
return address.split(",")[2].split(" ")[1]

all_data["City"] = all_data["Purchase Address"].apply(lambda x: f"{get_city(x)}␣

↪({get_state(x)})")

all_data

[15]: Order ID Product Quantity Ordered Price Each \

5
Order Date Purchase Address Month \
0 04/19/19 08:46 917 1st St, Dallas, TX 75001 4
2 04/07/19 22:30 682 Chestnut St, Boston, MA 02215 4
3 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4
4 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4
5 04/30/19 09:27 333 8th St, Los Angeles, CA 90001 4
… … … …
186845 09/17/19 20:56 840 Highland St, Los Angeles, CA 90001 9
186846 09/01/19 16:00 216 Dogwood St, San Francisco, CA 94016 9
186847 09/23/19 07:39 220 12th St, San Francisco, CA 94016 9
186848 09/19/19 17:30 511 Forest St, San Francisco, CA 94016 9
186849 09/30/19 00:18 250 Meadow St, San Francisco, CA 94016 9

Sales City
0 23.90 Dallas (TX)
2 99.99 Boston (MA)
3 600.00 Los Angeles (CA)
4 11.99 Los Angeles (CA)
5 11.99 Los Angeles (CA)
… … …
186845 8.97 Los Angeles (CA)
186846 700.00 San Francisco (CA)
186847 700.00 San Francisco (CA)
186848 379.99 San Francisco (CA)
186849 11.95 San Francisco (CA)

[185950 rows x 9 columns]

0.0.5 Which City had the Highest Sales?

[16]: results = all_data.groupby("City").sum()

[17]: cities = all_data["City"].unique()

plt.bar(cities, results["Sales"])
plt.xticks(cities, rotation=45, size=8)
plt.xlabel("City Name")
plt.ylabel("Sales in USD ($)")
plt.show()

6
0.0.6 What time should we display advertisements to maximize the likelihood of
customers buying the product?

[18]: all_data["Order Date"] = pd.to_datetime(all_data["Order Date"]) # convert␣

↪datetime format

C:\Users\DELL\AppData\Local\Temp\ipykernel_9116\2228339044.py:1: UserWarning:
Could not infer format, so each element will be parsed individually, falling
back to `dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
all_data["Order Date"] = pd.to_datetime(all_data["Order Date"]) # convert
datetime format

[19]: all_data.head()

7
[19]: Order ID Product Quantity Ordered Price Each \
0 176558 USB-C Charging Cable 2 11.95
2 176559 Bose SoundSport Headphones 1 99.99
3 176560 Google Phone 1 600.00
4 176560 Wired Headphones 1 11.99
5 176561 Wired Headphones 1 11.99

Order Date Purchase Address Month Sales \

0 2019-04-19 08:46:00 917 1st St, Dallas, TX 75001 4 23.90
2 2019-04-07 22:30:00 682 Chestnut St, Boston, MA 02215 4 99.99
3 2019-04-12 14:38:00 669 Spruce St, Los Angeles, CA 90001 4 600.00
4 2019-04-12 14:38:00 669 Spruce St, Los Angeles, CA 90001 4 11.99
5 2019-04-30 09:27:00 333 8th St, Los Angeles, CA 90001 4 11.99

City
0 Dallas (TX)
2 Boston (MA)
3 Los Angeles (CA)
4 Los Angeles (CA)
5 Los Angeles (CA)

[20]: all_data["Hour"] = all_data["Order Date"].dt.hour

all_data["Minute"] = all_data["Order Date"].dt.minute

[21]: all_data.head()

[21]: Order ID Product Quantity Ordered Price Each \

0 176558 USB-C Charging Cable 2 11.95
2 176559 Bose SoundSport Headphones 1 99.99
3 176560 Google Phone 1 600.00
4 176560 Wired Headphones 1 11.99
5 176561 Wired Headphones 1 11.99

Order Date Purchase Address Month Sales \

City Hour Minute

0 Dallas (TX) 8 46
2 Boston (MA) 22 30
3 Los Angeles (CA) 14 38
4 Los Angeles (CA) 14 38
5 Los Angeles (CA) 9 27

8
[22]: hours = sorted(all_data["Hour"].unique())
plt.plot(hours, all_data.groupby(["Hour"]).count())
plt.xticks(hours)
plt.grid()
plt.xlabel("Hour")
plt.ylabel("Count of Orders")
plt.show()

#Highest number of orders came on 11 AM & 7 PM (11)

0.0.7 What products sold the most? Why do you think it sold the most?

[23]: product_group = all_data.groupby("Product")

quantity_order = product_group["Quantity Ordered"].sum()
products = [product for product, f in product_group]
plt.bar(products,quantity_order)
plt.ylabel("Quantity Ordered")
plt.xlabel("Product")
plt.xticks(products, rotation="vertical", size=8)
plt.show()

9
[44]: price = all_data.groupby("Product")["Price Each"].mean()

fig,ax1 = plt.subplots()

ax2 = ax1.twinx() # share x axis and create second y axis,␣

↪useful when we show different y axis in common x axis

ax1.bar(products,quantity_order, color="green")
ax2.plot(products,price)

ax1.set_xlabel("Product Name")
ax1.set_ylabel("Quantity Ordered", color="green")

10
ax1.set_xticklabels(products, rotation="vertical", size=8)
plt.title("Quantity Ordered vs Average Price for Each Product")

plt.show()

C:\Users\DELL\AppData\Local\Temp\ipykernel_9116\836746615.py:13: UserWarning:
set_ticklabels() should only be used with a fixed number of ticks, i.e. after
set_ticks() or using a FixedLocator.
ax1.set_xticklabels(products, rotation="vertical", size=8)

#The products with lower prices sold more, while those with higher prices sold less.

Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Data Visualization For Python - Sales Retail - r1
No ratings yet
Data Visualization For Python - Sales Retail - r1
19 pages
Sales Data Analysis
No ratings yet
Sales Data Analysis
37 pages
MeriSkill Sales Analysis
No ratings yet
MeriSkill Sales Analysis
17 pages
Case Study
50% (2)
Case Study
8 pages
Locked College List Mop Up Round AIQ
No ratings yet
Locked College List Mop Up Round AIQ
4 pages
N.E.F. Phobia
No ratings yet
N.E.F. Phobia
2 pages
Automatic Transfer Switch - Ats 22 Manual
No ratings yet
Automatic Transfer Switch - Ats 22 Manual
38 pages
Articles (Homework)
No ratings yet
Articles (Homework)
2 pages
Grade 12 Gas General Physics DLL
100% (1)
Grade 12 Gas General Physics DLL
9 pages
Exercise 37. Read and Find The Appropriate Translation For The Words Below in The Text
No ratings yet
Exercise 37. Read and Find The Appropriate Translation For The Words Below in The Text
3 pages
BS en 10223-5 (1998)
No ratings yet
BS en 10223-5 (1998)
13 pages
Npab FMT
No ratings yet
Npab FMT
7 pages
Series D1MW Characteristics: Technical Features
No ratings yet
Series D1MW Characteristics: Technical Features
6 pages
E-Commerce Marketing and Sales Business Case
No ratings yet
E-Commerce Marketing and Sales Business Case
93 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Technical Specification: Item Qnty. Description Price in Taka No. Unit Price Total Price
No ratings yet
Technical Specification: Item Qnty. Description Price in Taka No. Unit Price Total Price
3 pages
I Lecture 6
No ratings yet
I Lecture 6
39 pages
Sales Dataset Analysis
No ratings yet
Sales Dataset Analysis
28 pages
Olist Kasyapa
No ratings yet
Olist Kasyapa
22 pages
Market and Retail Analysis Presentation-Compressed-Compressed
No ratings yet
Market and Retail Analysis Presentation-Compressed-Compressed
23 pages
Y.garud Multiaxial Fatigue
No ratings yet
Y.garud Multiaxial Fatigue
27 pages
Python Project
No ratings yet
Python Project
20 pages
Project ProductAnalyst
No ratings yet
Project ProductAnalyst
32 pages
How To Create A Pipeline Capable of Processing 2.5 Billion Records/day
No ratings yet
How To Create A Pipeline Capable of Processing 2.5 Billion Records/day
65 pages
Customer Transaction Analysis - Vu Truong
No ratings yet
Customer Transaction Analysis - Vu Truong
23 pages
New Microsoft Word Document (3) BBBB
No ratings yet
New Microsoft Word Document (3) BBBB
85 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
EDA On Sales Data Using MySQL and Power BI
No ratings yet
EDA On Sales Data Using MySQL and Power BI
14 pages
Documentpython 2
No ratings yet
Documentpython 2
22 pages
Divyanshi 05401172023 Ds Practical
No ratings yet
Divyanshi 05401172023 Ds Practical
18 pages
Retail Analysis Walmart
No ratings yet
Retail Analysis Walmart
18 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
Cap 793
No ratings yet
Cap 793
17 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
Amazon Sales Analysis-1
No ratings yet
Amazon Sales Analysis-1
14 pages
Data Collection and Data Cleaning: Next Connect To The Drive
No ratings yet
Data Collection and Data Cleaning: Next Connect To The Drive
16 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
SalesDataAnalysis 1693296057
No ratings yet
SalesDataAnalysis 1693296057
14 pages
Mu Checker - 2215 1
No ratings yet
Mu Checker - 2215 1
20 pages
Naan Mudhalvan - Google Cloud Data Analytics
No ratings yet
Naan Mudhalvan - Google Cloud Data Analytics
33 pages
4700 Vertical Multi-Stage Centrifugal Pumps: Installation and Operating Instructions
No ratings yet
4700 Vertical Multi-Stage Centrifugal Pumps: Installation and Operating Instructions
36 pages
Wa0016.
No ratings yet
Wa0016.
13 pages
MRA Part A
No ratings yet
MRA Part A
30 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
Business Analytics With Excel
No ratings yet
Business Analytics With Excel
11 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
Guides
No ratings yet
Guides
23 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
AWS SAA Notes
No ratings yet
AWS SAA Notes
18 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
Retrieve
No ratings yet
Retrieve
12 pages
Active Learning in The Era of Big Data
No ratings yet
Active Learning in The Era of Big Data
13 pages
Experiment 8
No ratings yet
Experiment 8
9 pages
Wa0002.
No ratings yet
Wa0002.
4 pages
Top 20 Latest Research Problems in Big Data and Data Science - by Dr. Sunil Kumar Vuppala - Towards Data Science
No ratings yet
Top 20 Latest Research Problems in Big Data and Data Science - by Dr. Sunil Kumar Vuppala - Towards Data Science
12 pages
Notes 20241025083428
No ratings yet
Notes 20241025083428
4 pages
Ofosu
No ratings yet
Ofosu
9 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
Handlebars
No ratings yet
Handlebars
5 pages
Olap
No ratings yet
Olap
7 pages
DRA Lab Exp3
No ratings yet
DRA Lab Exp3
5 pages
High Performance, Flexible, Solid-State Supercapacitors Based On A Renewable and Biodegradable Mesoporous Cellulose Membrane
No ratings yet
High Performance, Flexible, Solid-State Supercapacitors Based On A Renewable and Biodegradable Mesoporous Cellulose Membrane
9 pages
Importing Libraries: Import As Import As Import As Import As Import From Import
No ratings yet
Importing Libraries: Import As Import As Import As Import As Import From Import
12 pages
Energy Price Prediction With XGBoost-Time Series
No ratings yet
Energy Price Prediction With XGBoost-Time Series
8 pages
EDA Report Week2
No ratings yet
EDA Report Week2
15 pages
08 Sales Analysis
No ratings yet
08 Sales Analysis
4 pages
5-2a Dataframes Column Operations - Instruction
No ratings yet
5-2a Dataframes Column Operations - Instruction
2 pages
Semi Finals Examination: Multiple Choice
No ratings yet
Semi Finals Examination: Multiple Choice
6 pages
Bus 5115 - Discussion Forum Unit 1 University of The People
No ratings yet
Bus 5115 - Discussion Forum Unit 1 University of The People
5 pages
Paper3 - LLM Agent Operating System
No ratings yet
Paper3 - LLM Agent Operating System
14 pages
CH 4 Force System Resultant
No ratings yet
CH 4 Force System Resultant
50 pages
Visualization
No ratings yet
Visualization
8 pages
Latihan Soal Akademik Bahasa Inggris-1-1
No ratings yet
Latihan Soal Akademik Bahasa Inggris-1-1
4 pages
Project
No ratings yet
Project
6 pages
Assgn
No ratings yet
Assgn
6 pages
Project Merged
No ratings yet
Project Merged
7 pages
BI - Analytics - Question 4
No ratings yet
BI - Analytics - Question 4
4 pages
Testcase 3
No ratings yet
Testcase 3
3 pages
UNIT 5 Scenario
No ratings yet
UNIT 5 Scenario
5 pages
Bio++data Mukul++Vaghela
No ratings yet
Bio++data Mukul++Vaghela
2 pages
Sales Analysis Assessment
No ratings yet
Sales Analysis Assessment
2 pages
Data Handling Ques
No ratings yet
Data Handling Ques
2 pages
BR12 TDS BladeRep Topcoat 12 EN 01
No ratings yet
BR12 TDS BladeRep Topcoat 12 EN 01
2 pages
Sales EDA
No ratings yet
Sales EDA
1 page
Sales Data Assignment
No ratings yet
Sales Data Assignment
1 page
Advanced Sales Analysis Project Report
No ratings yet
Advanced Sales Analysis Project Report
2 pages
R4M - Superstore Dataset
No ratings yet
R4M - Superstore Dataset
2 pages
671b287a96722 Excel Competency Check - MSDF
No ratings yet
671b287a96722 Excel Competency Check - MSDF
1 page
Case Study: Analyze Sales: Clean Up Data
No ratings yet
Case Study: Analyze Sales: Clean Up Data
1 page
List of Some Implementation Based Problems On Spoj
No ratings yet
List of Some Implementation Based Problems On Spoj
2 pages
Notes On The Balance of Power
No ratings yet
Notes On The Balance of Power
1 page
Coursera 23W8XJFD7V54
No ratings yet
Coursera 23W8XJFD7V54
1 page
Autobiography Rubric: Category 4 3 2 1
No ratings yet
Autobiography Rubric: Category 4 3 2 1
1 page
Harrogate International Application Form
No ratings yet
Harrogate International Application Form
4 pages
MMT Bus E-Ticket Nu 25147911932077 Hyderabad-Pune
No ratings yet
MMT Bus E-Ticket Nu 25147911932077 Hyderabad-Pune
2 pages
Complete Fundamental of Computer
100% (3)
Complete Fundamental of Computer
42 pages

Sales Analysis Project

Uploaded by

Sales Analysis Project

Uploaded by

sales-analysis

0.0.1 Import Necessary Libraries

[1]: import pandas as pd

0.0.2 Merge 12 Month of Sales Data into a Single CSV File

[2]: files = [file for file in os.listdir("./Sales_Data")]

for dir in files:

# concate all dataframes

# save all files in one file

[4]: Order ID Product Quantity Ordered Price Each \

Order Date Purchase Address

0.0.3 Clean up the Data

[5]: Order ID 545

[6]: all_data = all_data.dropna(how="all")

Find “OR” and dlt it

[9]: all_data = all_data[all_data["Order Date"].str[0:2] != "Or"]

Convert Column to the Correct Type

all_data["Price Each"] = pd.to_numeric(all_data["Price Each"]) #␣

Add Month Column

[11]: Order ID Product Quantity Ordered Price Each \

Order Date Purchase Address Month

Add Sales Column

[12]: Order ID Product Quantity Ordered Price Each \

[185950 rows x 8 columns]

0.0.4 Which is the Best Month for Sales?

all_data["City"] = all_data["Purchase Address"].apply(lambda x: f"{get_city(x)}␣

[15]: Order ID Product Quantity Ordered Price Each \

[185950 rows x 9 columns]

0.0.5 Which City had the Highest Sales?

[16]: results = all_data.groupby("City").sum()

[17]: cities = all_data["City"].unique()

[18]: all_data["Order Date"] = pd.to_datetime(all_data["Order Date"]) # convert␣

Order Date Purchase Address Month Sales \

[20]: all_data["Hour"] = all_data["Order Date"].dt.hour

[21]: Order ID Product Quantity Ordered Price Each \

Order Date Purchase Address Month Sales \

City Hour Minute

#Highest number of orders came on 11 AM & 7 PM (11)

[23]: product_group = all_data.groupby("Product")

ax2 = ax1.twinx() # share x axis and create second y axis,␣

You might also like