Sales Analysis Project
Sales Analysis Project
May 3, 2025
Sales_April_2019.csv
Sales_August_2019.csv
Sales_December_2019.csv
Sales_February_2019.csv
Sales_January_2019.csv
Sales_July_2019.csv
Sales_June_2019.csv
Sales_March_2019.csv
Sales_May_2019.csv
Sales_November_2019.csv
Sales_October_2019.csv
Sales_September_2019.csv
[3]: df_list = []
df = pd.read_csv(file_path)
df_list.append(df)
1
[4]: all_data = pd.read_csv("all_data.csv")
all_data.head()
[5]: all_data.isnull().sum()
[7]: all_data.isnull().sum()
[7]: Order ID 0
Product 0
Quantity Ordered 0
Price Each 0
Order Date 0
Purchase Address 0
dtype: int64
[8]: Index(['Order ID', 'Product', 'Quantity Ordered', 'Price Each', 'Order Date',
'Purchase Address'],
2
dtype='object')
3
Order Date Purchase Address Month Sales
0 04/19/19 08:46 917 1st St, Dallas, TX 75001 4 23.90
2 04/07/19 22:30 682 Chestnut St, Boston, MA 02215 4 99.99
3 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4 600.00
4 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4 11.99
5 04/30/19 09:27 333 8th St, Los Angeles, CA 90001 4 11.99
… … … … …
186845 09/17/19 20:56 840 Highland St, Los Angeles, CA 90001 9 8.97
186846 09/01/19 16:00 216 Dogwood St, San Francisco, CA 94016 9 700.00
186847 09/23/19 07:39 220 12th St, San Francisco, CA 94016 9 700.00
186848 09/19/19 17:30 511 Forest St, San Francisco, CA 94016 9 379.99
186849 09/30/19 00:18 250 Meadow St, San Francisco, CA 94016 9 11.95
[14]: plt.figure(figsize=(6,4))
months = range(1,13)
plt.bar(months, results["Sales"])
plt.xticks(months)
plt.xlabel("Month Number")
plt.ylabel("Sales in USD ($)")
plt.show()
4
Add City Column
[15]: def get_city(address):
return address.split(",")[1]
def get_state(address):
return address.split(",")[2].split(" ")[1]
all_data
5
Order Date Purchase Address Month \
0 04/19/19 08:46 917 1st St, Dallas, TX 75001 4
2 04/07/19 22:30 682 Chestnut St, Boston, MA 02215 4
3 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4
4 04/12/19 14:38 669 Spruce St, Los Angeles, CA 90001 4
5 04/30/19 09:27 333 8th St, Los Angeles, CA 90001 4
… … … …
186845 09/17/19 20:56 840 Highland St, Los Angeles, CA 90001 9
186846 09/01/19 16:00 216 Dogwood St, San Francisco, CA 94016 9
186847 09/23/19 07:39 220 12th St, San Francisco, CA 94016 9
186848 09/19/19 17:30 511 Forest St, San Francisco, CA 94016 9
186849 09/30/19 00:18 250 Meadow St, San Francisco, CA 94016 9
Sales City
0 23.90 Dallas (TX)
2 99.99 Boston (MA)
3 600.00 Los Angeles (CA)
4 11.99 Los Angeles (CA)
5 11.99 Los Angeles (CA)
… … …
186845 8.97 Los Angeles (CA)
186846 700.00 San Francisco (CA)
186847 700.00 San Francisco (CA)
186848 379.99 San Francisco (CA)
186849 11.95 San Francisco (CA)
6
0.0.6 What time should we display advertisements to maximize the likelihood of
customers buying the product?
C:\Users\DELL\AppData\Local\Temp\ipykernel_9116\2228339044.py:1: UserWarning:
Could not infer format, so each element will be parsed individually, falling
back to `dateutil`. To ensure parsing is consistent and as-expected, please
specify a format.
all_data["Order Date"] = pd.to_datetime(all_data["Order Date"]) # convert
datetime format
[19]: all_data.head()
7
[19]: Order ID Product Quantity Ordered Price Each \
0 176558 USB-C Charging Cable 2 11.95
2 176559 Bose SoundSport Headphones 1 99.99
3 176560 Google Phone 1 600.00
4 176560 Wired Headphones 1 11.99
5 176561 Wired Headphones 1 11.99
City
0 Dallas (TX)
2 Boston (MA)
3 Los Angeles (CA)
4 Los Angeles (CA)
5 Los Angeles (CA)
[21]: all_data.head()
8
[22]: hours = sorted(all_data["Hour"].unique())
plt.plot(hours, all_data.groupby(["Hour"]).count())
plt.xticks(hours)
plt.grid()
plt.xlabel("Hour")
plt.ylabel("Count of Orders")
plt.show()
0.0.7 What products sold the most? Why do you think it sold the most?
9
[44]: price = all_data.groupby("Product")["Price Each"].mean()
fig,ax1 = plt.subplots()
ax1.bar(products,quantity_order, color="green")
ax2.plot(products,price)
ax1.set_xlabel("Product Name")
ax1.set_ylabel("Quantity Ordered", color="green")
10
ax1.set_xticklabels(products, rotation="vertical", size=8)
plt.title("Quantity Ordered vs Average Price for Each Product")
plt.show()
C:\Users\DELL\AppData\Local\Temp\ipykernel_9116\836746615.py:13: UserWarning:
set_ticklabels() should only be used with a fixed number of ticks, i.e. after
set_ticks() or using a FixedLocator.
ax1.set_xticklabels(products, rotation="vertical", size=8)
#The products with lower prices sold more, while those with higher prices sold less.
11