Sales Analysis Using Python and SQL
Sales Analysis Using Python and SQL
ANALYSIS
(Python + SQL)
DATA CLEANING (PYTHON)
SALES DATA ANALYSIS USING (PYTHON AND POSTGRES SQL)
Connect to Kaggle API and download file
import kaggle
Import Libraries
import pandas as pd
READ Dataset
data=pd.read_csv('orders.csv',na_values=['Not Available','unknown'])
data.head(10)
Discount Percent
0 2
1 3
2 5
3 2
4 5
5 3
6 3
7 5
8 2
9 3
data.dtypes
Order Id int64
Order Date object
Ship Mode object
Segment object
Country object
City object
State object
Postal Code int64
Region object
Category object
Sub Category object
Product Id object
cost price int64
Formatting columns
data.columns=data.columns.str.lower().str.replace(' ','_')
Column addition
data['profit']=data['list_price']-data['cost_price']
data['discount']=data['list_price']*data['discount_percent']*.01
data['sale_price']=data['list_price']-data['discount']
data.drop(['list_price','cost_price','discount_percent'],axis='columns',inplace=True) data.head()
# auto commit
conn.autocommit=True
# cursor
cur=conn.cursor()
cur.execute('''CREATE DATABASE Orders;''')
conn.close()
except psycopg2.errors as e:
print(e)
try:
cur.execute(query)
print("Table Created")
except psycopg2.errors as e:
print(e)
pd.DataFrame(cur.fetchall())
0 1 2 3 4 \
0 1 2023-03-01 Second Class Consumer United States
1 2 2023-08-15 Second Class Consumer United States
2 3 2023-01-10 Second Class Corporate United States
3 4 2022-06-18 Standard Class Consumer United States
4 5 2022-07-13 Standard Class Consumer United States
... ... ... ... ... ...
9989 9990 2023-02-18 Second Class Consumer United States
9990 9991 2023-03-17 Standard Class Consumer United States
9991 9992 2022-08-07 Standard Class Consumer United States
9992 9993 2022-11-19 Standard Class Consumer United States
9993 9994 2022-07-17 Second Class Consumer United States
5 6 7 8 9 10 \
0 Henderson Kentucky 42420 South Furniture Bookcases
1 Henderson Kentucky 42420 South Furniture Chairs
2 Los Angeles California 90036 West Office Supplies Labels
3 Fort Lauderdale Florida 33311 South Furniture Tables
4 Fort Lauderdale Florida 33311 South Office Supplies Storage
... ... ... ... ... ... ...
9989 Miami Florida 33180 South Furniture Furnishings
9990 Costa Mesa California 92627 West Furniture Furnishings
9991 Costa Mesa California 92627 West Technology Phones
9992 Costa Mesa California 92627 West Office Supplies Paper
9993 Westminster California 92683 West Office Supplies Appliances
CLOSE CONNECTION
conn.close()
CONCLUSION
1. Import dataset from Kaggle API
2. Connection to PostgreSQL
WITH CTE AS (
SELECT CATEGORY,
EXTRACT (YEAR FROM ORDER_DATE) AS YEAR, category year month total_sales
EXTRACT (MONTH FROM ORDER_DATE) AS MONTH, Furniture 2022 10 42888.9
ROUND(SUM(SALE_PRICE)::NUMERIC,1) AS TOTAL_SALES
Office Supplies 2023 2 44118.5
FROM ORDERS_TABLE
GROUP BY 1,2,3 Technology 2023 10 53000.1
ORDER BY 1,2,3),
CTE2 AS (
SELECT
CATEGORY, YEAR, MONTH, TOTAL_SALES,
MAX(TOTAL_SALES) OVER (PARTITION BY CATEGORY) AS MAX_SALES
FROM CTE
)
SELECT
CATEGORY, YEAR, MONTH, TOTAL_SALES
FROM CTE2
WHERE TOTAL_SALES=MAX_SALES;
WHICH SUB CATEGORY HAD HIGHEST GROWTH BY PROFIT IN 2023 COMPARE TO 2022
Link to Github