0% found this document useful (0 votes)
8 views5 pages

DMV 6 Output

sppu dmv practical 6 output

Uploaded by

sachin ahankari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views5 pages

DMV 6 Output

sppu dmv practical 6 output

Uploaded by

sachin ahankari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

DMV PRACTICAL 6

Data Aggregation Problem Statement: Analyzing Sales Performance by Region in a Retail Company Datas
et: " customer_shopping_data.csv" Description: The dataset contains information about sales tra
nsactions in a retail company. It includes attributes such as transaction date, product category, quantity
sold, and sales amount. The goal is to perform data aggregation to analyze the sales performance by regi
on and identify the top-performing regions.
Tasks to Perform:
1. Import the " customer_shopping_data.csv" dataset.
2. Explore the dataset to understand its structure and content.
3. Identify the relevant variables for aggregating sales data, such as region, sales amount, and product ca
tegory.
4. Group the sales data by region and calculate the total sales amount for each region.
5. Create bar plots or pie charts to visualize the sales distribution by region.
6. Identify the top-performing regions based on the highest sales amount.
7. Group the sales data by region and product category to calculate the total sales amount for each com
bination.
8. Create stacked bar plots or grouped bar plots to compare the sales amounts across different regions
and product categories.
PYTHON CODE :
import pandas as pd
import matplotlib.pyplot as plt

# Ensure this path points to the actual location of your CSV file
df = pd.read_csv("customer_shopping_data.csv")

# To check the count of records grouped by region/branch of the mall


print(df.groupby("shopping_mall").count())

# To check the count of records grouped by the product categories


print(df.groupby("category").count())

# Total sales for each mall branch


branch_sales = df.groupby("shopping_mall").sum()

# Total sales for each category of product


category_sales = df.groupby("category").sum()

# To get the top performing branches


top_branches = branch_sales.sort_values(by="price", ascending=False)

# To get the top selling categories


top_categories = category_sales.sort_values(by="price", ascending=False)

# To get total sales for each combination of branch and product_category


combined_branch_category_sales = df.groupby(["shopping_mall", "category"]).su
m()

# Pie chart for sales by branch


plt.pie(branch_sales["price"], labels=branch_sales.index, autopct='%1.1f%%',
shadow=True, startangle=140)
plt.title('Sales by Branch')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle
plt.show()

# Pie chart for sales by product category


plt.pie(category_sales["price"], labels=category_sales.index, autopct='%1.1f%
%', shadow=True, startangle=140)
plt.title('Sales by Product Category')
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle
plt.show()

# Pivot table for combined sales by branch and category


combined_pivot = df.pivot_table(index="shopping_mall", columns="category", va
lues="price", aggfunc="sum")

# Grouped bar chart for sales of different categories at different branches


combined_pivot.plot(kind='bar', figsize=(10, 6))
plt.title('Sales of Different Categories at Different Branches')
plt.ylabel('Sales')
plt.show()
invoice_no customer_id gender age category quantity \
shopping_mall
Cevahir AVM 4991 4991 4991 4991 4991 4991
Emaar Square Mall 4811 4811 4811 4811 4811 4811
Forum Istanbul 4947 4947 4947 4947 4947 4947
Istinye Park 9781 9781 9781 9781 9781 9781
Kanyon 19823 19823 19823 19823 19823 19823
Mall of Istanbul 19943 19943 19943 19943 19943 19943
Metrocity 15011 15011 15011 15011 15011 15011
Metropol AVM 10161 10161 10161 10161 10161 10161
Viaport Outlet 4914 4914 4914 4914 4914 4914
Zorlu Center 5075 5075 5075 5075 5075 5075

price payment_method invoice_date


shopping_mall
Cevahir AVM 4991 4991 4991
Emaar Square Mall 4811 4811 4811
Forum Istanbul 4947 4947 4947
Istinye Park 9781 9781 9781
Kanyon 19823 19823 19823
Mall of Istanbul 19943 19943 19943
Metrocity 15011 15011 15011
Metropol AVM 10161 10161 10161
Viaport Outlet 4914 4914 4914
Zorlu Center 5075 5075 5075
invoice_no customer_id gender age quantity price \
category
Books 4981 4981 4981 4981 4981 4981
Clothing 34487 34487 34487 34487 34487 34487
Cosmetics 15097 15097 15097 15097 15097 15097
Food & Beverage 14776 14776 14776 14776 14776 14776
Shoes 10034 10034 10034 10034 10034 10034
Souvenir 4999 4999 4999 4999 4999 4999
Technology 4996 4996 4996 4996 4996 4996
Toys 10087 10087 10087 10087 10087 10087

payment_method invoice_date shopping_mall


category
Books 4981 4981 4981
Clothing 34487 34487 34487
Cosmetics 15097 15097 15097
Food & Beverage 14776 14776 14776
Shoes 10034 10034 10034
Souvenir 4999 4999 4999
Technology 4996 4996 4996
Toys 10087 10087 10087
C:\Users\AI&DS\AppData\Local\Temp\ipykernel_12148\2859295099.py:14: FutureWar
ning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated
. In a future version, numeric_only will default to False. Either specify num
eric_only or select only columns which should be valid for the function.
branch_sales = df.groupby("shopping_mall").sum()
C:\Users\AI&DS\AppData\Local\Temp\ipykernel_12148\2859295099.py:17: FutureWar
ning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated
. In a future version, numeric_only will default to False. Either specify num
eric_only or select only columns which should be valid for the function.
category_sales = df.groupby("category").sum()
C:\Users\AI&DS\AppData\Local\Temp\ipykernel_12148\2859295099.py:26: FutureWar
ning: The default value of numeric_only in DataFrameGroupBy.sum is deprecated
. In a future version, numeric_only will default to False. Either specify num
eric_only or select only columns which should be valid for the function.
combined_branch_category_sales = df.groupby(["shopping_mall", "category"]).
sum()

You might also like