Supermarket Sales Data Analysis
Supermarket Sales Data Analysis
Here’s a beginner-level Python Pandas project idea to help you practice and strengthen your
skills:
Perform basic data analysis on supermarket sales data to derive meaningful insights using
Pandas.
1. Dataset
You can use a public dataset like the "Supermarket Sales" dataset from Kaggle (link to dataset)
or create your own dummy dataset in a CSV file.
https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales?resource=download
1. Invoice ID
2. Branch
3. Customer Type (e.g., Member, Normal)
4. Gender
5. Product Line
6. Unit Price
7. Quantity
8. Total
9. Date
10. Payment Method
11. Rating
2. Tasks to Perform
1. Load the dataset:
○Add a new column: Compute "Total Sales" for each product (Unit Price ×
Quantity).
○ Filter data: Extract sales records for a specific branch or product line.
4. Data Aggregation:
3. Code Structure
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# Data Exploration
print(data.info())
print(data.describe())
print(data.isnull().sum())
# Data Manipulation
data['Total Sales'] = data['Unit Price'] * data['Quantity']
# Aggregation
total_sales_by_branch = data.groupby('Branch')['Total Sales'].sum()
average_rating_by_product_line = data.groupby('Product Line')['Rating'].mean()
# Visualizations
sns.barplot(x=total_sales_by_branch.index, y=total_sales_by_branch.values)
plt.title("Total Sales by Branch")
plt.show()
Expected Outcome
1. Data Loading
● .info() – Get a summary of the DataFrame structure, including column data types and
non-null counts.
● .shape – Get the dimensions of the DataFrame (rows, columns).
● .columns – Get the list of column names.
● .describe() – Generate summary statistics for numerical columns.
● .isnull() – Check for missing values (returns a boolean DataFrame).
● .isnull().sum() – Count missing values for each column.
● .dtypes – Get data types of each column.
● .unique() – Get unique values in a column.
● .value_counts() – Count occurrences of unique values in a column.
3. Data Manipulation
● Renaming columns:
.rename(columns={'Old Name': 'New Name'})
● Filtering rows:
● Resetting index:
.reset_index(drop=True)
● Aggregation functions:
5. Visualization (Optional)
● Matplotlib