0% found this document useful (0 votes)

46 views6 pages

Supermarket Sales Data Analysis

None

Uploaded by

gutgit026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views6 pages

Supermarket Sales Data Analysis

None

Uploaded by

gutgit026

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Supermarket Sales Data analysis

Here’s a beginner-level Python Pandas project idea to help you practice and strengthen your
skills:

Project: Analyzing Supermarket Sales Data

Objective

Perform basic data analysis on supermarket sales data to derive meaningful insights using
Pandas.

Steps and Instructions

1. Dataset

You can use a public dataset like the "Supermarket Sales" dataset from Kaggle (link to dataset)
or create your own dummy dataset in a CSV file.

https://www.kaggle.com/datasets/aungpyaeap/supermarket-sales?resource=download

Sample Dataset Columns:

1. Invoice ID
2. Branch
3. Customer Type (e.g., Member, Normal)
4. Gender
5. Product Line
6. Unit Price
7. Quantity
8. Total
9. Date
10. Payment Method
11. Rating

2. Tasks to Perform
1. Load the dataset:

○ Read the CSV file into a Pandas DataFrame.

○ Display the first 10 rows and understand the structure.
2. Data Exploration:

○Check for missing values.

○Understand data types and convert them if necessary (e.g., convert Date to
datetime).
○ Generate summary statistics (mean, median, min, max, etc.).
3. Data Manipulation:

○Add a new column: Compute "Total Sales" for each product (Unit Price ×
Quantity).
○ Filter data: Extract sales records for a specific branch or product line.
4. Data Aggregation:

○ Calculate the total sales per branch.

○ Group by Product Line and find the average rating for each product line.
○ Identify the branch with the highest sales.
5. Visualization (Optional):
Use Matplotlib or Seaborn to create simple plots like:

○ Sales distribution by branch.

○ Average rating by product line.
○ Total sales by payment method.

3. Code Structure

Here’s a basic outline to guide you:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset

data = pd.read_csv("supermarket_sales.csv")

# Data Exploration
print(data.info())
print(data.describe())
print(data.isnull().sum())
# Data Manipulation
data['Total Sales'] = data['Unit Price'] * data['Quantity']

# Aggregation
total_sales_by_branch = data.groupby('Branch')['Total Sales'].sum()
average_rating_by_product_line = data.groupby('Product Line')['Rating'].mean()

# Visualizations
sns.barplot(x=total_sales_by_branch.index, y=total_sales_by_branch.values)
plt.title("Total Sales by Branch")
plt.show()

Expected Outcome

By the end of this project, you should be comfortable with:

● Loading and exploring data using Pandas.

● Performing basic data manipulations.
● Summarizing and aggregating data using Pandas functions.
● Optionally, visualizing data with Matplotlib or Seaborn.

comprehensive list of Pandas methods

Here’s a comprehensive list of Pandas methods and some additional Python functions you
might use for the project:

1. Data Loading

● pd.read_csv(filepath) – Load a CSV file into a DataFrame.

● .head(n) – Display the first n rows of the DataFrame.
● .tail(n) – Display the last n rows.
● .sample(n) – Randomly sample n rows.
2. Data Exploration

● .info() – Get a summary of the DataFrame structure, including column data types and
non-null counts.
● .shape – Get the dimensions of the DataFrame (rows, columns).
● .columns – Get the list of column names.
● .describe() – Generate summary statistics for numerical columns.
● .isnull() – Check for missing values (returns a boolean DataFrame).
● .isnull().sum() – Count missing values for each column.
● .dtypes – Get data types of each column.
● .unique() – Get unique values in a column.
● .value_counts() – Count occurrences of unique values in a column.

3. Data Manipulation

● Adding a new column:

df['New Column'] = df['Column1'] * df['Column2']

● Renaming columns:
.rename(columns={'Old Name': 'New Name'})

● Filtering rows:

○ Conditional filtering: df[df['Column'] > value]

○ Multiple conditions: df[(df['Column1'] > value1) & (df['Column2']
== 'condition')]
● Sorting:
.sort_values(by='Column', ascending=True)

● Resetting index:
.reset_index(drop=True)

● Dropping columns or rows:

.drop(columns=['Column1', 'Column2'])
.drop(index=[0, 1])

● Changing data types:

.astype({'Column': 'datatype'})
.to_datetime(df['Column']) – Convert a column to datetime.

4. Data Aggregation and Grouping

● .groupby('Column') – Group data by one or more columns.

● Aggregation functions:

○ .sum() – Calculate the sum.

○ .mean() – Calculate the mean.
○ .median() – Calculate the median.
○ .min() and .max() – Get minimum and maximum values.
○ .count() – Count the number of non-null values.
○ .agg({'Col1': 'mean', 'Col2': 'sum'}) – Apply multiple aggregations.
● df.pivot_table(values='Column', index='Col1', columns='Col2',
aggfunc='sum') – Create pivot tables.

5. Visualization (Optional)

Use Matplotlib or Seaborn:

● Matplotlib

○ plt.plot() – Plot data.

○ plt.bar() – Create bar plots.
○ plt.pie() – Create pie charts.
○ plt.show() – Display the plot.
● Seaborn

○ sns.barplot(x, y) – Create bar plots.

○ sns.histplot(x) – Create histograms.
○ sns.heatmap() – Display heatmaps.

6. Saving the Results

● df.to_csv('filename.csv', index=False) – Save the DataFrame to a CSV file.

7. General Python Functions

● len(df) – Get the number of rows in the DataFrame.

● set() – Get unique values (alternative to .unique()).
● round(number, decimals) – Round values to a specified number of decimals.

Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Python Interviews
No ratings yet
Python Interviews
154 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Data Aggregation Using Python
No ratings yet
Data Aggregation Using Python
33 pages
Ai Programs
No ratings yet
Ai Programs
22 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
Pandas
No ratings yet
Pandas
20 pages
Intro To Pandas For Data Analytics
No ratings yet
Intro To Pandas For Data Analytics
20 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Lab Manual 4
No ratings yet
Lab Manual 4
23 pages
Python Comands
No ratings yet
Python Comands
3 pages
Pandas Notes
No ratings yet
Pandas Notes
8 pages
BIDA Practical Print
No ratings yet
BIDA Practical Print
56 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Document 11
No ratings yet
Document 11
6 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Dataframe in Pandas - Cheatsheet
No ratings yet
Dataframe in Pandas - Cheatsheet
8 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Dataframing in CSV
No ratings yet
Dataframing in CSV
14 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas Tutorial
No ratings yet
Pandas Tutorial
9 pages
Guides
No ratings yet
Guides
23 pages
Data Collection and Data Cleaning: Next Connect To The Drive
No ratings yet
Data Collection and Data Cleaning: Next Connect To The Drive
16 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
14oct Pandas 2024
No ratings yet
14oct Pandas 2024
13 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Pandas
No ratings yet
Pandas
13 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
11 pages
Task-by-Task Guide - Retail Data Analysis
No ratings yet
Task-by-Task Guide - Retail Data Analysis
6 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
Task 6
No ratings yet
Task 6
14 pages
Pandas Commands
No ratings yet
Pandas Commands
3 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
2 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
Wa0002.
No ratings yet
Wa0002.
4 pages
What Is Pandas
No ratings yet
What Is Pandas
9 pages
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
No ratings yet
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
10 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
Pandas Notes
No ratings yet
Pandas Notes
3 pages
IP Project Final
No ratings yet
IP Project Final
9 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Divyanshi 05401172023 Ds Practical
No ratings yet
Divyanshi 05401172023 Ds Practical
18 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
KNN Algorithm - PPT (Autosaved)
0% (1)
KNN Algorithm - PPT (Autosaved)
8 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
Important Pandas Operations 1697910759
No ratings yet
Important Pandas Operations 1697910759
6 pages
Addis Ababa University Thesis and Dissertation PDF
100% (2)
Addis Ababa University Thesis and Dissertation PDF
6 pages
Data Manipulation With Pandas - Yulei's Sandbox
No ratings yet
Data Manipulation With Pandas - Yulei's Sandbox
18 pages
EDA With Pandas
No ratings yet
EDA With Pandas
8 pages
Data Science PPT-2
No ratings yet
Data Science PPT-2
34 pages
Detection of Autism Spectrum Disorder
No ratings yet
Detection of Autism Spectrum Disorder
52 pages
20 Questions To Test Your Skills On Logistic Regression
No ratings yet
20 Questions To Test Your Skills On Logistic Regression
9 pages
Getting Your Data Ready For Ai Oreilly Ebook 87023487USEN
No ratings yet
Getting Your Data Ready For Ai Oreilly Ebook 87023487USEN
25 pages
Zara
No ratings yet
Zara
47 pages
Topic 8 Time Series and Forecasting
No ratings yet
Topic 8 Time Series and Forecasting
33 pages
Chap1-Overview of Data Science
No ratings yet
Chap1-Overview of Data Science
50 pages
Comprehensive Data Analysis Course
No ratings yet
Comprehensive Data Analysis Course
5 pages
Project Impact of Car Features
No ratings yet
Project Impact of Car Features
9 pages
Forecasting
No ratings yet
Forecasting
5 pages
Ada Module Chapter 1
No ratings yet
Ada Module Chapter 1
20 pages
Sta 3010 Quizes
No ratings yet
Sta 3010 Quizes
10 pages
Apache Spark Week-5 PDF
No ratings yet
Apache Spark Week-5 PDF
9 pages
Co 2 Multivariate Analysis
No ratings yet
Co 2 Multivariate Analysis
71 pages
Resiliency and Success Indicators of Close Proximity Vendors in Talavera N.E No Resume
No ratings yet
Resiliency and Success Indicators of Close Proximity Vendors in Talavera N.E No Resume
59 pages
Skittles Project 2-6
No ratings yet
Skittles Project 2-6
8 pages
Econometrics Course Outline
No ratings yet
Econometrics Course Outline
2 pages
Python
No ratings yet
Python
3 pages
Research Methodology
No ratings yet
Research Methodology
7 pages
Muh. Anzar Amrullah - Pte - Data 50 Sampel 3 Varian
No ratings yet
Muh. Anzar Amrullah - Pte - Data 50 Sampel 3 Varian
21 pages
Welcome To ISLP Documentation! - Introduction To Statistical Learning (Python)
No ratings yet
Welcome To ISLP Documentation! - Introduction To Statistical Learning (Python)
8 pages
Advanced Econometrics: Masters Class
No ratings yet
Advanced Econometrics: Masters Class
24 pages
Mvreg - Multivariate Regression
No ratings yet
Mvreg - Multivariate Regression
7 pages
Prediction of Air Quality Index Using Supervised Machine Learning
No ratings yet
Prediction of Air Quality Index Using Supervised Machine Learning
14 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Chapter-1: The Study On Effectiveness of Dealer Promotional Strategy at Toms Pipes
No ratings yet
Chapter-1: The Study On Effectiveness of Dealer Promotional Strategy at Toms Pipes
12 pages
Aleks 1.72
No ratings yet
Aleks 1.72
5 pages
AIML 2nd IA Question Bank
No ratings yet
AIML 2nd IA Question Bank
2 pages
MySQL Crash Course: A Hands-on Introduction to Database Development
From Everand
MySQL Crash Course: A Hands-on Introduction to Database Development
Rick Silva
No ratings yet
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet

Supermarket Sales Data Analysis

Uploaded by

Supermarket Sales Data Analysis

Uploaded by

Supermarket Sales Data analysis

Project: Analyzing Supermarket Sales Data

Steps and Instructions

Sample Dataset Columns:

○ Read the CSV file into a Pandas DataFrame.

○Check for missing values.

○ Calculate the total sales per branch.

○ Sales distribution by branch.

Here’s a basic outline to guide you:

# Load the dataset

By the end of this project, you should be comfortable with:

● Loading and exploring data using Pandas.

comprehensive list of Pandas methods

● pd.read_csv(filepath) – Load a CSV file into a DataFrame.

● Adding a new column:

○ Conditional filtering: df[df['Column'] > value]

● Dropping columns or rows:

● Changing data types:

4. Data Aggregation and Grouping

● .groupby('Column') – Group data by one or more columns.

○ .sum() – Calculate the sum.

Use Matplotlib or Seaborn:

○ plt.plot() – Plot data.

○ sns.barplot(x, y) – Create bar plots.

6. Saving the Results

7. General Python Functions

● len(df) – Get the number of rows in the DataFrame.

You might also like