Data Analytics Project Sem4
Data Analytics Project Sem4
Introduction
In today’s rapidly evolving digital economy, e-commerce platforms generate vast amounts of data that
can be leveraged to gain valuable business insights. Our project aims to explore and analyze sales
patterns, customer preferences, recent trends and product performance using various data analytics
techniques.
Our study is based on the BlinkIT Grocery Data dataset, which provides extensive details about product
categories, outlet types, pricing, and sales figures. By applying modern data analytics tools such as
Python (Pandas, Matplotlib, Seaborn), Tableau, Power BI, and Excel, we aim to extract meaningful
insights from the dataset.
Through this project, we demonstrate the significance of data analytics in improving business decision-
making and driving growth in the competitive e-commerce industry.
Data Visualization:
Tools: Matplotlib, Seaborn, Plotly.
Purpose: Visualize sales trends, outlet types, and item categories to draw patterns.
Correlation Analysis:
Tools: Pandas, Seaborn.
Purpose: Explore relationships between variables like item visibility, item weight, and
sales.
Regression Analysis:
Tools: Scikit-learn.
Purpose: Build predictive models for sales based on features like item visibility, outlet
type, and outlet size.
Clustering:
Tools: Scikit-learn.
Purpose: Group outlets or products based on similarity in sales, size, or customer ratings.
Classification:
Tools: Scikit-learn.
Purpose: Classify outlets based on type (Supermarket Type1, Type2) or predict customer
ratings.
Outlet Analysis:
How do sales vary by outlet type (Supermarket Type1, Grocery Store, etc.)?
Pricing Analysis:
Relationship between item weight and sales
Rating Analysis:
Since most ratings are 5, is there any variation worth exploring?
For our project, we focused on 2-3 of these trends with clear visualizations and
interpretations. The product category performance and outlet type analysis have
made particularly interesting starting points.
Product Category Performance: Certain product categories, such as Fruits & Vegetables, have
consistently high sales, while fat content influences purchasing behavior.
Outlet Analysis: Sales vary significantly based on outlet type and location tier, with larger
supermarkets generally outperforming smaller grocery stores.
Temporal Trends: Newer outlets tend to have a different sales distribution compared to older
ones, highlighting the importance of store expansion strategies.
Pricing and Sales Correlation: Product visibility and pricing play a crucial role in influencing
customer purchases.
Rating Analysis: Most ratings are concentrated at the higher end (5-star), but further
segmentation could reveal nuanced consumer preferences.
Overall, our project demonstrates how businesses can leverage data analytics to optimize their
operations, improve product offerings, and enhance customer satisfaction. By utilizing tools like
Python, Tableau, and Power BI, organizations can transform raw data into actionable insights,
ultimately driving success in the e-commerce sector.
This project has provided us with a deeper understanding of real-world data analytics applications,
equipping us with essential skills for future industry challenges.