0% found this document useful (0 votes)
15 views15 pages

Advance Database

The document outlines a data warehouse and data mining solution designed for RetailCo, a mid-sized retail company, to enhance business insights. It details the company's goals, challenges, and data needs, along with the design of a star schema for the data warehouse, the ETL process, and data mining techniques such as clustering and association rule mining. Key insights include customer segmentation, sales trends, and product associations, leading to actionable recommendations for targeted marketing and inventory optimization.

Uploaded by

xddaiki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views15 pages

Advance Database

The document outlines a data warehouse and data mining solution designed for RetailCo, a mid-sized retail company, to enhance business insights. It details the company's goals, challenges, and data needs, along with the design of a star schema for the data warehouse, the ETL process, and data mining techniques such as clustering and association rule mining. Key insights include customer segmentation, sales trends, and product associations, leading to actionable recommendations for targeted marketing and inventory optimization.

Uploaded by

xddaiki
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Warehouse and Data Mining Solution for Retail Business Insights

SCHOOL LOGO

Submitted by:

Rusty iean uy

Khen Maverick

Kurt Aguillon

Submitted to:

Jerome Dela Pena

1
TABLE OF CONTENT

CHAPTER 1.....................................................................................................................3

1.1 Business Case Study....................................................................................... 3

CHAPTER 2.....................................................................................................................3

2.1 Data Warehouse Design.......................................................................................3

2.1.1 Schema Design:............................................................................................... 3

2.1.2 Data Source Identification.................................................................................3

2.1.3 Diagram............................................................................................................3

CHAPTER 3.....................................................................................................................3

3.1 ETL Process...................................................................................................... 3

3.1.1 Data Extraction.................................................................................................3

3.1.2 Data Transformation.........................................................................................3

3.1.3 Data Loading.................................................................................................... 3

CHAPTER 4.....................................................................................................................3

4.1 Data Mining Techniques...................................................................................3

4.1.1 Model Explanation............................................................................................3

4.1.2 Results Interpretation....................................................................................... 3

CHAPTER 5.....................................................................................................................4

5.1 Data Analysis & Visualization..........................................................................4

5.1.1 Data Analysis....................................................................................................4

5.1.2 Visualization..................................................................................................... 4

5.1.3 Summary Report.............................................................................................. 4

2
CHAPTER 1

1.1 Business Case Study

Business Domain: Retail

Company Overview

RetailCo is a mid-sized retail company operating both online and physical stores across
multiple regions. The company sells a wide range of products, including electronics,
clothing, and home goods.

Company Goals

 Increase Sales: Boost overall sales through targeted marketing and optimized
inventory management.

 Improve Customer Satisfaction: Enhance customer experience by


understanding preferences and behaviors.

 Optimize Inventory: Reduce stockouts and overstock situations by forecasting


demand accurately.

Challenges

 Identifying Purchase Patterns: Understanding what products are frequently


bought together to create effective marketing campaigns.

 Managing Inventory: Balancing stock levels to meet demand without incurring


excess inventory costs.

 Analyzing Sales Performance: Evaluating sales across different regions and


channels to identify high-performing areas and those needing improvement.

Data Needs

 Customer Data: Includes demographics (age, gender, location), purchase


history, and preferences.

3
 Sales Data: Detailed transactional data such as product IDs, quantities sold,
prices, timestamps, and sales channels (online vs. in-store).

 Product Data: Information about products, including categories, prices,


suppliers, and inventory levels.

Primary Insights Desired

1. Customer Segmentation: Grouping customers based on purchasing behavior


and demographics to tailor marketing strategies.

2. Sales Trend Analysis: Identifying peak sales periods and seasonal trends to
inform inventory and marketing decisions.

3. Product Affinity Analysis: Discovering products frequently bought together to


optimize cross-selling and bundling strategies.

4. Regional Performance: Assessing sales performance across different regions


to allocate resources effectively.

4
CHAPTER 2

2.1 Data Warehouse Design

2.1.1 Schema Design:

A Star Schema is chosen for its simplicity and efficiency in handling complex queries,
which is ideal for the retail scenario.

Fact Table

 FactSales

o Columns:

 SaleID (Primary Key)

 DateID (Foreign Key)

 CustomerID (Foreign Key)

 ProductID (Foreign Key)

 StoreID (Foreign Key)

 QuantitySold

 TotalAmount

Dimension Tables

1. DimDate

o DateID (Primary Key)

o Date

o Day

o Month

o Quarter

5
o Year

2. DimCustomer

o CustomerID (Primary Key)

o FirstName

o LastName

o Gender

o Age

o Location

3. DimProduct

o ProductID (Primary Key)

o ProductName

o Category

o Price

o Supplier

4. DimStore

o StoreID (Primary Key)

o StoreName

o Location

o Manager

6
2.1.2 Data Source Identification

 Point of Sale (POS) Systems: Provide transactional sales data including SaleID,
DateID, ProductID, StoreID, QuantitySold, and TotalAmount.

 Customer Relationship Management (CRM) Systems: Supply customer-related


data such as CustomerID, FirstName, LastName, Gender, Age, and Location.

 Inventory Management Systems: Offer product-related information including


ProductID, ProductName, Category, Price, and Supplier.

2.1.3 Diagram

+-------------+

| DimDate |

+-------------+

+-------------+

| FactSales |

+-------------+

/ | \

/ | \

+---------+ +---------+ +---------+

| DimCust | | DimProd | | DimStore|

+---------+ +---------+ +---------+

7
CHAPTER 3

3.1 ETL Process

3.1.1 Data Extraction

Data Extraction

 Tools/Technologies:

o SQL: To query and extract data from relational databases.

o Python Scripts: For handling data extraction from APIs or flat files if
necessary.

o ETL Tool: Apache NiFi is used for orchestrating the ETL workflow due to
its user-friendly interface and scalability.

3.1.2 Data Transformation

 Cleaning:

 Remove duplicate records.

 Handle missing values by imputing or removing incomplete records.

 Normalization:

 Standardize data formats (e.g., date formats to YYYY-MM-DD).

 Ensure consistent naming conventions across datasets.

 Standardization:

 Convert categorical variables to a standardized format (e.g., gender to 'Male',


'Female', 'Other').

 Calculate derived metrics such as Total Amount by multiplying Quantity Sold by


Price.

8
3.1.3 Data Loading

 Loading Strategy: Incremental Load

o Only new or updated records are loaded into the data warehouse to
optimize performance.

 Schedule:

o Frequency: Daily at midnight to ensure the data warehouse is updated


with the latest information without disrupting business operations.

 Process:

o Extract data from source systems.

o Transform the data as per the requirements.

9
CHAPTER 4

4.1 Data Mining Techniques

Selected Techniques

1. Clustering (K-Means)

2. Association Rule Mining (Apriori Algorithm)

4.1.1 Model Explanation

Clustering (K-Means)
 Purpose: To segment customers into distinct groups based on their purchasing
behavior and demographics.
 Justification: Clustering helps in identifying homogeneous groups within the
customer base, enabling targeted marketing and personalized offers.
 Insights Provided:
o Identifies high-value customers.
o Reveals customer segments with similar purchasing patterns.
o Assists in tailoring marketing strategies to different segments.
2. Association Rule Mining (Apriori Algorithm)
 Purpose: To discover frequently bought together products.
 Justification: Understanding product associations aids in cross-selling, bundling,
and inventory management.
 Insights Provided:
o Identifies product pairs or sets that are often purchased together.
o Helps in designing promotions and placement strategies in stores.
o Optimizes inventory by forecasting demand for associated products.

4.1.2 Results Interpretation

Clustering Results

10
Findings:
Segment 1: Young adults (age 18-25) primarily purchasing electronics and
fashion items.
Segment 2: Middle-aged customers (age 35-50) buying home goods and
appliances.
Segment 3: Senior customers (age 60+) with a preference for health and
wellness products.
Business Impact:
Enables targeted marketing campaigns tailored to each segment's preferences.
Improves customer retention by addressing specific needs of different groups.
Association Rule Mining Results
Findings:
Customers who buy smartphones often purchase protective cases and screen
protectors.
Shoppers purchasing coffee machines frequently buy coffee beans and filters.
Business Impact:
Facilitates effective cross-selling by recommending related products.
Enhances product placement strategies in both online and physical stores.
Informs inventory management to ensure availability of associated products.

11
CHAPTER 5

5.1 Data Analysis & Visualization

5.1.1 Data Analysis

 Trends and Patterns:


 Sales Trends: Increased sales during holiday seasons and weekends.
 Customer Behavior: High repeat purchase rate among Segment 1 and
Segment 2.
 Product Performance: Electronics category shows the highest growth, while
home goods remain steady.
 Correlations:
 Positive correlation between marketing spend and sales in targeted segments.
 Significant association between product categories (e.g., electronics and
accessories).

5.1.2 Visualization

 Tools Used: Tableau


Visualizations Created:
1. Sales Trend Dashboard:
o Line Charts showing monthly sales across different categories.
o Heatmaps indicating peak sales periods.
2. Customer Segmentation Pie Chart:
o Pie Chart displaying the distribution of customer segments.
3. Product Affinity Graph:
o Network Diagram illustrating associations between frequently bought
together products.
4. Regional Sales Performance Bar Chart:
o Bar Charts comparing sales figures across different regions.

12
5.1.3 Summary Report

Executive Summary
RetailCo has implemented a data warehouse and data mining solution to
harness valuable insights from their extensive data. By leveraging customer
segmentation and association rule mining, RetailCo can enhance marketing
strategies, optimize inventory, and ultimately drive sales growth.
Key Insights
1. Customer Segmentation:
o Three distinct customer segments identified, enabling tailored marketing
efforts.
2. Sales Trends:
o Recognized seasonal sales peaks, allowing for proactive inventory and
marketing planning.
3. Product Associations:
o Established strong product pairings that can be utilized for cross-selling
and promotional campaigns.
Actionable Recommendations
 Targeted Marketing: Develop personalized campaigns for each customer
segment to increase engagement and sales.
 Inventory Optimization: Adjust stock levels based on identified sales trends and
product associations to reduce stockouts and excess inventory.
 Cross-Selling Strategies: Implement bundling offers for frequently bought
together products to boost average transaction value.

13
Code and data visualizations

ETL Code using python

DATA Mining :

14
Python Script for Clustering and Association Rule Mining

Visualization Code

15

You might also like