Advance Database
Advance Database
SCHOOL LOGO
Submitted by:
Rusty iean uy
Khen Maverick
Kurt Aguillon
Submitted to:
1
TABLE OF CONTENT
CHAPTER 1.....................................................................................................................3
CHAPTER 2.....................................................................................................................3
2.1.3 Diagram............................................................................................................3
CHAPTER 3.....................................................................................................................3
CHAPTER 4.....................................................................................................................3
CHAPTER 5.....................................................................................................................4
5.1.2 Visualization..................................................................................................... 4
2
CHAPTER 1
Company Overview
RetailCo is a mid-sized retail company operating both online and physical stores across
multiple regions. The company sells a wide range of products, including electronics,
clothing, and home goods.
Company Goals
Increase Sales: Boost overall sales through targeted marketing and optimized
inventory management.
Challenges
Data Needs
3
Sales Data: Detailed transactional data such as product IDs, quantities sold,
prices, timestamps, and sales channels (online vs. in-store).
2. Sales Trend Analysis: Identifying peak sales periods and seasonal trends to
inform inventory and marketing decisions.
4
CHAPTER 2
A Star Schema is chosen for its simplicity and efficiency in handling complex queries,
which is ideal for the retail scenario.
Fact Table
FactSales
o Columns:
QuantitySold
TotalAmount
Dimension Tables
1. DimDate
o Date
o Day
o Month
o Quarter
5
o Year
2. DimCustomer
o FirstName
o LastName
o Gender
o Age
o Location
3. DimProduct
o ProductName
o Category
o Price
o Supplier
4. DimStore
o StoreName
o Location
o Manager
6
2.1.2 Data Source Identification
Point of Sale (POS) Systems: Provide transactional sales data including SaleID,
DateID, ProductID, StoreID, QuantitySold, and TotalAmount.
2.1.3 Diagram
+-------------+
| DimDate |
+-------------+
+-------------+
| FactSales |
+-------------+
/ | \
/ | \
7
CHAPTER 3
Data Extraction
Tools/Technologies:
o Python Scripts: For handling data extraction from APIs or flat files if
necessary.
o ETL Tool: Apache NiFi is used for orchestrating the ETL workflow due to
its user-friendly interface and scalability.
Cleaning:
Normalization:
Standardization:
8
3.1.3 Data Loading
o Only new or updated records are loaded into the data warehouse to
optimize performance.
Schedule:
Process:
9
CHAPTER 4
Selected Techniques
1. Clustering (K-Means)
Clustering (K-Means)
Purpose: To segment customers into distinct groups based on their purchasing
behavior and demographics.
Justification: Clustering helps in identifying homogeneous groups within the
customer base, enabling targeted marketing and personalized offers.
Insights Provided:
o Identifies high-value customers.
o Reveals customer segments with similar purchasing patterns.
o Assists in tailoring marketing strategies to different segments.
2. Association Rule Mining (Apriori Algorithm)
Purpose: To discover frequently bought together products.
Justification: Understanding product associations aids in cross-selling, bundling,
and inventory management.
Insights Provided:
o Identifies product pairs or sets that are often purchased together.
o Helps in designing promotions and placement strategies in stores.
o Optimizes inventory by forecasting demand for associated products.
Clustering Results
10
Findings:
Segment 1: Young adults (age 18-25) primarily purchasing electronics and
fashion items.
Segment 2: Middle-aged customers (age 35-50) buying home goods and
appliances.
Segment 3: Senior customers (age 60+) with a preference for health and
wellness products.
Business Impact:
Enables targeted marketing campaigns tailored to each segment's preferences.
Improves customer retention by addressing specific needs of different groups.
Association Rule Mining Results
Findings:
Customers who buy smartphones often purchase protective cases and screen
protectors.
Shoppers purchasing coffee machines frequently buy coffee beans and filters.
Business Impact:
Facilitates effective cross-selling by recommending related products.
Enhances product placement strategies in both online and physical stores.
Informs inventory management to ensure availability of associated products.
11
CHAPTER 5
5.1.2 Visualization
12
5.1.3 Summary Report
Executive Summary
RetailCo has implemented a data warehouse and data mining solution to
harness valuable insights from their extensive data. By leveraging customer
segmentation and association rule mining, RetailCo can enhance marketing
strategies, optimize inventory, and ultimately drive sales growth.
Key Insights
1. Customer Segmentation:
o Three distinct customer segments identified, enabling tailored marketing
efforts.
2. Sales Trends:
o Recognized seasonal sales peaks, allowing for proactive inventory and
marketing planning.
3. Product Associations:
o Established strong product pairings that can be utilized for cross-selling
and promotional campaigns.
Actionable Recommendations
Targeted Marketing: Develop personalized campaigns for each customer
segment to increase engagement and sales.
Inventory Optimization: Adjust stock levels based on identified sales trends and
product associations to reduce stockouts and excess inventory.
Cross-Selling Strategies: Implement bundling offers for frequently bought
together products to boost average transaction value.
13
Code and data visualizations
DATA Mining :
14
Python Script for Clustering and Association Rule Mining
Visualization Code
15