Retail Sales Dataset Project Documen1
Retail Sales Dataset Project Documen1
1. Project Overview
This project focuses on processing and analyzing retail sales data using Azure Data Lake
Storage (ADLS) and Databricks. The goal is to ingest, transform, and organize data
efficiently across different storage layers (Landing, Bronze, Silver, and Gold) for insightful
analysis.
2. Data Source
Source: Kaggle Retail Sales Dataset
Data Type: CSV files
5. Technologies Used
Cloud Storage: Azure Data Lake Storage (ADLS)
Processing Framework: Databricks (Apache Spark)
Data Source: Kaggle CSV files
Authentication: Access key-based authentication
6. Expected Outcomes
Well-organized data across Landing, Bronze, Silver, and Gold layers.
Cleaned and transformed data ready for analysis.
Efficient data pipeline for future scalability and automation.
7. Future Enhancements
Implement Delta Lake for enhanced data reliability.
Automate the ingestion pipeline using Azure Data Factory (ADF).
Apply Machine Learning models for demand forecasting.