Report

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

PROJECT REPORT

Anas Zafar
19i-1676

GOALS OF THE PROJECT


 Build a data warehouse prototype for transactional data of Metro
 Learn the implementation of ETL techniques for mapping data from database into data
warehouse
 Perform OLAP queries for data analysis on the built data warehouse

TASKS PERFORMED TO ACHIEVE THE GOALS


Identifying the facts and dimensions
 Identify the facts and dimensions in the given data and list them down
 Define granularities for the chosen dimensions referring to the business questions asked
 Identify the measures for calculating the Total sales

The following Facts and Dimensions were identified in the project with granularities
Facts:
1. Sales
2. Quantity

Dimensions:
1. Customer Dimension
2. Store Dimension
3. Supplier Dimension
4. Product Dimension
5. Time Dimension
a. Year
b. Quarter
c. Month
d. Date of Month
e. Date (unique)
The Sales fact value is calculated as a product of Quantity sold in transaction and Unit price of
the product
Total_Sales=Quantity*Unit price
Defining Attributes for Facts and Dimension Tables
The fact and dimension tables are composed of following attributes
Dimensions:
1. Customer
a. Customer id (Unique)
b. Customer name
2. Store
a. Store id (Unique)
b. Store name
3. Supplier
a. Supplier id (Unique)
b. Supplier name
4. Product
a. Product id (Unique)
b. Product name
c. Unit price
5. Time
a. Year (Unique)
b. Quarter
c. Month
d. Date of Month
e. Date (full date, unique)
Fact:
1. Fact_Table(Table)
a. Customer id
b. Product id
c. Supplier id
d. Store id
e. Date
f. Quantity
g. Sales

A materialized, view with following attributes, is also created as demanded in the problem
statement
 Store Analysis Materialized View
a. Store id
b. Product id
c. Total sales
This view table contains the product wise total sales of each store
Star Schema
Extraction, Transformation and Loading processes

Transactions

Attributes Methods
transaction id  Transactions ()
product id  Parametrized Constructor
customer id
customer name
store id
store name
date
quantity

Master
Attributes Methods
product id  Master()
product name  Parametrized Constructor
supplier id
supplier name
double price

Fact_Table
Attributes Methods
transaction id  Fact_Table()
customer id  Parametrized Constructor
customer name
store id
store name
product id
product name
supplier id
supplier name
date
double price
quantity
Mesh_Join
Attributes Methods
none  Public Static Void Main (…)
Implements Mesh_Join

Queries’ Results
The snaps below are the piece of results of the respective queries

Query 1

Query 2

Query 3
Query 4

Query 5

Query 6
 There is Transition Id Given ,Which is of no need in data ware house, that's just a burden
in data or adding just more space.
Query 7

Shortcoming of Mesh_Join and learnings from project


Shortcomings
 In MESHJOIN there is a dependency between the size of partitions in an internal queue
for the stream data and the number of iterations required to bring the disk-based
relation into memory. This dependency hampers the optimal distribution of memory
among the join components. In particular the size of the disk-buffer varies with the size
of the disk-based relation which is unnecessary.
 Queue Size is limited ,so that if stream is much bigger than due to fixed size ,waiting
Time increases
 Space inefficiency. It requires much space to process large data in using Queue and hash
tables and result sets.
 In Queue, there is only one way back,from end of queue, so if the data is inserted wrong
or some mistake is done you have to remove all the data till to reach that particular set.
Learnings
 How to identify a warehouse’s components from the database data
 How to design the structure of a data warehouse (the star schema)
 connect java code to MySQL database .
 How to perform ETL operations on database .
 How to use an ETL technique and how usually ETL techniques perform
 How to use a data warehouse for data analysis through OLAP queries

You might also like