0% found this document useful (0 votes)
9 views

Data_Model

sample data model for sales data

Uploaded by

aqsa.domination
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Data_Model

sample data model for sales data

Uploaded by

aqsa.domination
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Why to create a data model?

Data modelling is important due to following reasons


1. To get the right data for the business requirement. For business systems data
modelling helps extract entities, attributes and their relationship and put data into a
store to generate the required data in right format.
2. Data modelling help to map data from business sources to perform analytics and
reporting on the right data. It enhances quality of data and removes data redundancy
3. Data modelling explains the business process. While getting into data modelling and
identifying granular data not only helps creating reporting solutions but also raises
questions with business process and finding their appropriate solution.
4. Helps to build stores to house non redundant logical data in low cost and efficient
manner, which improves business performance as it helps in business decision
making.
The data model should align with the enterprise data model and has to be approved by
Enterprise Architecture board.
Choosing model for the data warehouse for sales data
Since it is only one business aspects then DataMart is more suitable.
In past I have developed DataMart for marketing campaigns within Credit card Data
warehouse space which was limited to products which were included in the promotion
events. DataMart was used to measure the marketing campaigns success.
Factors to think about before deciding on the selection of data model
Data types: In this case it is structured sales data from on premise Sql Server DB which can
acts as ODS.
Scale: Size of data to be considered. It include number of rows, number of columns in the
table, number of tables in consideration etc all of which impact scale of data that DataMart/
Data Warehouse needs to support
Performance: Data should be retrieved after querying
This will help if summarised/aggregated data is housed within the Data Warehouse.
Typically aggregation in terms of daily, weekly and monthly data. Also if the users are not
given permissions to directly query the tables but views which are already created to cater
for the needs of end users.
Also if there are a set of users which feed on the data from DW, it is advisable to create a
separate schema for them. In past I have worked with SAAS application team to create
separate schema to be used by SAAS team.
Maintenance: how much effort is needed to maintain the Data warehouse/DataMart. How
much effort is needed to implement any changes in the Data Warehouse/DataMart for
example adding a new data set to the existing DW.
I have worked on integration of Data from two different banks (Lloyds and HBOS) when they
were merged. It was decided to have a common billing engine for customers of both the
banks and entire data had to be shared with First Data (Vision+). Then it was required to
make changes to Datawarehouse to adapt to the feed coming from First Data (Vision+)
In past I have worked on numerous projects where I had to work with 3rd party, some
examples are - PPI redressal where it was required to provide credit card data to 3rd Party
for redress calculations.
When there was divestments .e.g. Sainsbury divestment from LBG or creation of New
Lloyds TSB where we had to segregate users within the Data warehouse and eventually
part away with the data owned by divested entity.
Cost:
Cloud data warehouse on Azure lives entirely online and don't require any physical
hardware, are much easier to implement and scale, and, are typically less expensive than on
premise data warehouses. Also cloud provides better Disaster recovery strategies with
minimal downtime compared to on premise.
From my personal experience of maintaining on premise Data Warehouse, a change
request was needed every quarter to add additional disks to the server, define the data file
which was supposed to house the data etc. There was a time when the rack was filled
completely and it was required to remove all small capacity disk with larger ones. (e.g. 20GB
disks replaced by 100 GB disks). Failure to look into this aspect before all space gets used
up might lead to DB being full and entire system coming to a halt.
The DR tests were a pain as manually switching over DB and restarting all the applications.
It used to be more than 1 full day exercise to switch over from one datacentre to another.
Design:
A model of DataMart / Data Warehouse can be chosen for creation of DataMart / Data
Warehouse depending upon the business requirements. It is not straight forward to arrive to
a conclusion as it requires follows
- Workshops with stakeholders
- Analysis of business requirements to figure out key business entities
- Use case study
- Consider if there are any data governance requirements
- Define data dictionary with help of Business Analysts/Stake holders
- Changing business requirements as changes do come at different stages while a
model is being designed
- Define grains (most granular level data stored) – in our case it will be a typical row in
the sales fact table
- Identify Dimensions such as Product, customer, Store, Employee etc
- Identify Fact – sales transactions
- Build star or snowflake linking Fact with Dimensions
Theoretically the steps of creating a data model includes following phases of development.
There are reviews and sessions involved with stakeholders/ business analysts at each of the
stages and feedback from business are incorporated. There are multiple cycles of the same
till the model is approved.
a) Conceptual data model: high-level representation of the data which demonstrate the
essential components of the required solution as they relate to each other. It defines
the direction of the relationships, one-to-one, one-to-many or many-to-many etc. The
conceptual model demonstrates the essentials of the business.
b) Logical Data Model: Extract all entities, all attributes related to the entities and the
relationship between attributes across the entities. This becomes the basis of
physical data model and forms design of the database.
c) Physical data model: The entities from Logical data model are translated into Tables
and attributes as columns. The physical data model adhere to the DB technology in
use. It includes tables, columns, keys, data types, validation rules, database triggers,
stored procedures, and constraints etc.
The challenges I have faced most of the time are
1. Changing business requirement which requires rework
2. Lack of complete data dictionary
I have made following assumptions about the Dimension Tables and Fact Tables (Sales
Table).

Product (Dimension table)


Customer (Dimension table)
Date (Dimension table)
Store (Dimension table)
Sales (Fact Table)
Employee (Dimension Table)
Sales_Territory (Dimension Table)

Usually in case of DataMart star schema is preferred over SnowFlake.


Either Star or Snowflake schema are suited for the scenario based on following

Star Schema SnowFlake Schema


Data Complexity If the data relationship is If the relationship between data
straight forward then Star is complex, SnowFlake is
Schema is chosen favoured against Star
Query Performance Less complex, Faster Query Slower compared to Star
performance due to lesser Schema
number of joins
Data Integrity Dimensional attributes are Better for maintaining data
often repeated across integrity and reducing
multiple records within a redundancy
dimension table which may
cause data quality issues in
the longer run.
Data Storage More space is consumed. Lesser space is consumed
Usage Suitable for small data Suitable for large, complex data
warehouses warehouses
Design Complexity Simple to design and Complex to design and maintain
maintain
Normalisation Partially denormalised design Highly Normalised design
Additional aggregate tables can be created as daily, weekly and monthly aggregates.

You might also like