Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4
Why to create a data model?
Data modelling is important due to following reasons
1. To get the right data for the business requirement. For business systems data modelling helps extract entities, attributes and their relationship and put data into a store to generate the required data in right format. 2. Data modelling help to map data from business sources to perform analytics and reporting on the right data. It enhances quality of data and removes data redundancy 3. Data modelling explains the business process. While getting into data modelling and identifying granular data not only helps creating reporting solutions but also raises questions with business process and finding their appropriate solution. 4. Helps to build stores to house non redundant logical data in low cost and efficient manner, which improves business performance as it helps in business decision making. The data model should align with the enterprise data model and has to be approved by Enterprise Architecture board. Choosing model for the data warehouse for sales data Since it is only one business aspects then DataMart is more suitable. In past I have developed DataMart for marketing campaigns within Credit card Data warehouse space which was limited to products which were included in the promotion events. DataMart was used to measure the marketing campaigns success. Factors to think about before deciding on the selection of data model Data types: In this case it is structured sales data from on premise Sql Server DB which can acts as ODS. Scale: Size of data to be considered. It include number of rows, number of columns in the table, number of tables in consideration etc all of which impact scale of data that DataMart/ Data Warehouse needs to support Performance: Data should be retrieved after querying This will help if summarised/aggregated data is housed within the Data Warehouse. Typically aggregation in terms of daily, weekly and monthly data. Also if the users are not given permissions to directly query the tables but views which are already created to cater for the needs of end users. Also if there are a set of users which feed on the data from DW, it is advisable to create a separate schema for them. In past I have worked with SAAS application team to create separate schema to be used by SAAS team. Maintenance: how much effort is needed to maintain the Data warehouse/DataMart. How much effort is needed to implement any changes in the Data Warehouse/DataMart for example adding a new data set to the existing DW. I have worked on integration of Data from two different banks (Lloyds and HBOS) when they were merged. It was decided to have a common billing engine for customers of both the banks and entire data had to be shared with First Data (Vision+). Then it was required to make changes to Datawarehouse to adapt to the feed coming from First Data (Vision+) In past I have worked on numerous projects where I had to work with 3rd party, some examples are - PPI redressal where it was required to provide credit card data to 3rd Party for redress calculations. When there was divestments .e.g. Sainsbury divestment from LBG or creation of New Lloyds TSB where we had to segregate users within the Data warehouse and eventually part away with the data owned by divested entity. Cost: Cloud data warehouse on Azure lives entirely online and don't require any physical hardware, are much easier to implement and scale, and, are typically less expensive than on premise data warehouses. Also cloud provides better Disaster recovery strategies with minimal downtime compared to on premise. From my personal experience of maintaining on premise Data Warehouse, a change request was needed every quarter to add additional disks to the server, define the data file which was supposed to house the data etc. There was a time when the rack was filled completely and it was required to remove all small capacity disk with larger ones. (e.g. 20GB disks replaced by 100 GB disks). Failure to look into this aspect before all space gets used up might lead to DB being full and entire system coming to a halt. The DR tests were a pain as manually switching over DB and restarting all the applications. It used to be more than 1 full day exercise to switch over from one datacentre to another. Design: A model of DataMart / Data Warehouse can be chosen for creation of DataMart / Data Warehouse depending upon the business requirements. It is not straight forward to arrive to a conclusion as it requires follows - Workshops with stakeholders - Analysis of business requirements to figure out key business entities - Use case study - Consider if there are any data governance requirements - Define data dictionary with help of Business Analysts/Stake holders - Changing business requirements as changes do come at different stages while a model is being designed - Define grains (most granular level data stored) – in our case it will be a typical row in the sales fact table - Identify Dimensions such as Product, customer, Store, Employee etc - Identify Fact – sales transactions - Build star or snowflake linking Fact with Dimensions Theoretically the steps of creating a data model includes following phases of development. There are reviews and sessions involved with stakeholders/ business analysts at each of the stages and feedback from business are incorporated. There are multiple cycles of the same till the model is approved. a) Conceptual data model: high-level representation of the data which demonstrate the essential components of the required solution as they relate to each other. It defines the direction of the relationships, one-to-one, one-to-many or many-to-many etc. The conceptual model demonstrates the essentials of the business. b) Logical Data Model: Extract all entities, all attributes related to the entities and the relationship between attributes across the entities. This becomes the basis of physical data model and forms design of the database. c) Physical data model: The entities from Logical data model are translated into Tables and attributes as columns. The physical data model adhere to the DB technology in use. It includes tables, columns, keys, data types, validation rules, database triggers, stored procedures, and constraints etc. The challenges I have faced most of the time are 1. Changing business requirement which requires rework 2. Lack of complete data dictionary I have made following assumptions about the Dimension Tables and Fact Tables (Sales Table).
Product (Dimension table)
Customer (Dimension table) Date (Dimension table) Store (Dimension table) Sales (Fact Table) Employee (Dimension Table) Sales_Territory (Dimension Table)
Usually in case of DataMart star schema is preferred over SnowFlake.
Either Star or Snowflake schema are suited for the scenario based on following
Star Schema SnowFlake Schema
Data Complexity If the data relationship is If the relationship between data straight forward then Star is complex, SnowFlake is Schema is chosen favoured against Star Query Performance Less complex, Faster Query Slower compared to Star performance due to lesser Schema number of joins Data Integrity Dimensional attributes are Better for maintaining data often repeated across integrity and reducing multiple records within a redundancy dimension table which may cause data quality issues in the longer run. Data Storage More space is consumed. Lesser space is consumed Usage Suitable for small data Suitable for large, complex data warehouses warehouses Design Complexity Simple to design and Complex to design and maintain maintain Normalisation Partially denormalised design Highly Normalised design Additional aggregate tables can be created as daily, weekly and monthly aggregates.
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint
Knight's Microsoft Business Intelligence 24-Hour Trainer: Leveraging Microsoft SQL Server Integration, Analysis, and Reporting Services with Excel and SharePoint