Data Warehouse and Data Modelling

Data modeling involves creating conceptual, logical, and physical representations of how data is related and structured in a database. It typically follows a top-down approach starting from business requirements or a bottom-up approach from existing tables. There are two main types of data models - star schemas which have fact tables connected to denormalized dimension tables, and snowflake schemas which have more normalized dimension tables. Dimension tables can be classified as conformed, role playing, degenerated, slowly changing, or shrunken based on their structure and purpose. Late arriving dimensions refer to dimension attributes that are unknown when loading related fact records.

Uploaded by

AkashRai

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views

Data Warehouse and Data Modelling

Uploaded by

AkashRai

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Data Warehouse and Data

Modelling
Data Modelling
• It is diagrammatic representation showing how the entities are related to each other. It is the initial step
towards database design. We first create the conceptual model, then logical model and finally move to
the physical model.

• What is data Modelling?

• Conceptual representation of structure of tables which express the business requirement, or the blueprint or
map called be termed as data modelling.
• Represents visually the nature of data, business rules governing the data and how it will be
organized in the database.
• Data model –conceptual, Logical and physical
• Top-down approach- form Business requirements.
• Bottom-up approach-from existing tables.
Data Modelling
OLAP (online analytical processing) OLTP (online transactional processing)
• Handles large volume of data and complex queries • Handle large number of small transactions
• Based on insert, update and delete command • Based on select to aggregate data for reporting purpose
• Response time should be milliseconds • Response time can be milliseconds seconds and hours
• Industry specific- retail , banking etc based on amount of data
• Subject specific- sales, marketing , pharma etc.
• Real time
• Fast updated
• Discover hidden secrets
• Data refreshed periodically
• Normalized Databases for efficiency
• Denormalized Databases for analysis
• Customer facing personal –clerks etc.
• Day to Day transaction
• Data Analyst etc.

• Regular backups needed

• Multi view of data
• Can be retrieved from OLTP systems
Data Modelling
There are two types of Data Models:-Star Schema and Snowflake Schema
• Star Schema
• The Centre of it can be one or more “Fact Table” which can be associated with multiple Dimension Table.
• The Dimension tables are Denormalized

Advantages of Star Schema

• Simpler query and fast performance
Challenges
• Decreased Data Integrity
• Less capable of handling diverse and complex queries
• No Many-to-Many Relationships
Data Modelling
There are two types of Data Models:-Star Schema and Snowflake Schema
• Snowflake Schema
• Unlike Star Schema in snowflake the Dimension tables are more normalized.
• It solves the slow writing process of star schema

Advantages of Star Schema

• Since the Dimension tables are normalize it used less space and saves lot of storage cost
Challenges
• Complex data Schema
• Slow at processing cube data
• Lower data integrity levels
Data Modelling
When to use star and Snowflake Schema ?

• Snowflake:
• In data warehouses. As the warehouse is Data Central for the company, we could save lot of space this way. Because in
some cases Dimension table can store lot of redundant information resulting in huge Dimension table

• Star Schema:
• In data marts. Data marts are subsets of data taken out of the central data warehouse. They are usually created for
different departments and don’t even contain all the history data. In this setting, saving storage space is not a priority
Types of Dimensions
• Conformed Dimensions:
• It is a dimension common dim_prescriber (physician universe) and dim_time_period shared across thetable
that is shared by multiple fact tables
Example: - suppose in a pharmaceutical organization we have different data marts based on field
forces. There can be a different data marts then dim_prescriber is called as conformed dimensions.
Prod_typ Cpn_typ Pymt_typ Junk_id
Junk Dimensions:
• Table composed of low cardinality column that do not have Online Yes Cash 1
place in fact table
Offlilne Yes Cash 2
• initial Transaction_tbl Online No Cash 3
(transaction_id,product_id,customer_id,emp_id,order_id,payme Offline No Cash 4
nt_id,coupon_id,amount,qty)
Online Yes Card 5
• Final Transaction_tbl Offlilne Yes Card 6
• (transaction_id,product_id,customer_id,emp_id,order_id,junk_i Online No Card 7
d,amount,qty)
Offline No Card 8
Junk table
Types of Dimensions
• Role Playing Dimensions:
• Dimensions utilized for multiple purpose in the same database.
• Example Dim time period.

• Degenerated Dimensions:-
• A dimensions which is not a fact but present in the fact table as a PK
• Example invoice number or order number.

• Slowly Changing Dimensions:

• These are the dimensions whose attribute changes over a period of time.
• SCD 0- dimensions whose attributes remain steady with time.
• SCD 1- In these type of dimensions the previous value is replaced by current value.
• SDC 2- In these type of dimensions unlimited history is preserved.
• SCD 3- In these type of dimension limited history is preserved.
• SCD 4- History is maintained in separate table.
Types of Dimensions
SCD- 2 implementation in SQL

MERGE INTO customers

USING (
-- These rows will either UPDATE the current addresses of existing customers or INSERT the new addresses of new customers
SELECT updates.customerId as mergeKey, updates.* FROM updates

UNION ALL

-- These rows will INSERT new addresses of existing customers

-- Setting the mergeKey to NULL forces these rows to NOT MATCH and be INSERTed.
SELECT NULL as mergeKey, updates.*
FROM updates JOIN customers
ON updates.customerid = customers.customerid
WHERE customers.current = true AND updates.address <> customers.address

) staged_updates
ON customers.customerId = mergeKey
WHEN MATCHED AND customers.current = true AND customers.address <> staged_updates.address THEN
UPDATE SET current = false, endDate = staged_updates.effectiveDate -- Set current to false and endDate to source's effective date.
WHEN NOT MATCHED THEN
INSERT(customerid, address, current, effectivedate, enddate)
VALUES(staged_updates.customerId, staged_updates.address, true, staged_updates.effectiveDate, null) -- Set current to true along with the
Types of Dimensions
Shrunken Dimensions:
Shrunken dimensions are conformed dimensions that are a subset of rows and /or columns of a base
dimension. Shrunken rollup dimensions are required when constructing aggregate fact tables. They are also
necessary for business processes that naturally capture data at a higher level of granularity, such as a forecast by
month and brand (instead of the more atomic date and product associated with sales data). Another case of
conformed dimension subsetting occurs when two dimensions are at the same level of detail, but one represents
only a subset of rows.

Static Dimensions:
Static dimensions are not extracted from the original data source, but are created within the context of the data
warehouse. A static dimension can be loaded manually — for example with status codes — or it can be generated
by a procedure, such as a date or time dimension.
Types of Dimensions
• Late Arriving Dimensions:
• the natural key in the fact record has not yet been loaded in a related dimension preventing a successful foreign key
lookup to the dimension’s surrogate key

• How to Deal With Late Arriving Dimensions

• Never process – Simply omit the record from the fact load. This is rarely the case.
• Queue & Retry – wait and process later
• Unknown Member – If the fact record does have value without a reference to a related late arriving dimension, and we
either do not expect or do not care if a dimension record shows up at a later time, then we can simply load the fact
record with a foreign key reference to the unknown record in the related dimension; typically -1 surrogate key value.
Note that we must pay particular attention to the fact grain if we are going to take the unknown member approach to
failed foreign key lookups. Most fact table primary keys are set as a composite across foreign keys and degenerate
dimensions. If allowing a foreign key to be set to -1 causes a possible primary key violation, then we need to look for
other options such as adding a degenerate dimension.
• Inferred Member –we can use the natural key in the record bound for the fact table to seed a record (inferred member)
in the related dimensions. Once the dimension record shows up, it will have the same natural key and the dimension
record’s attributes can then be updated. Particular attention should be paid to the inferred member’s record time in
cases where dimension history is tracked such as SCD 2 situations.

Snowpro Advanced Data Engineer
No ratings yet
Snowpro Advanced Data Engineer
17 pages
Srilakshi M Resume
No ratings yet
Srilakshi M Resume
6 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
NABARD Grade A Computer IT Officer 2021 Previous Year Paper PDF
No ratings yet
NABARD Grade A Computer IT Officer 2021 Previous Year Paper PDF
6 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
PL 300 Microsoft Power BI Data Analyst Badge20221004-46-1f3469z
No ratings yet
PL 300 Microsoft Power BI Data Analyst Badge20221004-46-1f3469z
1 page
PBL2 SME Governance Problem Statement-V2
No ratings yet
PBL2 SME Governance Problem Statement-V2
3 pages
Microsoft Dynamics GP 2013 Implementation
From Everand
Microsoft Dynamics GP 2013 Implementation
Victoria Yudin
No ratings yet
HR Training & Event Management Configuration PDF
75% (4)
HR Training & Event Management Configuration PDF
61 pages
Ru Train
100% (1)
Ru Train
40 pages
Informatica Training
No ratings yet
Informatica Training
21 pages
PL300
No ratings yet
PL300
36 pages
Azure Services Periodic Table v1 1
No ratings yet
Azure Services Periodic Table v1 1
1 page
Power Query
No ratings yet
Power Query
4 pages
Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education
No ratings yet
Part 1: The Star Schema Data Model: Healthcare Data Models UC Davis Continuing and Professional Education
5 pages
Types in The Power Query M Formula Language
No ratings yet
Types in The Power Query M Formula Language
7 pages
Data Lakes For Maximum Flexibility
No ratings yet
Data Lakes For Maximum Flexibility
29 pages
Monash Data Science
No ratings yet
Monash Data Science
4 pages
Data Modelling 2 Normalisation: by Haik Richards
No ratings yet
Data Modelling 2 Normalisation: by Haik Richards
28 pages
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
No ratings yet
Vertipaq Vs OLAP - Change Your Data Modeling Approach - Marco Russo
10 pages
Databricks Cloud How To Log Analysis Example
No ratings yet
Databricks Cloud How To Log Analysis Example
9 pages
Exam Da 100 Analyzing Data With Microsoft Power Bi Skills Measured
No ratings yet
Exam Da 100 Analyzing Data With Microsoft Power Bi Skills Measured
9 pages
Power BI Themes Samples
No ratings yet
Power BI Themes Samples
11 pages
2024 IDMC Security Architecture Whitepaper
No ratings yet
2024 IDMC Security Architecture Whitepaper
22 pages
Learn To Read The Bible Effectively: Distance Learning Programme Session 1 2
No ratings yet
Learn To Read The Bible Effectively: Distance Learning Programme Session 1 2
16 pages
SQL Server Sample Resume
No ratings yet
SQL Server Sample Resume
2 pages
Basics of Dimensional Modeling
100% (1)
Basics of Dimensional Modeling
14 pages
Exam Questions DA-100: Analyzing Data With Microsoft Power BI
No ratings yet
Exam Questions DA-100: Analyzing Data With Microsoft Power BI
10 pages
Inmon Vs Kimball
No ratings yet
Inmon Vs Kimball
32 pages
Low Level Design
No ratings yet
Low Level Design
23 pages
Data Modeling ER
33% (3)
Data Modeling ER
89 pages
Data Architect or ETL Architect or BI Architect or Data Warehous
No ratings yet
Data Architect or ETL Architect or BI Architect or Data Warehous
4 pages
Alteryx + Snowflake Retail Solutions
No ratings yet
Alteryx + Snowflake Retail Solutions
19 pages
Azure AnalysisServiceOverview
No ratings yet
Azure AnalysisServiceOverview
173 pages
Data Warehouses and Data Cubes
No ratings yet
Data Warehouses and Data Cubes
21 pages
Mastermind
No ratings yet
Mastermind
8 pages
Scope, and The Inter-Relationships Among These Entities
No ratings yet
Scope, and The Inter-Relationships Among These Entities
12 pages
MDX and DAX-compare and Contrast - Mark Whitehorn
No ratings yet
MDX and DAX-compare and Contrast - Mark Whitehorn
61 pages
Data Engineering
No ratings yet
Data Engineering
10 pages
Data Engineer Path - Hands On SQL, Data Pipelines - Dataquest
No ratings yet
Data Engineer Path - Hands On SQL, Data Pipelines - Dataquest
1 page
Super Hybrid BI - PowerBI Gateway
No ratings yet
Super Hybrid BI - PowerBI Gateway
26 pages
BI Projects
No ratings yet
BI Projects
17 pages
USAID Sample Evaluation Report Template Final
100% (1)
USAID Sample Evaluation Report Template Final
18 pages
Master Data Management
No ratings yet
Master Data Management
5 pages
Leveraging Power BI With D365 Chicago
No ratings yet
Leveraging Power BI With D365 Chicago
33 pages
ERModel PDF
100% (1)
ERModel PDF
82 pages
(Module 2) Data Visualization Using Power Bi
No ratings yet
(Module 2) Data Visualization Using Power Bi
51 pages
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
No ratings yet
How To Build A Self-Service Data Analytics Stack Final - Google Docs Pdxule
12 pages
dp-203
No ratings yet
dp-203
353 pages
DP 600t00a Enu Powerpoint 08
No ratings yet
DP 600t00a Enu Powerpoint 08
15 pages
Dimensional Modeling PDF
No ratings yet
Dimensional Modeling PDF
14 pages
Data Model Scorecard - Article 2 of 11
No ratings yet
Data Model Scorecard - Article 2 of 11
6 pages
MongoBoulder - Schema Design
No ratings yet
MongoBoulder - Schema Design
59 pages
How To Automate SSIS and SQL Agent Job Deployments
No ratings yet
How To Automate SSIS and SQL Agent Job Deployments
34 pages
Data Modeling For Business Intelligence: Lesson 4: Logical Model
No ratings yet
Data Modeling For Business Intelligence: Lesson 4: Logical Model
25 pages
MIS
No ratings yet
MIS
20 pages
1030 Stephen Brobst Semantic Data Modeling
No ratings yet
1030 Stephen Brobst Semantic Data Modeling
16 pages
Day 3 DP-900 PDF
No ratings yet
Day 3 DP-900 PDF
29 pages
SAMPLE Data Warehouse Project Documentation
No ratings yet
SAMPLE Data Warehouse Project Documentation
22 pages
ER/Studio® Software Architect: Evaluation Guide
No ratings yet
ER/Studio® Software Architect: Evaluation Guide
27 pages
TDWI CBIP Brochure 2013 Web
No ratings yet
TDWI CBIP Brochure 2013 Web
6 pages
Conceptual Modelling
No ratings yet
Conceptual Modelling
30 pages
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
María Carina Roldán
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
CPE 313 Database Management Systems: Fall 2021/2022
No ratings yet
CPE 313 Database Management Systems: Fall 2021/2022
24 pages
Assignment Brief BTEC Level 4-5 HNC/HND Diploma (QCF) : Merit and Distinction Descriptor
100% (1)
Assignment Brief BTEC Level 4-5 HNC/HND Diploma (QCF) : Merit and Distinction Descriptor
11 pages
OrgaTEX 3.x-9.x New Features-EN
No ratings yet
OrgaTEX 3.x-9.x New Features-EN
28 pages
Paas Iaas Pub CLD Srvs Pillar 4021422 PDF
No ratings yet
Paas Iaas Pub CLD Srvs Pillar 4021422 PDF
54 pages
MS Excel 2016 L15A Intro To PivotTables
No ratings yet
MS Excel 2016 L15A Intro To PivotTables
13 pages
Docmail Hybrid Mail API v2 Guide 30/03/2011
No ratings yet
Docmail Hybrid Mail API v2 Guide 30/03/2011
70 pages
Data Mining Tools
No ratings yet
Data Mining Tools
19 pages
Solid State Drives
No ratings yet
Solid State Drives
11 pages
Unit-4: Hashing & File Structure (File Structure)
No ratings yet
Unit-4: Hashing & File Structure (File Structure)
22 pages
Jasperreports Server Admin Guide
No ratings yet
Jasperreports Server Admin Guide
234 pages
A Case Study On Mutual Fund What Are They and Their Future
No ratings yet
A Case Study On Mutual Fund What Are They and Their Future
6 pages
Creating WIM Images For System Deployment Using Windows 7 PE 3.0
No ratings yet
Creating WIM Images For System Deployment Using Windows 7 PE 3.0
24 pages
SqlEssentials-Learning Outline 2
No ratings yet
SqlEssentials-Learning Outline 2
6 pages
Kafka Reference Architecture
No ratings yet
Kafka Reference Architecture
12 pages
Niraj Pandey 673627260
No ratings yet
Niraj Pandey 673627260
9 pages
General S3 FAQs
No ratings yet
General S3 FAQs
45 pages
Apache Cassandra Certification
No ratings yet
Apache Cassandra Certification
0 pages
Figurative Language in Selected Songs of "Red" by Taylor Swift
No ratings yet
Figurative Language in Selected Songs of "Red" by Taylor Swift
11 pages
Strengthen Your Data Governance With Adverity
No ratings yet
Strengthen Your Data Governance With Adverity
2 pages
Ritika Vohra Excel Sheet
No ratings yet
Ritika Vohra Excel Sheet
39 pages
B-Trees Slides
No ratings yet
B-Trees Slides
24 pages
Section 3.3 Data Storage
No ratings yet
Section 3.3 Data Storage
13 pages
devish all unit
No ratings yet
devish all unit
42 pages
Organization and Management
100% (1)
Organization and Management
3 pages
RohanAdus Data Analyst Roadmap1
No ratings yet
RohanAdus Data Analyst Roadmap1
23 pages
AI and Accounting
No ratings yet
AI and Accounting
9 pages

Data Warehouse and Data Modelling

Uploaded by

Data Warehouse and Data Modelling

Uploaded by

Data Warehouse and Data

• What is data Modelling?

• Regular backups needed

Advantages of Star Schema

Advantages of Star Schema

• Slowly Changing Dimensions:

MERGE INTO customers

-- These rows will INSERT new addresses of existing customers

• How to Deal With Late Arriving Dimensions

You might also like