Module 3_Data Warehousing
Module 3_Data Warehousing
Data Warehousing
1. Learning outcomes
• Discuss the key concepts of data warehousing
• Identify resources needed for data warehousing
2. Introduction
Imagine a large organization having different departments, each with their own database
systems. A business analyst would like to generate reports for decision support. She
approaches each department but has problems with some of them whose main roles are just to
handle data transactions – not reports. Those that do give her information give data in a number
of different formats. Customer names are saved differently, birthdates are in mm/dd/yy and
dd/mm/yy and so on. Wouldn’t it save the business analyst so much time and effort if there was
a central repository containing information needed for her to generate the reports that she needs
with the data in a standardized format, too?
3. Data Warehouses and Data Marts
A data warehouse is a physical repository where relational data are specially organized to
provide enterprise-wide, cleansed data in a standardized format. In our previous module, we
have learned what database systems are. In turn, a data warehouse is a collection of integrated,
subject-oriented databases. Each unit of data is non-volatile and relevant to some moment in
time.
Data in data warehouses are NOT in 3NF. That being so, they are referred to as BIG DATA.
Since they are not normalized, some data may be redundant. The redundancies will result in,
well, BIG data. However, BIG DATA is more useful for DECISION SUPPORT.
This is good since the purpose of a data warehouse is provide aggregate data for decision
making. You are not that interested in what the data for each table are, you are more interested
in how the company will move forward given that data.
There may be questions or decisions which are specialized for specific people. Thus,
separate entities called DATA MARTS are used to provide specialized and strategic answers for
specific people. This keeps it simple for the users. Small problems are easier to solve.
Data marts, therefore, are a subset of the data warehouse that support the requirements of
a particular department or business function.
A data mart is a departmental data warehouse that stores only relevant data. Data marts
can be dependent or independent. A dependent data mart is a subset that is created directly
from a data warehouse. An independent data mart, on the other hand, is a small data warehouse
designed for a strategic business unit or a department.
Page 1 of 3
Professorial Lecturer: Module 2_ Database Warehousing
Dr. Domingo T. Balse, Jr, LPT Lecture Notes
5. Quiz / Activity
References
Book References:
Corr, Lawrence & Jim Stagnitto (2011). Agile Data Warehouse Design: Collaborative Dimensional
Modeling, from Whiteboard to Star Schema
Jarke , Matthias, Maurizio Lenzerini , Yannis Vassiliou & Panos Vassiliadis (2003). Fundamentals
of Data Warehouses. Springer Berlin Heidelberg Publishing. ISBNs 978-3-54-042089-7,
978-3-64-207564-3, 978-3-66-205153-5. DOI 10.1007/978-3-662-05153-5
Jukic,Nenad, Susan Vrbsky & Svetlozar Nestorov (2016). Database Systems: Introduction to
Databases and Data Warehouses.
Kimball, Ralp (2013). The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling, 3rd Edition
Linstedt, Daniel & Michael Olschimke (2015). Building a Scalable Data Warehouse with Data
Vault 2.0
Page 2 of 3
Professorial Lecturer: Module 2_ Database Warehousing
Dr. Domingo T. Balse, Jr, LPT Lecture Notes
Page 3 of 3