Module 3 - Data Warehousing
Module 3 - Data Warehousing
Module 3
DW is organized at the right level of granularity to provide clean enterprise wide data in a
standardized format for reports, queries, and analysis.
Creating a DW for analysis and queries represents significant investment in time and effort.
It can facilitate distributed access to up-to-date business knowledge for departments and functions,
thus improving business efficiency and customer service.
DW can present a competitive advantage by facilitating decision making and helping reform
business processes.
It simplifies data access and allows end users to perform extensive analysis.
It enhances overall IT performance by not burdening the operational databases used by Enterprise
Resource Planning (ERP) and other systems.
1. Subject-oriented:
To be effective, DW should be designed around a subject domain, that is, to help solve a certain
category of problems.
2. Integrated:
DW should include data from many functions that can shed light on a particular subject area.
Thus, the organization can benefit from a comprehensive view of the subject area.
4. Nonvolatile:
DW should be persistent, that is, it should not be created on the fly from the operations databases.
Thus, DW is consistently available for analysis, across the organization and over time.
5. Summarized:
DW contains rolled-up data at the right level for queries and analysis.
It helps reduces the number of variables or dimensions of the data to make them more meaningful
for the decision makers.
6. Not normalized:
DW often uses a star schema, which is a rectangular central table, surrounded by some lookup
tables.
7. Metadata:
Many of the variables in the database are computed from other variables in the operational
database.
The method of its calculation for each variable should be effectively documented.
DWs should be updated in near real-time in many high-transaction volume industries, such as
airlines.
The cost of implementing and updating DW in real time could discourage others.
Another downside of real-time DW is the possibilities of inconsistencies in reports drawn just a few
minutes apart.
DW Development Approaches
is to produce small data marts, for the reporting needs of different departments or functions, as
needed.
The smaller data marts will eventually align to deliver comprehensive EDW capabilities.
The top-down approach provides consistency but takes time and resources.
The bottom-up approach leads to healthy local ownership and maintainability of data (Table 3.1).
The first element is the data sources that provide the raw data.
The second element is the process of transforming that data to meet the decision needs.
The third element is the methods of regularly and accurately loading of that data into EDW or data
marts.
The fourth element is the data access and analysis part, where devices and applications use the data
from DW to deliver insights and other benefits to users.
Data Sources
Unstructured data, such as text data, would need to be structured before inserted into DW.
1. Operations data include data from all business applications, including from ERPs systems that
form the backbone of an organization’s IT systems.
The data to be extracted will depend upon the subject matter of DW.
For example, for a sales/marketing DW, only the data about customers, orders, customer service,
and so on would be extracted.
Data Sources
2. Other applications, such as point-of-sale (POS) terminals and e- commerce applications, provide
customer-facing data.
Planning and budget data should also be added as needed for making comparisons against targets.
DW Architecture
3. External syndicated data, such as weather or economic activity data, could also be added to DW,
as needed, to provide good contextual information to decision makers.
The heart of a useful DW is the processes to populate the DW with good quality data.
1. Data should be extracted from many operational (transactional) database sources on a regular
basis.
The entire data should then be brought to the same format as the central table of DW.
Daily transaction data can be extracted from ERPs, transformed, and uploaded to the database the
same night.
If DW is needed for near-real-time information access, then the ETL processes would need to be
executed more frequently.
ETL work is usually automated using programing scripts that are written, tested, and then deployed
for periodic updating DW.
DW Design
There is a central fact table that provides most of the information of interest.
There are lookup tables that provide detailed values for codes used in the central table.
For example, the central table may use digits to represent a sales person.
The lookup table will help provide the name for that sales person code.
Here is an example of a star schema for a data mart for monitoring sales performance (Figure 3.2).
DW Design
The difference between a star and snowflake is that in the latter, the lookup tables can have their
own further lookup tables.
There are also a variety of tools out there for data migration, data upload, data retrieval, and data
analysis.
Data from DW could be accessed for many purposes, through many devices.
For example, a sales performance report would show sales by many dimensions, and compared with
plan.
A dash boarding system will use data from the warehouse and present analysis to users.
The data from DW can be used to populate customized performance dashboards for executives.
The dashboard could include drill-down capabilities to analyze the performance data for root cause
analysis.
DW Access
2. The data from the warehouse could be used for ad hoc queries and any other applications that
make use of the internal data.
Parts of the data would be extracted, and then combined with other relevant data, for data mining.
DW Best Practices
It is often much more expensive to redesign after development work has begun.
DW Best Practices
Users should be trained in using the system, and absorb the many features of the system.
DW Best Practices
As business needs change, new data marts can be created for new needs.
Conclusion
DWs are special data management facilities intended for creating reports and analysis to support
managerial decision making.
They are designed to make reporting and querying simple and efficient.
The sources of data are operational systems and external data sources.
Review Questions
3. What are the sources and types of data for a data warehouse?