Data Warehouse Data Design
Data Warehouse Data Design
Week 04
THE INFORMATION CONTAINED IN THIS PRESENTATION IS FOR INFORMATIONAL PURPOSES ONLY. IBM SHALL
NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO,
THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
IBM, the IBM logo, ibm.com, Cognos, SPSS and iLog are trademarks or registered trademarks of International
Business Machines Corporation in the United States, other countries, or both. If these and other IBM
trademarked terms are U.S. registered or common law trademarks owned by IBM at the time this
information was published. Trademarks may also be registered or common law trademarks in other
countries. A current list of IBM trademarks is available on the Web at “Copyright and trademark
information” at http://www.ibm.com/legal/copytrade.html. The IBM logo must not be moved, added to
or altered in any way.
Other company, product, or service names may be trademarks or service marks of others.
Kimball, R. and Ross, M. (2010). The Kimball Group Reader: Relentlessly Practical Tools for Data
Warehousing and Business Intelligence. John Wiley & Sons.
Reeves, L. (2009). A Manager's Guide to Data Warehousing. John Wiley & Sons.
Laberge, R. (2011). The Data Warehouse Mentor: Practical Data Warehouse and Business Intelligence
Insights. McGraw-Hill/Osborne.
Whitney, H. (2013). Data Insights: New Ways to Visualize and Make Sense of Data. Morgan Kaufmann
Publishers.
[5] Ponniah, P. (2010). Data Warehousing Fundamentals for IT Professionals.2nd edition. John
Wiley & Sons.
Business Requirements
Aggregated groupings
C
c
The Key Boundaries[1]
Boundaries are guidelines, rules or limits.
●
Boundaries
● with the business users - to find the business users, interview
●
them, and interpret what they tell you into specific DW/BI deliverables
●
●
●
Boundaries
● with finance - finance should work out the logical and political
●
implications of the cost allocations, and you can quietly implement them.
● ●
●
Boundaries
● across organizations - executives must establish a corporate
●
culture that sends a very clear message to all the separate departments
●
●
Boundaries with legal - providing adequate security, privacy, archiving, and
●
compliance across the DW/BI system
●
●
●
Boundaries with IT - be able to rely on other groups within IT for storage
IBM Confidential
1 IBM Global Center for Smarter Analytics.
BAFWARE: Fundamentals of Data Warehouse
Visualization
●
●
●
Move easily from "the big picture" to the minute details at will
●
●
●
Allow us to quickly spot errors in the data
●
●
●
Enable us to perceive things we were not considering or expecting and
●
●
help us to better deal with the unexpected
●
●
Have deep simplicity
● ●
●
Have some level of interactivity and qualities of good collaborators
●
●
●
Have some flexibility and be able to adapt and adjust to the changing
●
●
needs and contexts of the user
●
●
List Reports
●
Crosstabs
●
Charts
●
Graphs
●
●
Drill Down - the visualization first presents the results at the summary
level
●
Advanced Interaction - user simply double clicks a part of the
visualization and then drags and drops representations of data
●
Support the natural ways people have of viewing data over time
●
Include seeing instantaneous events, regular periodic reports, and latest
●
status
●
Data warehouse shall preserve history
information
●Underlying data mart designs with dimensions and
measures
●Volumetric
●Indexes
●Data partitions
●Loading of data
●Quality of data
●Source of data
●Granularity of data
●Operational support
Data Modelling: Abstracting the individual data elements and how they
interact with one another
Dimensional Model
● Dimensions
Support the business perspective of the data, and today's technology
●
ensures that they can be effectively implemented.
●
●
Basic parts of a dimensional model: the dimensions and the facts
Facts
1. Ease of Use
2 Query Performance
Reeves, L. (2009). A Manager's Guide to Data Warehousing. John Wiley & Sons.
No Measure to
calculate
IBM Confidential
3 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)[7]
Reasons for Type 1: Overwriting History
● Usually, changes relate to correction of errors in source systems
warehouse
IBM Confidential
3 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)
Original:
Type 1:
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)[7]
Type 2: Preserving History
● Predominant technique for supporting this requirement when it comes
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)[7]
Reasons for Type 2: Preserving History
●
Usually relate to true changes in source systems
●
There is a need to preserve history in the data warehouse
●
Type of change partitions history in the data warehouse
●
Every change for the same attribute must be preserved
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)
Original:
Type 2:
Create a new row with a new surrogate key that reflects the changes.
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)[7]
Type 3: Preserving a Version of History
●
Places a value for the change in the original dimensional record.
●
Appropriate when there's a strong need to support two views of the
world simultaneously
●
Preserves the change
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)[7]
Reasons for Type 3: Preserving History
●
Usually relate to “soft” or tentative changes in the source systems
●
There is a need to keep track of history with old and new values of the
changed attribute
●
Used to compare performance across the transition
●
Provide the ability to track forward and backward
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)
Original:
Type 3:
Add an “old” field in the dimension table for the affected attribute
Keep the new value of the attribute in the “current” field, current date may also be added
No new dimension row is needed
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Implementing Changing Dimensions (continued)
Original:
Type 1:
Type 2:
Type 3:
All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.
IBM Confidential
4 IBM Global Center for Smarter Analytics.
Steps in Developing Business Dimensional Model [2]
●
Information Management
●
Data Governance
●
Data Security
●
Data Ownership
●
Data Dictionary
[2]: Reeves, L. (2009). A Manager's Guide to Data Warehousing. John Wiley & Sons.
[3]: Laberge, R. (2011). The Data Warehouse Mentor: Practical Data Warehouse and Business Intelligence Insights.
McGraw-Hill/Osborne.
[4]Whitney, H. (2013). Data Insights: New Ways to Visualize and Make Sense of Data. Morgan Kaufmann
Publishers.
[5] Ponniah, P. (2010). Data Warehousing Fundamentals for IT Professionals.2nd edition. John Wiley & Sons.
[6]:n.d.(2008). IBM Cognos 8 Business Intelligence. IBM Corporation.
[7]: Mohanty, S. (2006). Data Warehousing: Design, Development and Best Practices. Tata McGraw-Hill
Publishing Company, India.