Chapter 13 - Data Warehouse
Chapter 13 - Data Warehouse
Coronel | Morris
Chapter 13
Business Intelligence and Data
Warehouses
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Learning Objectives
▪ In this chapter, students will learn:
▪ How business intelligence provides a comprehensive
business decision support framework
▪ About business intelligence architecture, its evolution,
and reporting styles
▪ About the relationship and differences between
operational data and decision support data
▪ What a data warehouse is and how to prepare data for
one
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 2
Learning Objectives
▪ In this chapter, students will learn:
▪ What star schemas are and how they are constructed
▪ About data analytics, data mining, and predictive
analytics
▪ About online analytical processing (OLAP)
▪ How SQL extensions are used to support OLAP-type
data manipulations
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 3
Business Intelligence (BI)
▪ Comprehensive, cohesive, integrated set of tools and
processes
▪ Captures, collects, integrates, stores, and analyzes data
▪ Purpose - Generate and present information to
support business decision making
▪ Allows a business to transform:
▪ Data into information
▪ Information into knowledge
▪ Knowledge into wisdom
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 4
Figure 13.1 - Business Intelligence
Framework
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 5
Business Intelligence Tools
▪ Dashboards and business activity monitoring
▪ Dashboards: Shows key business performance
indicators in a single integrated view
▪ Portals: Integrate data using web browser from
multiple sources into a single webpage
▪ Data analysis and reporting tools
▪ Data-mining tools
▪ Data warehouses (DW)
▪ OLAP tools and data visualization
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 6
Practices to Manage Data
▪ Master data management (MDM): Collection of
concepts, techniques, and processes for identification,
definition, and management of data elements
▪ Governance: Method of government for controlling
business health and for consistent decision making
▪ Key performance indicators (KPI): Numeric or
scale-based measurements that assess company’s
effectiveness in reaching its goals
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 7
Practices to Manage Data
▪ Data visualization: Abstracting data to provide
information in a visual format
▪ Enhances the user’s ability to efficiently comprehend
the meaning of the data
▪ Techniques
▪ Pie charts and bar charts
▪ Line graphs
▪ Scatter plots
▪ Gantt charts
▪ Heat maps
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 8
Reporting Styles of a Modern BI
System
Monitoring and
Advanced reporting
alerting
Advanced data
analytics
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 9
Business Intelligence Benefits
Integrating architecture
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 10
Table 13.4 - Business Intelligence
Evolution
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 11
Table 13.4 - Business Intelligence
Evolution
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 12
Figure 13.3 - Evolution of BI
Information Dissemination Formats
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 13
Business Intelligence Technology
Trends
Data storage improvements
Personal analytics
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 14
Decision Support Data
▪ Effectiveness of BI depends on quality of data
gathered at operational level
▪ Operational data
▪ Seldom well-suited for decision support tasks
▪ Stored in relational database with highly normalized
structures
▪ Optimized to support transactions representing daily
operations
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 15
Decision Support Data
▪ Differ from operational data in:
▪ Time span
▪ Granularity
▪ Drill down: Decomposing a data to a lower level
▪ Roll up: Aggregating a data into a higher level
▪ Dimensionality
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 16
Table 13.5 - Contrasting Operational and
Decision Support Data Characteristics
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 17
Decision Support Database
Requirements
▪ Database schema
▪ Must support complex, non-normalized data
representations
▪ Data must be aggregated and summarized
▪ Queries must be able to extract multidimensional time
slices
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 18
Decision Support Database
Requirements
▪ Data extraction and loading
▪ Allow batch and scheduled data extraction
▪ Support different data sources and check for
inconsistent data or data validation rules
▪ Support advanced integration, aggregation, and
classification
▪ Database size should support:
▪ Very large databases (VLDBs)
▪ Advanced storage technologies
▪ Multiple-processor technologies
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 19
Table 13.8 - Characteristics of Data Warehouse
Data and Operational Database Data
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 20
Figure 13.5 - The ETL Process
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 21
Data Marts
▪ Small, single-subject data warehouse subset
▪ Provide decision support to a small group of people
▪ Benefits over data warehouses
▪ Lower cost and shorter implementation time
▪ Technologically advanced
▪ Inevitable people issues
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 22
Table 13.9 - Twelve Rules for a Data
Warehouse
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 23
Table 13.9 - Twelve Rules for a Data
Warehouse
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 24
Star Schema
▪ Data-modeling technique
▪ Maps multidimensional decision support data into a
relational database
▪ Creates the near equivalent of multidimensional
database schema from existing relational database
▪ Yields an easily implemented model for
multidimensional data analysis
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 25
Components of Star Schemas
Facts
Dimensions
Attributes
Attribute hierarchy
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 26
Star Schema Representation
▪ Facts and dimensions represented by physical tables
in data warehouse database
▪ Many-to-one (M:1) relationship between fact table
and each dimension table
▪ Fact and dimension tables
▪ Related by foreign keys
▪ Subject to primary and foreign key constraints
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 27
Star Schema Representation
• Primary key of a fact table
▪ Is a composite primary key because the fact table is
related to many dimension tables
▪ Always formed by combining the foreign keys
pointing to the related dimension tables
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 28
Techniques Used to Optimize Data
Warehouse Design
▪ Normalizing dimensional tables
▪ Snowflake schema: Dimension tables can have their
own dimension tables
▪ Maintaining multiple fact tables to represent different
aggregation levels
▪ Denormalizing fact tables
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 29
Techniques Used to Optimize Data
Warehouse Design
▪ Partitioning and replicating tables
▪ Partitioning: Splits tables into subsets of rows or
columns and places them close to customer location
▪ Replication: Makes copy of table and places it in a
different location
▪ Periodicity: Provides information about the time span
of the data stored in the table
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 30
Data Analytics
▪ Encompasses a wide range of mathematical,
statistical, and modeling techniques to extract
knowledge from data
▪ Subset of BI functionality
▪ Classification of tools
▪ Explanatory analytics: Focuses on discovering and
explaining data characteristics and relationships based
on existing data
▪ Predictive analytics: Focuses on predicting future
outcomes with a high degree of accuracy
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 31
Data Mining
▪ Analyzing massive amounts of data to:
▪ Uncover hidden trends, patterns, and relationships
▪ Form computer models to stimulate and explain the
findings
▪ Use the models to support business decision making
▪ Run in two modes
▪ Guided
▪ Automated
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 32
Figure 13. 15 - Extracting Knowledge
from Data
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 33
Figure 13. 16 - Data-Mining Phases
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 36
Multidimensional Data Analysis
Techniques
▪ Data are processed and viewed as part of a
multidimensional structure
▪ Augmenting functions
▪ Advanced data presentation functions
▪ Advanced data aggregation, consolidation, and
classification functions
▪ Advanced computational functions
▪ Advanced data-modeling functions
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 37
Advanced Database Support
▪ Advanced data access features
▪ Access to many different kinds of DBMSs, flat files,
and internal and external data sources
▪ Access to aggregated data warehouse data and to the
detail data found in operational databases
▪ Advanced data navigation features
▪ Rapid and consistent query response times
▪ Ability to map end-user requests to appropriate data
source and to proper data access language
▪ Support for very large databases
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 38
Easy-to-Use End-User Interface
▪ Proper implementation leads to simple navigation and
accelerated decision making or data analysis
▪ Advanced OLAP features are more useful when
access is kept simple
▪ Many interface features are borrowed from previous
generations of data analysis tools
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 39
Figure 13.19 - OLAP Architecture
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 40
Figure 13.20 - OLAP Server with Local
Miniature Data Marts
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 41
Relational Online Analytical
Processing (ROLAP)
▪ Provides OLAP functionality using relational
databases and familiar relational tools to store and
analyze multidimensional data
▪ Extensions added to traditional RDBMS technology
▪ Multidimensional data schema support within the
RDBMS
▪ Data access language and query performance optimized
for multidimensional data
▪ Support for very large databases (VLDBs)
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 42
Multidimensional Online Analytical
Processing (MOLAP)
▪ Extends OLAP functionality to multidimensional
database management systems (MDBMSs)
▪ MDBMS: Uses proprietary techniques store data in
matrix-like n-dimensional arrays
▪ End users visualize stored data as a 3D data cube
▪ Grow to n dimensions, becoming hypercubes
▪ Held in memory in a cube cache to speed access
▪ Sparsity: Measures the density of the data held in the
data cube
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 43
Table 13.12 - Relational vs.
Multidimensional OLAP
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 44
SQL Extensions for OLAP
The ROLLUP extension
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 45
Materialized View
▪ Dynamic table that contains SQL query command to
generate rows and stores the actual rows
▪ Created the first time query is run
▪ Summary rows are stored in the table
▪ Automatically updated when base tables are updated
▪ Requires specified privileges
©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 46