An Overview of Data Warehousing and OLAP Technology: Presented by Manish Desai
An Overview of Data Warehousing and OLAP Technology: Presented by Manish Desai
OLAP Technology
Presented By
Manish Desai
• Introduction
• What is data warehouse ?
• Explanation of definition
• Data warehouse Vs. Operational Database
• Data warehouse architecture
• Back end tools
• Conceptual model
• Database design
• Warehouse servers
• Index structures
• Meta data
• Conclusion
• References
2
Introduction
• Essential elements of decision support
• Enables The Knowledge Worker to make better
and faster decisions
• Used in many industries like:
– Manufacturing (for order shipment)
– Retail (for inventory management)
– Financial Services (claims and risk analysis)
• Every major database vendor offers product in this
area
3
What is Data Warehouse ?
• A data warehouse is a “subject-oriented,
integrated, time-varying, non-volatile collection of
data that is used primarily in organizational
decision making”
• Typically maintained separately from operational
databases
4
Explanation of definition
• Subject-Oriented:
– Designed around subject such as customer, vendor,
product and activity
– Does not includes data that are not needed for Decision
support system (DSS)
• Integrated:
– Most important feature
– Consistent naming convention, measurement of
variables and so forth
– The data should be stored in single globally acceptable
fashion
5
Explanation (continues…)
• Time Varying:
– All data in the warehouse should be accurate as of
some moment in time
– Data stored over a long time horizon (5 –10 years)
– Key structure contains element of time (implicitly or
explicitly)
– Data once correctly recorded cant be updated
• Non Volatile:
– No Update of data allowed
– only loading and access of data operations
6
Data Warehouse Vs. Operational
Database
7
Architecture
9
Back end tools and Utilities
• Data cleaning, loading, refreshing tools
• Cleaning
– Multiple source, possibility of errors
– Example: replace string sex by gender
• Loading
– Building indices, sorting and making access paths
– Large amount of data
• Incremental loading
• Only updated tuples are inserted ,Process hard to manage
• Refresh
– Propagating updates
– When to refresh ?
– Set by administrator depending on user needs and traffic
10
Conceptual Model and front end tools
• Multi dimensional view
– Dimensions together uniquely determine the measure
– Example: Sales can be represented as city,product, data
– Each dimension is described by set of attribute
– Example: product consist of
• Category of product
• Industry of product
• Year of introduction
• Front end tools
– Multi dimensional spreadsheet
• Supports Pivoting-reorientation
• Roll_up - summarized data
• Drill_down - go from high level to low level summary
11
Database design
• Two ways to represent Multi dimensional model
– Star schema
• Database consist of single fact table and single table for each
dimension
• Each tuples in fact table consist of pointer to each of
dimension
– Snowflake schema
• Refinement over star schema
• Dimensional hierarchy is explicitly represented by normalizing
dimension tables
12
Warehouse Servers
• Specialized SQL servers
– Provides advanced query language and query
processing support for SQL queries over star and
snowflake schemas
– Example: Redbrick
• ROLAP
– Between relational back end and client front end tools
– Extend traditional relational servers to support
multidimensional queries
– Example: Microstratergy
• MOLAP
– Multidimensional storage engine
– Direct mapping
– Example: Essbase from Arbor Inc.
13
Index structures
• Bit map indices
– Use single bit to indicate specific value of attribute
– Example:
instead of storing eight characters to record “engineer” as skill
of employee use single bit
id# Name Skill
1000 John 1
• Join indices
– Maintains the relationship between foreign key with its
matching primary keys
14
Meta data and warehouse management
• Its data about data
• Used for building, maintain, managing and using
data warehouse
• Administrative meta data
– Information about setting up and using warehouse
• Business meta data
– Business terms and definition
• Operational meta data
– Information collected during operation of warehouse
15
Conclusion
• Data warehouse is the technology for the future.
• data warehouse enables knowledge worker to
make faster and better decisions
16
References
17