Chapter 2.introduction To Data Warehouse
Chapter 2.introduction To Data Warehouse
Data Warehouse and DBMS Architecture of Data Warehouse, Multidimensional data model
Concepts of OLAP and Data Cube
OLAP operations
Dimensional Data Modelling- Star, Snow flake schemas
What is Data Warehouse
Organized around major subjects, such as customer, supplier, product sales atc.
Provide a simple and concise view around particular subject issues by excluding data
that is not useful for decision process
Modeled in accordance with decision makers needs and not transaction processing
needs
Integrated –
Constructed by integrating multiple, heterogeneous data sources- relational databases,
flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Ensure consistency in naming conventions, encoding structures, attribute measures,
etc. among different data sources
Time-variant : The time horizon for the data warehouse is significantly longer than that of
operational systems
Operational database: current value data
Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse Contains an element of time, explicitly or implicitly
Operational update of data does not occur in the data warehouse environment
Does not require transaction processing, recovery, and concurrency control mechanisms
Does not require transaction processing, recovery, and concurrency control mechanisms
Requires only two operations in data accessing: initial loading of data and access of data
What is Data Warehousing
The major task of online operational database systems is to perform online transaction and query processing.
These systems are called online transaction processing(OLTP) systems.
They cover most of the day-to-day operations of an organization such as purchasing, inventory, manufacturing,
banking, payroll, registration, and accounting.
OLAP
Data warehouse systems, on the other hand, serve users or knowledge workers in the role of data analysis and
decision making.
Such systems can organize and present data in various formats in order to accommodate the diverse needs of
different users.
These systems are known as online analytical processing (OLAP) systems.
Distinct features (OLTP vs. OLAP):
Through the bottom level of the cube to its back-end relational tables (using SQL)
OLAP Operations : Drilling across
involving (across) more than one fact table. The operation, often referred
to as "OLAP Join"
Drilling across simply means making separate queries against two or
more fact tables where the row headers of each query consist of identical
conformed attributes. The answer sets from the two queries are aligned
by performing a sort-merge operation on the common dimension
attribute row headers.
Conceptual Modeling of Relational Data Model
Example:
If we want to create exam database. Student will appear for the exam. Student can attempt only one or more exam. But one exam
can be given by many students.
To store information for above we will required a database. To design database for above scenario ,we have to find out the entities
and relationship among them.
Entities will be: Student and exam
Relationship will be: Student appears for exam
We can represent above structure using ER Model as
1 m
Student Appear Exam
Using above diagram It is clear that ,database will contain data for student and exam.
2 104 12/02/2016 67
Exam
Conceptual Modelling of Data warehouses
Design Decisions
Choosing the process – selecting the subjects for the first set of logical structures to be designed
Ex. Dimensions
Time , location , item , branch
Choosing the facts-Selecting the metrics or units of measurements to be included
Facts
Quantity of sales, Sales Amount
Choosing the duration of the database- determining how far back in time one should go for historical data
Conceptual Modelling of Data warehouses
Dimension 4
Dimension 1
Facts
Dimension 2
Dimension 5
A star schema
In star schema data warehouse contains
1. a large central table (fact table) containing the bulk of the data, with no
redundancy
2. a set of smaller attendant tables (dimension tables), one for each
dimension. Dimension 3
Dimension 4
Dimension 1
Facts
Dimension 2
Dimension 5
Example of Star Schema
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
STAR SCHEMA KEYS
•Avoid built-in meanings in the primary key of the
dimension table
•Do not use operational system keys as primary keys
• Operational system keys contain built in meaning
• The keys may be reassigned, thus giving wrong
aggregate value
• Use keys which are system generated sequence numbers
• The operational system keys can be stored as attributes
in dimension table.
Advantages of Star Schema
Measures
Advantages of Snowflake Schema
• Small savings in storage space.
• Normalized structures are easier to update and maintain