Data Warehouse 1
Data Warehouse 1
Subject Oriented: A data warehouse can be used to analyze subject area. For
example, "Sales" can be a particular subject.
Integrated: A data warehouse integrates data from multiple data sources.
Time Variant: Historical data is kept in a data warehouse. For example, one can
retrieve data from 3 months, 6 months, 12 months, or even older data from a
data warehouse.
Non-Volatile: Once data is in the data warehouse, it will not change. So, historical
data in a data warehouse should never be altered.
Data Warehouse:
Within a data warehouse, data from several systems will typically merge to
present a global enterprise view.
Data warehouses will also typically keep a very long history from several years to
the entire life of the company so that very long-term trends can be viewed.
Because historical data is the backbone of any business for mission critical
business decisions.
Data Warehouse
Now Question here is:
Why Business Intelligence systems are using Data Warehouse rather than
Normal database to pull historical data?
What is the difference between Database and Data warehouse while both of
them have some tables with Data, Index and constraints etc.,
Normal Database:
Used for Online Transaction Processing (OLTP). This records the data from the
user for history.
The tables and joins are complex since they are normalized. This is done to
reduce redundant data and to save storage space.
Entity - Relational (ER) modeling techniques are used for database design.
Optimized for write operation.
Performance is low for analysis queries.
Data warehouse:
Used for Online Analytical Processing (OLAP). This reads the historical data for
the users for business Decisions
The tables and joins are simple since they are de-normalized. This is done to
reduce the response time for analytical queries.
Dimension - Modeling techniques are used for the Data warehouse design.
Optimized for read operations
High performance for analytical queries.
Data for Data Mart is derived from a data warehouse or from Source
systems.
Data Warehouse
Data Mart vs. Data Warehouse:
A Data Mart stores Department data (A single subject).
A DWH stores enterprise data (Integration of multiple subjects)
Data Mart is designed for middle management
DWH designed for TOP management access.
A Data Warehouse (DWH) is a single organizational repository of enterprise
wide data across many or all subject areas. A DWH incorporate information
about many subject areas (HR, Sales, Marketing) -- often the entire
enterprise. The Data Mart represents only a portion of an enterprise's data --
perhaps data related to department or functional (Ex. HR, Sales, &
Marketing).
The ultimate goal with any integrated information system whether it is a Data
Mart or DWH is to provide consistent, accurate data about the organization to
the users.
Department (HR) - focused Data Marts have only the information that groups
needs.
Each Department has its own specific uses for Data Mart, which often ignore
the information needs of other areas
Typically, a data mart's data is targeted to a small audience of end users.
The data mart is typically easier to build than enterprise-wide DWH.
Data Mart can be quickly implemented; and offers fast access for the users.
Star Schema:
Star Schema is a database which contains a centrally located "FACT” table,
which is surrounded by "DIMENSION" tables. Since the DB design looks like a
star, hence it's called as Start Schema DB design.
A star schema can be simple or complex.
A simple star consists of one fact table; a complex star can have more than one
fact table.
For example, Assume our data warehouse keeps store sales data, and the different
dimensions are time, store, product, and customer. In this case, the figure shown in the
above slide represents the star schema. The lines between two tables indicate that
there is a primary key / foreign key relationship between the two tables.
Data Warehouse
SnowFlake Schema :
The snowflake schema is an extension of the star schema, where each point of
the star explodes into more points.
In a star schema, each dimension is represented by a single dimensional table,
whereas in a snowflake schema, that dimensional table is normalized into
multiple tables, each representing a level in the dimensional hierarchy.
For example, the Time Dimension that consists of 2 different hierarchies: 1. Year
Month Day
We will have 3 tables in the above snowflake schema diagram: A table for year, a
table for month, and a table for Day. Year is connected to Month, which is then
connected to Day.
Fact Table:-
A fact table is the centrally table in a star schema of a data warehouse.
A fact table consists of facts of a particular business process e.g., sales revenue
by month by product.
Fact table contains only numerical values.