0% found this document useful (0 votes)
29 views6 pages

Data Warehouse 1

A data warehouse is a database used for reporting and analysis rather than transaction processing. It contains a subject-oriented, integrated, non-volatile collection of data from multiple sources to support management decision making. Data is organized using dimensional modeling with fact and dimension tables. A data warehouse uses a star or snowflake schema and contains historical data to analyze trends over time for reporting purposes.

Uploaded by

Pramodh Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views6 pages

Data Warehouse 1

A data warehouse is a database used for reporting and analysis rather than transaction processing. It contains a subject-oriented, integrated, non-volatile collection of data from multiple sources to support management decision making. Data is organized using dimensional modeling with fact and dimension tables. A data warehouse uses a star or snowflake schema and contains historical data to analyze trends over time for reporting purposes.

Uploaded by

Pramodh Gowda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Warehouse

What is Data Warehouse?

A Data warehouse is a database used for reporting.

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile


collection of data in support of management's decision-making process.

 Subject Oriented: A data warehouse can be used to analyze subject area. For
example, "Sales" can be a particular subject.
 Integrated: A data warehouse integrates data from multiple data sources.
 Time Variant: Historical data is kept in a data warehouse. For example, one can
retrieve data from 3 months, 6 months, 12 months, or even older data from a
data warehouse.
 Non-Volatile: Once data is in the data warehouse, it will not change. So, historical
data in a data warehouse should never be altered.

Difference between DWH and Web Application Project Life cycle

 Traditional projects start with requirements and end with data.


 Data warehousing projects start with data and end with requirements.
 Basically, a database is any system which keeps data in a table format.
 A data warehouse is a especially setup database designed to hold large amounts
of data for reporting purposes.

Types of Data bases:

 Normal Database: Normal Database is optimized for transactional activity for


keeping a small amount of data.

Data Warehouse:

A data warehouse will be optimized for large scale reporting.

 Within a data warehouse, data from several systems will typically merge to
present a global enterprise view.
 Data warehouses will also typically keep a very long history from several years to
the entire life of the company so that very long-term trends can be viewed.
 Because historical data is the backbone of any business for mission critical
business decisions.
Data Warehouse
Now Question here is:

 Why Business Intelligence systems are using Data Warehouse rather than
Normal database to pull historical data?
 What is the difference between Database and Data warehouse while both of
them have some tables with Data, Index and constraints etc.,

Here are the differences:

Normal Database:
 Used for Online Transaction Processing (OLTP). This records the data from the
user for history.
 The tables and joins are complex since they are normalized. This is done to
reduce redundant data and to save storage space.
 Entity - Relational (ER) modeling techniques are used for database design.
 Optimized for write operation.
 Performance is low for analysis queries.

Data warehouse:
 Used for Online Analytical Processing (OLAP). This reads the historical data for
the users for business Decisions
 The tables and joins are simple since they are de-normalized. This is done to
reduce the response time for analytical queries.
 Dimension - Modeling techniques are used for the Data warehouse design.
 Optimized for read operations
 High performance for analytical queries.

What is Data Mart?


A Data Mart (DM) is a specific, subject oriented, repository of data designed to answer
specific questions for a specific set of users. So an organization could have multiple
data marts serving the needs of Sales, Marketing, etc., A data mart usually is organized
as one dimensional model as a star-schema (OLAP cube) made of a fact table and
multiple dimension tables.
Data Mart Examples: Finance, Sales, Marketing
 The data comes from operational information that is needed by a particular
group of employees for analysis, presentations all in terms that are familiar to
them.

 Data for Data Mart is derived from a data warehouse or from Source
systems.
Data Warehouse
Data Mart vs. Data Warehouse:
 A Data Mart stores Department data (A single subject).
 A DWH stores enterprise data (Integration of multiple subjects)
 Data Mart is designed for middle management
 DWH designed for TOP management access.
 A Data Warehouse (DWH) is a single organizational repository of enterprise
wide data across many or all subject areas. A DWH incorporate information
about many subject areas (HR, Sales, Marketing) -- often the entire
enterprise. The Data Mart represents only a portion of an enterprise's data --
perhaps data related to department or functional (Ex. HR, Sales, &
Marketing).
 The ultimate goal with any integrated information system whether it is a Data
Mart or DWH is to provide consistent, accurate data about the organization to
the users.
 Department (HR) - focused Data Marts have only the information that groups
needs.
 Each Department has its own specific uses for Data Mart, which often ignore
the information needs of other areas
 Typically, a data mart's data is targeted to a small audience of end users.
 The data mart is typically easier to build than enterprise-wide DWH.
 Data Mart can be quickly implemented; and offers fast access for the users.

OLTP VS OLAP (DWH):

Features of OLTP (Online Transaction Processing) systems:


 OLTP systems handle day-to-day transactions and operations of the
business.
 OLTP systems store, update and retrieve Operational Data. Operational Data
is the data that runs the business.
 OLTP systems are highly normalized.

E.g. Accounting system, Banking Application, Payroll system, Order Management


System (OMS), Airline reservation system etc.
Property OLTP OLAP (Data Warehouse)
Response Time Seconds to seconds Second to minutes
Nature of Data 30 – 60 days or 1 yr to 2 yr Quarter, month , Decade
Deals with Current Data Historical data
Size MB to GB GB to TB
Activities Processes Analysis
No. of Records One record at a time Thousand to millions of
records
Data Warehouse
Data Modeling:
Data Modeling is a process of designing Data Base with a set of tables.

The DWH is designed with following types of Schemas.


 Star Schema
 Snow Flake Schema
Database architecture (or) Data Modeler is responsible for designing the DWH.

Dimensional Data Model:


 Dimensional data model is most often used in data warehousing systems.
 In designing data models for data warehouses / data marts, the most commonly
used schema types are
Star Schema and
Snowflake Schema.
• Whether one uses a star or a snowflake largely depends on business needs.

Star Schema:
 Star Schema is a database which contains a centrally located "FACT” table,
which is surrounded by "DIMENSION" tables. Since the DB design looks like a
star, hence it's called as Start Schema DB design.
 A star schema can be simple or complex.
 A simple star consists of one fact table; a complex star can have more than one
fact table.

For example, Assume our data warehouse keeps store sales data, and the different
dimensions are time, store, product, and customer. In this case, the figure shown in the
above slide represents the star schema. The lines between two tables indicate that
there is a primary key / foreign key relationship between the two tables.
Data Warehouse
SnowFlake Schema :
 The snowflake schema is an extension of the star schema, where each point of
the star explodes into more points.
 In a star schema, each dimension is represented by a single dimensional table,
whereas in a snowflake schema, that dimensional table is normalized into
multiple tables, each representing a level in the dimensional hierarchy.

For example, the Time Dimension that consists of 2 different hierarchies: 1. Year
Month Day

 We will have 3 tables in the above snowflake schema diagram: A table for year, a
table for month, and a table for Day. Year is connected to Month, which is then
connected to Day.

Fact Table:-
 A fact table is the centrally table in a star schema of a data warehouse.
 A fact table consists of facts of a particular business process e.g., sales revenue
by month by product.
 Fact table contains only numerical values.

ID Contact Number Pin Code


1001 9999999999 7654568
1002 8888987888 7675890
1003 6543579668 1344422
1004 6647477474 6664467
Data Warehouse
Dimension Table:-
 A Dimension Table is a table in a star schema of a data warehouse.
 Dimension table are De-Normalize
 Each dimension is represented as a single table. The primary key in each
dimension table is related to a foreign key in the fact table.

You might also like