0% found this document useful (0 votes)
27 views17 pages

Unit 2 DATA WAREHOUSE AND DATA MART

A Data Warehouse is a centralized repository that collects and manages data from various sources to support decision-making processes, while a Data Mart is a simplified version focused on a single subject. The document outlines the differences between operational database systems (OLTP) and data warehouses (OLAP), emphasizing their distinct purposes, data management, and design structures. It also describes various schemas used in dimensional modeling, OLAP operations, and types of data marts, highlighting their architectures and data sourcing methods.

Uploaded by

worlddependsonme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views17 pages

Unit 2 DATA WAREHOUSE AND DATA MART

A Data Warehouse is a centralized repository that collects and manages data from various sources to support decision-making processes, while a Data Mart is a simplified version focused on a single subject. The document outlines the differences between operational database systems (OLTP) and data warehouses (OLAP), emphasizing their distinct purposes, data management, and design structures. It also describes various schemas used in dimensional modeling, OLAP operations, and types of data marts, highlighting their architectures and data sourcing methods.

Uploaded by

worlddependsonme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

DATA WAREHOUSE AND

DATA MART

By
DEVIPRIYA P/AP/MCA
DATA WAREHOUSE
What is Data Warehouse?
– A Data Warehouse collects and manages data from
varied sources to provide meaningful business
insights.
– A data ware house is a subject-oriented, integrated,
time-variant, and nonvolatile collection of data in
support of management’s decision making process.
– From other repository systems, such as relational
database systems, transaction processing systems,
and file systems.
Differences between operational Database
Systems and data warehouses
• The major task of on-line operational databse systems is to
perform on-line transaction and query processing. These system
are called on-line transaction processing (OLTP) system.
• They cover most of the day-to-day operations of an organization,
such as purchasing, inventory, manufacturing, banking, payroll,
registration, and accounting,.
• Data warehouse systems, on the other hand, serve users or
knowledge workers in the role of data analysis and decision
making.
• Such systems can organize and present data in various formats in
order to accommodate the diverse needs of the different users.
These systems are known as on-line analytical processing(OLAP)
systems.
• The major distinguishing features between OLTP and OLAP are
summarized as follows:
The major distinguishing features between OLTP and OLAP as follows:

OLTP OLAP
USERS AND SYSTEM ORIENTATION: USERS AND SYSTEM ORIENTATION:
An OLTP system is customer-oriented and is used for An OLAP system is market-oriented and is used for data
transaction and query processing by clerks, clients analysis by knowledge workers, including managers,
and information technology professionals. executives, and analysts.

Data contents: Data contents:


An OLTP system manages current data that, typically, An OLAP system manages large amounts of historical
are too detailed to be easily used for decision making. data, provides facilities for summarization and
aggregation, and stores and manages information at
different levels of granularity.

Data Base design: Data Base design:


An OLTP system usually adopts and entity- An OLAP system typically adopts either a star or
relationship (ER) data model and an application- snowflake model and a subject-oriented database design.
oriented database design.
View: View:
An OLTP system focuses mainly on the current data An OLAP system often spans multiple versions of a
within an enterprise or department, without referring database schema, due to the evolutionary process of an
to historical data or data indifferent organizations. organization.

Access patterns: Access patterns:


The access patterns of an OLTP system consist Accesses to OLAP systems are mostly read-only
mainly of short, atomic transactions. Such a system operations, although many could be complex queries.
requires concurrency control and recovery
mechanisms.
A Multidimensional Data Model
• Data warehouses and OLAP tools are based on a
multidimensional data model.
• This model views data in the form of a data cube.
• There are three basic schemas that are used in
dimensional modeling:
1. Star schema
2. Snowflake schema
3. Fact constellation schema
Star schema
The multidimensional view of data that is expressed using relational data base
semantics is provided by the data base design called star schema.
The basic of star schema is that information can be classified into two groups:
1.Facts 2.Dimension
Snowflake schema
• The snowflake schema is a variant of the star schema model, where some
dimension tables are normalized, thereby further splitting the data into additional
tables. The resulting schema graph forms a shape similar to a snowflake.
Fact constellation schema
• Sophisticated applications may require multiple fact tables to share
dimension tables.
Measures: Their categorization and computation
• A data cube measure is a numerical function that can be evaluated at each
point in the data cube space.
• A measure value is computed for a given point by aggregating the data
corresponding to the respective dimension-value pairs defining the given
point.
• Measures can be organized into three categories based on the kind of
aggregate functions used.
– Distributive
• An aggregate function is distributive if it can be computed in a distributed
manner. The distributive aggregate functions are count(), sum(), min(), max().
– Algebraic
• An aggregate function is algebraic if it can be computed by an algebraic
function with M arguments, each of which is obtained by applying a
distributive aggregate function. For example, avg() can be computed by
sum()/count(), where both sum() and count() are distributive aggregate
functions.
– Holistic
• An aggregate function is holistic if there is no constant bound on the storage
size needed to describe a subaggregate.
Concept Hierarchies
• A concept hierarchy defines a sequence of mappings from a set of low-
level concepts to higher-level, more general concepts.
• Consider a concept hierarchy for the dimension location.
• City values for location include vancouver, Toronto, New york, and
chicago.
OLAP Operations in the Multidimensional Data
Model
• “How are concept hierarchies useful in OLAP?” In the multidimensional
model, data are organized into multiple dimensions, and each dimension
contains multiple levels of abstraction defined by concept hierarchies.
• This organization provides users with the flexibility to view data from
different perspectives.
• A number of OLAP data cube operations exist to materialize these different
views, allowing interactive querying and analysis of the data at hand.
• Hence, OLAP provides a user-friendly environment for interactive data
analysis.
– Roll-up: The roll-up operation(also called the drill-up operation by some vendors) performs
aggregation on a data cube, either by climbing up a concept hierarchy for a dimension or by
dimension reduction.
– Drill-down: Is the reverse of roll-up. It navigates from less detailed data to more detailed
data.
– Slice and dice: The slice operation performs a selection on one dimension of the given cube,
resulting in a sub cube.
– Pivot (rotate): Pivot ( also called rotate) is a visualization operation that rotates the data axes
in view in order to provide an alternative presentation of the data.
Three tier data warehouse architecture.
Data Mart
What is Data Mart?
A data mart is a simple form of a Data
Warehouse. It is focused on a single subject.
Data Mart draws data from only a few sources.
These sources may be central Data warehouse,
internal operational systems, or external data
sources. There are three types of data mart.
• Independent Data Mart
• Dependent Data Mart and Operational Data Store
• Logical Data Mart and Real-Time Data Warehouse
Independent data mart data Data marts:
warehousing architecture Mini-warehouses, limited in scope

T
E
Separate ETL for each Data access complexity
independent data mart due to multiple data marts
Dependent data mart with operational data ODS provides option for
store: a three-level architecture obtaining current data

T
E
Simpler data access
Single ETL for Dependent data marts
enterprise data warehouse (EDW) loaded from EDW
Logical data mart and real time
ODS and data warehouse
warehouse architecture are one and the same

T
E
Near real-time ETL for Data marts are NOT separate databases,
Data Warehouse but logical views of the data warehouse
 Easier to create new data marts

You might also like