Data Warehousing
Data Warehousing
Subject Oriented
• A data warehouse is also
subject-oriented, which means
that the data is organized around
specific subjects.
• This allows for easy access to the
data relevant to a specific
subject, as well as the ability to
track the data over time.
• ex- sales information,
customer information
Characteristics of Data Warehouse
Integrated
Integrated data means a data
warehouse stores data from
multiple sources by standardizing
and formatting all data into a
single, consistent format to
support accurate reporting and
analysis.
Characteristics of Data Warehouse
Time Variant
Historical information is kept in a
data warehouse. For example, one
can retrieve files from 3 months, 6
months, 12 months, or even
previous data from a data
warehouse. These variations with a
transactions system, where often
only the most current file is kept.
Characteristics of Data Warehouse
Non-volatile
Another characteristic of a data
warehouse is that it is non-
volatile. This means that the data
in the warehouse is never updated
or deleted, only added to. This is
important because it allows for
the preservation of historical data,
making it possible to track trends
and patterns over time.
Example Applications of Data
Warehousing
• Data Warehousing can be applied anywhere where we have a huge
amount of data and we want to see statistical results that help in decision
making.
• Social Media Websites: The social networking websites like Facebook,
Twitter, Linkedin, etc. are based on analyzing large data sets. These sites
gather data related to members, groups, locations, etc., and store it in a
single central repository. Being a large amount of data, Data Warehouse is
needed for implementing the same.
• Banking: Most of the banks these days use warehouses to see the
spending patterns of account/cardholders. They use this to provide them
with special offers, deals, etc.
• Government: Government uses a data warehouse to store and analyze tax
payments which are used to detect tax thefts.
Need for Data Warehouse
Need for Data Warehouse
• Improved data quality: Data warehousing can help improve data quality
by consolidating data from various sources into a single, consistent view.
• Faster access to information: Data warehousing enables quick access to
information, allowing businesses to make better, more informed
decisions faster.
• Better decision-making: With a data warehouse, businesses can analyze
data and gain insights into trends and patterns that can inform better
decision-making.
• Reduced data redundancy: By consolidating data from various sources,
data warehousing can reduce data redundancy and inconsistencies.
• Scalability: Data warehousing is highly scalable and can handle large
amounts of data from different sources.
Disadvantages:
Enterprise Data Warehouse (EDW) is a centralized warehouse. It provides decision support service across
the enterprise. It offers a unified approach for organizing and representing data. It also provide the ability to
classify data according to the subject and give access according to those divisions.
Operational Data Store, which is also called ODS, are nothing but data store required when neither Data
warehouse nor OLTP systems support organizations reporting needs. In ODS, Data warehouse is refreshed in real
time. Hence, it is widely preferred for routine activities like storing records of the Employees.
3. Data Mart:
A data mart is a subset of the data warehouse. It specially designed for a particular line of business,
such as sales, finance, sales or finance. In an independent data mart, data can collect directly from sources.
Enterprise Data Warehouse (EDW)
Operational Data Store
Data Mart
Data Warehouse Architecture: Basic
Operational System
An operational system is a method used in data
warehousing to refer to a system that is used to process the
day-to-day transactions of an organization.
Flat Files
A Flat file system is a system of files in which
transactional data is stored, and every file in the system
must have a different name.
Meta Data
• A set of data that defines and gives information about
other data.
• Meta Data summarizes necessary information about
data,
• For example, author, data build, and data changed, and
file size are examples of very basic document metadata.
Data Warehouse Architecture: Basic
Lightly and highly summarized data
The area of the data warehouse saves all the
predefined lightly and highly summarized (aggregated) data
generated by the warehouse manager.
End-User access Tools
The principal purpose of a data warehouse is to
provide information to the business managers for strategic
decision-making. These customers interact with the
warehouse using end-client access tools.
The examples of some of the end-user access tools can be
• Reporting and Query Tools
• Application Development Tools
• Executive Information Systems Tools
• Online Analytical Processing Tools
• Data Mining Tools
Data Warehouse Architecture: With Staging Area
• We must clean and process your operational
information before put it into the warehouse.
• We can do this programmatically, although
data warehouses uses a staging area (A place
where data is processed before entering the
warehouse).
• A staging area simplifies data cleansing and
consolidation for operational method coming
from multiple source systems, especially for
enterprise data warehouses where all relevant
data of an enterprise is consolidated.
• Data Warehouse Staging Area is a temporary
location where a record from source systems
is copied.
Data Warehouse Architecture: With Staging Area and
Data Marts
We may want to customize our warehouse's
architecture for multiple groups within our
organization.
• We can do this by adding data marts. A data
mart is a segment of a data warehouses that
can provided information for reporting and
analysis on a section, unit, department or
operation in the company, e.g., sales, payroll,
production, etc.
• The figure illustrates an example where
purchasing, sales, and stocks are separated. In
this example, a financial analyst wants to
analyze historical data for purchases and sales
or mine historical information to make
predictions about customer behavior.
Properties of Data Warehouse Architectures
Properties of Data Warehouse Architectures
• 1. Separation: Analytical and transactional processing should be keep
apart as much as possible.
• 2. Scalability: Hardware and software architectures should be simple to
upgrade the data volume, which has to be managed and processed, and
the number of user's requirements, which have to be met, progressively
increase.
• 3. Extensibility: The architecture should be able to perform new
operations and technologies without redesigning the whole system.
• 4. Security: Monitoring accesses are necessary because of the strategic
data stored in the data warehouses.
• 5. Administerability: Data Warehouse management should not be
complicated.
Dimensional Modeling
Fact
It is a collection of associated data items, consisting of measures and context data. It typically
represents business items or business transactions.
Dimensions
It is a collection of data which describe one business dimension. Dimensions decide the
contextual background for the facts, and they are the framework over which OLAP is performed.
Measure
It is a numeric attribute of a fact, representing the performance or behavior of the business
relative to the dimensions.
Fact Table
Fact tables are used to data facts or measures in the business. Facts are the numeric data
elements that are of interest to the company.
Dimension Table
Dimension tables establish the context of the facts. Dimensional tables store fields that describe
the facts.
Example of Fact and Dimension Table
Data Cube
There is some condition which cannot be meet by star schemas like the
relationship between the user, and bank account cannot describe as
star schema as the relationship between them is many to many.
Example: Suppose a star schema is composed of a fact table, SALES,
and several dimension tables connected to it for time, branch, item,
and geographic locations.
Snowflake Schema
• Star schema contains just one dimension table for one dimension entry while there
may exist dimension and sub-dimension table for one entry.
• Normalization is used in snowflake schema which eliminates the data redundancy. As
against, normalization is not performed in star schema which results in data
redundancy.
• Star schema is simple, easy to understand and involves less intricate queries. On the
contrary, snowflake schema is hard to understand and involves complex queries.
• The data model approach used in a star schema is top-down whereas snowflake
schema uses bottom-up.
• Star schema uses a fewer number of joins. On the other hand, snowflake schema uses a
large number of joins.
• The space consumed by star schema is more as compared to snowflake schema.
• The time consumed for executing a query in a star schema is less. Conversely, snowflake
schema consumes more time due to the excessive use of joins.
OLAP(Online Analytical Processing)
• These are intermediate servers which stand in between a relational back-end server and user
frontend tools.
• They use a relational or extended-relational DBMS to save and handle warehouse data, and OLAP
middleware to provide missing pieces.
• ROLAP servers contain optimization for each DBMS back end, implementation of aggregation
navigation logic, and additional tools and services.
• ROLAP technology tends to have higher scalability than MOLAP technology.
• ROLAP systems work primarily from the data that resides in a relational database, where the base
data and dimension tables are stored as relational tables. This model permits the multidimensional
analysis of data.
• This technique relies on manipulating the data stored in the relational database to give the presence
of traditional OLAP's slicing and dicing functionality. In essence, each method of slicing and
dicing is equivalent to adding a "WHERE" clause in the SQL statement.
Relational OLAP (ROLAP) Server
Advantages
• Can handle large amounts of information: The data size limitation of ROLAP technology is depends on the data size
of the underlying RDBMS. So, ROLAP itself does not restrict the data amount.
• <="" strong="">RDBMS already comes with a lot of features. So ROLAP technologies, (works on top of the RDBMS)
Disadvantages
• Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL queries) in the relational database, the
• Limited by SQL functionalities: ROLAP technology relies on upon developing SQL statements to query the relational
• Can perform complex calculations: All evaluation have been pre-generated when the cube
is created. Hence, complex calculations are not only possible, but they return quickly.
Disadvantages
• Limited in the amount of information it can handle: Because all calculations are
performed when the cube is built, it is not possible to contain a large amount of data in the
cube itself.
• Requires additional investment: Cube technology is generally proprietary and does not
already exist in the organization. Therefore, to adopt MOLAP technology, chances are other
investments in human and capital resources are needed.
Hybrid OLAP (HOLAP) Server
Advantages of HOLAP
1.HOLAP provide benefits of both MOLAP and ROLAP.
3.HOLAP balances the disk space requirement, as it only stores the aggregate
information on the OLAP server and the detail record remains in the relational
database. So no duplicate copy of the detail record is maintained.
Disadvantages of HOLAP
3.HOLAP architecture is very complicated because it supports both MOLAP
and ROLAP servers.
Other Types of OLAP Servers
Other Types
There are also less popular types of OLAP styles upon which one could stumble upon every so often. We have listed some of the less popular
brands existing in the OLAP industry.
WOLAP pertains to OLAP application which is accessible via the web browser. Unlike traditional client/server OLAP applications, WOLAP
is considered to have a three-tiered architecture which consists of three components: a client, a middleware, and a database server.
DOLAP permits a user to download a section of the data from the database or source, and work with that dataset locally, or on their desktop.
Mobile OLAP enables users to access and work on OLAP data and applications remotely through the use of their mobile devices.
SOLAP includes the capabilities of both Geographic Information Systems (GIS) and OLAP into a single user interface. It facilitates the
management of both spatial and non-spatial data.
ROLAP MOLAP
ROLAP stands for Relational Online Analytical Processing. MOLAP stands for Multidimensional Online Analytical Processing.
It usually used when data warehouse contains relational data. It used when data warehouse contains relational as well as non-relational
data.
It has a high response time It has less response time due to prefabricated cubes.