0% found this document useful (0 votes)
25 views29 pages

Unit-1.1 Data Warehouse

A data warehouse is a specialized database model designed for analyzing large, multidimensional data sets by integrating data from various sources for decision-making purposes. It differs from traditional databases in that it is subject-oriented, time-variant, and non-volatile, allowing for historical data analysis and strategic decision support. Key components of a data warehouse include source data, data staging, storage, metadata, and management control, all of which work together to provide a comprehensive and efficient analytical environment.

Uploaded by

kirpabajaj2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views29 pages

Unit-1.1 Data Warehouse

A data warehouse is a specialized database model designed for analyzing large, multidimensional data sets by integrating data from various sources for decision-making purposes. It differs from traditional databases in that it is subject-oriented, time-variant, and non-volatile, allowing for historical data analysis and strategic decision support. Key components of a data warehouse include source data, data staging, storage, metadata, and management control, all of which work together to provide a comprehensive and efficient analytical environment.

Uploaded by

kirpabajaj2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

DATA

WAREHOUSE
INTRODUCTION
 A data warehouse is a powerful database model that significantly
enhances the user’s ability to quickly analyze large,
multidimensional data sets.
 A data warehouse is constructed by integrating data from multiple
heterogeneous sources that support analytical reporting, structured
and/or ad hoc queries, and decision making.
 Data warehousing involves data cleaning, data integration, and data
consolidations.
DATABASE SYSTEM VS. DATA
WAREHOUSE
Parameter Data Warehouse Database
Transactional and
Workloads Analytical
Operational
It is subject-focused since it provides
information on a certain topic rather
than information about a company's Removes redundancy and
Characteristics current activities. The data also has to offers security. It allows for
be stored in a unanimously acceptable numerous data views.
manner and data warehouse in
common.
It stores both historical and current
The data in the database is
Data Type data. It is possible that the data is out
updated.
of date.

Might not be updated. Depends on the


Orientation Real-time
frequency of ETL processes.
Parameter Data Warehouse Database

Purpose Designed to analyze Designed to record

A database's tables and


Tables and Tables and joins are straightforward joins are complicated
Joins since they're denormalized. because they're
normalized.

Data is updated from


Availability It is available in real-time. source systems when
needed.

Technique Analyze data Capture data

Complex queries are


Simple transaction queries are
Query Type utilized for analytical
implemented.
reasons.

Flexible or rigid schema


Schema Fixed and pre-defined schema
based on the type of
Flexibility definition for ingest.
database.
Parameter Data Warehouse Database

Data scientists and business


Users Application developers
analysts.

Processing It uses OLAP (Online Analytical It makes use of OLTP (Online


Method Processing). Transactional Processing).

Data from any number of apps is Generally confined to a


Storage Limit
stored. particular application.

ER modeling approaches are


Data modeling approaches are
employed for designing. It
Usage employed for designing. It permits
aids in the execution of basic
you to analyze your enterprise.
business procedures

Banking, universities, airlines,


Healthcare sector, airline, retail finance, telecommunication,
Applications chain, insurance sector, banking, manufacturing, sales and
and telecommunication. production, and HR
management.
NEED OF DATA
WAREHOUSING
 As an enterprise grows larger, hundreds of computer applications are
needed to support the various business processes. These applications are
effective in what they are designed to do. They gather, store, and process
all the data needed to successfully perform the daily operations. They
provide online information and produce a variety of reports to monitor and
run the business.
 The operational computer systems did provide information to run the day-
to-day operations, but what the executives needed were different kinds of
information that could be readily used to make strategic decisions.
1) Business User: Business users require a data warehouse to view
summarized data from the past. Since these people are non-technical,
the data may be presented to them in an elementary form.
2) Store historical data: Data Warehouse is required to store the time
variable data from the past. This input is made to be used for various
purposes.
3) Make strategic decisions: Some strategies may be depending upon
the data in the data warehouse. So, data warehouse contributes to
making strategic decisions.
4) For data consistency and quality: Bringing the data from different
sources at a commonplace, the user can effectively undertake to bring
the uniformity and consistency in data.
5) High response time: Data warehouse has to be ready for somewhat
unexpected loads and types of queries, which demands a significant
degree of flexibility and quick response time.
BUILDING BLOCKS OF DATA
WAREHOUSE
SOURCE DATA COMPONENT
 Production Data: This type of data comes from the different operating
systems of the enterprise. Based on the data requirements in the data
warehouse, we choose segments of the data from the various operational
modes.
 Internal Data: In each organization, the client keeps their "private"
spreadsheets, reports, customer profiles, and sometimes even department
databases. This is the internal data, part of which could be useful in a data
warehouse.
 Archived Data: Operational systems are mainly intended to run the
current business. In every operational system, we periodically take the old
data and store it in achieved files.
 External Data: Most executives depend on information from external
sources for a large percentage of the information they use. They use
statistics associating to their industry produced by the external
department.
DATA STAGING COMPONENT
 After we have been extracted data from
various operational systems and external
sources, we have to prepare the files for
storing in the data warehouse.
 The extracted data coming from several
different sources need to be changed,
converted, and made ready in a format
that is relevant to be saved for querying
and analysis.
DATA STORAGE
COMPONENTS
Information Delivery
Component
 The information delivery
element is used to enable the
process of subscribing for data
warehouse files and having it
transferred to one or more
destinations according to some
customer-specified scheduling
algorithm.
METADATA COMPONENT
 Metadata in a data warehouse is equal to the data dictionary or the data
catalog in a database management system.
 In the data dictionary, we keep the data about the logical data structures,
the data about the records and addresses, the information about the
indexes, and so on.
DATA MARTS
 It includes a subset of corporate-wide data that is of value to a specific
group of users. The scope is confined to particular selected subjects.
 Data in a data warehouse should be a fairly current, but not mainly up to
the minute, although development in the data warehouse industry has
made standard and incremental data dumps more achievable.
 Data marts are lower than data warehouses and usually contain
organization.
 The current trends in data warehousing are to developed a data warehouse
with several smaller related data marts for particular kinds of queries and
reports.
MANAGEMENT AND
CONTROL COMPONENT
 The management and control elements coordinate the services and
functions within the data warehouse.
 These components control the data transformation and the data transfer
into the data warehouse storage.
 On the other hand, it moderates the data delivery to the clients. Its work
with the database management systems and authorizes data to be
correctly saved in the repositories.
 It monitors the movement of information into the staging method and from
there into the data warehouses storage itself.
FEATURES OF DATA
WAREHOUSE
SUBJECT-ORIENTED
 A data warehouse target
on the modeling and
analysis of data for
decision-makers.
 Therefore, data
warehouses typically
provide a concise and
straightforward view
around a particular
subject, such as
customer, product, or
sales, instead of the
global organization's
ongoing operations.
INTEGRATED
 A data warehouse integrates
various heterogeneous data
sources like RDBMS, flat files,
and online transaction records.
 It requires performing data
cleaning and integration during
data warehousing to ensure
consistency in naming
conventions, attributes types,
etc., among different data
sources.
TIME-VARIANT
 Historical information is kept in a data warehouse.
 For example, one can retrieve files from 3 months, 6 months, 12 months, or
even previous data from a data warehouse.
 These variations with a transactions system, where often only the most
current file is kept.
NON-VOLATILE
 The data warehouse is a physically separate data storage, which is
transformed from the source operational RDBMS.
 The operational updates of data do not occur in the data warehouse, i.e.,
update, insert, and delete operations are not performed.
 It usually requires only two procedures in data accessing: Initial loading of data
and access to data.
 Therefore, the DW does not require transaction processing, recovery, and
concurrency capabilities, which allows for substantial speedup of data retrieval.
 Non-Volatile defines that once entered into the warehouse, and data should not
change.
A THREE TIER DATA
WAREHOUSE ARCHITECTURE
 Data Warehouses usually have a three-level (tier) architecture that
includes:
 Bottom Tier (Data Warehouse Server)
 Middle Tier (OLAP Server)
 Top Tier (Front end Tools).
BOTTOM-TIER
 A bottom-tier that consists of the Data Warehouse server, which is
almost always an RDBMS. It may include several specialized data
marts and a metadata repository.
 Data from operational databases and external sources (such as user
profile data provided by external consultants) are extracted using
application program interfaces called a gateway. A gateway is
provided by the underlying DBMS and allows customer programs to
generate SQL code to be executed at a server.
 Examples of gateways contain ODBC (Open Database Connection)
and OLE-DB (Open-Linking and Embedding for Databases), by
Microsoft, and JDBC (Java Database Connection).
MIDDLE-TIER
 A middle-tier which consists of an OLAP server for fast querying
of the data warehouse.
 The OLAP server is implemented using either

1) A Relational OLAP (ROLAP) model, i.e., an extended


relational DBMS that maps functions on multidimensional data
to standard relational operations.
2) A Multidimensional OLAP (MOLAP) model, i.e., a particular
purpose server that directly implements multidimensional
information and operations.
 A top-tier that
contains front-end tools for displaying results
provided by OLAP, as well as additional tools for data mining of the
OLAP-generated data.
 The metadata repository stores information that defines DW
objects. It includes the following parameters and information for the
middle and the top-tier applications:
 A description of the DW structure, including the warehouse schema,
dimension, hierarchies, data mart locations, and contents, etc.
 Operational metadata, which usually describes the currency level of
the stored data, i.e., active, archived or purged, and warehouse
monitoring information, i.e., usage statistics, error reports, audit,
etc.
 System performance data, which includes indices, used to improve
data access and retrieval performance.
 Information about the mapping from operational databases, which
provides source RDBMSs and their contents, cleaning and
transformation rules, etc.
 Summarization algorithms, predefined queries, and reports business
data, which include business terms and definitions, ownership
information, etc.
METADATA
 Metadata is information about the data in the data warehouse.
 It includes information about the data sources, data transformations,
data models, and other information that is needed to manage and
use the data warehouse.
 The metadata repository is a database that stores this information.
 For example, a line in sales database may contain:

4030 KJ732 299.90


 This is a meaningless data until we consult the Meta that tell us it
was
 Model number: 4030
 Sales Agent ID: KJ732
 Total sales amount of $299.90
 Metadata helps to answer the following questions
 What tables, attributes, and keys does the Data Warehouse
contain?
 Where did the data come from?
 How many times do data get reloaded?
 What transformations were applied with cleansing?
 Metadata can be classified into following categories:
 1. Technical Meta Data: This kind of Metadata contains information
about warehouse which is used by Data warehouse designers and
administrators.
 2. Business Meta Data: This kind of Metadata contains detail that
gives end-users a way easy to understand information stored in the
data warehouse.

You might also like