0% found this document useful (0 votes)
122 views45 pages

Unit 2 Data Warehouse New

This document provides an overview of data warehouse and OLAP technology. It discusses key concepts such as the components of a data warehouse including operational data sources, operational data stores, load managers, warehouse managers, query managers, and end user access tools. It also covers ETL processes, data warehouse architectures including star schemas and snowflake schemas, differences between OLTP and OLAP systems, and considerations for data warehouse implementation and conceptual modeling.

Uploaded by

SUMAN SHEKHAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views45 pages

Unit 2 Data Warehouse New

This document provides an overview of data warehouse and OLAP technology. It discusses key concepts such as the components of a data warehouse including operational data sources, operational data stores, load managers, warehouse managers, query managers, and end user access tools. It also covers ETL processes, data warehouse architectures including star schemas and snowflake schemas, differences between OLTP and OLAP systems, and considerations for data warehouse implementation and conceptual modeling.

Uploaded by

SUMAN SHEKHAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 45

Unit 2

Data Warehouse and OLAP Technology


• A data warehouse is simply a single, complete and consistent store of
data optained from a variety of sources and made available to end
users in a way they can understand and use it in a business context.
• A data warehouse is a subject oriented ,integrated, time variant and
nonvolatile collection of data in support of managements decision
making process.
Data warehouse- subject oriented
• Oriented to the major subject areas of the corporation that have been
defined in the data model.
• for example, for an insurance company :customer, product,
transaction or activity, policy ,claim, account etc.
Data warehouse-Integrated
• There is no consistency in encoding, naming conventions, among
different data sources.
• heterogeneous data sources
• when data is moved to the warehouse, it is converted.
Data warehouse- nonvolatile
• Operational data is regularly accessed and manipulated a record at a
time and update is done to data in the operational environment.
Data warehouse- time variance
• That time Horizon for the data warehouse is sufficiently longer than
that of operational systems.
• operational database: current value data
Building blocks or component
• Meta data -good metadata is essential to the effective operation of a
data warehouse and it is used in data collection, data transformation
and data access.
• Meta data maps the translation of information from the operational
system to the analytical system.
Data marts
• Data mart are smaller than data warehouses and generally contain
information from a single department of a business or organisation.
The current trend in data warehouseing is to develop a data
warehouse with several smaller related data marts for specific kinds
of queries and reports.
Security
• As with any information system security of data is determined by the
hardware software and the procedures that created them. The
reliability and authenticity of the data and information extracted from
the warehouse will be a function of the reliability and authenticity of
the warehouse and the various source systems.
Construction
• That steps in planning of data warehouse are identical to the steps
for any other type of computer application. Users must be involved to
determine the scope of the warehouse and what business
requirements need to be met.
Why a warehouse
• Two approaches:
• 1.Query-driven (lazy)
• 2.Warehouse (Eager)
• The traditional research
• Query driven( lazy, on demand)
Disadvantages of query driven approach
• Delay in query processing.
• Slow or unavailable information sources
• complex filtering and integration
• inefficient and potentially expensive for frequent queries
• competes with local processing at sources
• has not caught on in industry
The warehousing approach
• Information integrated in advance
• stored in warehouse for Direct.
• Advantages of warehousing approach
• High query performance
• but not necessarily most current information
• does not interfere with local processing at sources
• complex queries at warehouse.
Data warehouse architectures
• 1. Single layer
• every data element is stored once only
• virtual warehouse
• 2.Two layer
• real time+ derived data
• most commonly used approach in industry today
• 3. three layered architecture
• transformation of real time data to derived data really requires two steps: view
level ‘particular informational needs’
• physical implementation of the data warehouse.
Data warehouse architecture
Data warehouse components
• 1.Operational data sources-For the data warehouse is supplied from
mainframe operational data held in first generation hierarchical and
network data bases ,departmental data held in File systems, private
data held on work stations and private servers and external systems
such as the internet, commercially available database or database
associated with and organisationa’s suppliers or customers.
• 2. Operational datastore (ODS)- is a repository of current and
integrated operational data used for analysis. it is often structured
and supplied with data in the same way as the data warehouse, but
may in fact simply act as a staging area for data to be moved into the
warehouse.
• 3. Load manager-Also called the front-end company, it performance
all the operations associated with the extraction and loading of data
into the warehouse. these operations include simple transformations
of the data to prepare the data for entry into the warehouse.
• 4. Warehouse manager- performs all the operations associated with
the management of the data in the warehouse. The operations
performed by this component include analysis of data to ensure
consistency, transformation and merging of source data, creation of
indexes and views, generation of denormalisation and aggregations.
• 5. Query manager- also called back and component, it performs all
the operations associated with the management of user queries. The
operations performed by this component include directing queries to
the appropriate tables and scheduling the execution of queries.
• 6. End user access tools -can be categorised into five main groups
data reporting and query tools, application development tools,
executive information system tools, online analytical processing tools
and data mining tools.
• Diagram in data warehouse slide.
Data warehouse implementation
• Includes loading data, Implementing transformation program, design
user interface, developing standard query and reports and training to
warehouse users.
ETL in data warehouse
• The process of extracting data from source system and bringing it into
the data warehouse is commonly called ETL which stands for:
• Extraction -to retrieve all the required data from the source system
with as little resources as possible.
• Transformation –Applies a set of rules to transform the data from the
source to the target .
• converting any measured data to the same dimension using the same
units so that they can later be joined.
• it also requires joining data from several sources, generating
aggregates, sorting, deriving new calculated values.
• Loading-To ensure that the load is performed correctly and with as
little resources as possible. The target of the load process is often a
database. The referential integrity needs to be maintained by ETL tool
to ensure consistency.
Advantages of data warehouse
implementation
• 1. Better data management and delivery -one of the most important
advantages of using a data warehousing system in the organisation is
efficient data management and delivery .It helps in the storage of all
types of data from different sources into a single base that can be
used for analysis purposes.
• 2. Better decision making- the use of effective inside cell business
intelligence the management of the organisation can take effective
decisions based on solid data analysis.
• Cost reduction -it helps in avoiding duplication of works that
ultimately helps in reducing the cost and increasing the efficiency of
the organisation.
• Competitive advantages- as the organisation is able to make effective
decision, they would be ready to out with their competitors as they
are able to fully utilise their resources and can focus on activities in a
better way.
Data processing models
• There are two basic data processing motels
• 1. 0LTP-The main aim of OLTP is reliable and efficient processing of a
large number of transactions and ensuring data consistency.
• 2. OLAP- The main aim of OLEP is efficient multi dimensional
processing of large data volumes.
Traditional OLTP
• Traditionally DBMS Have been used for online transaction processing OLTP
• Order entry :pull up order and update status field
• banking: transfer rupees thousand from account X to account Y
• critical data processing task
• detailed up to date data
• structured repetitive tasks
• Short transactions are the unit of work
• read and update a few records
• isolation, recovery and integrity are critical
OLTP vs OLAP
• OLTP: online transaction processing
• describes processing at operational sites
• OLAP :online analytical processing
• describes processing at warehouse
Comparison of 0LTP system and data
warehousing system
Conceptual modelling of data warehouse
• Three basic conceptual DBMS schemas:
• Star schema
• snowflake schema
• fact constellation
Star schema
• A single object in the middle connected to a number of dimension tables.
• Terms
• Basic notion: a measure(e.g sales quality etc.)
• given collection of numeric measures
• each measures depends on a set of dimensions (e.g sales volumes as a function of
product ,time and location )
• relation which relates the dimensions to the measure of interest is called the fact table(e.g
sale)
• information about dimensions can be represented as a collection of relations called the
dimension table(Product, Customer ,store)
• each dimension can have a set of associated attributes.
• Diagram in data warehouse slide
Snowflake schema
• A refinement of Star schema where the dimensional hierarchy is
represented explicitly by normalising the dimension tables.
• Diagram in data warehouse slide
Fact constellation
• Multiple fact table share dimension tables database design methodology for data
warehouse
• 1. choosing the process
• 2 choosing the grain
• 3. identifying and confirming the dimensions
• 4. choosing the facts
• 5.storing the precalculation in fact table
• 6.rounding out the dimension tables
• 7. choosing the duration of the database
• 8. tracking slowly changing dimensions
• 9.deciding the query priorities and the query modes.
• Choosing the process-
• the process (function) refer to the subject matter of a particular data
marts. The first data mart to be built should be the one that is most
likely to be delivered on time within budget and to answer the most
commercial important business questions.
• The best choice for the first data mart tends to be the one that is
related to sales.
• Choosing the grain-
• Choosing the grain means deciding exactly what affect people record
represents.
• Only when the grain for the fact table is chosen we can identify the
dimensions of the fact table.
• The grain decision for the fact table also determines the grain of each
of the dimension tables.
• Identifying and conforming the dimensions-
• Dimensions set the context for formulating queries about the facts in
the fact table.
• We identify dimensions in sufficient detail to describe things such as
clients and properties at the correct Grain.
• Choosing the facts-
• The grain of the fact table determines which facts can be used in the
data mart -all facts must be expressed at the level implied by The
Grain.
• Storing pre-calculation in the fact table-
• Once the facts have been selected it should be re-examined to
determine whether there are opportunities to use pre-
calculations.ex :a profit or loss statement.
• Rounding out the dimensions tables-
• In this step we return to the dimention tables and add as many text
descriptions to the dimensions as possible.
• The text description should be as understandable to the users as
possible.
• Choosing the duration of the data warehouse-
• The duration measures how far back in time the fact table goes .
• for some companies(e.g insurance companies) there may be a legal
requirement to retain data extending back five or more years.
• Tracking slowly changing dimensions-
• The changing dimension problem means that the proper description
of the old client and the old branch must be used with the old data
warehouse schema.
• Deciding the query priorities and the query moves-
• In this step we consider physical design issues.
• The presence of pre- stored summaries and aggregates
• security issue
• backup issue etc

You might also like