0% found this document useful (0 votes)

15 views71 pages

Module 1 (2)

The document provides an overview of data warehousing and OLAP, detailing the concepts, architecture, and operations involved in data warehouses. It explains the characteristics of data warehouses, such as being subject-oriented, integrated, time-variant, and non-volatile, and contrasts OLAP with OLTP systems. Additionally, it covers the importance of ETL processes, data modeling techniques, and the functionalities of OLAP systems for decision support and data analysis.

Uploaded by

Akhila K T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views71 pages

Module 1 (2)

Uploaded by

Akhila K T

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 71

Module 1

Data warehousing and OLAP

Contents
• Data Warehouse basic concepts
• Data Warehouse Modeling
• Data cube and OLAP
 Characteristics of OLAP systems
 Multidimensional view and Data cube
 Data Cube Implementations
 Data Cube operations
 Implementation of OLAP and overview on OLAP software
 Typical OLAP Operations
What is Data Warehouse?
• A data warehouse is a centralized repository that stores large volumes
of data from multiple sources for analysis and reporting. Unlike
traditional databases, which are optimized for transactional processing,
data warehouses are specifically designed to support complex queries,
data analysis, and reporting, enabling organizations to make data-driven
decisions

• “A data warehouse is a subject-oriented, integrated, time-variant, and

nonvolatile collection of data in support of management’s decision-
making process.”—W. H. Inmon

• Data Warehousing – Process of constructing and using data warehouses

Data Warehouse – Subject Oriented
• Data is organized around specific subjects or areas of interest, such
as sales, finance, or customer information, rather than individual
transactions
• Focusing on the modeling and analysis of data for decision makers,
not on daily operations or transaction processing
• Provide a simple and concise view around particular subject issues by
excluding data that are not useful in the decision support process
Data Warehouse - Integrated
• Constructed by integrating multiple, heterogeneous data sources
• relational databases, flat files, on-line transaction records

• Data cleaning and data integration techniques are applied.

• Ensure consistency in naming conventions, encoding structures,
attribute measures, etc. among different data sources
• E.g., Hotel price: currency, tax, breakfast covered, etc

• When data is moved to the warehouse, it is converted

Data Warehouse – Time Variant
• The time horizon for the data warehouse is significantly longer than that of
operational systems
• Operational database: current value data
• Data warehouse data: provide information from a historical perspective (e.g.,
past 5-10 years)
• Every key structure in the data warehouse
• Contains an element of time, explicitly or implicitly
• But the key of operational data may or may not contain “time element”
Data Warehouse – Non Volatile
• A physically separate store of data transformed from the operational
environment
• Operational update of data does not occur in the data warehouse
environment
• Does not require transaction processing, recovery, and concurrency
control mechanisms
• Requires only two operations in data accessing:
• initial loading of data and access of data
OLTP VS. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date historical,
detailed, flat relational summarized, multidimensional
isolated integrated, consolidated
usage repetitive ad-hoc
access read/write lots of scans
index/hash on prim. key
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Why a Separate Data Warehouse?
• High performance for both systems
• DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery
• Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view,
consolidation
• Different functions and different data:
• missing data: Decision support requires historical data which operational DBs do
not typically maintain
• data consolidation: DS requires consolidation (aggregation, summarization) of
data from heterogeneous sources
• data quality: different sources typically use inconsistent data representations,
codes and formats which have to be reconciled
• Note: There are more and more systems which perform OLAP analysis directly on
relational databases
Operational Data Stores(ODS)
• An ODS is designed to provide a consolidated view of the enterprise’s
current operational information
• An ODS has been defined by Inmon and Imhoff (1996) as follows

“An Operational Data Store is a subject-oriented, integrated, volatile, current

valued data store, containing only corporate detailed data”
• Subject-oriented (University- students, lecturers and courses)
• Integrated
• Volatile -> Data changes as new information refreshes the ODS
• Detailed

An ODS may be viewed as a short term memory

ODS Contd..
ODS – Reporting tool for administrative purpose (Sales total, orders filled)
ODS – Product and location codes
ODS – CRM (Customer Relationship Management)
ODS Design and Implementation
Why a Separate Database
ODS should be separate from the operational databases is that from time to
time complex queries are likely to degrade the performance of the OLTP
systems.

The OLTP systems have to provide a quick response to operational users and
business cannot afford to have response time suffer when a manager is
running a complex query.
Data Mart
A Data Mart is a subset of a data warehouse that is designed to focus on a
specific area or department of an organization, such as sales, finance,
marketing, or human resources.

Data marts are typically smaller in scope than a full enterprise data
warehouse (EDW) and are optimized to meet the needs of specific users or
business functions.
Data Warehouse: A Multi-Tiered Architecture
Three Data Warehouse Models
• Enterprise warehouse
• collects all of the information about subjects spanning the entire
organization
• Data Mart
• a subset of corporate-wide data that is of value to a specific groups of
users. Its scope is confined to specific, selected groups, such as marketing
data mart
• Independent vs. dependent (directly from warehouse) data mart
• Virtual warehouse
• A set of views over operational databases
• Only some of the possible summary views may be materialized
Extraction, Transformation, and Loading
(ETL)
• Data extraction
• get data from multiple, heterogeneous, and external sources
• Data cleaning
• detect errors in the data and rectify them when possible
• Data transformation
• convert data from legacy or host format to warehouse format
• Load
• sort, summarize, consolidate, compute views, check integrity, and build
indices and partitions
• Refresh
• propagate the updates from the data sources to the warehouse
Metadata Repository
• Meta data is the data defining warehouse objects. It stores:
• Description of the structure of the data warehouse
• schema, view, dimensions, hierarchies, derived data defn, data mart locations and
contents
• Operational meta-data
• data lineage (history of migrated data and transformation path), currency of data
(active, archived, or purged), monitoring information (warehouse usage statistics, error
reports, audit trails)
• The algorithms used for summarization
• The mapping from operational environment to the data warehouse
• Data related to system performance
• warehouse schema, view and derived data definitions
• Business data
Conceptual Modeling of Data Warehouses
• Modeling data warehouses: dimensions & measures
• Star schema: A fact table in the middle connected to a set of dimension
tables
• Snowflake schema: A refinement of star schema where some
dimensional hierarchy is normalized into a set of smaller dimension
tables, forming a shape similar to snowflake
• Fact constellations: Multiple fact tables share dimension tables, viewed
as a collection of stars, therefore called galaxy schema or fact
constellation
Example of Star Schema
Example of Snowflake Schema
Example of Fact Constellation(Galaxy
schema)
Data Warehouse Implementation
• Centralized
• Distributed

Steps:
• Requirement analysis and capacity planning
• Hardware integration
• Physical modeling
• Sources
• ETL
• Populate the data warehouse
• User application
• Roll-out the warehouse and application
DW Implementation Guidelines
• Build incrementally
• Need a champion
• Senior management support
• Ensure Quality
• Corporate strategy
• Business plan
• Training
• Adaptability
• Joint management
OLAP
• In 1993, E.F Codd presented this somewhat difficult to understand
definition of OLAP:

“OLAP is dynamic enterprise analysis required to create, manipulate,

animate and synthesise information from exegetical, contemplative and
formulaic data analysis models”

Exegetical – The information is manipulated from the point of view of a

manager
Contemplative – From the point of view of someone who has thought about
it
Formulaic - According to some formula
OLAP (Contd..)
• OLAP is software technology that enables analysts, managers and
executives to gain insight into data through fast, consistent, interactive
access to a wide variety of possible views of information that has been
transformed from raw data to reflect the real dimensionality of the
enterprise

• OLAP is fast analysis of shared multidimensional information for advanced

analysis.
• This definition is also called as FASMI, implies that most OLAP queries
should be answered within seconds.
CHARACTERISTICS OF OLAP SYSTEMS
• Users – select group of managers/dozens of users
• Functions – ad hoc driven and often much more complex operations.
• Nature – Involve complex queries to pull many records at a time and provide
summary/aggregate data to a manager
- OLAP apps often involve data stored in a data warehouse extracted
from many tables i.e., from more than one enterprise data base
• Design – view enterprise information as multidimensional
• Data- require historical data over several years since trends are often
important in decision making
• Kind of use – normally no data updates
FASMI Characteristics
• Fast – OLAP queries are answered very quickly (within seconds)
- Pre-compute the most commonly queried aggregates and compute
the remaining on-the-fly.
• Analytic – provide rich analytic functionality
- queries answered without any programming
• Shared – shared by hundreds of users
- should provide adequate security for confidentiality as well as
integrity
- concurrency control is required
• Multidimensional – whatever OLAP software is used, it must provide a
multidimensional conceptual view of data
FASMI Characteristics
• Information – obtain info from data warehouse
- should be able to handle large amount of input data
Codd’s OLAP Characteristics
• Codd’s et al’s 1993 paper listed 12 characteristics (rules) of OLAP systems.
Another six in 1995.
• All the 18 rules are available at https://www.olapreport.com/fasmi.htm

1) Multidimensional conceptual view – helps to carryout slice and dice

operations
2) Accessibility (OLAP as a mediator) – between data sources (e.g. a data
warehouse) and an OLAP front-end
3) Batch extraction vs interpretive – multidimensional data staging plus
partial precalculation of aggregates in last multidimensional databases.
Codd’s OLAP Characteristics (Contd..)
4) Multi-user support
5) Storing OLAP results – OLAP results data should be kept separate from
source data.
- Read-write OLAP applications should not be implemented directly on
live transaction data if OLAP systems are supplying info to the OLAP system
directly
6) Extraction of missing values – OLAP should distinguish missing values from
zero values to compute aggregate correctly
7) Treatment of missing values – ignoring missing values
8) Uniform reporting performance – increasing the number of dimensions or
database size should not degrade the reporting performance of OLAP system
9) Generic dimensionality – each dimension should be treated as equivalent
in structure as well as operational capabilities
Codd’s OLAP Characteristics (Contd..)
10) Unlimited dimensions and aggregation levels
Motivations for using OLAP
Examples to illustrate the types of information that OLAP tools can help
in discovering

1) Understanding and improving sales

2) Understanding and reducing costs of doing business
Multi dimensional Data model
• The multidimensional data model is a key concept in data warehousing
and OLAP (Online Analytical Processing) systems, designed to organize and
present data in a way that facilitates efficient querying and reporting. It
represents data in the form of a multi-dimensional structure, often
referred to as a data cube, which allows users to perform complex queries
and analyses on large datasets, particularly for decision-making purposes.
• Key concepts:
1) Dimensions - Dimensions are perspectives or entities with respect to
which an organization wants to keep records.
Dimensions are often organized into hierarchies. For example, in a Time
dimension, data can be analyzed by year, quarter, month, and day.
Multi dimensional Data model (Contd..)
2) Facts - Facts are the numerical measures or metrics that are analyzed in
relation to the dimensions. These could include values like sales revenue,
profit, quantity sold, or any other key performance indicator (KPI).
Facts are stored in a fact table, which typically contains keys referencing
related dimensions, along with the numerical values (metrics) being
measured.

3) Data Cube - The data is stored in a structure called a data cube (even if it
may have more than three dimensions).
The cube allows for multidimensional analysis, enabling users to slice, dice,
drill down, or roll up the data for in-depth analysis.
Multi dimensional Data model (Contd..)
4) Hierarchies - Each dimension can have levels of granularity in the form of
hierarchies. For example, the Time dimension can have a hierarchy of Year →
Quarter → Month → Day. Users can analyze data at different levels of this
hierarchy (e.g., aggregate sales per month vs. sales per year).
From Tables and Spreadsheets to
Data Cubes
• A data warehouse is based on a multidimensional data model which views data in the form of a data
cube
• A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions
• Dimension tables, such as item (item_name, brand, type), or time(day, week, month, quarter,
year)
• Fact table contains measures (such as dollars_sold) and keys to each of the related dimension
tables
• In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid,
which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms
a data cube.
• Lattice - The lattice of cuboids is the structure of all possible cuboids that can be generated from a
multi-dimensional cube, based on different levels of aggregation.
Cube: A Lattice of Cuboids
all
0-D (apex) cuboid

time item location supplier

1-D cuboids

time,location item,location location,supplier

2-D cuboids
time,supplier item,supplier

time,location,supplier
3-D cuboids
time,item,supplier item,location,supplier

4-D (base) cuboid

A Concept Hierarchy:
Dimension (location)
• A concept hierarchy defines a sequence of mappings from a set of low-level
concepts to higher-level, more general concepts
Concept Hierarchies (Contd..)

(a) A hierarchy for location (total order)

(b) A lattice for time (partial order)

A concept hierarchy that is a total or partial order

among attributes in a database schema is called
a schema hierarchy.
Data Cube Measures: Three Categories
• Distributive: if the result derived by applying the function to n aggregate values is
the same as that derived by applying the function on all the data without
partitioning
• E.g., count(), sum(), min(), max()
• Algebraic: if it can be computed by an algebraic function with M arguments (where
M is a bounded integer), each of which is obtained by applying a distributive
aggregate function
• E.g., avg(), min_N(), standard_deviation()
• Holistic: if there is no constant bound on the storage size needed to describe a
subaggregate.
• E.g., median(), mode(), rank()
Multidimensional Data
• Sales volume as a function of product, month, and region
A Sample Data Cube
Cuboids Corresponding to the Cube
Typical OLAP Operations
1) Roll-up (Drill-up) - This is like zooming-out on the data-cube This is required
when the user needs further abstraction or less detail
• Initially, the location-hierarchy was "street < city < province < country".
• On rolling up, the data is aggregated by ascending the location-hierarchy from the
level-of city to level-of- country

2) Drill-down - This is like zooming-in on the data. This is the reverse of roll-up.
• when the user needs further details or → when the user wants to partition more
finely
• This adds more details to the data. • Initially, the time-hierarchy was "day < month
< quarter < year”
• On drill-down, the time dimension is descended from the level-of-quarter to the
level-of-month
Typical OLAP Operations (Contd..)
3) Slice and Dice – The slice operation performs a selection on one dimension
of the given cube, resulting in a subcube

• The dice operation defines a subcube by performing a selection on two or

more dimensions

4) Pivot (rotate) - This is used when the user wishes to re-orient the view of
the data-cube.
This may involve → swapping the rows and columns or → moving one of the
row-dimensions into the column-dimension.
Typical OLAP Operations (Contd..)
Other operations :
5) Drill-across – executes queries involving (i.e., across) more than one fact table.

6) Drill-through – this operation makes use of relational SQL facilities to drill

through the bottom level of a data cube down to its back-end relational tables
A Star-Net Query Model
• The querying of multidimensional databases can be based on Starnet model.
• A starnet model consists of radial lines emanating from central point, where each
line represents a concept hierarchy for a dimension.
• Each abstraction level in the hierarchy is called a footprint.
• These represent the granularities available for use by OLAP operations such as
drill-down and roll-up.
Design of Data Warehouse: A Business
Analysis Framework
• Four views regarding the design of a data warehouse
• Top-down view
• allows selection of the relevant information necessary for the data
warehouse
• Data source view
• exposes the information being captured, stored, and managed by operational
systems
• Data warehouse view
• consists of fact tables and dimension tables
• Business query view
• sees the perspectives of data in the warehouse from the view of end-user
Data Warehouse Design Process
• Top-down, bottom-up approaches or a combination of both
• Top-down: Starts with overall design and planning (mature)
• Bottom-up: Starts with experiments and prototypes (rapid)
• From software engineering point of view
• Waterfall: structured and systematic analysis at each step before proceeding to the next
• Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn
around
• Typical data warehouse design process
• Choose a business process to model, e.g., orders, invoices, etc.
• Choose the grain (atomic level of data) of the business process , e.g., individual transactions
• Choose the dimensions that will apply to each fact table record
• Choose the measure that will populate each fact table record , e.g., dollars_sold and units_sold
Data Warehouse Development: A
Recommended Approach
Data Warehouse Usage
• Three kinds of data warehouse applications
• Information processing
• supports querying, basic statistical analysis, and reporting using crosstabs,
tables, charts and graphs
• Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
• Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models, performing classification
and prediction, and presenting the mining results using visualization tools
From On-Line Analytical Processing (OLAP)
to On Line Analytical Mining (OLAM)
• Why online analytical mining (OLAP mining)?
Integrates OLAP with data miming and mining knowledge in multidimensional
databases, is particularly important for following reasons:
• High quality of data in data warehouses
• DW contains integrated, consistent, cleaned data
• Available information processing infrastructure surrounding data warehouses
• ODBC, OLE (object linking and embedding) DB, Web accessing, service
facilities, reporting and OLAP tools
• OLAP-based exploratory data analysis
• Mining with drilling, dicing, pivoting, etc.
• On-line selection of data mining functions
• Integration and swapping of multiple mining functions, algorithms, and tasks
2D Representation
• In the 2-D representation, the All Electronics sales data for items sold per
quarter in the city of Vancouver. The measured display in dollars sold (in
thousands).
3D Representation
• To view the data according to time, item as well as the location for the cities
Chicago, New York, Toronto, and Vancouver
• The measured display in dollars sold (in thousands)
• The 3-D data of the table are represented as a series of 2-D tables
3D Data cube
Efficient Data Cube Computation
• Data cube can be viewed as a lattice of cuboids
• The bottom-most cuboid is the base cuboid
• The top-most cuboid (apex) contains only one cell
• How many cuboids in an n-dimensional cube with L levels?
n
T   ( Li 1)
i 1

• Materialization of data cube

• Materialize every (cuboid) (full materialization), none (no materialization), or some
(partial materialization)
• Selection of which cuboids to materialize
• Based on size, sharing, access frequency, etc.
The “Compute Cube” Operator
• Cube definition and computation in DMQL
define cube sales [item, city, year]: sum (sales_in_dollars)
compute cube sales
• Transform it into a SQL-like language (with a new operator cube by, introduced by Gray et al.’96)
SELECT item, city, year, SUM (amount) ()

FROM SALES
(city) (item) (year)
CUBE BY item, city, year
• Need compute the following Group-Bys
(date, product, customer),
(city, item) (city, year) (item, year)
(date,product),(date, customer), (product, customer),
(date), (product), (customer)
() (city, item, year)
• Total number of cuboids computed for this data cube is 2^3 = 8
Indexing OLAP Data
• To facilitate efficient data accessing, most data warehouse systems
support index structures and materialized views (using cuboids)
• The bitmap indexing method is popular in OLAP products because it
allows quick searching in data cubes
• The bitmap index is an alternative representation of the
record_ID(RID) list.
Indexing OLAP Data: Bitmap Index
• Index on a particular column
• Each value in the column has a bit vector: bit-op is fast
• The length of the bit vector: # of records in the base table
• The i-th bit is set if the i-th row of the base table has the value for the indexed column
• Suitable for low cardinality domains
Limitations of OLAP cubes
• OLAP requires restructuring of data into a star/snowflake schema
• There is a limited number of dimensions (fields) a single OLAP cube
• It is nearly impossible to access transactional data in the OLAP cube
• Changes to an OLAP cube requires a full update of the cube – a
lengthy process
Indexing OLAP Data: Join Indices
• The join indexing method gained popularity from its use in relational
database query processing
• Traditional indexing maps the value in a given column to a list of rows having
that value
• In contrast, join indexing registers the joinable rows of two relations from a
relational database
• For example, if two relations R(RID, A) and S(B, SID) join on the attributes A
and B. Then the join index record contains the pair (RID, SID), where RID and
SID are record identifiers from the R and S relations respectively. Hence, the
join index records can identify joinable tuples without performing costly join
operations
• Join indexing is especially useful for maintaining the relationship between a
foreign key and its matching primary keys, from the joinable relation
Indexing OLAP Data: Join Indices (Contd..)
• The star schema model of data warehouses makes join indexing attractive for
cross table search. Because the linkage between a fact table and its
corresponding dimension tables comprises the fact table’s foreign key and
the dimension table’s primary key.
Indexing OLAP Data: Join Indices (Contd..)
• Linkages between a sales fact table and location, item dimension
tables
Indexing OLAP Data: Join Indices (Contd..)
• Join index tables based on the linkages between the sales fact table
and the location and item dimension tables shown in figure below
Efficient Processing OLAP Queries
• The purpose of materializing cuboids and constructing OLAP index
structures is to speed up the query processing in data cubes.
• Given materialized views, query processing should proceed as follows:
1) Determine which operations should be performed on the available
cuboids:
This involves transforming any selection, projection, roll-up (group-by),
and drill-down operations specified in the query into corresponding
SQL and/or OLAP operations
For example, slicing and dicing of a data cube may correspond to
selection and/or projection operations on a materialized cuboid
Efficient Processing OLAP Queries
2) Determine to which materialized cuboid(s) the relevant operations should
be applied:
• This involves identifying all of the materialized cuboids that may potentially
be used to answer the query,
• pruning the above set using knowledge of “dominance” relationships among
the cuboids,
• estimating the costs of using the remaining materialized cuboids, and
• selecting the cuboid with the least cost.
Example: Suppose that we define a data cube for AllElectronics of the
form”sales [time, item, location]: sum(sales_in_dollars)”.
The dimension hierarchies used are “day<month<quarter<year” for time
“item_name<brand<type” for item and
“street<city<province_or_state<country” for location
Query to be processed is on {brand, province_or_state}, with the selection constant
“year=2000” and there are 4 materialized cuboids available:
1) {year, item_name, city}
2) {year, brand, country}
3) {year, brand, province_or_state}
4) {item_name, province_or_state} where year = 2000
Which of the abpve four cuboids should be selected to process the query?
• Cuboid 2 cannot be selected since country is more general concept than
province_or_state
• Cuboids 1, 3 and 4 can be used to process the query since
1) They have the same set or superset of the dimension in the query
2) The selection clause can imply selection in the cuboid
3) The abstraction levels for the item and location dimensions in these cuboids
are at a finer level than brand and province_or_state respectively

How would the costs of each cuboid compare if used to process the query?
• Cuboid 1 would cost the most since item_name and city are at lower level
• If there are not many year values associated with items in the cube, but there
are several item_names for each brand, then cuboid 3 will be smaller than 4
• If efficient indices are available for cuboid 4, then cuboid 4 may be a better
choice

Day1.4 DataWarehousing
No ratings yet
Day1.4 DataWarehousing
32 pages
Informatica FAQs
No ratings yet
Informatica FAQs
143 pages
UNIT I DWDM
No ratings yet
UNIT I DWDM
67 pages
Chap3-Data Warehousing and OLAP
No ratings yet
Chap3-Data Warehousing and OLAP
67 pages
Data Warehouse 2
No ratings yet
Data Warehouse 2
33 pages
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
No ratings yet
Data Mining and Warehousing (203105431) : Sandeep Jangir, Assistant Professor
44 pages
2-Data Warehouse Architecture_ Three-tier Data Warehouse Architecture-16!12!2024
No ratings yet
2-Data Warehouse Architecture_ Three-tier Data Warehouse Architecture-16!12!2024
30 pages
ETL Testing
No ratings yet
ETL Testing
32 pages
chp15 16 17 Warehouse NoSQL
No ratings yet
chp15 16 17 Warehouse NoSQL
38 pages
Data Warehousing unit 1,2
No ratings yet
Data Warehousing unit 1,2
9 pages
Module8 DataWarehousing
No ratings yet
Module8 DataWarehousing
32 pages
ML Module1 Ppt - Copy
No ratings yet
ML Module1 Ppt - Copy
56 pages
AX4-5-Series Hardware and Operational Overview PDF
No ratings yet
AX4-5-Series Hardware and Operational Overview PDF
20 pages
DW Unit-1 (1) XXXXXXXX
No ratings yet
DW Unit-1 (1) XXXXXXXX
70 pages
Unit Ii DWDM
No ratings yet
Unit Ii DWDM
10 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Data Mining and Warehousing
No ratings yet
Data Mining and Warehousing
18 pages
CH - 3
No ratings yet
CH - 3
45 pages
04DWH & Olap
No ratings yet
04DWH & Olap
50 pages
Data Mining and Warehosuing Lecture 01
No ratings yet
Data Mining and Warehosuing Lecture 01
36 pages
CS2202_DataWarehouse_OLAP
No ratings yet
CS2202_DataWarehouse_OLAP
49 pages
Module1 Part3
No ratings yet
Module1 Part3
46 pages
_04OLAP_editted_v1_
No ratings yet
_04OLAP_editted_v1_
59 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
Wk3-4 Data Warehouse
No ratings yet
Wk3-4 Data Warehouse
60 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
No ratings yet
FALLSEM2023-24 CSI3010 ETH VL2023240104197 2023-07-26 Reference-Material-I
28 pages
DWM UNIT-I NOTES
No ratings yet
DWM UNIT-I NOTES
9 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Unit 1
No ratings yet
Unit 1
99 pages
Data Mining Unit 1
No ratings yet
Data Mining Unit 1
46 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Data Warehousing, Business Analytics and Online Analytical -1 (1)
No ratings yet
Data Warehousing, Business Analytics and Online Analytical -1 (1)
35 pages
Data Mining 9,10,11
No ratings yet
Data Mining 9,10,11
27 pages
DWDM Unit 2
No ratings yet
DWDM Unit 2
21 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
Data Mining Unit-2 notes
No ratings yet
Data Mining Unit-2 notes
8 pages
Data Warehouse - Final
No ratings yet
Data Warehouse - Final
28 pages
DMDW1
No ratings yet
DMDW1
13 pages
Lect 14 DM
No ratings yet
Lect 14 DM
33 pages
Defining Data Warehouse Concepts and Terminology
No ratings yet
Defining Data Warehouse Concepts and Terminology
30 pages
UNIT-1 (RIT-062) : Data Warehousing
No ratings yet
UNIT-1 (RIT-062) : Data Warehousing
34 pages
BIDW Concepts
100% (1)
BIDW Concepts
56 pages
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
100% (1)
Final Interview Questions (Etl - Informatica) : Subject Oriented, Integrated, Time Variant, Non Volatile
77 pages
Data Warehousing Basics
No ratings yet
Data Warehousing Basics
20 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Data Mining-Data Warehouse
No ratings yet
Data Mining-Data Warehouse
7 pages
Cisco
100% (2)
Cisco
12 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
Business Intelligence: Lecture # 1
No ratings yet
Business Intelligence: Lecture # 1
30 pages
DW Concepts
100% (1)
DW Concepts
40 pages
Data Warehouse
No ratings yet
Data Warehouse
77 pages
DWDM Book
No ratings yet
DWDM Book
58 pages
Data Warehousing and BA
No ratings yet
Data Warehousing and BA
77 pages
Module 5 Notes
No ratings yet
Module 5 Notes
33 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Data Warehouse Concepts & Terminology: - Vamshi Myana
No ratings yet
Data Warehouse Concepts & Terminology: - Vamshi Myana
39 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Business Intelligence (BI) Using IBM Cognos
No ratings yet
Business Intelligence (BI) Using IBM Cognos
40 pages
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
No ratings yet
How To Scrape Product Data From Amazon - A Complete Guide - Oxylabs
19 pages
Psychometric Success Diagrammatic Reasoning Practice Test 1
No ratings yet
Psychometric Success Diagrammatic Reasoning Practice Test 1
10 pages
International School of Paris - Secondary - School - Handbook
No ratings yet
International School of Paris - Secondary - School - Handbook
45 pages
HCL Industrial Training Report
No ratings yet
HCL Industrial Training Report
40 pages
TSX 172 3428 v1.4 Telemecanique Manual
No ratings yet
TSX 172 3428 v1.4 Telemecanique Manual
70 pages
SSP-546 The Passat 2015 Infotainment and Car-Net
No ratings yet
SSP-546 The Passat 2015 Infotainment and Car-Net
48 pages
Volte Rohc
No ratings yet
Volte Rohc
6 pages
15-441 Computer Networking: Lecture 5 - Ethernet
No ratings yet
15-441 Computer Networking: Lecture 5 - Ethernet
41 pages
Software Development With Natural
No ratings yet
Software Development With Natural
71 pages
poc document
No ratings yet
poc document
5 pages
Answer B 1 2
No ratings yet
Answer B 1 2
7 pages
Block 4 Tia Portal
No ratings yet
Block 4 Tia Portal
4 pages
BOOKLET 2023 - Computing & Programming
No ratings yet
BOOKLET 2023 - Computing & Programming
21 pages
VB12
No ratings yet
VB12
110 pages
Zebra RFD90 Rfid Sled
No ratings yet
Zebra RFD90 Rfid Sled
4 pages
Getting Started With Javaserver Faces 1.2, Part 1:: Building Basic Applications
No ratings yet
Getting Started With Javaserver Faces 1.2, Part 1:: Building Basic Applications
49 pages
Juno Di System Update Version 1.12: © 2012 Roland Corp. U.S
No ratings yet
Juno Di System Update Version 1.12: © 2012 Roland Corp. U.S
3 pages
RRU3908 V2 Hardware Description (V100 - 02)
50% (2)
RRU3908 V2 Hardware Description (V100 - 02)
46 pages
2025 GRANDIOSE MOCK - Computing 2
No ratings yet
2025 GRANDIOSE MOCK - Computing 2
7 pages
4.3 - Content-Based and Geographic Addressing
No ratings yet
4.3 - Content-Based and Geographic Addressing
1 page
Chapter 3 Project
No ratings yet
Chapter 3 Project
7 pages
IS Course
No ratings yet
IS Course
2 pages
Blast Load - Intergraph CADWorx & Analysis
No ratings yet
Blast Load - Intergraph CADWorx & Analysis
4 pages
Distributor Life Cycle Management Identification.1
No ratings yet
Distributor Life Cycle Management Identification.1
7 pages
Creating A Fillable Form On Word
No ratings yet
Creating A Fillable Form On Word
3 pages
QUESTION BANK FOR DCA I SEM Fundamentals of Computer (101) Hi-Tech Institute of Computers
100% (2)
QUESTION BANK FOR DCA I SEM Fundamentals of Computer (101) Hi-Tech Institute of Computers
2 pages
The Data Warehouse Advantage
From Everand
The Data Warehouse Advantage
Pasquale De Marco
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Learn Data Warehousing in 24 Hours
From Everand
Learn Data Warehousing in 24 Hours
Alex Nordeen
No ratings yet

Module 1 (2)

Uploaded by

Module 1 (2)

Uploaded by

Module 1

Data warehousing and OLAP

• “A data warehouse is a subject-oriented, integrated, time-variant, and

• Data Warehousing – Process of constructing and using data warehouses

• Data cleaning and data integration techniques are applied.

• When data is moved to the warehouse, it is converted

“An Operational Data Store is a subject-oriented, integrated, volatile, current

An ODS may be viewed as a short term memory

“OLAP is dynamic enterprise analysis required to create, manipulate,

Exegetical – The information is manipulated from the point of view of a

• OLAP is fast analysis of shared multidimensional information for advanced

1) Multidimensional conceptual view – helps to carryout slice and dice

1) Understanding and improving sales

time item location supplier

time,location item,location location,supplier

4-D (base) cuboid

(a) A hierarchy for location (total order)

A concept hierarchy that is a total or partial order

• The dice operation defines a subcube by performing a selection on two or

6) Drill-through – this operation makes use of relational SQL facilities to drill

• Materialization of data cube

You might also like