0% found this document useful (0 votes)
21 views8 pages

Unit 1 - DWM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views8 pages

Unit 1 - DWM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Q. What is data warehousing?

(4m)
The data warehouse is a collection of data that is subject-oriented, integrated, time-variant,
and non-volatile, which can be used for strategic decisions
Q. differentiate between OLAP and OLTP Operational database (online transaction processing
[OLTP]) – always insertion updation deletion operations are going on
1. Operation warehouse (Online analytical processing [OLAP])

OLTP OLAP
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision dupport
DB design Application oriented Subject oriented
Data Current, up to date, detailed, Historical, summarized,
flat relational isolated multidimensional, integrated,
consolidated
Usage Repetitive Ad-hoc
Access Read/ write Lots of scans
Index/hash
Unit of work Short, simple Complex query
#records accessed Tens Millions
#users Thousands hundreds
DB size 100 MB - GB 100 GB - TB
metric Transaction throuhput Query throughput, response
 Data extraction
o Get data from multiple, heterogenous, and external sources
 Data cleaning
o Detect errors in the data and rectify them when possible
 Data transformation
o Convert data from legacy to host format to warehouse format
 Load
o Sort, summarize, consolidate, compute views, check integrity, and build indices
and partition
 Refresh
o Propagate the updates from the data source to the warehouse

Normalization
 Min-Max normalization
V= value to change
 Z score/zero mean normalization

OLAP

ROLAP MOLAP HOLAP

ROLAP: Relational Online Analytical Processing


MOLAP: Multidimensional Online Analytical Processin
HOLAP: Hybrid Online Analytical Processing

S.NO ROLAP MOLAP

ROLAP stands for Relational While MOLAP stands


1. Online Analytical for Multidimensional Online
Processing. Analytical Processing.

ROLAP is used for large data While it is used for limited data
2.
volumes. volumes.

3. The access of ROLAP is slow. While the access of MOLAP is fast.


In ROLAP, Data is stored in While in MOLAP, Data is stored in
4.
relation tables. multidimensional array.

In ROLAP, Data is fetched While in MOLAP, Data is fetched


5.
from data-warehouse. from MDDBs database.

In ROLAP, Complicated sql While in MOLAP, Sparse matrix is


6.
queries are used. used.

In ROLAP, Static While in MOLAP, Dynamic


7. multidimensional view of data multidimensional view of data is
is created. created.

Basis ROLAP MOLAP


Storage location for summary Relational database is used as Multidimensional database is
aggregation storage location for summary used as storage location for
aggregation summary aggregation
Processing time Processing time of ROLAP is Processing time of MOLAP is
very slow fast
Storage space and Large storage space Medium storage space
requirement requirement in ROLAP as requirement in ROLAP as
compared to MOLAP and compared to MOLAP and
HOLAP HOLAP
Storage location for detail Relational database is used as Multidemensional database
data storage location for detail is used as storage location for
data detail data
Latency Low latency in ROLAP as High latency in MOLAP as
compared to MOLAP and compared to ROLAP and
HOLAP HOLAP
Query response time Slow query response time in Fast query response time in
ROLAP as compared to MOLAP as compared to
MOLAP and HOLAP ROLAP and HOLAP
Multidimensional data cuboid model

Multidimensional model –
1. Star schema:
a fact table in the middle connected to a set of dimension tables

2. Snowflake schema
3. Fact constellation
OLAP operations
Roll-up operator
 Performs aggregation on a data cube, either by climbing up a concept hierarchy for a
dimension or by dimension reduction
Drill down operator
 It can be realized by either stepping down a concept hierarchy, for a dimension or
introducing additional dimensions
Transpose = Pivot (rotate)
 Visualization operation that rotates the data access in view In order to provide an
alternative presentation of the data
Slice operation
 Performs a selection on 1 dimension of given cube, resulting in a sub-cube
Dice operation
 Defines a sub-cube by performing a selection on 1 or more than one dimensions
Data warehouse information flow

Inflow – processes
Cleaning includes removing inconsistencies, adding missing fields, cross-checking for data
integrity
Transforming includes adding date/time stamp fields, summarizing detailed data, deriving new
fields to the calculated data

upflow
process which adds value to the data in warehouse through
 Summarizing
o Choose, project, join, group data
o Summarize – identify trends, clustering, sampling
 Packaging
o Converting data to summarized info – spreadsheet, doc, chart, graphs, db,
animation etc
 Distribution in groups to increase availability and accessibility

Bitmap Indices
Used in situation where the types of values small. Ex. Gender – M,F
 Special type of index for
 Used in places which has less unique values

You might also like