Unit 1 - DWM
Unit 1 - DWM
(4m)
The data warehouse is a collection of data that is subject-oriented, integrated, time-variant,
and non-volatile, which can be used for strategic decisions
Q. differentiate between OLAP and OLTP Operational database (online transaction processing
[OLTP]) – always insertion updation deletion operations are going on
1. Operation warehouse (Online analytical processing [OLAP])
OLTP OLAP
Users Clerk, IT professional Knowledge worker
Function Day to day operations Decision dupport
DB design Application oriented Subject oriented
Data Current, up to date, detailed, Historical, summarized,
flat relational isolated multidimensional, integrated,
consolidated
Usage Repetitive Ad-hoc
Access Read/ write Lots of scans
Index/hash
Unit of work Short, simple Complex query
#records accessed Tens Millions
#users Thousands hundreds
DB size 100 MB - GB 100 GB - TB
metric Transaction throuhput Query throughput, response
Data extraction
o Get data from multiple, heterogenous, and external sources
Data cleaning
o Detect errors in the data and rectify them when possible
Data transformation
o Convert data from legacy to host format to warehouse format
Load
o Sort, summarize, consolidate, compute views, check integrity, and build indices
and partition
Refresh
o Propagate the updates from the data source to the warehouse
Normalization
Min-Max normalization
V= value to change
Z score/zero mean normalization
OLAP
ROLAP is used for large data While it is used for limited data
2.
volumes. volumes.
Multidimensional model –
1. Star schema:
a fact table in the middle connected to a set of dimension tables
2. Snowflake schema
3. Fact constellation
OLAP operations
Roll-up operator
Performs aggregation on a data cube, either by climbing up a concept hierarchy for a
dimension or by dimension reduction
Drill down operator
It can be realized by either stepping down a concept hierarchy, for a dimension or
introducing additional dimensions
Transpose = Pivot (rotate)
Visualization operation that rotates the data access in view In order to provide an
alternative presentation of the data
Slice operation
Performs a selection on 1 dimension of given cube, resulting in a sub-cube
Dice operation
Defines a sub-cube by performing a selection on 1 or more than one dimensions
Data warehouse information flow
Inflow – processes
Cleaning includes removing inconsistencies, adding missing fields, cross-checking for data
integrity
Transforming includes adding date/time stamp fields, summarizing detailed data, deriving new
fields to the calculated data
upflow
process which adds value to the data in warehouse through
Summarizing
o Choose, project, join, group data
o Summarize – identify trends, clustering, sampling
Packaging
o Converting data to summarized info – spreadsheet, doc, chart, graphs, db,
animation etc
Distribution in groups to increase availability and accessibility
Bitmap Indices
Used in situation where the types of values small. Ex. Gender – M,F
Special type of index for
Used in places which has less unique values