L4. Datawarehouse Architecture PDF
L4. Datawarehouse Architecture PDF
5
Sourcing, Acquisition, Clean-up and Transformation
Tools (ETL)
The data sourcing, transformation, and migration tools are used for performing all the conversions,
summarizations, and all the changes needed to transform data into a unified format in the data
warehouse. They are also called Extract, Transform and Load (ETL) Tools.
Their functionality includes:
• Anonymize data as per regulatory stipulations.
• Eliminating unwanted data in operational databases from loading into Data warehouse.
• Search and replace common names and definitions for data arriving from different sources.
• Calculating summaries and derived data
• In case of missing data, populate them with defaults.
• De-duplicated repeated data arriving from multiple data sources.
Metadata
The name Meta Data suggests some high-level technological Data
Warehousing Concepts. However, it is quite simple. Metadata is data
about data which defines the data warehouse. It is used for building,
maintaining and managing the data warehouse.
10
Data Warehouse Design Process
• Top-down, bottom-up approaches or a combination of both
• Top-down: Starts with overall design and planning (mature)
• Bottom-up: Starts with experiments and prototypes (rapid)
• From software engineering point of view, the design and construction of a data warehouse may consist
of the following steps: planning, requirements study, problem analysis, warehouse design, data
integration and testing, and finally deployment of the data warehouse.
• Waterfall: structured and systematic analysis at each step before proceeding to the next
• Spiral: rapid generation of increasingly functional systems, short turn around time, quick turn
around
11
Data Warehouse Usage
• Three kinds of data warehouse applications
• Information processing
• supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts
and graphs
• Analytical processing
• multidimensional analysis of data warehouse data
• supports basic OLAP operations, slice-dice, drilling, pivoting
• Data mining
• knowledge discovery from hidden patterns
• supports associations, constructing analytical models, performing classification and
prediction, and presenting the mining results using visualization tools
12
THANK YOU
13