0% found this document useful (0 votes)
12 views

Data Mining UNIT - 2 (Data Warehouse Architecture)

Uploaded by

deepakjami27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Data Mining UNIT - 2 (Data Warehouse Architecture)

Uploaded by

deepakjami27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

💽

Data Mining

Data Warehouse Architecture

Bottom Tier :
The bottom tier consists of a warehouse database server, typically implemented as a relational
database system.

Back-end tools and utilities are used to load data into the bottom tier of the architecture from
operational databases or external sources.

These tools and utilities perform :

Data extraction.

Data cleaning.

Data transformation.

The data are extracted using application program interface know as gateways.

Data Mining 1
A gateway, supported by the underlying DBMS, enables client programs to generate SQL code
for execution on a server.

Example :

ODBC (Open Database Connection) by Microsoft.

OLEDB (Open Linking and Embedding for Databases) by Microsoft.

JDBC (Java Database Connection).

This tier is also contains a metadata repository, which stores information about the data
warehouse and its contents.

Extraction, Transformation, and Loading (ETL) :


Data Extraction : Get data from multiple, heterogenous, and external sources.

Data Cleaning : Detects errors in the data and rectify them when possible.

Data Transformation : Convert data from legacy or host format to warehouse format.

Load :

Sort.

Summarize.

Consolidate.

Compute views.

Check integrity.

Build indices and partitions.

Refresh : Transfers updates from the data sources to the data warehouse.

Middle Tier :
The middle tier is an OLAP server that is typically implemented using either :

A Relational OLAP (ROLAP) model extends a traditional relational DBMS by mapping multi-
dimensional data operations to standard relational operations.

A Multi-dimensional OLAP (MOLAP) model is a specialized server designed to directly


implement multi-dimensional data structures and operations.

Top Tier :
The top tier is a front-end client layer, which contains :

Query and reporting tools.

Analysis tools.

Data mining tools.

Data Warehouse Models :


From the architecture point of view, there are three data warehouse models:

The Enterprise Warehouse.

Data Mining 2
The Data Mart.

The Virtual Warehouse.

Enterprise Warehouse :

It collects all of the information about subjects spanning the entire organization.

It provides corporate-wide data integration.

It typically contains detailed data as well as summarized data and can range in size from a
few gigabytes to hundreds of gigabytes, terabytes or beyond.

An Enterprise data warehouse may be implemented on :

Traditional mainframes.

Computer super servers.

Parallel architecture platforms.

Data Mart :

It contains a subset of corporate-wide data that is of value to a specific group of users.

The scope is confined to specific selected subjects.

Example : A marketing data mart may confine its subjects to customer, item, and sales.

The data contained in data mart tend to be summarized.

Usually implemented on low-cost departmental servers that are UNIX/LINUX or Window-


based.

The implementation cycle of data mart is more likely to be measured in weeks rather than
months or years.

Virtual Warehouse :

A virtual warehouse is a set of views over operational database.

For efficient query processing, only some of the possible summary views may be
materialized.

It is easy to build but requires excess capacity on operational database server.

Data Mining 3

You might also like