0% found this document useful (0 votes)
102 views10 pages

Unit - I Data Extraction, Cleanup, and Transformation Tools

The document discusses various tools used for data extraction, cleanup, and transformation in a data warehouse. It describes code generators that create transformation programs, database replication tools that replicate and transform data between systems, and rule-driven engines that extract, transform, and load data into data marts based on user-defined rules. These tools aid in cleaning, consolidating, and preparing data from different sources for analytical uses in a data warehouse.

Uploaded by

Kalaivani D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views10 pages

Unit - I Data Extraction, Cleanup, and Transformation Tools

The document discusses various tools used for data extraction, cleanup, and transformation in a data warehouse. It describes code generators that create transformation programs, database replication tools that replicate and transform data between systems, and rule-driven engines that extract, transform, and load data into data marts based on user-defined rules. These tools aid in cleaning, consolidating, and preparing data from different sources for analytical uses in a data warehouse.

Uploaded by

Kalaivani D
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

UNIT – I

Data Extraction, Cleanup, and


Transformation Tools
Data Extraction, Cleanup & Transformation
Tools
• The task of capturing data from a source data system,
cleaning and transforming it and then loading the results into
a target data system can be carried out either by separate
products, or by a single integrated solution. More
contemporary integrated solutions can fall into one of the
categories described below:
– Code Generators
– Database data Replications
– Rule-driven Dynamic Transformation Engines (Data Mart
Builders)
Sourcing, Acquisition, Cleanup and
Transformation Tools
• A significant portion of the data warehouse implementation
effort is spent extracting data from operational systems and
putting it in a format suitable for informational applications
that will run off the data warehouse
• The data sourcing, cleanup, transformation, and migration
tools perform all of the conversions, summarizations, key
changes, structural changes, and condensations needed to
transform disparate data into information that can be used by
the decision support tool
Sourcing, Acquisition, Cleanup and
Transformation Tools
• The functionality includes:
– Removing unwanted data from operational databases
– Converting to common data names and definitions
– Calculating summaries and derived data
– Establishing defaults for missing data
– Accommodating source data definition changes
• The data sourcing, cleanup, extract, transformation and
migration tools have to deal with some significant issues, as
follows:
– Database heterogeneity.
– Data heterogeneity.
Access Tools
• The principal purpose of data warehouse is to provide
information to business users for strategic decision making.
• These users interact with the data warehouse using front-end
tool.
• Many of these tools require an information specialist, a
domain expert, who can analyze the information and can
interact with the data warehousing environment in order to
reach meaningful conclusions.
• This is especially true for data mining tools when defining the
problem, configuring the tool, and analyzing the results.
Data Mining tools
• Most organizations engage in data mining to do the same following:
– Discovering knowledge: segmentation, classification, association and
preferencing.
– Visualizing Data
– Correct data
• The strategic value of data mining is time-sensitive, especially in the retail,
marketing and finance sectors of the industry
• Using data mining to build predictive models in decision making has
several benefits.
– A model should explain why a particular decision was made
– Adjusting a model based on feedback from future decisions will lead to
experience accumulation and true organizational learning.
– Finally, a predictive model can be used to automate a decision step in a larger
process.
Code Generator

– It creates 3GL/4GL transformation programs based on source


and target data definitions, and data transformation and
enhancement rules defined by the developer.
– This approach reduces the need for an organization to write its
own data capture, transformation, and load programs. These
products employ DML Statements to capture a set of the data
from source system.
– These are used for data conversion projects, and for building
an enterprise-wide data warehouse, when there is a significant
amount of data transformation to be done involving a variety
of different flat files, non-relational, and relational data
sources.
Database Data Replication Tools

– These tools employ database triggers or a recovery log to


capture changes to a single data source on one system and apply
the changes to a copy of the data source data located on a
different system.
– Most replication products do not support the capture of changes
to non-relational files and databases, and often do not provide
facilities for significant data transformation and enhancement.
– These point-to-point tools are used for disaster recovery and to
build an operational data store, a data warehouse, or a data
mart when the number of data sources involved are small and a
limited amount of data transformation and enhancement is
required.
Rule-driven Dynamic Transformation Engines

– They are also known as Data Mart Builders and capture data from a
source system at User-defined intervals, transform data, and then send
and load the results into a target environment, typically a data mart.
– To date most of the products of this category support only relational
data sources, though now this trend have started changing.
– Data to be captured from source system is usually defined using query
language statements, and data transformation and enhancement is
done on a script or a function logic defined to the tool.
– With most tools in this category, data flows from source systems to
target systems through one or more servers, which perform the data
transformation and enhancement. These transformation servers can
usually be controlled from a single location, making the job of such
environment much easier.
Thank you

You might also like