Informatica: Source and Target
Informatica: Source and Target
Informatica: Source and Target
In Informatica, all the Metadata information about source systems, target systems and
transformations are stored in the Informatica repository. Informatica's Power Center Client and
Repository Server access this repository to store and retrieve metadata.
Note: To know more about Metadata and its significance, please click here.
• Repository: This is where all the metadata information is stored in the Informatica suite. The
Power Center Client and the Repository Server would access this repository to retrieve, store and
manage metadata.
• Power Center Client: Informatica client is used for managing users, identifiying source and
target systems definitions, creating mapping and mapplets, creating sessions and run workflows etc.
• Repository Server: This repository server takes care of all the connections between the
repository and the Power Center Client.
• Power Center Server: Power Center server does the extraction from source and then loading
data into targets.
• Designer: Source Analyzer, Mapping Designer and Warehouse Designer are tools reside within
the Designer wizard. Source Analyzer is used for extracting metadata from source systems.
Mapping Designer is used to create mapping between sources and targets. Mapping is a pictorial
representation about the flow of data from source to target.
Warehouse Designer is used for extracting metadata from target systems or metadata can be created
in the Designer itself.
• Data Cleansing: The PowerCenter's data cleansing technology improves data quality by
validating, correctly naming and standardization of address data. A person's address may not be same
in all source systems because of typos and postal code, city name may not match with address. These
errors can be corrected by using data cleansing process and standardized data can be loaded in target
systems (data warehouse).
• Transformation: Transformations help to transform the source data according to the
requirements of target system. Sorting, Filtering, Aggregation, Joining are some of the examples of
transformation. Transformations ensure the quality of the data being loaded into target and this is
done during the mapping process from source to target.
• Workflow Manager: Workflow helps to load the data from source to target in a sequential
manner. For example, if the fact tables are loaded before the lookup tables, then the target system
will pop up an error message since the fact table is violating the foreign key validation. To avoid this,
workflows can be created to ensure the correct flow of data from source to target.
• Workflow Monitor: This monitor is helpful in monitoring and tracking the workflows created in
each Power Center Server.
• Power Center Connect: This component helps to extract data and metadata from ERP systems
like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.
• Power Center Exchange: This component helps to extract data and metadata from ERP
systems like IBM's MQSeries, Peoplesoft, SAP, Siebel etc. and other third party applications.
Power Channel:
This helps to transfer large amount of encrypted and compressed data over LAN, WAN, through
Firewalls, tranfer files over FTP, etc.
Power Analyzer:
Power Analyzer provides organizations with reporting facilities. PowerAnalyzer makes accessing,
analyzing, and sharing enterprise data simple and easily available to decision makers.
PowerAnalyzer enables to gain insight into business processes and develop business intelligence.
With PowerAnalyzer, an organization can extract, filter, format, and analyze corporate
information from data stored in a data warehouse, data mart, operational data store, or
otherdata storage models. PowerAnalyzer is best with a dimensional data warehouse in a
relational database. It can also run reports on data in any table in a relational database that do
not conform to the dimensional model.
Super Glue:
Superglue is used for loading metadata in a centralized place from several sources. Reports can
be run against this superglue to analyze meta data.
Power Mart:
Power Mart is a departmental version of Informatica for building, deploying, and managing data
warehouses and data marts. Power center is used for corporate enterprise data warehouse and
power mart is used for departmental data warehouses like data marts. Power Center supports
global repositories and networked repositories and it can be connected to several sources. Power
Mart supports single repository and it can be connected to fewer sources when compared to
Power Center. Power Mart can extensibily grow to an enterprise implementation and it is easy
for developer productivity through a codeless environment.
Note:This is not a complete tutorial on Informatica. We will add more Tips and Guidelines on
Informatica in near future. Please visit us soon to check back. To know more about Informatica,
contact its official website www.informatica.com.
Active Transformation
An active transformation can change the number of rows that pass through it from source to
target i.e it eliminates rows that do not meet the condition in transformation.
Passive Transformation
A passive transformation does not change the number of rows that pass through it i.e it passes
all rows through the transformation.
Connected Transformation
Connected transformation is connected to other transformations or directly to target table in the
mapping.
UnConnected Transformation
An unconnected transformation is not connected to other transformations in the mapping. It is
called within another transformation, and returns a value to that transformation.
• Aggregator Transformation
• Expression Transformation
• Filter Transformation
• Joiner Transformation
• Lookup Transformation
• Normalizer Transformation
• Rank Transformation
• Router Transformation
• Sequence Generator Transformation
• Stored Procedure Transformation
• Sorter Transformation
• Update Strategy Transformation
• XML Source Qualifier Transformation
• Advanced External Procedure Transformation
• External Transformation
Aggregator Transformation
Aggregator transformation is an Active and Connected transformation. This transformation is
useful to perform calculations such as averages and sums (mainly to perform calculations on
multiple rows or groups). For example, to calculate total of daily sales or to calculate average of
monthly or yearly sales. Aggregate functions such as AVG, FIRST, COUNT, PERCENTILE, MAX,
SUM etc. can be used in aggregate transformation.
Expression Transformation
Expression transformation is a Passive and Connected transformation. This can be used to
calculate values in a single row before writing to the target. For example, to calculate discount of
each product or to concatenate first and last names or to convert date to a string field.
Filter Transformation
Filter transformation is an Active and Connected transformation. This can be used to filter rows
in a mapping that do not meet the condition. For example, to know all the employees who are
working in Department 10 or to find out the products that falls between the rate category $500
and $1000.
Joiner Transformation
Joiner Transformation is an Active and Connected transformation. This can be used to join two
sources coming from two different locations or from same location. For example, to join a flat
file and a relational source or to join two flat files or to join a relational source and a XML source.
In order to join two sources, there must be atleast one matching port. at least one matching
port. While joining two sources it is a must to specify one source as master and the other as
detail.
The Joiner transformation supports the following types of joins:
• Normal
• Master Outer
• Detail Outer
• Full Outer
Normal join discards all the rows of data from the master and detail source that do not match,
based on the condition.
Master outer join discards all the unmatched rows from the master source and keeps all the
rows from the detail source and the matching rows from the master source.
Detail outer join keeps all rows of data from the master source and the matching rows from
the detail source. It discards the unmatched rows from the detail source.
Full outer join keeps all rows of data from both the master and detail sources.
Lookup Transformation
Lookup transformation is Passive and it can be both Connected and UnConnected as well. It is
used to look up data in a relational table, view, or synonym. Lookup definition can be imported
either from source or from target tables.
For example, if we want to retrieve all the sales of a product with an ID 10 and assume that the
sales data resides in another table. Here instead of using the sales table as one more source,
use Lookup transformation to lookup the data for the product, with ID 10 in sales table.
Connected lookup returns multiple columns from the same row whereas UnConnected lookup
has one return port and returns one column from each row.
Connected lookup supports user-defined default values whereas UnConnected lookup does not
support user defined values.
Normalizer Transformation
Normalizer Transformation is an Active and Connected transformation. It is used mainly with
COBOL sources where most of the time data is stored in de-normalized format. Also, Normalizer
transformation can be used to create multiple rows from a single row of data.
Rank Transformation
Rank transformation is an Active and Connected transformation. It is used to select the top or
bottom rank of data. For example, to select top 10 Regions where the sales volume was very
high or to select 10 lowest priced products.
Router Transformation
Router is an Active and Connected transformation. It is similar to filter transformation. The only
difference is, filter transformation drops the data that do not meet the condition whereas router
has an option to capture the data that do not meet the condition. It is useful to test multiple
conditions. It has input, output and default groups. For example, if we want to filter data like
where State=Michigan, State=California, State=New York and all other States. It’s easy to route
data to different tables.
It has two output ports to connect transformations. By default it has two fields CURRVAL and
NEXTVAL(You cannot add ports to this transformation). NEXTVAL port generates a sequence of
numbers by connecting it to a transformation or target. CURRVAL is the NEXTVAL value plus one
or NEXTVAL plus the Increment By value.
Sorter Transformation
Sorter transformation is a Connected and an Active transformation. It allows to sort data either
in ascending or descending order according to a specified field. Also used to configure for case-
sensitive sorting, and specify whether the output rows should be distinct.
External Procedure returns single value, whereas Advanced External Procedure returns
multiple values.
External Procedure supports COM and Informatica procedures whereas AEP supports only
Informatica Procedures.
Example: In order to store data, over the years, many application designers in each branch
have made their individual decisions as to how an application and database should be built. So
source systems will be different in naming conventions, variable measurements, encoding
structures, and physical attributes of data. Consider a bank that has got several branches in
several countries, has millions of customers and the lines of business of the enterprise are
savings, and loans. The following example explains how the data is integrated from source
systems to target systems.
In the aforementioned example, attribute name, column name, datatype and values are entirely
different from one source system to another. This inconsistency in data can be avoided by
integrating the data into a data warehouse with good standards.
In the above example of target data, attribute names, column names, and datatypes are
consistent throughout the target system. This is how data from various source systems is
integrated and accurately stored into the data warehouse.
Data warehouses and data marts are built on dimensional data modeling where fact tables are
connected with dimension tables. This is most useful for users to access data since a database
can be visualized as a cube of several dimensions. A data warehouse provides an opportunity for
slicing and dicing that cube along each of its dimensions.
Data Mart: A data mart is a subset of data warehouse that is designed for a particular line of
business, such as sales, marketing, or finance. In a dependent data mart, data can be derived
from an enterprise-wide data warehouse. In an independent data mart, data can be collected
directly from sources.
Hierarchy
A logical structure that uses ordered levels as a means of organizing data. A hierarchy can be
used to define data aggregation; for example, in a time dimension, a hierarchy might be used to
aggregate data from the Month level to the Quarter level, from the Quarter level to the Year
level. A hierarchy can also be used to define a navigational drill path, regardless of whether the
levels in the hierarchy represent aggregated totals or not.
Level
A position in a hierarchy. For example, a time dimension might have a hierarchy that represents
data at the Month, Quarter, and Year levels.
Fact Table
A table in a star schema that contains facts and connected to dimensions. A fact table typically
has two types of columns: those that contain facts and those that are foreign keys to dimension
tables. The primary key of a fact table is usually a composite key that is made up of all of its
foreign keys.
A fact table might contain either detail level facts or facts that have been aggregated (fact tables
that contain aggregated facts are often instead called summary tables). A fact table usually
contains facts with the same level of aggregation.
Example of Star Schema: Figure 1.6
In the example figure 1.6, sales fact table is connected to dimensions location, product, time
and organization. It shows that data can be sliced across all dimensions and again it is possible
for the data to be aggregated across multiple dimensions. "Sales Dollar" in sales fact table can
be calculated across all dimensions independently or in a combined manner which is explained
below.
Snowflake Schema
A snowflake schema is a term that describes a star schema structure normalized through the
use of outrigger tables. i.e dimension table hierachies are broken into simpler tables. In star
schema example we had 4 dimensions like location, product, time, organization and a fact
table(sales).
In Snowflake schema, the example diagram shown below has 4 dimension tables, 4 lookup
tables and 1 fact table. The reason is that hierarchies(category, branch, state, and month) are
being broken out of the dimension tables(PRODUCT, ORGANIZATION, LOCATION, and TIME)
respectively and shown separately. In OLAP, this Snowflake schema approach increases the
number of joins and poor performance in retrieval of data. In few organizations, they try to
normalize the dimension tables to save space. Since dimension tables hold less space,
Snowflake schema approach may be avoided.
Example of Snowflake Schema: Figure 1.7
Fact Table
The centralized table in a star schema is called as FACT table. A fact table typically has two
types of columns: those that contain facts and those that are foreign keys to dimension tables.
The primary key of a fact table is usually a composite key that is made up of all of its foreign
keys.