Easy Approach To A

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Easy Approach to Informatica

Author: subrata_sahana

Date Written: 20th June, 2003.

Declaration: I hereby declare that this document is based on my project experience. To


the best of my knowledge, this document does not contain any material that infringes the
copyrights of any other individual or organization including the customers of Infosys.

Target Readers: All

Table of Contents

1. Introduction......................................................................................................................2
2. Component and Architecture...........................................................................................2
Source..........................................................................................................................2
Server...........................................................................................................................2
Target...........................................................................................................................2
Source data...................................................................................................................2
Transformed data.........................................................................................................2
Instructions from Metadata..........................................................................................2
Repository....................................................................................................................2
3. Informatica Design Process.............................................................................................3
4. Informatica Repository....................................................................................................4
5. Informatica Client............................................................................................................4
5.1 Repository Manager...................................................................................................4
5.1.1 Repository Security.............................................................................................5
5.2 Designer.....................................................................................................................6
5.2.1 Transformations..................................................................................................9
5.2.1.1 Aggregator Transformation..........................................................................9
5.2.1.2 Expression Transformation..........................................................................9
5.2.1.3 Filter Transformation...................................................................................9
5.2.1.4 Router Transformation.................................................................................9
5.2.1.5 Joiner Transformation .................................................................................9
5.2.1.6 Lookup Transformation...............................................................................9
5.2.1.7 Normalizer Transformation........................................................................10
5.2.1.8 Rank Transformation.................................................................................10
5.2.1.9 Sequence Generator Transformation..........................................................10
5.2.1.10 Source Qualifier Transformation.............................................................10
5.2.1.11 Update Strategy Transformation..............................................................10
5.3 Server Manager........................................................................................................11
5.3.1 Transformation Process.....................................................................................12
5.3.2 Sessions and Batches........................................................................................12
5.3.3 Session Log.......................................................................................................12
6. Connectivity Overview..................................................................................................12
7. Some Typical Troubleshooting......................................................................................13

1. Introduction
Informatica is an ETL tool that allows you to load data into a centralized location, such as
datamart, data warehouse or operational data store.

ETL Tool:
-Extract data from multiple sources
-Transform the data according to business logic and need
-Load the transformed data into file and relational targets

2. Component and Architecture


Informatica consists of the following integrated components:

 Informatica Repository: Informatica Repository is the center of Informatica.


You create a set of metadata tables within repository database that the Informatica
application and tools access. Informatica Client and Server access the repository
to save and access metadata.
 Informatica Client: Informatica Client is used to manage users, define sources
and targets, build mappings and mapplets with the transformation logic, and
create sessions to run the mapping logic. Informatica client consists of Repository
Manager, Designer and Server Manager.
 Informatica Server: Informatica Server extracts data from source, transforms
data and load-transformed data into targets.

The figure below illustrates the architecture of Informatica.


Source Server Target

Source data Transformed data

Instructions from Metadata

Repository
Sources
Informatica access the following sources:

 Relational - Oracle, Sybase, Informix, IBM DB2, Microsoft SQL Server and
Teradata.

 File - Fixed and delimited flat file, COBOL file and XML.

 Extended - PeopleSoft, SAP R/3, Sieble and IBM MQSeries (need to purchase
additional products for these sources).

 Mainframe - Need to purchase additional products.

 Other - Microsoft Excel and Access.


Targets
Informatica can load data into following targets:

 Relational - Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL
Server and Teradata.

 File - Fixed and delimited flat files and XML.

 Extended - SAP BW and IBM MQSeries (need to purchase additional products


for these targets).

 Other - Microsoft Access.

3. Informatica Design Process


Informatica design process mainly consists of five different steps:
1. Create Repository – Repository will hold all metadata and thus drive extraction
and transformation process of Informatica.
2. Import Source Definitions – Source Analyzer in Designer is used to import or
create source definitions.
3. Create Target Schema – Warehouse Designer in Designer is used to import or
create target definitions.
4. Create Mappings – Mapping Designer in Designer is used to link source to target
with the required transformations.
5. Load Data – Server Manager is used to create and schedule sessions and batches
to run the mappings. Based on the information in transformation and repository
metadata Informatica Server loads data into targets.

4. Informatica Repository
Informatica Repository is a set of tables that stores metadata created while using
Informatica Client tools. A database is required to create a repository. The following
database platforms can be used to create Informatica Repository –
• IBM DB2
• Informix
• Microsoft SQL Server
• Oracle
• Sybase

There are three different types of repositories – standalone, global and local.
Standalone repository: A repository that functions individually, unrelated and
unconnected to other repositories.
Global repository: A centralized repository in a domain. The global repository is used to
store common objects that can be used by many people through shortcuts. These objects
may be source definitions, reusable transformation, mappings and mapplets.
Local repository: The repository in a domain that is not global repository. Local
repository is used for development. From local repository, shortcuts to objects in shared
folders in global repository can be created.

5. Informatica Client
Informatica client comprised of three applications:
• Repository Manager – Repository Manager is used to create and administer
Navigator
metadata Window
in the repository. Main Window
• Designer – Designer is used to create mappings that contain transformations
instruction for Informatica Server.
• Server Manager – Server Manager is used create, schedule and monitor sessions.

5.1 Repository Manager


Repository Manager allows creating and administering one or more repositories.
Repository Manager consists of four windows.

Dependency Window

Output Window
Navigator Window displays all objects that are created in Repository Manager,
Designer and Server Manager.
Main Window displays properties of object selected in Navigator Window.
Dependency Window displays dependencies on sources, target and mappings of
the object selected in Navigator Window or Main Window.
Output Window provides output of the processes executed in Repository
Manager.

5.1.1 Repository Security


The Informatica Client, Server, and Repository offer several layers of security.

The following are some important points related to repository security:


• When a repository is created two default user groups are created automatically
– Administrators and Public. These two groups cannot be deleted or their
privileges cannot be changed.
• Repository Manager automatically creates two users in Administrators group
– Administrator and database username used to create repository. These two
users cannot be deleted or cannot be removed from Administrators group.
• Repository Manager does not create any default user for Public group.
• Each repository user must be assigned to at least one group. User receives all
group privileges, inherits any changes to group privileges, and loses and gains
privileges if you change the user group membership.
• A user can create or delete group (except default groups) if the user has
Administer repository or Super User privileges.
• If a group is deleted which has users then those users are assigned to Public
group.
• A user with Administer Repository or Super User privileges can edit any
user’s properties except for Administrator, default database user and cannot
change the user name.
• A user can edit his/her own password if user has Browse Repository privilege.
• A user with Administer Repository or Super User privileges can edit any
user’s password.
• A user with Administer Repository or Super User privileges can change the
privileges of other users (except Administrator and default database user) or
group. Users individually granted privileges have to be revoked individually.
• A user can have three different types of permissions in a folder – read, write
and execute.
• A user can change folder permissions if the user has Super User privilege,
Administer Repository privilege with read permission in folder or Browse
Repository privilege as folder owner with read permission.
• If a user is working on an object, repository locks that object so that another
user does not work on the same object simultaneously.
• A user with Browse Repository or Administer Repository privilege with read
permission can unlock objects locked by his/her username.
• A user with Super User privilege can unlock any lock in the repository.

5.2 Designer
Designer helps to create source definitions, target definitions and transformations
to build mappings. Designer consists of four windows:

Overview Window

Workspace
Navigator
Workbook Tabs
Output Window

Status Bar
Navigator Window is used to connect and work in different repositories and
folders.
Workspace is used to view and edit sources, targets, transformations, mapplets
and mappings.
Output Window provides details when some tasks are performed, such as saving
or validating a mapping.
Overview Window is used for viewing workbook containing large mappings or
large number of objects.
Status bar displays the status of the operation performed.

Designer consists of five tools:


o Source Analyzer: Use to import or create source definitions.
o Warehouse Designer: Use to import or create target definitions.
o Transformation Developer: Use to create reusable transformations.
o Mapplet Designer: Use to create mapplets (a set of transformations that
can be used in multiple mappings).
o Mapping Designer: Use to create mappings.
5.2.1 Transformations
A transformation is a repository object, which generates, modifies or passes data.
There are many types of transformations that can be incorporated in a mapping to
process data. The brief descriptions of some frequently used transformations are
given below:

5.2.1.1 Aggregator Transformation


The Aggregator transformation allows performing aggregate calculation, such as
average and sum. The Aggregator transformation is unlike Expression
transformation because former can be used to perform calculation on groups
whereas later can be used to perform calculation on row-by-row basis.

5.2.1.2 Expression Transformation


The Expression transformation is used to calculate value in single row before
writing to target. This transformation can be used to perform non-aggregate
calculations.

5.2.1.3 Filter Transformation


The Filter transformation provides the means for filtering records in a mapping.
All the rows from a source transformation are passed through the filter
transformation, then a filter condition is entered .All the ports are input/output
ports and only records that meet the condition pass through the Filter
transformation. This transformation is used to eliminate all unwanted records
from being processed.

5.2.1.4 Router Transformation


The Router transformation is similar to Filter transformation. A Filter
transformation can test data for one condition and drops all rows that do not meet
the condition. A Router transformation can test data for more than one condition
and the rows that do not meet any of the conditions can be route through default
output group. If same input data need to be tested against many conditions then
use router instead of using multiple filter transformation.

5.2.1.5 Joiner Transformation


Source Qualifier can join data originating from a common source database but
joiner transformation joins two related heterogeneous sources residing in different
locations or file systems. The joiner transformation is used to join two sources
with at least one matching port of data. The joiner transformation uses a condition
that matches one or more pairs of ports between the two sources.

5.2.1.6 Lookup Transformation


The Lookup transformation is used to access data from any relational database to
which both Informatica Client and Server can connect. A mapping can contain
multiple lookups.

The lookup transformation can be used to perform tasks like:


o Perform a calculation
o Update slowly changing dimension tables
o Take into account integrity constraints in tables

5.2.1.7 Normalizer Transformation


The Normalizer transformation is used to organize the data. In sources like
COBOL normalizer is used instead of source qualifier. With Normalizer repeated
data in a record can be broken into separate records. For each new record it
creates, the normalizer generates a unique identifier. This key value can be used to
join the normalized records. A normalizer transformation can also be used to
create multiple rows from a single row of data.

5.2.1.8 Rank Transformation


The Rank transformation allows selecting only the top or bottom rank of data. A
rank transformation can be used to return the largest or smallest value in a port or
group. The rank transformation differs from transformation functions MAX and
MIN, as it allows selecting a group of top or bottom values and not just one value.

5.2.1.9 Sequence Generator Transformation


Sequence Generator transformation generates numeric values that can be used to
create primary key values, to replace missing primary keys, or to cycle through a
sequential range of numbers. The sequence generator transformation is a
connected transformation, which contains two output ports that can connect to one
or more transformations. The Informatica Server generates a value each time a
row enters a connected transformation. Sequence generator can be made reusable,
and can be used in multiple mappings for multiple loads on a single target.

5.2.1.10 Source Qualifier Transformation


The Source Qualifier transformation is used to connect a relational or flat file
source. The source qualifier represents the records that the Informatica Server
reads when it runs a session.
Source Qualifier can be used to perform following task:
o Join the data originating from same source database
o Filter records when the Informatica Server reads source data
o Specify an outer join rather than the default inner join
o Select distinct values from the source
o Specify sorted ports
o Create a custom query to issue a special SELECT statement for the
Informatica Server to read source data.

5.2.1.11 Update Strategy Transformation


The Update Strategy transformation is used to implement the logic to insert,
update, delete and reject data in target tables. Update Strategy can be set at two
different levels:
o Within a session: This can be achieved by instructing the Informatica
Server to treat all rows in the same way or use the instruction coded in the
session mapping to flag the records for different database operation.
o Within a mapping: An update strategy transformation can be used to flag
records for insert, delete, update or reject.

5.3 Server Manager


Server Manager allows creating session, monitoring session, tuning session,
running session and configuring server.

Server Manager consists of the Navigator window, Configure window, Monitor


window and Output window.
Navigator window is used to view and select configured sessions.
Configure window is used to create and edit session.
Monitor window is used to view information about running and completed
sessions and batches.
Output window is for viewing messages from Informatica Server.

Navigator

Configure Window

Monitor Window

Output
Window
5.3.1 Transformation Process
A transformation to take place Informatica Server carries out the following steps:
o Reads information from the Repository.
o Extracts data from the Sources and stores the data in memory while it
applies the transformation rules you created.
o Loads the transformed data into the mapping targets.

5.3.2 Sessions and Batches

Session: A session is a set of instructions that tell the Informatica Server how and
when to move data from sources to targets.

Batches: A group of sessions, which are to be run together. Batches provide a


way to group sessions for either serial or parallel execution by the Informatica
Server. There are two types of batches:

o Sequential: Runs sessions one after the other.


o Concurrent: Runs sessions at the same time.

Once a session or batch is created, the Server Manager or the command line
program pmcmd can be used to start or stop the session or batch.

5.3.3 Session Log


The Informatica server creates session log files for each session it runs. The
session log file contains information about all tasks Informatica Server performs.
The amount of detail in the session log file depends upon the tracing level set by
the user. Error tracing level can be defined per transformation or for the entire
session. By default the Informatica server saves the session log in the directory for
Informatica session variable $PMSessionLogDir which can be defined in the
server manager properties. The default name for session log is session_name.log.

6. Connectivity Overview
7. Some Typical Troubleshooting
 Problem encountered in saving a mapping and status bar shows message "Run out
of locks" contact repository database administrator. This is a problem in database
side.
 Informatica Client hangs during login, even if correct user id and password are
entered, and status bar show "Connecting to repository", contact repository
database administrator as database may have run out of space.
 If some other user id has obtained lock on your session and you are not
administrator, to run your mapping create another session with different name and
run it. Ask administrator to release the lock.

You might also like