Introduction To Informatica Powercenter
Introduction To Informatica Powercenter
Data Warehousing
Data warehousing is the entire process of data extraction, transformation, and loading of data to the warehouse and the access of the data by end users and applications
Data Mart
A data mart stores data for a limited number of subject areas, such as marketing and sales data. It is used to support specific applications. An independent data mart is created directly from source systems. A dependent data mart is populated from a data warehouse.
Data Sources
Transaction Data Prod
ETL Software
S T A G I N G A R E A O P E R A T I O N A L D A T A
Data Stores
Users
IBM
SQL
Mkt
IMS
Ascential
ANALYSTS
Cognos Teradata IBM Load Informatica Data Warehouse Data Marts Finance Essbase Marketing Meta Data Queries,Reporting, DSS/EIS, Data Mining EXECUTIVES Micro Strategy Sales Microsoft Siebel Business Objects Web Browser CUSTOMERS/ SUPPLIERS SAS MANAGERS
HR
VSAM
Fin
Oracle
Extract
Acctg
Syba se
SAP
Sagent
Infor mix
SAS
External Data
Demographic
HarteHanks
S T O R E
OPERATIONAL PERSONNEL
Often performed by COBOL routines (not recommended because of high program maintenance and no automatically generated meta data) Sometimes source data is copied to the target database using the replication capabilities of standard RDBMS (not recommended because of dirty data in the source systems) Increasing performed by specialized ETL software
Components Of Informatica
Repository Manager
Designer
Workflow Manager
Informatica Client. Use the Informatica Client to manage users, define sources and targets, build mappings and mapplets with the transformation logic, and create sessions to run the mapping logic. The Informatica Client has three client applications: Repository Manager, Designer, and Workflow Manager.
Informatica Server. The Informatica Server extracts the source data, performs the data transformation, and loads the transformed data into the targets.
Architecture
Process Flow
Informatica Server moves the data from source to target based on the workflow and metadata stored in the repository. A workflow is a set of instructions how and when to run the task related to ETL. Informatica server runs workflow according to the conditional links connecting tasks. Session is type of workflow task which describes how to move the data between source and target using a mapping. Mapping is a set of source and target definitions linked by transformation objects that define the rules for data transformation.
Sources
Power Mart and Power Center access the following sources:
Relational. Oracle, Sybase, Informix, IBM DB2, Microsoft SQL Server, and Teradata. File. Fixed and delimited flat file, COBOL file, and XML. Extended. If you use Power Center, you can purchase additional Power Connect products to access business sources such as PeopleSoft, SAP R/3, Siebel, and IBM MQSeries. Mainframe. If you use Power Center, you can purchase Power Connect for IBM DB2 for faster access to IBM DB2 on MVS. Other. Microsoft Excel and Access.
Targets
Power Mart and Power Center can load data into the following targets:
Relational. Oracle, Sybase, Sybase IQ, Informix, IBM DB2, Microsoft SQL Server, and Teradata. File. Fixed and delimited flat files and XML. Extended. If you use Power Center, you can purchase an integration server to load data into SAP BW. You can also purchase Power Connect for IBM MQSeries to load data into IBM MQSeries message queues. Other. Microsoft Access.
You can load data into targets using ODBC or native drivers, FTP, or external loaders.
Step 2:Connecting to the repository from the designer. importing source and target tables , creation of mappings. Step 3 : Creation of Workflow through workflow Manager which has different tasks connected between them. In that ,session is the task which is pointing to a mapping created in the designer.
Repository
The Informatica repository is a set of tables that stores the metadata you create using the Informatica Client tools. You create a database for the repository, and then use the Repository Manager to create the metadata tables in the database. You add metadata to the repository tables when you perform tasks in the Informatica Client application such as creating users, analyzing sources, developing mappings or mapplets, or creating sessions. The Informatica Server reads metadata created in the Client application when you run a session. The Informatica Server also creates metadata such as start and finish times of a session or session status.
Contd :-
Repository Contd..
When you use Power Center, you can develop global and local repository to share metadata: Global repository. The global repository is the hub of the domain. Use the global repository to store common objects that multiple developers can use through shortcuts. These objects may include operational or application source definitions, reusable transformations, mapplets, and mappings. Local repositories. A local repository is within a domain that is not the global repository. Use local repositories for development. From a local repository, you can create shortcuts to objects in shared folders in the global repository. These objects typically include source definitions, common dimensions and lookups, and enterprise standard transformations. You can also create copies of objects in nonshared folders.
Repository Architecture
Repository Client
Repository Server
---------------------------Repository Agent
Repository Database
Creating a Repository
To create Repository
1. Launch the Repository Manager by choosing Programs-Power Center (or Power Mart) Client-Repository Manager from the Start Menu. 2. In the Repository Manager, choose Repository-Create Repository. Note: You must be running the Repository Manager in Administrator mode to see the Create Repository option on the menu. Administrator mode is the default when you install the program. 3. In the Create Repository dialog box, specify the name of the new repository, as well as the parameters needed to connect to the repository database through ODBC.
Privileges. Repository-wide security that controls which task or set of tasks a single user or group of users can access. Examples of these are Use Designer, Browse repository , Session operator etc. Permissions. Security assigned to individual folders within the repository. You can perform various tasks for each privilege. Ex :- Read , Write and Execute.
Folders
Folders provide a way to organize and store all metadata in the repository, including mappings, schemas, and sessions. Folders are designed to be flexible, to help you organize your data warehouse logically. Each folder has a set of properties you can configure to define how users access the folder. For example, you can create a folder that allows all repository users to see objects within the folder, but not to edit them. Or you can create a folder that allows users to share objects within the folder.
Shared Folders
When you create a folder, you can configure it as a shared folder. Shared folders allow users to create shortcuts to objects in the folder. If you have reusable transformation that you want to use in several mappings or across multiple folders, you can place the object in a shared folder. For example, you may have a reusable Expression transformation that calculates sales commissions. You can then use the object in other folders by creating a shortcut to the object.
Folder Permissions
Permissions allow repository users to perform tasks within a folder. With folder permissions, you can control user access to the folder, and the tasks you permit them to perform. Folder permissions work closely with repository privileges. Privileges grant access to specific tasks while permissions grant access to specific folders with read, write, and execute qualifiers.
However, any user with the Super User privilege can perform all tasks across all folders in the repository. Folders have the following types of permissions:
Read permission. Allows you to view the folder as well as objects in the folder. Write permission. Allows you to create or edit objects in the folder. Execute permission. Allows you to execute or schedule a session or batch in the folder.
Creating Folders
Questions/Comments
Designer
Importing Sources
Creating Targets
You can create target definitions in the Warehouse Designer for file and relational sources. Create definitions in the following ways: Import the definition for an existing target. Import the target definition from a relational target. Create a target definition based on a source definition. Drag one of the following existing source definitions into the Warehouse Designer to make a target definition: o Relational source definition o Flat file source definition o COBOL source definition
Manually create a target definition. Create and design a target definition in the Warehouse Designer.
Creating targets
Contd..
Transformations
Transformations
A transformation is a repository object that generates, modifies, or passes data The Designer provides a set of transformations that perform specific functions Data passes into and out of transformations through ports that you connect in a mapping or mapplet Transformations can be active or passive
Transformations
Active transformations Aggregator Filter Router Joiner Source qualifier
performs aggregate calculations serves as a conditional filter serves as a conditional filter (more than one filters) allows for heterogeneous joins represents all data queried from the source
Passive transformations Expression performs simple calculations Lookup looks up values and passes to other objects Sequence generator generates unique ID values Stored procedure calls a stored procedure and captures return values Update strategy allows for logic to insert, update, delete, or reject data
Transformations Contd..
Create the transformation. Create it in the Mapping
Designer as part of a mapping, in the Mapplet Designer as part of a Mapplet, or in the Transformation Developer as a reusable transformation.
Configure the transformation. Each type of transformation has a unique set of options that you can configure. Connect the transformation to other transformations and target definitions. Drag one port to another to connect them in the mapping or Mapplet.
Expression Transformation
You can use the Expression transformations to calculate
Expression Transformation
Calculating Values
To use the Expression transformation to calculate values for a single
row, you must include the following ports: Input or input/output ports for each value used in the calculation. For example, when calculating the total price for an order, determined by multiplying the unit price by the quantity ordered, the input or input/output ports. One port provides the unit price and the other provides the quantity ordered. Output port for the expression. You enter the expression as a configuration option for the output port. The return value for the output port needs to expression. match the return value of the
Variable Port : Variable Port is used like local variable inside Expression Transformation , which can be used in other calculations
Description
Defines a custom query that replaces the default query the Informatica Server uses to read data from sources represented in this Source Qualifier Specifies the condition used to join data from multiple sources represented in the same Source Qualifier transformation Specifies the filter condition the Informatica Server applies when querying records.
Indicates the number of columns used when sorting records queried from relational sources. If you select this option, the Informatica Server adds an ORDER BY to the default query when it reads source records. The ORDER BY includes the number of ports specified, starting from the top of the Source Qualifier. When selected, the database sort order must match the session sort order.
Tracing Level
Sets the amount of detail included in the session log when you run a session containing this transformation. Specifies if you want to select only unique records. The Informatica Server includes a SELECT DISTINCT statement if you choose this option.
Select Distinct
Joiner Transformation
While a Source Qualifier transformation can join data originating from a common source database, the Joiner transformation joins two related heterogeneous sources residing in different locations or file systems. The combination of sources can be varied. You can use the following sources:
Two relational tables existing in separate databases Two flat files in potentially different file systems Two different ODBC sources Two instances of the same XML source A relational table and a flat file source A relational table and an XML source
If two relational sources contain keys, then a Source Qualifier transformation can easily join the sources on those keys. Joiner transformations typically combine information from two different sources that do not have matching keys, such as flat file sources. The Joiner transformation allows you to join sources that contain binary data.
Cache Directory
Specifies the directory used to cache master records and the index to these records. By default, the caches are created in a directory specified by the server variable $PMCacheDir. If you override the directory, be sure there is enough disk space on the file system. The directory can be a mapped or mounted drive. Specifies the type of join: Normal, Master Outer, Detail Outer, or Full Outer.
Join Type
Lookup Transformation
Used to look up data in a relational table, view, synonym or Flat File. It compares Lookup transformation port values to lookup table column values based on the lookup condition. Connected Lookups Receives input values directly from another transformation in the pipeline For each input row, the Informatica Server queries the lookup table or cache based on the lookup ports and the condition in the transformation Passes return values from the query to the next transformation Un Connected Lookups Receives input values from an expression using the :LKP (:LKP.lookup_transformation_name (argument, argument, ...)) reference qualifier to call the lookup and returns one value. With unconnected Lookups, you can pass multiple input values into the transformation, but only one column of data out of the transformation
Lookup Transformation
lookups.
You
can
configure
the
transformation
to
be
connected
or
unconnected, cached or uncached: Connected or unconnected. Connected and unconnected transformations receive input and send output in different ways. Cached or uncached. Sometimes you can improve session performance by caching the lookup table. If you cache the lookup table, you can choose to use a dynamic or static cache. By default, the lookup cache remains static and does not change during the session. With a dynamic cache, the Informatica Server inserts rows into the cache during the session. Informatica recommends that you cache the target table as the lookup. This enables you to look up values in the target and insert them if they do not exist.
Unconnected lookup
Receives input values from the result of LKP expression within other transformation. U can use a static cache. Cache includes all lookup out put ports. Does not support user defined default values
Static Cache
1) U can not insert or update the cache 2) The Informatica Server does not update the cache while it processes the Lookup transformation
Dynamic Cache
U can insert rows into the cache as u pass to the target
The Informatica Server dynamically inserts data into the lookup cache and passes data to the target table.
Within a session. When you configure a session, you can instruct the Informatica Server to either treat all records in the same way (for example, treat all records as inserts), or use instructions coded into the session mapping to flag records for different database operations. Within a mapping. Within a mapping, you use the Update Strategy transformation to flag records for insert, delete, update, or reject.
Setting
Insert
Description
Treat all records as inserts. If inserting the record violates a primary or foreign key constraint in the database, the Informatica Server rejects the record.
Treat all records as deletes. For each record, if the Informatica Server finds a corresponding record in the target table (based on the primary key value), the Informatica Server deletes it. Note that the primary key constraint must exist in the target definition in the repository. Treat all records as updates. For each record, the Informatica Server looks for a matching primary key value in the target table. If it exists, the Informatica Server updates the record. Again, the primary key constraint must exist in the target definition. The Informatica Server follows instructions coded into Update Strategy transformations within the session mapping to determine how to flag records for insert, delete, update, or reject. If the mapping for the session contains an Update Strategy transformation, this field is marked Data Driven by default. If you do not choose Data Driven setting, the Informatica Server ignores all Update Strategy transformations in the mapping.
Delete
Update
Data Driven
setting you choose depends on your update strategy and the status of data in target tables:
Setting
Use To Populate the target tables for the first time, or maintaining a historical data warehouse. In the latter case, you must set this strategy for the entire data warehouse, not just a select group of target tables. Clear target tables. Update target tables. You might choose this setting whether your data warehouse contains historical data or a snapshot. Later, when you configure how to update individual target tables, you can determine whether to insert updated records as new records or use the updated information to modify existing records in the target. Exert finer control over how you flag records for insert, delete, update, or reject. Choose this setting if records destined for the same table need to be flagged on occasion for one operation (for example, update), or for a different operation (for example, reject). In addition, this setting provides the only way you can flag records for reject.
Insert
Delete
Update
Data Driven