Etl Faq

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 20

Informatica FAQs

ET E

Frequently Asked Questions


What is an ETL process? The process of extracting data from the source systems and after meaningful transformations, loading into the data warehouse is called Extraction, Transformation and Loading. What is the need of Extracting Data? With the increasing competition, there is a need to obtain and analyze the business data faster. Data Warehouse reads data from multiple data sources. These databases are generally operational databases. What is the need of Transforming Data? Simply taking a copy of online transaction processing data and propagating it to a Decision Support System designed database is not enough to achieve information sharing. Data needs to be converted into more-descriptive, often summarized and aggregated information, and loaded into a data architecture meant to facilitate the sharing of information across the enterprise. What is the need of loading Data? We need to load our Data Warehouse so that it serves the purpose of allowing business analysis. The new data at the Data Warehouse is to be made available to the end user so as to facilitate reports generation for making business decisions. What are the techniques of Data Extraction? The following techniques are used for Data Extraction; Extracting data from an operational system and place the data into a file. Techniques, which extract data from an operational system and directly transport data into the target database. (A Data Warehouse or a Staging database)

What are the selection criteria for Data Transformation? The selection of Data is based on the following criteria; Separation/ Concatenation Normalization/ De-normalization Aggregation Conversion Algorithmic Conversion Conversion by Look up Enrichment

What are the Different types of tables in Data Warehouse? The different types of tables in Data Warehouse are; Dimension Fact Aggregate

Page 1 of 20

Informatica FAQs

What is the Sequence of Loading in Data warehouse? The Sequence of loading in Data Warehouse is as follows; Dimensions should be loaded first as all facts data depends on these dimension tables. After Dimensions tables are loaded the fact tables to be loaded next as the aggregation depends on the fact data. Aggregate tables can be loaded after facts are loaded.

What is the ETL Toolkit provided in Oracle 9i? Oracle 9I introduces several new features, which will change the way the data is loaded, and transformation is done. The venerable SQL *Loader utility in Oracle 9I meets powerful new features for extraction, transformation and loading. The ETL features which are interesting in Oracle 9I; External Tables Table Functions Merge Statement

Does Oracle 9i support pipelining in ETL operations? Yes. Oracle 9i data load can include multi table insert and upsert semantics. Oracle9i provides support for external tables, that quickly load the data into your database and a new data capture facility will allow incremental changes from target sources to be captured and applied to the data warehouse automatically. What are external tables? How to fetch flat file data from db without loading it? Oracle9i allows you read-only access to data in external tables. External tables are defined as tables that do not reside in the database, and can be in any format for which an access driver is provided. The CREATE TABLE ...ORGANIZATION EXTERNAL statement specifies metadata describing the external tables. What are Table Functions in Oracle 9i? A table function takes a set of row as input and returns a collection of records. Before creating the table function, some types need to be created. Table functions always returns a collection of records, so to begin, a table type is created that corresponds to the definition of the ultimate destination table. What are Merge Statements in Oracle 9i? The merge statement solves the long-standing problem of reloading data that were loaded previously. Prior to 9I procedural code were to be written to detect whether a row existed and based on that issue an Update or Insert operation. In Oracle 9I simply using the merge statement and let the database handle these details. What are the advantages of ETL implementation by writing a custom program? The advantages are The program is not that complex. Programmers are easily available.

Page 2 of 20

Informatica FAQs

What are the disadvantages of ETL implementation by writing a custom program? The Disadvantages are The functions to be supported by individual Data Marts cannot be predicted in advance. In a typical data mart most of the functionality are defined by end user after the data marts move into production. So the ETL programs have to be continually modified and so lot of rework. Metadata is not generated automatically in this process and so it is difficult to integrate data marts across the organizations. Hand coded ETL programs are likely to have a slower speed of execution and are typically single threaded.

What are the commonly used techniques for implementing Data Transformation? The commonly used techniques are; Transformation Flow Transformations provided by SQL*Loader Transformations using SQL and PL/SQL Data Substitution Key Lookups Pivoting

What are the criteria for selecting an Extraction, Transformation and Loading tool? ETL tool selection is the most important decision that has to be made when choosing the components of a data warehousing application. The ETL tool operates at the heart of the data warehouse, extracting data from multiple data sources, transforming the data to make it accessible to business analysis, and loading multiple target databases. There are two options for implementing an extraction, transformation, and loading processes: A custom program in COBOL, C, or PL/SQL to extract data from multiple source files, transform the data, and load the target databases. Purchase an off-the-shelf extraction/transformation/loading (ETL) tool.

Page 3 of 20

Informatica FAQs

Simple to complex Questions in informatica and d/w: What are the broad components of informatica (designer, wf manager, wf monitor, server manager) What are various transformations? How many kinds of ports are there? What are variable ports? How is joiner different from a SQ? What is a normalizer? How do u define external variables in a mapping ($$ variables)? In ver 6 can $$ variables be used in lookup? How can you generate sequence no. thru informatica ? If you are familier with ver6, can you tell about the upgrade-features? Usage of target override: how do I affect multiple rows in a target? What is a lookup? What is lkp override. Why do we use it? Why do we use lookup cache? How different is it, if we dont use it? What are connected and unconnected lookups, and why do we use them.. under which circumstances.. Syntax for unconnected lookup reference. What is sq-override. What are the commands in update strategy? (Dd_insert, udate and delete) what are pre-sql and post-sql queries in source qualifier.. when do we use router transformation? What is star schema? What are fact and dimension tables in d/w? How is d/w different from OLTP? How can you make a session appear successful, inspite of a task within it failing? What will happen if you increase the commit_interval in session.? In ver 6. If I want the informatica to behave as per the instructions in update strategy, what needs to Be taken care in the properties? (ans: treat the source rows as data driven) Difference between active and passive transformations. Qs on reusable components, mapplets etc. What are IPF files ? What is target load plan.. or ask similar question stressing on this..

Other conceptual questions


Differences between Active Transformation and Passive Transformation
Active Transformation An active transformation can change the number of rows that pass through it, such as a Filter transformation that removes rows that do not meet the filter condition. Advanced External Procedures Aggregator Application Source Qualifier Filter Transformation Joiner Transformation Normalizer Transformation Rank Transformation Router Transformation Sorter Transformation Source Qualifier Update Strategy Passive Transformation A passive transformation does not change the number of rows that pass through it, such as an Expression transformation that performs a calculation on data and passes all rows through the transformation. Expression External Procedures Input Transformation Lookup Transformation Output Transformation Sequence Generator Stored Procedure XML Source Qualifier

Page 4 of 20

Informatica FAQs

Differences between Parameter and Variable


Parameter A mapping parameter represents a constant value that you can define before running a session. A mapping parameter retains the same value throughout the entire session. When you use a mapping parameter, you declare and use the parameter in a mapping or mapplet. Then define the value of the parameter in a parameter file. During the session, the Informatica Server evaluates all references to the parameter to that value. Variable A mapping variable represents a value that can change through the session. The Informatica Server saves the value of a mapping variable to the repository at the end of each successful session run and uses that value the next time you run the session. When you use a mapping variable, you declare the variable in the mapping or mapplet, and then use a variable function in the mapping to automatically change the value of the variable. At the beginning of a session, the Informatica Server evaluates references to a variable to its start value. At the end of a successful session, the Informatica Server saves the final value of the variable to the repository. The next time you run the session, the Informatica Server evaluates references to the variable to the saved value. You can override the saved value by defining the start value of the variable in a parameter file. Use mapping variables to perform automatic incremental reads of a source. For example, suppose the customer accounts in the mapping parameter example, in parameter, are numbered from 001 to 065, incremented by one. Instead of creating a mapping parameter, you can create a mapping variable with an initial value of 001. In the mapping, use a variable function to increase the variable value by one. The first time the Informatica Server runs the session, it extracts the records for customer account 001. At the end of the session, it increments the variable by one and saves that value to the repository. The next time the Informatica Server runs the session, it automatically extracts the records for the next customer account, 002.

To reuse the same mapping to extract records for other customer accounts, you can enter a new value for the parameter in the parameter file and run the session. Or you can create a parameter file for each customer account and start the session with a different parameter file each time using pmcmd. By using a parameter file, you reduce the overhead of creating multiple mappings and sessions to extract transaction records for different customer accounts.

Differences Between Active Mapplets and Passive Mapplets


A mapplet can be active or passive depending on the transformations in the mapplet. Active mapplets contain one or more active transformations. Passive mapplets contain only passive transformations. As with an active transformation, you cannot concatenate data from an active mapplet with a different pipeline.

Differences Between Standard Validation and Extended Validation


Standard Validation Use standard validation to validate task instances and expressions in the workflow without validating nested worklets and worklet objects. When you use standard validation, the Workflow Manager does not validate reusable worklet objects used in the workflow. The Workflow Manager validates nonreusable worklet objects and reusable session instances if you have viewed or edited the session or worklet. The Workflow Manager validates the worklet object Extended Validation Use extended validation to validate reusable worklet instances, worklet objects, and all other nested worklets in the workflow. Extended validation validates all task instances (including sessions) and worklets, regardless of whether you have edited them. To use extended validation, choose Workflow-Extended Validate. If the workflow contains nested worklets, you can

Page 5 of 20

Informatica FAQs

using the same validation rules for workflows. The Workflow Manager validates the worklet instance by verifying attributes in the Parameter tab of the worklet instance. When you use standard validation, the Workflow Manager does not validate nested worklets or nonreusable worklets and sessions you have not edited.

select a worklet to validate the worklet and all other worklets nested under it. To validate a worklet and its nested worklets, right-click the worklet and choose Extended Validate. When you use extended validation, the Workflow Manager validates all tasks instances and nested worklets, regardless of whether you have edited them.

How to Increase Performance by Improving Network Speed?


The performance of the Informatica Server is related to network connections. A local disk can move data five to twenty times faster than a network. Consider the following options to minimize network activity and to improve Informatica Server performance. If you use flat file as a source or target in your session, you can move the files onto the Informatica Server system to improve performance. When you store flat files on a machine other than the Informatica Server, session performance becomes dependent on the performance of your network connections. Moving the files onto the Informatica Server system and adding disk space might improve performance. If you use relational source or target databases, try to minimize the number of network hops between the source and target databases and the Informatica Server. Moving the target database onto a server system might improve Informatica Server performance. When you run sessions that contain multiple partitions, have your network administrator analyze the network and make sure it has enough bandwidth to handle the data moving across the network from all partitions. You can run multiple PowerCenter Servers on separate systems against the same repository. Distributing the session load to separate PowerCenter Server systems increases performance. When all character data processed by the Informatica Server is 7-bit ASCII or EBCDIC, configure the Informatica Server to run in the ASCII data movement mode. In ASCII mode, the Informatica Server uses one byte to store each character. When you run the Informatica Server in Unicode mode, it uses two bytes for each character, which can slow session performance. Configure your system to use additional CPUs to improve performance. Additional CPUs allows the system to run multiple sessions in parallel as well as multiple pipeline partitions in parallel. However, additional CPUs might cause disk bottlenecks. To prevent disk bottlenecks, minimize the number of processes accessing the disk. Processes that access the disk include database functions and operating system functions. Parallel sessions or pipeline partitions also require disk access. You might want to increase system memory in the following circumstances: 1. 2. You run a session that uses large cached lookups. You run a session with many partitions.

If you cannot free up memory, you might want to add memory to the system.

If the source is a Flat File, how can I improve performance?


If you use flat file as a source or target in your session, you can move the files onto the Informatica Server system to improve performance. When you store flat files on a machine other than the Informatica Server, session performance becomes dependent on the performance of your network connections. Moving the files onto the Informatica Server system and adding disk space might improve performance.

Can we create a Lookup based on more than one table?

You can import a lookup table from the mapping source or target database, or you can import a lookup table from any database that both the Informatica Server and Client machine can connect to. If your mapping includes multiple sources or targets, you can use any of the mapping sources or mapping targets as the lookup table. The lookup table can be a single table, or you can join multiple tables in the same database using a lookup SQL override. The Informatica Server queries the lookup table or an in-memory cache of the table for all incoming rows into the Lookup transformation. Connect to the database to import the lookup table definition. The Informatica Sever can connect to a lookup table using a native database driver or an ODBC driver. However, the native database drivers improve session performance.

Page 6 of 20

Informatica FAQs

What are the transformations cannot be used inside a Mapplets?


You cannot include the following objects in a mapplet:

o o o o o o

Normalizer transformations COBOL sources XML Source Qualifier transformations XML sources Target definitions Other mapplets

Which transformation cannot be made reusable?


Sequence Generator transformations must be reusable in mapplets. You cannot demote reusable Sequence Generator transformations to standard in a mapplet.

What is Forwarding Rejected Rows?


You can configure the Update Strategy transformation to either pass rejected rows to the next transformation or drop them. By default, the Informatica Server forwards rejected rows to the next transformation. The Informatica Server flags the rows for reject and writes them to the session reject file. If you do not select Forward Rejected Rows, the Informatica Server drops rejected rows and writes them to the session log file.

What is Business Component?


Business components allow you to organize, group, and display sources and mapplets in a single location in your repository folder. For example, you can create groups of source tables that you call Purchase Orders and Payment Vouchers. You can then organize the appropriate source definitions into logical groups and add descriptive names for them. Business components let you access data from all operational systems within your organization through source and mapplet groupings representing business entities. You can think of business components as tools that let you view your sources and mapplets in a meaningful way using hierarchies and directories. You create business components in the Designer. The Designer creates a business component when you drag any source or mapplet into any directory of the business component tree. You can use the same source or mapplet multiple times in the business component tree. Since business components are references to another object, you can edit the object from its original location or from the business components directory.

What the various Date functions?


ADD_TO_DATE (date, format, amount) Adds a specified amount to one part of a date/time value, and returns a date in the same format as the date you pass to the function. ADD_TO_DATE accepts positive and negative integer values. Return Value Date in the same format as the date you pass to this function. NULL if a null value is passed as an argument to the function.

DATE_COMPARE (date1, date2) Returns an integer indicating which of two dates is earlier. Note that DATE_COMPARE returns an integer value rather than a date value.

Page 7 of 20

Informatica FAQs

Return Value -1 if the first date is earlier. 0 if the two dates are equal. 1 if the second date is earlier. NULL if one of the date values is NULL.

DATE_DIFF (date1, date2, format) Returns the length of time, measured in the increment you specify (years, months, days, hours, minutes, or seconds), between two dates. The Informatica Server subtracts the second date from the first date and returns the difference. Return Value

Double value. If date1 is later than date2, the return value is a positive number. If date1 is earlier than date2, the return value is a negative number. Zero if the dates are the same. NULL if one (or both) of the date values is NULL.

GET_DATE_PART (date, format) Returns the specified part of a date as an integer value. Therefore, if you create an expression that returns the month portion of the date, and pass a date such as Apr 1 1997 00:00:00, GET_DATE_PART returns 4. Return Value Integer representing the specified part of the date. NULL if a value passed to the function is NULL.

LAST_DAY (date) Returns the date of the last day of the month for each date in a port. Return Value Null If a value is NULL, LAST_DAY ignores the row. However, if all values passed from the port are NULL, LAST_DAY returns NULL. Group By LAST_DAY groups values based on group by ports you define in the transformation, returning one result for each group. If there is no group by port, LAST_DAY treats all rows as one group, returning one value. MAX (date [, filter_condition]) Returns the latest date found within a port or group. You can apply a filter to limit the rows in the search. You can nest only one other aggregate function within MAX. MAX is one of several aggregate functions. You use aggregate functions in Aggregator transformations only. You can also use MAX to return the largest numeric value in a port or group. Date. The last day of the month for that date value you pass to this function. NULL if a value in the selected port is NULL.

Page 8 of 20

Informatica FAQs

Return Value Date. & If NULL if all values passed to the function are NULL, or if no rows are selected (for example, the filter condition evaluates to FALSE or NULL for all rows). MIN (date [, filter_condition]) Returns the oldest date found in a port or group. You can apply a filter to limit the rows in the search. You can nest only one other aggregate function within MIN, and the nested function must return a date datatype. MIN is one of several aggregate functions. You use aggregate functions in Aggregator transformations only. You can also use MIN to return the minimum numeric value in a port or group. Return Value

Date if the value argument is a date. NULL if all values passed to the function are NULL, or if no rows are selected (for example, the filter condition evaluates to FALSE or NULL for all rows). Nulls If a single value is NULL, MIN ignores it. However, if all values passed from the port are NULL, MIN returns NULL. Group By MIN groups values based on group by ports you define in the transformation, returning one result for each group. If there is no group by port, MIN treats all rows as one group, returning one value. ROUND (date [, format]) Rounds one part of a date. You can also use ROUND to round numbers. Return Value Date with the specified part rounded. ROUND returns a date in the same format as the source date. You can link the results of this function to any port with a Date/Time datatype. NULL if you pass a null value to the function. SET_DATE_PART (date, format, value) Sets one part of a date/time value to a value you specify. Return Value Date in the same format as the source date with the specified part changed. NULL if a value passed to the function is NULL.

TRUNC (date [, format]) Return Value Date. NULL if a value passed to the function is NULL.

Page 9 of 20

Informatica FAQs

What are the types of Workflow variables?


Pre-defined Workflow variables User-defined Workflow variables

The Informatica Server creates pre-defined workflow variables each time you create a new task. You create userdefined workflow variables when you create a workflow. You can use workflow variables when you configure the following types of tasks: Assignment tasks Decision Task Links You can use an Assignment task to assign a value to a user-defined workflow variable. For example, you can increment a user-defined counter variable by setting the variable to its current value plus 1. Decision tasks determine how the Informatica Server executes a workflow. For example, you can use the Status variable to run a second session only if the first session completes successfully. Links connect each workflow task. You can use workflow variables in links to create branches in the workflow. For example, after a Decision task, you can create one link to follow when the decision condition evaluates to true, and another link to follow when the decision condition evaluates to false. Use links to connect each workflow task. You can specify conditions with links to create branches in the workflow. The Workflow Manager does not allow you to use links to create loops in the workflow. Each link in the workflow can execute only once. Timer tasks specify when the Informatica Server begins to execute the next task in the workflow. You can use a user-defined date/time variable to specify the exact time the Informatica Server starts to execute the next task.

Timer Tasks

Pre-Defined Workflow Variables The Workflow Manager creates a set of pre-defined variables for every workflow. There are two types:

Task-specific variables. The Workflow Manager creates a set of task-specific variables for each task in the workflow. You can use task-specific variables to represent information such as the time a task ended, the number of rows written to a target in a session, or the result of a Decision task. The Workflow Manager lists task-specific variables under the task name in the Expression Editor.

System variables. You can use the SYSDATE and WORKFLOWSTARTTIME system variables within a workflow. The Workflow Manager lists system variables under the Built-in node in the Expression Editor. You can use pre-defined variables within a workflow. You cannot modify or delete pre-defined workflow variables. The Task-Specific Workflow variables are, Pre-defined Workflow Variable Condition EndTime ErrorCode ErrorMsg FirstErrorCode FirstErrorMsg Description Evaluation result of decision condition expression. If the task fails, the Workflow Manager keeps the condition set to null. Date and time the associated task ended. Last error code for the associated task. If there is no error, the Informatica Server sets ErrorCode to 0 when the task completes. Last error message for the associated task. If there is no error, the Informatica Server sets ErrorMsg to an empty string when the task completes. Error code for the first error message in the session. If there is no error, the Informatica Server sets FirstErrorCode to 0 when the session completes. The first error message in the session. If there is no error, the Informatica Server sets FirstErrorMsg to Datatype Integer Date/Time Integer Nstring Integer Nstring

Page 10 of 20

Informatica FAQs

PrevTaskStatus

SrcFailedRows SrcSuccessRows StartTime Status

an empty string when the task completes. Status of the task that the Workflow Manager executes immediately before the current task. If the previous task succeeded, the Workflow Manager sets PrevTaskStatus to SUCCEEDED. Otherwise, it sets PrevTaskStatus to FAILED. For more information. Total number of rows read from the sources that failed. Total number of rows successfully read from the sources. Date and time the associated task started. Execution status. Task statuses include: ABORTED DISABLED FAILED NOTSTARTED STARTED STOPPED

Integer

Integer Integer Date/Time Integer

TgtFailedRows

TgtSuccessRows TotalTransErrors

SUCCEEDED Total number of rows that the targets rejected. Total number of rows successfully written to the targets. Total Numbers of Transformations Errors

Integer Integer Integer

Note: Nstring can have a maximum length of 600 characters.

Decision
The Decision task allows you to enter a condition that determines the execution of the workflow, similar to a link condition. The Decision task has a pre-defined variable called $Decision_task_name. Condition that represents the result of the decision condition. The Informatica Server evaluates the condition in the Decision Task and sets the pre-defined condition variable to True (1) or False (0). You can specify one decision condition per Decision task. After the Informatica Server evaluates the Decision task, you can use the pre-defined condition variable in other expressions in the workflow to help you develop the workflow. Depending on the workflow, you might use link conditions instead of a Decision task. If you do not specify a condition in the Decision task, the Informatica Server evaluates the Decision task to true.

Assignment
The Assignment task allows you to assign a value to a user-defined workflow variable. To use an Assignment task in the workflow, first create and add the Assignment task to the workflow. Then configure the Assignment task to assign values or expressions to user-defined variables. After you assign a value to a variable using the Assignment task, the Informatica Server uses the assigned value for the variable during the remainder of the workflow. You must create a variable before you can assign values to it. You cannot assign values to pre-defined workflow variables.

Timer
The Timer task allows you to specify the period of time to wait before the Informatica Server executes the next task in the workflow. You can choose to start the next task in the workflow at an exact time and date. You can also choose to wait a period of time after the start time of another task, workflow, or worklet before starting the next task. The Timer task has two types of settings:

Page 11 of 20

Informatica FAQs

Absolute time. You specify the exact time that the Informatica Server starts executing the next task in the workflow. You may specify the exact date and time, or you can choose a user-defined workflow variable to specify the exact time. Relative time. You instruct the Informatica Server to wait for a specified period of time after the Timer task, the parent workflow, or the top-level workflow starts. For example, you may have two sessions in the workflow. You want the Informatica Server wait ten minutes after the first session completes before it executes the second session. Use a Timer task after the first session. In the Relative Time setting of the Timer task, specify ten minutes from the start time of the Timer task.

Control
You can use the Control takes to stop, abort, or fail the top-level workflow or the parent workflow based on an input link condition. A parent workflow or worklet is the workflow or worklet that contains the Control task. Control Option Description Marks the Control task as Failed. The Informatica Server fails the Control task if you choose this option. If you choose Fail Me in the Properties tab and choose Fail Parent If This Task Fails in the General tab, the Informatica Server fails the parent workflow. Marks the status of the workflow or worklet that contains the Control task as Failed after the workflow or worklet completes. Stops the workflow or worklet that contains the Control task.

Fail Me Fail Parent Stop Parent

Aborts the workflow or worklet that contains the Control task.

Abort Parent
Fails the workflow that is running.

Fail Top-Level Wo rkfl ow


Stops the workflow that is running.

Stop Top-Level Wo rkfl ow


Aborts the workflow that is running.

Abort TopLev el Wo rkfl ow


You can define events in the workflow to specify the sequence of task execution. The event is triggered based on the completion of the sequence of tasks. Use the following tasks to help you use events in the workflow: Even-Raise Task and Even-Wait Task To coordinate the execution of the workflow, you may specify the following types of events for the Event-Wait and Event-Raise tasks:

Page 12 of 20

Informatica FAQs

Pre-defined event. A pre-defined event is a file-watch event. For pre-defined events, use an Event-Wait task to instruct the Informatica Server to wait for the specified indicator file to appear before continuing with the rest of the workflow. When the Informatica Server locates the indicator file, it starts the next task in the workflow.

User-defined event. A user-defined event is a sequence of tasks in the workflow. Use an Event-Raise task to specify the location of the user-defined event in the workflow. A user-defined event is sequence of tasks in the branch from the Start task leading to the Event-Raise task. When all the tasks in the branch from the Start task to the Event-Raise task complete, the Event-Raise task triggers the event. The Event-Wait task waits for the Event-Raise task to trigger the event before continuing with the rest of the tasks in its branch.

Event-Raise Task
The Event-Wait task waits for an event to occur. Once the event triggers, the Informatica Server continues executing the rest of the workflow.
To use an Event-Raise task, you must first declare the user-defined event. Then, create an Event-Raise task in the workflow to represent the location of the user-defined event you just declared. In the Event-Raise task properties, specify the name of a user-defined event.

Event-Wait Task
Event-Raise task represents a user-defined event. When the Informatica Server executes the Event-Raise task, the Event-Raise task triggers the event. Use the Event-Raise task with the Event-Wait task to define events. The Event-Wait task waits for a pre-defined event or a user-defined event. A pre-defined event is a file-watch event. When you use the Event-Wait task to wait for a pre-defined event, you specify an indicator file for the Informatica Server to watch. The Informatica Server waits for the indicator file to appear. Once the indicator file appears, the Informatica Server continues executing tasks after the Event-Wait task. Do not use the Event-Raise task to trigger the event when you wait for a pre-defined event. You can also use the Event-Wait task to wait for a user-defined event. To use the Event-Wait task for a user-defined event, you specify the name of the user-defined event in the Event-Wait task properties. The Informatica Server waits for the Event-Raise task to trigger the user-defined event. Once the user-defined event is triggered, the Informatica Server continues executing tasks after the Event-Wait task. Waiting for Pre-Defined Events To use a pre-defined event, you need a shell command, script, or batch file to create an indicator file. The file must be created or sent to a directory local to the Informatica Server. The file can be any format recognized by the Informatica Server operating system. You can choose to have the Informatica Server delete the indicator file after it detects the file, or you can manually delete the indicator file. The Informatica Server marks the status of the Event-Wait task as failed if it cannot delete the indicator file. When you specify the indicator file in the Event-Wait task, enter the directory in which the file will appear and the name of the indicator file. You must provide the absolute path for the file. The directory must be local to the Informatica Server. If you only specify the file name and not the directory, the Workflow Manager looks for the indicator file in the system directory. The Informatica Server writes the time the file appears in the workflow log. Note: Do not use a source or target file name as the indicator file name.

Enable Test Load in Session Property


You can configure the Informatica Server to perform a test load. With a test load, the Informatica Server reads and transforms data without writing to targets. The Informatica Server generates all session files, and performs all pre- and post-session functions, as if running the full session.

Page 13 of 20

Informatica FAQs

The Informatica Server writes data to relational targets, but rolls back the data when the session completes. For all other target types, such as flat file and SAP BW, the Informatica Server does not write data to the targets. Enter the number of source rows you want to test in the Number of Rows to Test field. You cannot perform a test load on sessions using XML sources. Note: You can perform a test load when you configure a session for normal mode. If you configure the session for bulk mode, the session fails.

Incremental Aggregation
Select Incremental Aggregation option if you want the Informatica Server to perform incremental aggregation. Informatica Server Processing for Incremental Aggregation The first time you run a session with incremental aggregation enabled, the Informatica Server processes the entire source. At the end of the session, the Informatica Server stores aggregate data from that session run in two files, the index file and the data file. The Informatica Server creates the files in a local directory. The second time you run the session, use only changes in the source as source data for the session. The Informatica Server then performs the following actions:

For each input record, the Informatica Server checks historical information in the index file for a corresponding group, then: o o If it finds a corresponding group, the Informatica Server performs the aggregate operation incrementally, using the aggregate data for that group, and saves the incremental change. If it does not find a corresponding group, the Informatica Server creates a new group and saves the record data. When writing to the target, the Informatica Server applies the changes to the existing target: Updates modified aggregate groups in the target. Inserts new aggregate data. Deletes removed aggregate data. Ignores unchanged aggregate data. Saves modified aggregate data in the index and data files to be used as historical data the next time you run the session.

o o o o o

Each subsequent time you run the session with incremental aggregation, you use only the incremental source changes in the session. If the source changes significantly, and you want the Informatica Server to continue saving aggregate data for future incremental changes, configure the Informatica Server to overwrite existing aggregate data with new aggregate data. When you partition a session that uses incremental aggregation, the Informatica Server creates one set of cache files for each partition. If you change the partitioning information after you run an incremental aggregation session, the Informatica Server realigns the cache files the next time you run the incremental aggregation session. The Informatica Server creates new aggregate data, instead of using historical data, when you perform one of the following tasks:

Save a new version of the mapping. Select Reinitialize Aggregate Cache in the session property sheet. Move the aggregate files without correcting the configured path or directory for the files in the session property sheet. Change the configured path or directory for the aggregate files in the session property sheet without moving the files to the new location.

Re-initializing the Aggregate Files

Page 14 of 20

Informatica FAQs

Reinitializing the aggregate cache overwrites historical aggregate data with new aggregate data. When you reinitialize the aggregate cache, instead of using the captured changes in source tables, you typically need to use the use the entire source table. You might use this option when source tables change dramatically. After you run a session that reinitializes the aggregate cache, edit the session properties to disable the Reinitialize Aggregate Cache option. If you do not clear Reinitialize Aggregate Cache, the Informatica Server overwrites the aggregate cache each time you run the session. Note: When you move from Windows to UNIX, you must reinitialize the cache. Therefore, you cannot change from a Latin1 code page to an MSLatin1 code page, even though these code pages are compatible. Do not enable incremental aggregation in the following circumstances:

You cannot capture new source data. Processing the incrementally changed source significantly changes the target. Your mapping contains percentile or median functions.

Capturing Incremental Changes Before enabling incremental aggregation, you must capture changes in source data. You might do this by:

Using a filter in the mapping. You may be able to remove pre-existing source data during a session with a filter. Using a stored procedure. You may be able to remove pre-existing source data at the source database with a pre-load stored procedure. Creating File Directory When you run multiple sessions with incremental aggregation, decide where you want the files stored. Then enter the appropriate directory for the server variable, $PMCacheDir, in the Workflow Manager. You can enter sessionspecific directories for the index and data files. However, by using the server variable for all sessions using incremental aggregation, you can easily change the cache directory when necessary by changing $PMCacheDir. Note: Changing the cache directory without moving the files causes the Informatica Server to reinitialize the aggregate cache and gather new aggregate data.

Line Sequential Buffer Length


Affects the way the Informatica Server reads flat files. Increase this setting from the default of 1024 bytes per line only if source flat file records are larger than 1024 bytes.

Override Tracing
Overrides tracing levels set on a transformation level. Selecting this option enables a menu from which you choose a tracing level: None, Terse, Normal, Verbose Initialization, or Verbose Data. None Terse Normal Verbose The Informatica Server uses the tracing level set in the mapping. Informatica Server logs initialization information as well as error messages and notification of rejected data. Informatica Server logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows. In addition to normal tracing, Informatica Server logs additional initialization details, names of index

Page 15 of 20

Informatica FAQs

Initialization Verbose Data

and data files used, and detailed transformation statistics. In addition to verbose initialization tracing, Informatica Server logs each row that passes into the mapping. Also notes where the Informatica Server truncates string data to fit the precision of a column and provides detailed transformation statistics.

You can also enter tracing levels for individual transformations in the mapping. When you enter a tracing level in the session properties, you override tracing levels configured for transformations in the mapping.

Target Load Type


Note: Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Informatica Server reads each source qualifier in the mapping. Constraint-based loading establishes the order in which the Informatica Server loads individual targets within a set of targets receiving data from a single source qualifier. In the Designer, you can set the order in which the Informatica Server sends rows to different target definitions in a mapping. This feature is crucial if you want to maintain referential integrity when inserting, deleting, or updating records in tables that have the primary key and foreign key constraints. The Informatica Server writes data to all the targets connected to the same Source Qualifier or Normalizer simultaneously to maximize performance. To specify the order in which the Informatica Server sends data to targets, create one Source Qualifier or Normalizer transformation for each target within a mapping. To set the target load order, you then determine the order in which each Source Qualifier sends data to connected targets in the mapping. When a mapping includes a Joiner transformation, the Informatica Server sends all rows to targets connected to that Joiner at the same time, regardless of the target load order.

Page 16 of 20

FAQ for ETL Process

Pre-Session and Post-Session Shell Commands


The Informatica Server can perform shell commands at the beginning of the session or at the end of the session. Shell commands are operating system commands. You can use pre- or post-session shell commands, for example, to delete a reject file or session log, or to archive target files before the session begins. The Workflow Manager provides the following types of shell commands for each Session task:

Pre-session command. The Informatica Server performs pre-session shell commands at the beginning of a session. Post-session success command. The Informatica Server performs post-session success commands only if the session completed successfully. Post-session failure command. The Informatica Server performs post-session failure commands only if the session failed to complete.

You can configure a session to stop or continue if a pre-session shell command fails. Use the following guidelines to call a shell command:

Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or batch file for Windows servers. Configure the session to execute the pre- or post-session shell commands.

The Workflow Manager provides a task called the Command task that allows you to specify shell commands anywhere in the workflow. You can choose a reusable Command task for the pre- or postsession shell command. Or, you can create non-reusable shell commands for the pre- or post-session shell commands. If you create non-reusable pre- or post-session shell commands, you have the option to make them into a reusable Command task. The Workflow Manager allows you to choose from the following options when you configure shell commands:

Create non-reusable shell commands. Create a non-reusable set of shell commands for the session. Other sessions in the folder cannot use this set of shell commands. Use an existing reusable Command task. Select an existing Command task to run as the pre- or post-session shell command.

Configure pre- and post-session shell commands in the Components tab of the session properties. Using Server and Session Variables You can include any server variable, such as $PMTargetFileDir, or session variables in commands in pre-session and post-session commands. When you use a server variable instead of entering a specific directory, you can run the same workflow on different Informatica Servers without changing session properties. You cannot use server variables or session variables in standalone Command tasks in the workflow. The Informatica Server does not expand server variables or session variables used in standalone Command tasks. Configuring Non-Reusable Shell Commands When you create non-reusable pre- or post-session shell commands, the commands are only visible in the session properties. The Workflow Manager does not create a Command task from the non-reusable pre- or post-session shell commands. You have the option to make a non-reusable shell command into a reusable Command task. Creating a Reusable Command Task from Pre- or Post-Session Commands If you create non-reusable pre- or post-session shell commands, you have the option to make them into a reusable Command task.

Page 17 of 20

FAQ for ETL Process


Once you make the pre- or post-session shell commands into a reusable Command task, you cannot revert back. Using Server Variables You can include any server variable, such as $PMTargetFileDir, in pre- or post-session shell commands. When you use a server variable instead of entering a specific directory, you can run the same workflow on different Informatica Servers without changing session properties. Pre-Session Shell Command Errors You can configure the session to stop or continue if a pre-session shell command fails. If you select stop, the Informatica Server stops the session, but continues with the rest of the workflow. If you select Continue, the Informatica Server ignores the errors and continues the session. By default the Informatica Server stops the session upon shell command errors. Configure the session to stop or continue if a pre-session shell command fails in the Error Handling settings on the Config Object tab.

Constraint Based Load Ordering in Sessions


Do not use constraint-based loading when the mapping used in the session contains Update Strategy transformations. When you use Update Strategy transformations, you must set the session option Treat Source Rows As to Data Driven. When the mapping contains Update Strategy transformations and you need to load data to a primary key table first, split the mapping to load the primary key table first, and the dependent tables second. Note: Constraint-based loading does not affect the target load ordering of the mapping. Target load ordering defines the order the Informatica Server reads each source qualifier in the mapping. Constraint-based loading establishes the order in which the Informatica Server loads individual targets within a set of targets receiving data from a single source qualifier. To enable constraint-based loading: 1. 2. In the General Options settings of the Properties tab, choose Insert for the Treat Source Rows As property. Click the Config Object tab. On the Advanced settings, select Constraint-Based Load Ordering.

When you select this option, the Informatica Server orders the target load on a row-by-row basis. For every row generated by an active source, the Informatica Server loads the corresponding transformed row first to the primary key table, then to any foreign key tables. Constraint-based loading depends on the following requirements:

Active source. Related target tables must have the same active source. Key relationships. Target tables must have key relationships. Target connection groups. Targets must be in one target connection group. Treat rows as insert. Use this option when you insert into the target. You cannot use updates with constraint-based loading.

Active Source The following transformations can be an active source within a mapping:

Source Qualifier Normalizer (COBOL or flat file) Advanced External Procedure Aggregator Joiner Rank Sorter Mapplet, if it contains one of the above transformations

Key Relationship When target tables have no key relationships, the Informatica Server does not perform constraint-based loading. Similarly, when target tables have circular key relationships, the Informatica Server reverts to a normal load. For example, you have one target containing a primary

Page 18 of 20

FAQ for ETL Process


key and a foreign key related to the primary key in a second target. The second target also contains a foreign key that references the primary key in the first target. The Informatica Server cannot enforce constraint-based loading for these tables. It reverts to a normal load. Target Connection group The Informatica Server enforces constraint-based loading for targets in the same target connection group. If you want to specify constraint-based loading for multiple targets that receive data from the same active source, you must verify the tables are in the same target connection group. If the tables with the primary-foreign key relationship are in different target connection groups, the Informatica Server cannot enforce constraint-based loading when you run the workflow. To verify that all targets are in the same target connection group, perform the following tasks:

Verify all targets are in the same target load order group and receive data from the same active source. Use the default partition properties and do not add partitions or partition points. Define the same target type for all targets in the session properties. Define the same database connection name for all targets in the session properties. Choose normal mode for the target load type for all targets in the session properties.

Treat Rows as Insert Use constraint-based loading only when the session option Treat Source Rows As is set to Insert. You might get inconsistent data if you select a different Treat Source Rows As option and you configure the session for constraint-based loading.

Cache Lookup() Function property in Sessions


If selected, the Informatica Server caches PowerMart 3.5 LOOKUP functions in the mapping, overriding mapping-level LOOKUP configurations. If not selected, the Informatica Server performs lookups on a row-by-row basis, unless otherwise specified in the mapping.

Workspace File Directory


The directory for workspace files created by the Workflow Manager. Workspace files maintain the last task or workflow you saved. This directory should be local to the Informatica Client to prevent file corruption or overwrites by multiple users. By default, the Workflow Manager creates files in the Informatica Client installation directory.

What are the factors to be considered before configuring the repository environment?
<Need to Refer>

Apart from using the Abort function to stop a session, what is the other way to stop a session?
<Need to Refer>

Difference Between Workflow and Worklet Worklets


A worklet is an object that represents a set of tasks. It can contain any task available in the Workflow Manager. You can run worklets inside a workflow. The workflow that contains the worklet is called the parent workflow. You can also nest a worklet in another worklet. Create a worklet when you want to reuse a set of workflow logic in several workflows. Use the Worklet Designer to create and edit worklets. The worklet does not contain any scheduling or server information. The worklet does not contain any scheduling or server information. To execute a worklet, include the worklet in a workflow. The worklet executes on the Informatica Server you choose for the workflow. The Workflow Manager does not provide a parameter file or log file for worklets.

Page 19 of 20

FAQ for ETL Process


The Informatica Server suspends the parent workflow when the status of the worklet is Suspended or Suspending. You can create reusable worklets in the Worklet Designer. You can also create non-reusable worklets in the Workflow Designer as you develop the workflow. Create reusable worklets in the Worklet Designer. You can view a list of reusable worklets in the Navigator Worklets node. You can create non-reusable worklets in the Workflow Designer as you develop the workflow. Non-reusable worklets only exist in the workflow. You cannot use a non-reusable worklet in another workflow. After you create the worklet in the Workflow Designer, open the worklet to edit it in the Worklet Designer. You can promote non-reusable worklets to reusable worklets by selecting the Reusable option in the worklet properties. Configuring Worklet Properties When you use a worklet in a workflow, you can configure the same set of general task settings on the General tab as any other task. In addition to general task settings, you can configure the following worklet properties: Worklet variables. Use worklet variables to reference values and record information. You use worklet variables the same way you use workflow variables. You can assign a workflow variable to a worklet variable to override its initial value. Events. To use the Event-Wait and Event-Raise tasks in the worklet, you must first declare an event in the worklet properties. Metadata extension. Extend the metadata stored in the repository by associating information with repository objects. Declaring Events in Worklets Similar to workflows, you can use Event-Wait and Event-Raise tasks in a worklet. To use the EventRaise task, you first declare a user-defined event in the worklet. Events in one instance of a worklet do not affect events in other instances of the worklet. You cannot specify worklet events in the Event tasks in the parent workflow. Using Worklet Variables Worklet variables are similar to workflow variables. A worklet has the same set of pre-defined variables as any task. You can also create user-defined worklet variables. Like user-defined workflow variables, user-defined worklet variables can be persistent or non-persistent. You cannot use variables from the parent workflow in the worklet. Similarly, you cannot use userdefined worklet variables in the parent workflow. However, you can use pre-defined worklet variables in the parent workflow, just as you can use pre-defined variables for other tasks in the workflow. Persistent Worklet Variables To create a persistent worklet variable, select Persistent when you create the variable. When you create a persistent worklet variable, the worklet variable retains its value the next time the Informatica Server executes the worklet instance in the parent workflow. Worklet variables only persist when you run the same workflow. A worklet variable does not retain its value when you use instances of the worklet in different workflows.

The Informatica Server writes information about worklet execution in the workflow log. When you choose Suspend On Error for the parent workflow, the Informatica Server also suspends the worklet if a task in the worklet fails. When a task in the worklet fails, the Informatica Server stops executing the failed task and other tasks in its path. If no other task is running in the worklet, the worklet status is Suspended. If one or more tasks are still running in the worklet, the worklet status is Suspending.

Page 20 of 20

You might also like