Informatica Iics
Informatica Iics
Total I have around 3.6 Years of experience in DWH using informatica tool in development.Primarily I worked
on PHARMA and manufacturing domains. (Banking or Sales Domains as per your project).
In my Current project my roles & responsibilities are basically, I am working with onsiteoffshore model so
we use to get the tasks from my onsite team.
As a developer first I need to understand the physical data model i.e., dimensions and facts;their relationship &
also functional specifications that tells the business requirement designed by Business Analyst.
I involved into the preparation of source to target mapping sheet (tech Specs) which tell uswhat is the source
and target and which column we need to map to target column and alsowhat would be the business logic. This
document gives the clear picture for the development.
Creating informatica mappings, sessions and workflows using different transformations toimplement business
logic.
Preparation of Unit test cases also one of my responsibilities as per the businessrequirement.
And also involved into Unit testing for the mappings developed by myself.
I use to source code review for the mappings & workflows developed by my team members.
And also involved into the preparation of deployment plan which contains list of mappingsand workflows they
need to migrate based on this deployment team can migrate the code from one environment to another
environment.
Once the code rollout to production we also work with the production support team for 2 weeks where we
parallel give the KT. So, we also prepare the KT document as well for theproduction team.
Page 1 of 21
In the present system they don’t have the BI design, so they are using the manual process by exporting the sql query data
to excel sheet a preparing PI charts using macros. In the new system weare providing BI passion reports like drill down, drill
ups, PI charts, Graphs, detail reports and Dashboards.
ORACLE
How strong you are in SQL& PL/SQL?
1) I am good in SQL; I use to write the source qualifier queries for informatica mappings as per thebusiness
requirement.
2) I am comfortable to work with joins; co related queries, sub queries, analyzing tables, inline viewsand materialized
views.
3) As an informatica developer I could not get more opportunity to work on pl/sql side. But I workedon PL/SQL to
informatica migration project so I do have exposure on procedure, function and triggers.
What is the difference between view and materialized view?
View Materialized view
A view has a logical existence. It does not A materialized view has a physical existence.
contain data.
Its not a database object. It is a database object.
We cannot perform DML operation on view. We can perform DML operation on materialized
view.
When we do select * from view it will fetch the When we do select * from materialized view it
data from base table. will fetch the data from materialized view.
In view we cannot schedule to refresh. In materialized view we can schedule to
refresh.
We can keep aggregated data into materializedview.
Materialized view can be created based
on multiple tables.
Materialized View
Materialized view is very essential for reporting. If we don’t have the materialized view, it will directly fetch the data from
dimension and facts. This process is very slow since it involves multiple joins. So, the same report logic if we put in the
materialized view. We can fetch the data directly from materialized view for reporting purpose. So that we can avoid
multiple joins at report run time.It is always necessary to refresh the materialized view. Then it can simply perform select
statement on materialized view.
Difference between Trigger and Procedure?
Triggers Stored Procedures
In trigger no need to execute manually. Triggerswill be Where as in procedure we need to executemanually.
fired automatically.
Triggers that run implicitly when an INSERT, UPDATE, or
DELETE statement is issued against
the associated table.
Page 2 of 21
Differences between where clause and having clause?
Where clause Having clause
Both where and having clause can be used to filter the data.
Where as in where clause it is not mandatory. But having clause we need to use it with thegroup by.
Where clause applies to the individual rows. Whereas having clause is used to test somecondition
on the group rather than on
individual rows.
Where clause is used to restrict rows. But having clause is used to restrict groups.
Restrict normal query by where Restrict group by function by having
In where clause every record is filtered basedon where. In having clause, it is with aggregate records(group by
functions).
Differences between stored procedure and functions
Stored Procedure Functions
Stored procedure may or may not returnvalues. Function should return at least one output
parameter. Can return more than one
parameter using OUT argument.
Stored procedure can be used to solve thebusiness Function can be used to calculations
logic.
Stored procedure is a pre-compiled statement. But function is not a pre-compiled statement.
Stored procedure accepts more than one Whereas function does not accept arguments.
argument.
Stored procedures are mainly used to process Functions are mainly used to compute values
the tasks.
Cannot be invoked from SQL statements. E.g.,SELECT Can be invoked form SQL statements e.g.,SELECT
Can affect the state of database using commit. Cannot affect the state of database.
Stored as a pseudo-code in database i.e.,compiled Parsed and compiled at runtime.
form.
Page 3 of 21
If joining columns doesn’t have index then it will do the full table scan if it is full table scan the cost will be more then will
create the indexes on the joining columns and will run the query it should givebetter performance and also needs to
analyze the tables if analyzation happened long back. The ANALYZE statement can be used to gather statistics for a specific
table, index or cluster using ANALYZE TABLE employees COMPUTE STATISTICS;
If still have performance issue, then will use HINTS, hint is nothing but a clue. We can use hints likeALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data warehousing systems.(/*+ ALL_ROWS
*/)
FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.(/*+
FIRST_ROWS */)
CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and FIRST_ROWS, based on statisticsgathered.
HASH
Hashes one table (full scan) and creates a hash index for that table. Then hashes other table anduses hash index to find
corresponding records. Therefore, not suitable for < or > join conditions.
/*+ use_hash */
Hints are most useful to optimize the query performance.
DWH Concepts
Page 4 of 21
Since it is one-to-one mapping from ODS to staging we do truncate and reload. We can create
indexes in the staging state, to perform our source qualifier best.
If we have the staging area no need to relay on the informatics transformation to known whetherthe record exists or
not.
ODS:
My understanding of ODS is, it’s a replica of OLTP system and so the need of this, is to reduce the burden on production
system (OLTP) while fetching data for loading targets. Hence, it’s a mandateRequirement for every Warehouse.
So, every day do we transfer data to ODS from OLTP to keep it up to date?
OLTP is a sensitive database they should not allow multiple select statements it may impact the performance as well as if
something goes wrong while fetching data from OLTP to data warehouse itwill directly impact the business.
ODS is the replication of OLTP.
ODS is usually getting refreshed through some oracle jobs.
Page 5 of 21
/pmar/informatica/pc/pmserver/pmcmd startworkflow -u $INFA_USER -p $INFA_PASSWD -s
$INFA_SERVER: $INFA_PORT -f $INFA_FOLDER -wait $1 >> $LOG_PATH/$LOG_FILE
2) And if we suppose to process flat files using informatica but those files were existing in remote server then we have to
write script to get ftp into informatica server before start process those files.
3) And also file watch mean that if indicator file available in the specified location, then we need tostart our informatica
jobs otherwise will send email notification using
Mail X command saying that previous jobs didn’t complete successfully something like that.
4) Using shell script update parameter file with session start time and end time.
This kind of scripting knowledge I do have. If any new UNIX requirement comes then I can Googleand get the solution
implement the same.
What is use of Shortcuts in informatica?
If we copy source definitions or target definition’s or mapplets from Shared folder to any otherfolders that will
become a shortcut.
Let’s assume we have imported some source and target definitions in a shared folder after that weare using those sources
and target definitions in other folders as a shortcut in some mappings.
If any modifications occur in the backend (Database) structure like adding new columns or dropexisting columns either in
source or target I f we reimport into shared folder those new changesautomatically it would reflect in all folder/mappings
wherever we used those sources or target definitions.
Ans:
Using Dynamic Lookup on Target table:
If record doesn’t exit do insert in target. If it is already existed then get corresponding Ename valefrom lookup and
concat in expression with current E name value then update the target Ename column using update strategy.
Using Var port Approach:
Sort the data in sq based on EmpNo column then Use expression to store previous record information using Var port after
that use router to insert a record if it is first time if it is already inserted then update Ename with concat value of prev
name and current name value then update intarget.
How to send Unique (Distinct) records into One target and duplicates into another target?Source:
Output: Target_1
Ename EmpNo
Ename EmpNo
stev 100
Stev 100
Stev 100
John 101
john 101
Mathew 102
Mathew 102
Target-2
Ename EmpNo
Stev 100
Page 6 of 21
Ans:
Using Dynamic Lookup on Target table:
If record doesn’t exit do insert in target_1. If it is already existed then send it to Target_2 usingRouter.
Using Var port Approach:
Sort the data in sq based on EmpNo column then Use expression to store previous record information using Var ports
after that use router to route the data into targets if it is first time thensent it to first target if it is already inserted then
send it to Tartget_2.
How to do Dynamic File generation in Informatica?
I want to generate the separate file for every employee (as per Name, it should generate file). It has to
generate 5 flat files and name of the flat file is corresponding employee namethat is the requirement.
Below is my mapping.
Source (Table) -> SQ -> Target (FF)
Source:
Dept Ename EmpNo
A S 22
A R 27
B P 29
B X 30
B U 34
This functionality was added in informatica 8.5 onwards earlier versions it was not there.
We can achieve it with use of transaction control and special "File Name" port in the target file.
In order to generate the target file names from the mapping, we should make use of the special "FileName" port in the target
file. You can't create this special port from the usual new port button.
There is a special button with label "F" on it to the right most corner of the target flat file whenviewed in "Target Designer".
When you have different sets of input data with different target files created, use the same instance,but with a Transaction
Control transformation which defines the boundary for the source sets.
in target flat file there is option in column tab i.e., filename as column.
when you click that one non editable column gets created in metadata of target.
in transaction control give condition as iif(not isnull(emp_no),tc_commit_before,continue) elsetc_commit_before
map the emp_no column to target's filename columnur mapping
will be like this
source -> squlf-> transaction control-> target
run it, separate files will be created by name of Ename
How do u populate 1st record to 1st target, 2nd record to 2nd target ,3rd record to 3rd target and 4threcord to 1st target
through informatica?
We can do using sequence generator by setting end value=3 and enable cycle option. Then in therouter take 3 groups
In 1st group specify condition as seq next value=1 pass those records to 1st target similarlyIn 2nd group
specify condition as seq next value=2 pass those records to 2nd target
In 3rd group specify condition as seq next value=3 pass those records to 3rd target.
Since we have enabled cycle option after reaching end value sequence generator will start from 1,for the 4th record seq.
next value is 1 so it will go to 1st target.
How do you perform incremental logic or Delta or CDC?
1) Incremental means suppose today we processed 100 records, for tomorrow run u need to extractwhatever the records
inserted newly and updated after previous run based on last updated timestamp (Yesterday run) this process called as
incremental or delta.
Page 7 of 21
Approach_1: Using set max var ()
2) First need to create mapping var ($$Pre_sess_max_upd) and assign initial value as old date(01/01/1940).
Then override source qualifier query to fetch only LAT_UPD_DATE >=$$Pre_sess_max_upd(Mapping var)
3) In the expression assign max last_upd_date value to $$Pre_sess_max_upd(mapping var) using setmax var
4) Because its var so it stores the max last upd_date value in the repository, in the next run oursource qualifier
query will fetch only the records updated or inseted after previous run.
Page 8 of 21
What is the difference between snowflake and star schema?
Star Schema Snowflake Schema
The star schema is the simplest data warehousescheme. Snowflake schema is a more complex data
warehouse model than a star schema.
In star schema each of the dimensions is represented in a In snowflake schema at least one hierarchyshould exist
single table. It should not have between dimension tables.
any hierarchies between dims.
It contains a fact table surrounded by dimension It contains a fact table surrounded by dimension tables.
tables. If the dimensions are de- normalized, we say it If a dimension is normalized,we say it is a snow flaked
is a star schema design. design.
In star schema only one join establishes the In snowflake schema since there is relationshipbetween
relationship between the fact table and any the dimensions tables it has to do
one of the dimension tables. many joins to fetch the data.
A star schema optimizes the performance by keeping Snowflake schemas normalize dimensions to
queries simple and providing fast response time. All the eliminated redundancy. The result is more complex
information about each queries and reduced query
level is stored in one row. performance.
It is called a star schema because the diagramresembles It is called a snowflake schema because thediagram
a star. resembles a snowflake.
Difference between data mart and data warehouse?
Data Mart Data Warehouse
Data mart is usually sponsored at the department level Data warehouse is a “Subject-Oriented, Integrated, Time-
and developed with a specific Variant, Nonvolatile collectionof data in support of
issue or subject in mind, a data mart is a datawarehouse decision making”.
with a focused objective.
A data mart is used on a business division/ A data warehouse is used on an enterprise level
department level.
A Data Mart is a subset of data from a Data A Data Warehouse is simply an integrated consolidation of
Warehouse. Data Marts are built for specificuser data from a variety of sourcesthat is specially designed to
groups. support strategic
and tactical decision making.
By providing decision makers with only a subsetof data The main objective of Data Warehouse is to provide
from the Data Warehouse, Privacy, Performance and an integrated environment and coherent picture of
Clarity Objectives can be the business at a point in
attained. time.
Page 9 of 21
Cache includes the lookup source column in thelookup Cache includes all lookup/output ports in thelookup
condition and the lookup/return port.
condition and the lookup source columns that are output
ports.
Best example where we need to use dynamic cache is if If we use static lookup first record it will go to look up
suppose first record and last recordboth are same but and check in the lookup cache based on the condition it
there is a change in the address. What informatica will not find the match so itwill return null value, then in
mapping has to dohere is first record needs to get insert the router it will send that record to insert flow.
and lastrecord should get update in the target table. But still this record dose not available in the cache
memory so when the last record comesto lookup it will
check in the cache it will not find the match so it
returns null value again itwill go to insert flow through
router, but it is supposed to go to update flow because
cache
didn’t get refreshed when the first record getsinserted
into target table.
Page 10 of 21
How to Process multiple flat files to single target table through informatica if all files are samestructure?
We can process all flat files through one mapping and one session using list file.
First, we need to create list file using Unix script for all flat file the extension of the list file is. LST. This list file it will
have only flat file names.
At session level we need to set source file
directory as list file path and source file name
as list file nameAnd file type as indirect.
How to populate file name to target while loading multiple files using list file concept.
In informatica 8.6 by selecting Add currently processed flat file name option in the properties tab of source definition after
import source file definition in source analyzer. It will add new column as currently processed file name. We can map this
column to target to populate filename.
SCD Type-II Effective-Date Approach
We have one of the dimensions in current project called resource dimension. Here we are maintaining the history to keep
track of SCD changes.
To maintain the history in slowly changing dimension or resource dimension. We followed SCD Type- II Effective-Date
approach.
My resource dimension structure would be eff-start-date, eff-end-date, s.k and source columns. Whenever I do an insert
into dimension I would populate eff-start-date with sys date, eff-end-datewith future date and s.k as a sequence number.
If the record already presents in my dimension but there is change in the source data. In that casewhat I need to do is
Update the previous record eff-end-date with sys date and insert as a new record with source data.
Informatica design to implement SDC Type-II effective-date approach?
Once you fetch the record from source qualifier. We will send it to lookup to find out whether therecord is present in the
target or not based on source primary key column.
Once we find the match in the lookup, we are taking SCD column and s.k column from lookup toexpression
transformation.
In lookup transformation we need to override the lookup override query to fetch active records fromthe dimension while
building the cache.
In expression transformation I can compare source with lookup return data.If the source and
target data is same, then I can make a flag as ‘S’.
If the source and target data is different than I can make a flag as ‘U’.
If source data does not exist in the target that means lookup returns null value. I can flag it as ‘I’.Based on the flag values in
router I can route the data into insert and update flow.
If flag=’I’ or ‘U’ I will pass it to insert flow.
If flag=’U’ I will pass this record to eff-date update flow When we do
insert, we are passing the sequence value to s.k.
Whenever we do update, we are updating the eff-end-date column based on lookup return s.k value.
Complex Mapping
We have one of the order file requirements. Requirement is every day in source system they will placefilename with
timestamp in informatica server.
We have to process the same date file through informatica.
Source file directory contain older than 30 days files with timestamps.
For this requirement if I hardcode the timestamp for source file name it will process the same fileevery day.
1) So, what I did here is I created $InputFilename for source file name.
2) Then I am going to use the parameter file to supply the values to session variables ($InputFilename).
3) To update this parameter file, I have created one more mapping.
Page 11 of 21
4) This mapping will update the parameter file with appended timestamp to file name.
5) I make sure to run this parameter file update mapping before my actual mapping.
How to handle errors in informatica?
1) We have one of the sources with numerator and denominator values we need to calculatenum/deno
When populating to target.
2) If deno=0 I should not load this record into target table.
We need to send those records to flat file after completion of 1st session run. Shell script will checkthe file size.
3) If the file size is greater than zero, then it will send email notification to source system POC (point ofcontact) along with
deno zero record file and appropriate email subject and body.
If file size<=0 that means, there is no records in flat file. In this case shell script will not send any emailnotification.
Or
We are expecting a not null value for one of the source columns.If it is null that
means it is a error record.
We can use the above approach for error handling.
Worklet
Worklet is a set of reusable sessions. We cannot run the worklet without workflow.If we want to run
2 workflow one after another.
If each workflow exists in same folder, we can create 2 worklet rather than creating 2 workflows.Finally, we can call
these 2 worklets in one workflow.
There we can set the dependency.
If both workflows exist in different folders or repository then we cannot create worklet.
We can set the dependency between these two-workflow using shell script is one approach.The other approach
is event wait and event rise.
In shell script approach
As soon as it completes first workflow, we are creating zero-byte file (indicator file).
If indicator file is available location. We will run second workflow.
If indicator file is not available, we will wait for 5 minutes and again we will check for theindicator. Like this
we will continue the loop for 5 times i.e 30 minutes.
After 30 minutes if the file does not exist, we will send out email notification.
Event wait and Event rise approach
In event wait it will wait for infinite time. Till the indicator file is available.
Why we need source qualifier?
Simply it performs select statement. Select statement fetches the data in the form of row.
Source qualifier will select the data from the source table. It identifies the record from the source.Parameter file it will supply
the values to session level variables and mapping level variables.
Variables are of two types: Session level variables: Mapping level variables
Page 12 of 21
Flat File
Flat file is a collection of data in a file in the specific format.Informatica can
support two types of files
Delimiter or Fixed Width
In delimiter we need to specify the separator. Like (., ., /,|, etc…)
In fixed width we need to know about the format first. Means how many characters to read forparticular column.
In delimiter also it is necessary to know about the structure of the delimiter. Because to know aboutthe headers.
If the file contains the header, then in definition, we need to skip the first row.List file:
If you want to process multiple files with same structure. We don’t need multiple mapping andmultiple sessions.
We can use one mapping one session using list file option.
First, we need to create the list file for all the files. Then we can use this file in the main mapping.
Aggregator Transformation:
Transformation type: Active and Connected
The Aggregator transformation performs aggregate calculations, such as averages and sums. The Aggregator
transformation is unlike the Expression transformation, in that you use the Aggregator transformation to perform
calculations on groups. The Expression transformation permits you to perform calculations on a row-by-row basis only.
Components of the Aggregator Transformation:
The Aggregator is an active transformation, changing the number of rows in the pipeline. The Aggregator transformation
has the following components and options
Aggregate cache: The Integration Service stores data in the aggregate cache until it completes aggregate calculations. It
stores group values in an index cache and row data in the data cache.
Aggregate expression: Enter an expression in an output port. The expression can include non- aggregate expressions and
conditional clauses.
Group by port: Indicate how to create groups. The port can be any input, input/output, output, or variable port: When
grouping data, the Aggregator transformation outputs the last row of each groupunless otherwise specified.
Sorted input: Select this option to improve session performance. To use sorted input, you must pass data to the
Aggregator transformation sorted by group by port, in ascending or descending order.
Aggregate Expressions:
The Designer allows aggregate expressions only in the Aggregator transformation. An aggregate expression can include
conditional clauses and non-aggregate functions. It can also include one aggregate function nested within another
aggregate function, such as:
MAX (COUNT (ITEM))
The result of an aggregate expression varies depending on the group by ports used in thetransformation
Aggregate Functions
Use the following aggregate functions within an Aggregator transformation. You can nest oneaggregate function within
another aggregate function. The transformation language includes the following aggregate functions:
AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE
When you use any of these functions, you must use them in an expression within an Aggregatortransformation.
Page 13 of 21
Tips: Use sorted input to decrease the use of aggregate caches.
Sorted input reduces the amount of data cached during the session and improves session performance. Use this option
with the Sorter transformation to pass sorted data to the Aggregator transformation.
Limit connected input/output or output ports.
Limit the number of connected input/output or output ports to reduce the amount of data the Aggregator transformation
stores in the data cache.
Filter the data before aggregating it.
If you use a Filter transformation in the mapping, place the transformation before the Aggregator transformation to reduce
unnecessary aggregation.
Normalizer Transformation: Transformation type: Active/Connected
The Normalizer transformation receives a row that contains multiple-occurring columns and returns a row for each
instance of the multiple-occurring data.
The Normalizer transformation parses multiple-occurring columns from COBOL sources, relational tables, or other sources.
It can process multiple record types from a COBOL source that contains a REDEFINES clause.
The Normalizer transformation generates a key for each source row. The Integration Service increments the generated key
sequence number each time it processes a source row. When the source row contains a multiple-occurring column or a
multiple-occurring group of columns, the Normalizer transformation returns a row for each occurrence. Each row contains
the same generatedkey value.
SQL Transformation: Transformation type: Active/Passive/Connected
The SQL transformation processes SQL queries midstream in a pipeline. You can insert, delete, update, and retrieve rows
from a database. You can pass the database connection information to the SQL transformation as input data at run time.
The transformation processes external SQL scripts or SQL queries that you create in an SQL editor. The SQL transformation
processes the query and returns rows and database errors.
For example, you might need to create database tables before adding new transactions. You can create an SQL
transformation to create the tables in a workflow. The SQL transformation returns database errors in an output port. You
can configure another workflow to run if the SQL transformation returns no errors.
When you create an SQL transformation, you configure the following options:
Mode. The SQL transformation runs in one of the following modes:
Script mode. The SQL transformation runs ANSI SQL scripts that are externally located. You pass a script name to the
transformation with each input row. The SQL transformation outputs one row for each input row.
Query mode. The SQL transformation executes a query that you define in a query editor. You can pass strings or parameters
to the query to define dynamic queries or change the selection parameters. You can output multiple rows when the query
has a SELECT statement.
Database type. The type of database the SQL transformation connects to.
Connection type. Pass database connection information to the SQL transformation or use aconnection object. Script Mode
An SQL transformation running in script mode runs SQL scripts from text files. You pass each script file name from the
source to the SQL transformation Script Name port. The script file name containsthe complete path to the script file.
When you configure the transformation to run in script mode, you create a passive transformation. The transformation
returns one row for each input row. The output row contains results of the query and any database error.
When the SQL transformation runs in script mode, the query statement and query data do not change. When you need to
run different queries in script mode, you pass the scripts in the sourcedata. Use script mode to run data definition queries
such as creating or dropping tables.
Page 14 of 21
When you configure an SQL transformation to run in script mode, the Designer adds the Script Name input port to the
transformation
An SQL transformation configured for script mode has the following default ports:
Port Type Description
Script Name Input Receives the name of the script to execute for the current row.
Script Result Output Returns PASSED if the script execution succeeds for the row. Otherwise containsFAILED.
Script Error Output Returns errors that occur when a script fails for a row.
Script Mode Rules and Guidelines
Use the following rules and guidelines for an SQL transformation that runs in script mode:
You can use a static or dynamic database connection with script mode.
To include multiple query statements in a script, you can separate them with a semicolon.
You can use mapping variables or parameters in the script file name
The script code page defaults to the locale of the operating system. You can change the localeof the script.
You cannot use scripting languages such as Oracle PL/SQL or Microsoft/Sybase T-SQL in thescript.
You cannot use nested scripts where the SQL script calls another SQL script
A script cannot accept run-time arguments.
The script file must be accessible by the Integration Service. The Integration Service must have read permissions on
the directory that contains the script. If the Integration Service uses operating system profiles, the operating
system user of the operating system profile must have read permissions on the directory that contains the script.
The Integration Service ignores the output of any SELECT statement you include in the SQL script. The SQL
transformation in script mode does not output more than one row of data foreach input row.
Query Mode:
When an SQL transformation runs in query mode, it executes an SQL query that you define in the transformation. You pass
strings or parameters to the query from the transformation input ports to change the query statement or the query data.
When you configure the SQL transformation to run in query mode, you create an active transformation. The
transformation can return multiple rows for each input row.
Create queries in the SQL transformation SQL Editor. To create a query, type the query statement in the SQL Editor main
window. The SQL Editor provides a list of the transformation ports that you can reference in the query.
You can create the following types of SQL queries in the SQL transformation:
Static SQL query. The query statement does not change, but you can use query parameters to change the data. The
Integration Service prepares the query once and runs the query for all input rows.
Dynamic SQL query. You can change the query statements and the data. The Integration Service prepares a
query for each input row.
When you create a static query, the Integration Service prepares the SQL procedure once andexecutes it for each
row. When you create a dynamic query, the Integration Service prepares the SQL for each input row. You can
optimize performance by creating static queries.
Query Mode Rules and Guidelines
Use the following rules and guidelines when you configure the SQL transformation to run in query mode:
The number and the order of the output ports must match the number and order of the fields in the query SELECT
clause.
Page 15 of 21
The native datatype of an output port in the transformation must match the datatype of the corresponding
column in the database. The Integration Service generates a row error when the datatypes do not match.
When the SQL query contains an INSERT, UPDATE, or DELETE clause, the transformation returns data to the SQL
Error port, the pass-through ports, and the Num Rows Affected port when it is enabled. If you add output ports
the ports receive NULL data values.
When the SQL query contains a SELECT statement and the transformation has a pass-through port, the
transformation returns data to the pass-through port whether the query returns database data. The SQL
transformation returns a row with NULL data in the output ports.
You cannot add the "_output" suffix to output port names that you create.
You cannot use the pass-through port to return data from a SELECT query.
When the number of output ports is more than the number of columns in the SELECT clause,the extra ports
receive a NULL value.
When the number of output ports is less than the number of columns in the SELECT clause,the Integration
Service generates a row error.
You can use string substitution instead of parameter binding in a query. However, the inputports must be string
datatypes.
Page 16 of 21
Level-1 (Mappings)
1) Difference between join and lookup and Source transformation?
2) What are the different types of lookup in CDI (IICS) ?
3) Difference between Connected and Unconnected lookup?
4) What are the active and passive transformations?
5) What are the different methods to perform remove duplicates in CDI?
6) What is indirect file load and how can we implement that in IICS?
7) How will you read Source JSON file in IICS?
8) Describe Rank, Aggregator, Normalizer transformation?
9) IIF vs Decode function in the expression?
10) Router vs filter in IICS?
11) How to reset sequence generator when we migrate from DEV to QA?
12) Union vs File list??
13) What is SQL override and Lookup Override?
14) How to execute UNIX/Power shell/python commands in IICS Mapping
15) What is the biggest mapping you handled as a developer? (SCD TYPE 2)
16) Data cache and index cache in Join transformer
17) Hierarchical parser vs structural parser
18) Types of parameters in mapping (input and INOUT parameters) and its usage
19) SUBSTR, INSTR, ERROR, LKP, DATE functions
20) What is gcid in the normalizer
21) SCD Type 1,2,3,
22) Mapping level Performance tuning
23) Web service consumer transformation
24) fatal and nonfatal errors
25) Exception handling and user-defined errors
26) Types of caches
1. Data Cache
2. Index Cache
3. Static cache
4. Dynamic cache
5. Persistent Cache
6. Re cache (Refill Cache)
7. Shared Cache
27. How to call Stored Procedure in IICS?
28. How to call unconnected lookup object in IICS?
29. How to return multiple values from unconnected lookup?
30. What is incremental load, what are the different approaches to implement that
31. Difference between Upset and data driven
32. In join transformation, which object will be master and which object will be details, based on
what metrics we decide that?
33. How to convert Rows into columns in IICS?
34. Difference between REST V2 connection and Web service consumer
35. How to create business service and how to use in IICS
36. How to pass multiple rows to input request for web service call
37. How to do decrypt and encrypt the PGP encrypted source flat file?
38. How to copy/move/remove files from one folder to another folder using file processor
connection?
39. Can we move the file to SFTP location using IICS connections?
40. Can we use command in source instead of file list
Page 17 of 21
Level-2 (Mapping Tasks, Synchronization, Replication, Mass Ingestion)
1) How you implement performance tuning in the Informatica mapping Tasks
2) Error Handling mechanism in data integration
3) What is the mapping task?
4) How to schedule the Mapping
5) What is the blackout period in the schedule?
6) What is parameter file and how you use in the mapping
7) How to enable verbose mode in Informatica data integration
8) What is cross-schema pushdown optimization
9) Tell me below advanced session properties:
-> Rollback transactions on error
-> Commit on the end of file
-> Recovery Strategy
-> DTM process
-> Incremental Aggregation
-> Pushdown Optimization
-> Session Retry on deadlock
-> Stop on error
10) Difference between Linear task flow and Task flow
11) Limitations of Data synchronization task
12) Use of Replication task
13) What is incremental load, full load, and initial load
14) How to perform up sert in Informatica mapping and required constrained to implement
15) How to run Pre and Post SQL commands, Pre and Post Processing commands
16) Difference between Synchronization vs Replication Task
17) Different types of mass ingestion tasks
18) What is Data Masking
19) How to configure maplets in IICS
20) Use of control table in ETL
21) Explain the below components in task flow:
22) How to call CAI process in DI job 23) How to read parameter file values into MCT
24) How to execute python /Unix /PowerShell script using Command task (windows Secure agent)
25) How to execute multiple mapping task instances simultaneously
26) What is the use of Stop on Error property in MCT
27) What is the use of email notification option in MCT
28) How to use fixed width delimited file in source
29) How to create Business service and its use case
30) What is the use of hierarchical schema, can we do create without schema
31) How to send an email using notification step in task flow
32) Can we send output response to Task flow
33) How to send variables data to mapping columns in task flow
34) How to get values from mapping columns to task flow variables?
35) How to implement custom error handling in task flow
Page 18 of 21
36) How to do audit logging in IICS with data task output response variables?
37) How to Trigger the task flow based on file event
38) How to create file listener and how to trigger task flow
39) when we use file event and when we use schedule?
40) How can we trigger IICS task flow using third party scheduler?
41) What is include dependency check in assets export
42) Best practices for IICS code migrations (export and import)
43) How to implement versioning in IICS
44) what is asset level permissions and how to use that (ACL)
45) Different types of semi-structured data and how to read in IICS
Page 19 of 21
6) Is look up active or passive? Why?
7) Different caches in lookup?
8) Difference between connected & unconnected lookup?
9) Difference between dynamic cache & static cache?
10) Difference between SQL override and lookup override?
11) Explain SCD type 2?
12) What is the difference in functionality of SCD type1 & type 2?
13) Difference between Filter & router? Which gives better performance?
14) Why is router called active transformation?
15) Why is Union called active transformation?
16) How to improve performance of lookup?
17) How to improve performance of Joiner?
18) How to improve performance of Aggregator?
19) What would happen if I forgot to select group by column in Aggregator? What will be the output?
20) Can I generate a repeatable sequence in Sequence Generator?
21) What will be output if I don’t connect the Nexval column from Sequence Generator and connects
only the Currval column?
22) What are different override options available in Informatica?
23) Can I join heterogeneous databases in source qualifier?
24) How can I make my records distinct in mapping? Explain different ways?
25) What are different tracing levels in Informatica?
26) What is transaction boundary?
27) What are different criteria to identify a transformation is active or passive?
28) Explain partitioning? Explain the different types of partitioning.
29) What is pushdown optimization?
30) What is Indirect file loading? Explain.
31) Explain Transactional Control transformation?
32) Can we find Dense rank using Rank transformation? If no then how can we find Dense rank in
Informatica?
33) What is the difference between STOP & ABORT?
34) How to load file name in the target table?
35) Can we create a mapping without Source Qualifier?
36) What are the different ways to set insert type (insert or update or delete) in mapping level &
session level?
37) I have used Update strategy in mapping, but my records are only getting inserted not getting
updated or rejected. What may be the reasons for this?
38) What is Incremental load & how can we achieve this? Explain.
39) How can I join different heterogeneous sources?
40) What is mapplet? What is worklet?
41) Difference between workflow and worklet?
42) Between lookup & joiner which is better?
43) What are the different ways by which we can generate sequence of numbers in Informatica?
Explain.
44) What is the most effective way to do distinct of records in Informatica?
45) Difference between mapping Variable & variable in expression transformation?
46) What is MD5 function? Explain SCD type 2 using MD5 function?
47) I want a session to run when the previous session has completed N runs. How to implement this in
Informatica?
Page 20 of 21
48) In filter transformation instead of giving condition I gave a random number by mistake. What will
be the output? Will the mapping run?
SQL Interview Questions:
Page 21 of 21
Informatica (Q&A)
1.What is the difference between Informatica PowerCenter and Informatica Cloud?
Informatica Intelligent Cloud Services is a cloud-based Integration platform(iPaaS). IICS helps you
integrate, synchronize all data and applications residing on your on-premises and cloud environments.
It provides similar functionality as PowerCenter in a better way and can be accessed via the internet.
Hence in IICS, there is no need to install any client applications on the personal computer or server. All
the supported applications can be accessed from the browser and the tasks can be developed through
browser UI. In PowerCenter, the client applications need to be installed on your server.
2.What is a Runtime environment?
A Runtime environment is the execution platform that runs a data integration or application
integration tasks. You must have at least one runtime environment setup to run tasks in your
organization. Basically, it is the sever upon which your data gets staged while processing. You can
choose either to process via the Informatica servers or your local servers which stays behind your
firewall. There are two types of runtime environments- Informatica Cloud Hosted Agent and
Informatica Cloud Secure Agent.
3.What is a Synchronization task?
Synchronization task helps you synchronize data between a source and target. A Synchronization task
can be built easily from the IICS UI by selecting the source and target without use of any
transformations like in mappings. You can also use expressions to transform the data according to your
business logic or use data filters to filter data before writing it to targets and use lookup data from
other objects and fetch a value. Anyone without PowerCenter mapping and transformation knowledge
can easily build synchronization tasks as UI guides you step by step.
4.What is a Replication task?
A Replication task allows you to replicate data from a database table or an on-premises application to a
desired target. You can choose to replicate all the source rows or the rows that changes since the last
runtime of the task using built in Incremental processing mechanism of Replication Task.
You can choose from three different type of operations when you replicate data to a target.
→ Incremental load after initial full load
→ Incremental load after initial partial load → Full load each run
5.What is the difference between a Synchronization task and Replication task?
In Synchronization task you must have a target to integrate data. However, a Replication task can
create a target for you. A Replication task can replicate an entire schema and all the tables in it at a
time which is not possible in Synchronization task. A Replication task comes with a built-in incremental
processing mechanism. In Synchronization task user needs to handle the incremental data processing.
6.Where does the metadata gets stored in Informatica Cloud (IICS)?
All the metadata gets stored in the Cloud server/repository. Unlike PowerCenter, all the information in
Informatica Cloud is stored on the server maintained by the Informatica and the user does not have
access to the repository database. Hence, it is not possible to use any SQL query on metadata tables to
retrieve the information like in Informatica PowerCenter.
7.What metadata information gets stored in the Informatica Cloud (IICS) repository?
Source and Target Metadata: Metadata information of each source and target including the field
names, datatype, precision, scale, and other properties.
Connection Information: The connection information to connect specific source and target systems in
an encrypted format.
Mappings: All the Data integration tasks built, their dependences and rules are stored.
Schedules: The schedules created you run the task built in IICS are stored.
Logging and Monitoring information: The results of all the jobs are stored.
8.What is a Mapping Configuration task?
A Mapping Configuration Task or Mapping Task is analogous to a session in Informatica PowerCenter.
You can define parameters that associate with the mapping. Define pre- and post-processing
commands. Add advance session properties to boost the performance and configure the task to run on
schedule.
Page 22 of 21
9.What is a task flow in Informatica Cloud?
A Task flow is analogous to a workflow in Informatica PowerCenter. A task flow controls the execution
sequence of a mapping configuration task, or a synchronization task based on the output of the
previous task.
10.What is the difference between a Task flow and Linear Task flow?
A Linear task flow runs the tasks one by one serially in an order defined in the task. If a task defined in
Linear task flow gets failed, you need to restart the entire task flow. A taskflow allows you to run the
task in parallel, advanced decision-making capabilities.
11.Can we run Powercenter jobs in Informatica cloud?
Yes. There is a Powercenter task available in Informatica Cloud where in user must upload the XML file
exported from Powercenter in Data Integration and run the job as a Powercenter task. You can update
an existing PowerCenter task to use a different PowerCenter XML file but cannot make changes to an
imported XML. When you upload a new PowerCenter XML file to an existing PowerCenter task, the
PowerCenter task deletes the old XML file and updates the PowerCenter task definition based on new
XML file content.
12.How does a update strategy transformation work in Informatica Cloud?
There is no Update strategy transformation available in Information Cloud. In the target transformation
in a mapping, Informatica Cloud Data Integration provides the option for the action to be performed on
the target – Insert, Update, Upsert, Delete and Data Driven.
13.What is the difference between a Union transformation in Informatica Cloud vs Informatica
Powercenter?
In earlier versions of Informatica Cloud, the Union transformation allows only two groups to be defined
in it. Hence if three different source groups need to be mapped to target, the user must use two Union
transformations. The output of first two groups to Union1. The output of Union1 and group3 to
Union2.
In the latest version, Informatica Cloud is supporting multiple groups. So, all the input groups can be
handled in a single Union transformation.
14.What is Dynamic Linking?
Informatica Cloud Data Integration allows you to create a new target files/tables at runtime. This
feature can only be used in mappings. In the target, choose Create New at Runtime option.
The user can choose a static filename which will be replaced by a new file every time the mapping runs
with the same name. The user can also choose to create a Dynamic filename so that every time the
mapping runs, a file is created with new name.
15.In what format can you export a task present in Informatica Cloud?
Informatica Cloud Data Integration supports exporting the tasks as a zip file where the metadata gets
stored in the JSON format inside the zip file. However, you can also download a XML version of the
tasks also which can be imported as workflows in Powercenter. But it will not support bulk export of
tasks in XML format at a time. Whereas you can export multiple tasks in form of JSON in a single export
zip file.
16.How do you read JSON Source file in IICS?
JSON files are read using the Hierarchy Parser transformation present in IICS. The user needs to define
a Hierarchical Schema that defines the expected hierarchy of output data in order to read a JSON file
through Hierarchy Parser. The Hierarchy Parser Transformation can also be used to read XML files in
Informatica Cloud Data Integration
17.What is a Hierarchical Schema in IICS?
A Hierarchical Schema is a component where user can upload an XML or JSON sample file that define
the hierarchy of output data. The Hierarchy Parser transformation converts input based on the
Hierarchical schema that is associated with transformation.
18.What is Indirect File loading and how to perform Indirect loading in IICS?
The processing of multiple source files having same structure and properties in a sequential manner in
a mapping is called Indirect File Loading. Indirect loading in IICS can be performed by selecting the File
List under Source Type property of a source transformation.
Page 23 of 21
19.What are the parameter types available in the Informatica Cloud?
IICS supports two types of parameters.
Input Parameter: Like a parameter in Powercenter. The parameter value remains constant as the value
defined in MCT or a Parameter file.
In-Out Parameter: Like a variable in Powercenter. The In-out parameter can be a constant or change
values within a single task run.
20.How many Status states are available in IICS monitor?
The various status states available in IICS are
Starting: Indicates that the task is starting.
Queued: There is a predefined number set which controls how many tasks can run together in your IICS
org. If the value is set to two and if two jobs are already running, the third task you trigger enters into
Queued state.
Running: The job enters the Running status from Queued status once the task is triggered completely.
Success: The task completed successfully without any issues.
Warning: The task completed with some rejects.
Failed: The task failed due to some issue.
Stopped: The parent job has stopped running, so the subtask cannot start. Applies to subtasks of
replication task instances.
Aborted: The job was aborted. Applies to file ingestion task instances.
Suspended: The job is paused. Applies to taskflow instances.
21.When Source is parameterized in a Cloud mapping, the source transformation fields would be
empty. Then how does the fields get propagated from source to the downstream transformations in
source parameterized mappings?
To propagate the fields to downstream transformations when source is parameterized, initially create
the mapping with actual source table. In the downstream transformation after source, select the Field
Selection Criteria as Named Fields and include all the source fields in the Incoming Fields section of the
transformation. Then change the source object to a parameter. This way the source fields are still
retained in the downstream transformation even when the fields are not available in source
transformation after the source is parameterized.
22.To include all incoming fields from an upstream transformation except those with dates, what
should you do?
Configure two field rules in a transformation. First, use the All-Fields rule to include all fields. Then,
create a Fields by Datatypes rule to exclude fields by data type and select Date/Time as the data type
to exclude
23.What are Preprocessing and postprocessing commands in IICS?
The Preprocessing and postprocessing commands are available in the Schedule tab of tasks to perform
additional jobs using SQL commands or Operating system commands. The task runs preprocessing
commands before it reads the source. It runs postprocessing commands after it writes to the target.
The task fails if If any command in the preprocessing or postprocessing scripts fail.
24.What are Field Name conflicts in IICS and how can they be resolved?
When there are fields with same name coming from different transformations into a downstream
transformation, the cloud mapping designer generates a Field Name Conflict error. You can either
resolve the conflict by renaming the fields in the upstream transformation itself or you can create a
field rule in downstream transformation to Bulk Rename fields by adding a prefix or a suffix to all
incoming fields.
25.What system variables are available in IICS to perform Incremental Loading?
IICS provides access to following system variables which can be used as a data filter variable to filter
newly inserted or updated records.
$LastRunTime returns the last time when the task ran successfully.
$LastRunDate returns only the last date on which the task ran successfully. The values of $LastRunDate
and $Lastruntime get stored in Informatica Cloud repository/server and it is not possible to override
the values of these parameters.
Page 24 of 21
26.What is the difference between the connected and unconnected sequence generator
transformation in Informatica Cloud Data Integration?
Sequence generator can be used in two different ways in Informatica cloud. One with Incoming fields
disabled and the other with incoming fields not disabled.
The difference between the sequence generator with incoming fields enabled and disabled is, when
NEXTVAL field is mapped to multiple transformations,
→ Sequence generator with incoming fields not disabled will generate same sequence of numbers for
each downstream transformation.
→ Sequence generator with incoming fields disabled will generate Unique sequence of numbers for
each downstream transformation.
27.Explain Partitioning in Informatica Cloud Data Integration.
Partitioning is nothing but enabling the parallel processing of the data through separate pipelines. With
the Partitioning enabled, you can select the number of partitions for the mapping. The DTM process
then creates a reader thread, transformation thread and writer thread for each partition allowing the
data to be processed concurrently, thereby reducing the execution time of the task. Partitions can be
enabled by configuring the Source transformation in mapping designer.
There are two major partitioning methods supported in Informatica Cloud Data Integration.
1. Key Range Partitioning distributes the data into multiple partitions based on the partitioning key
selected and range of the values defined for it. You must select a field as a partitioning key and defined
the start and end ranges of the value.
2. Fixed Partitioning can be enabled for sources which are not relational or support key range
partitioning. You must select the number of partitions by passing a value.
28.How to pass data from one mapping to other in Informatica Cloud Data Integration?
The data can be passed from one Mapping task to another in Informatica Cloud Data Integration
through a Task flow using parameters. The Mapping Task which passes the data should have an In-Out
Parameter defined using SetVariable functions. The Mapping Task which receives the data should
either have an Input parameter or an In-Out Parameter defined in the mapping to read the data passed
from upstream task.
***_SQL_Interview questions***
1.What is SQL?
SQL transfer Structured Query Language are also called as SEQUEL.
2.List out sub language of SQL?
They are 5 sub languages DDL,DML,DRL/DQL,TCL,DCL .
3.What is different between char and varchar2?
Char is a fixed size and varchar2 is a not a fixed size.
4.What is projection?
Selecting specific columns is projection.
5.How can we filter the rows in the table by use the Where, Group BY, Having, Order By clause?
Select deptno, sum(sal) from emp where ename <> ‘KING’ Group By deptno Having sum(sal) > 9000
Order By sum(sal) DESC;
6.What is column Alias?
Providing the duplicate to column Name. This is not a permanent.
7.Can we perform a arithmetic operation by using dual?
Select 10 + 20 Result from dual;
8.What is dual table?
Dual table is dummy table to calculate the some problems. This is one column and one row. Specified
to ‘X’
9.Write a query to display current date along with HH:MI:SS?
Select To_Char(sysdate,’DD-MON-YYYY HH:MI:SS’) from dual?
Page 25 of 21
10.Write a query to see the current date?
Select sysdate from dual;
11.Which operator is used to accept the values from the user?
INSERT, UPDATE,CREATE and etc.
12.How can you see all the table which are in the data base?
Select * from TAB
13.Which command is used to remove the table from the data base?
Drop command is used to the table.
14.What is different between delete and truncate command?
Delete to the table and getting the roll back. Truncate is used not possible the
table roll back.
15.Which operator is used to retrieve the rows based on null values?
IS NULL
16.In how many ways we can rename all the rows from the table?
They are two ways Delete and Truncate
17.How can we create copy of a table?
Create table emp1 AS select * from emp;
Create table emp2 AS select * from emp where deptno = 30;
18.Write a query to display no.of rows in the table?
BY using count (*) Select count (*) from emp;
19.What is different between count (*) and count (Expr)?
*is total table. Expr is only one column to count the table.
20.What is difference between group functions and scalar functions?
Group functions will act on total table and scalar functions will act on one row.
21.What is a use of Group By clause?
Group By clause will decided into several groups.
22.How can we filter the rows of Group By clause?
Having clause is used to filter the data.
23.Which clause is used to arrange the rows in the table?
Order By clause is used to arrange.
24.Which clause should be the last clause of the query?
Is order By caluse.
25.What is a TOAD?
Tool for Oracle Application Development.
26.What is need for Integrity constraint?
Constrains are rules which are applied on tables.
27.List out types of constraints?
They are 5 types NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, and CHECK.
28.In how many level constraints can be created?
Those are two levels i.e. column level and table level.
29.Which constraint can be created?
The constraint created not null.
30.Dose not null constraints accept duplicate values?
Yes
31.Which constraint is used to unique for every row in the table?
Primary key
32.What is composite primary key?
When primary key is applied on multiple columns it is called composite primary key. Composite
primary key can be applied only at table level.
Page 26 of 21
33.Can a table name two primary key?
It is not possible.
34.What is foreign key constraint explain?
This foreign key is a established in parent table and child table relationship.
35.Can we establish a parent & child relationship without having constraint in the parent table?
On
36.Con you explain change related to foreign key on delete cascade on delete set null constraint?
Foreign key column in the child table will only accept values which are the primary key column or
unique column.
We can delete the rows from the parent table and the corresponding child table rows deleted
automatically.
When we delete row from parent table. The corresponding values will be changed to null.
37.Does every constraint it has constraint name?
38.How can you know constraint name and combination type apply for a table?
By using user constraint.
39.Is there any different when a constraint is a created at column level or table level?
No difference.
40.Can you provide user defined constraint name?
Yes
41.What are data dictionary tables?
Predefined tables or user constraints
42.What is the need for join?
To retrieve the multiple tables
43.What is EQUI join?
When tables are joined basing on a common column it is called EQUI_JOIN.
44.How many conditions are required to join ‘n’ tables?
We need to n-1 conditions.
45.How can be display matching as well as non-matching rows?
By using outer joins.
46.What is outer join operator?
(+)
47.What is Cartesian product?
All possible in the table matching
48.What is difference between union and union all?
The union set operator display only original values and union all set operator is display all values.
Duplicate values also.
49.What are pseudo columns?
It is rownum is a pseudo column which starts with one and increment by 1.
50.Write a query to display first n rows from the table?
Select rownum, empno, ename, sal, deptno from emp;
51.What are different between rownum and rowid?
Rownum Rowid
Rownum values starts with 1 Rowid’s are hexadecimal values.
and increment by one.
Rownum values are temporary. Rowid values are permanent.
Rownum values are generated The Rowid values are generated when
when query is executed. row is created or inserted.
Page 27 of 21
53.write a query to display the first five of highest?
Select * from (select * from emp ORDER BY sal desc) where rownum <=5)
Minus
Select * from (select * from emp ORDER BY sal desc) where rownum <=4);
54.Explain about correlated subquery?
When subquery is executed in relation to parent query, it is called correlated subquery.
55.Whar are multiple row operators?
IN, ANY, ALL
56.Explain scalar subquery?
When we use sub query in the select clause it is called scalar subquery.
57.Explain inline view?
When a sub query is used inform clause it is called inline view.
Page 28 of 21