Crack The Interview
Crack The Interview
This Informatica Interview Questions blog covers all the core concepts from basic to advanced
level. So utilize these Questions to improve your chances of being hired in your next interview
Self introduction
Hi !
I am a m.tech graduate
having total 3.8 years of exp into informatica & iics cloud , from last 1 year i got the opportunity
to work in IICS project
Our client here is Takeda Pharamaceutical company ( Visit www.takeda.com for more info)
Here i involved in various activities related to etl (extract, transform, load) processes. Here are
some of my key roles and responsibilities:
Regularly engage with business analysts and data modelers to comprehend project
requirements.
Ensure a clear understanding of the data sources provided in the form of tables and flat files.
Develop logical processes for loading data incrementally using mapping variables.
Design and implement etl mappings for both historical and incremental loading based on
project requirements.
Design and develop complex etl mappings for handling type-1 and type-2 dimensions.
Create mappings for complex fact tables, adhering to business logic and client requirements.
Conduct unit testing to ensure the accuracy and efficiency of the mappings.
Participate in peer reviews of team members' mappings to ensure quality and adherence to best
practices.
Performance tuning:
Migrate code from individual user folders to the project folder for better organization.
Analyze and address defects identified by the project team to maintain the integrity of the etl
processes.
Implement a consolidated workflow framework job that includes sessions for staging,
dimension, and fact tables.
Schedule and automate the framework job to run daily, ensuring seamless data flow from
source to stage, stage to dimension, and finally to the fact table.
Enterprise Data Warehousing is the data of the organization being created or developed at a
single point of access. The data is globally accessed and viewed through a single source since
the server is linked to this single source. It also includes the periodic analysis of the source.
2. What is the meaning of Lookup transformation?
To get the relevant data or information, the Lookup transformation is used to find a source
qualifier, a target, or other sources. Many types of files can be searched in the Lookup
transformation like for example flat files, relational tables, synonym, or views, etc. The Lookup
transformation can be cited as active or passive. It can also be either connected or unconnected.
In mapping, multiple lookup transformations can be used. In the mapping, it is compared with
the lookup input port values.
The following are the different types of ports with which the lookup transformation is created:
1. Input port
2. Output port
3. Lookup ports
4. Return port
3. What are the points of difference between connected lookup and unconnected lookup?
Connected lookup is the one that takes up the input directly from the other transformations and
also participates in the data flow. On the other hand, an unconnected lookup is just the
opposite. Instead of taking the input from the other transformations, it simply receives the
values from the result or the function of the LKP expression.
Connected Lookup cache can be both dynamic and static but unconnected Lookup cache can't
be dynamic in nature. The First one can return to multiple output ports but the latter one
returns to only one output port. User-defined values which ads generally default values are
supported in the connected lookup but are not supported in the unconnected lookup.
The number of parameters that can include in an unconnected lookup is numerous. However, no
matter how many parameters are put, the return value would be only one. For example,
parameters like column 1, column 2, column 3, and column 4 can be put in an unconnected
lookup but there is only one return value.
To get the relevant data or information, the Lookup transformation is used to find a source
qualifier, a target, or other sources. Many types of files can be searched in the Lookup
transformation like for example flat files, relational tables, synonym, or views, etc. The Lookup
transformation can be cited as active or passive. It can also be either connected or unconnected.
In mapping, multiple lookup transformations can be used. In the mapping, it is compared with
the lookup input port values.
The following are the different types of ports with which the lookup transformation is created:
1. Input port
2. Output port
3. Lookup ports
4. Return port
5. What are the points of difference between connected lookup and unconnected lookup?
Connected lookup is the one that takes up the input directly from the other transformations and
also participates in the data flow. On the other hand, an unconnected lookup is just the
opposite. Instead of taking the input from the other transformations, it simply receives the
values from the result or the function of the LKP expression.
Connected Lookup cache can be both dynamic and static but unconnected Lookup cache can't
be dynamic in nature. The First one can return to multiple output ports but the latter one
returns to only one output port. User-defined values which ads generally default values are
supported in the connected lookup but are not supported in the unconnected lookup.
The number of parameters that can include in an unconnected lookup is numerous. However, no
matter how many parameters are put, the return value would be only one. For example,
parameters like column 1, column 2, column 3, and column 4 can be put in an unconnected
lookup but there is only one return value.
Informatica lookup caches can be of different nature like static or dynamic. It can also be
persistent or non-persistent. Here are the names of the caches:
1. Static Cache
2. Dynamic Cache
3. Persistent Cache
4. Shared Cache
5. Reached
8. What is the difference between a data warehouse, a data mart, and a database?
Data warehouse consists of different kinds of data. A database also consists of data but
however, the information or data of the database is smaller in size than the data
warehouse. Datamart also includes different sorts of data that are needed for different domains.
Examples - Different dates for different sections of an organization like sales, marketing,
financing, etc.
9. What is a domain?
The main organizational point sometimes undertakes all the interlinked and interconnected
nodes and relationships and this is known as the domain. These links are covered mainly by one
single point of the organization.
10. What is the Cite the differences between a powerhouse and a repository server?
The powerhouse server is the main governing server that helps in the integration process of
various different processes among the different factors of the server's database repository. On
the other hand, the repository server ensures repository integrity, uniformity, and consistency.
The total figure of repositories created in Informatica mainly depends on the total amounts of
the ports of the Informatica.
A session is partitioned in order to increase and improve the efficiency and the operation of the
server. It includes the solo implementation sequences in the session.
Parallel processing helps in further improvement of performance under hardware power. The
parallel processing is actually done by using the partitioning sessions. This partitioning option of
the Power Center in Informatica increases the performance of the Power Center by parallel
data processing. This allows the large data set to be divided into a smaller subset and this is also
processed in order to get a good and better performance of the session.
14. What are the different types of methods for the implementation of parallel processing
in Informatica?
There are different types of algorithms that can be used to implement parallel processing. These
are as follows:
15. What are the different mapping design tips for Informatica?
16. What is the meaning of the word ‘session’? Give an explanation of how to combine
execution with the assistance of batches?
Converting data from a source to a target is generally implemented by a teaching service and
this is known as a session. Usually, the session manager executes the session. In order to
combine session’s executions, batches are used in two ways - serially or parallelly.
Any number of sessions can be grouped in one batch but however, for an easier migration
process, it is better if the number is lesser in one batch.
18. What is the difference between mapping parameters and mapping variables?
The mapping variable refers to the changing values of the sessions' execution. On the other
hand, when the value doesn't change during the session then it is called mapping parameters.
The mapping procedure explains the procedure of the mapping parameters and the usage of
this parameter. Values are best allocated before the beginning of the session to these mapping
parameters.
19. Explain Partitionings & types in performance tuning ?
Round-Robin Partitioning - With the aid of this, the Integration service does the
distribution of data across all partitions evenly. It also helps in grouping data in a correct
way.
Hash Auto-keys partitioning - The hash auto keys partition is used by the power center
server to group data rows across partitions. These grouped ports are used as a
compound partition by the Integration Service.
Hash User-Keys Partitioning - This type of partitioning is the same as auto keys
partitioning but here rows of data are grouped on the basis of a user-defined or a user-
friendly partition key. The ports can be chosen individually that correctly defines the key.
Key Range Partitioning - More than one type of port can be used to form a compound
partition key for a specific source with its aid, the key range partitioning. Each partition
consists of different ranges and data is passed based on the mentioned and specified
range by the Integration Service.
Pass-through Partitioning - Here, the data are passed from one partition point to
another. There is no distribution of data.
Source Qualifier - This includes extracting the necessary data-keeping aside the
unnecessary ones. It also includes limiting columns and rows. Shortcuts are mainly used
in the source qualifier. The default query options like for example User Defined Join and
Filter etc, are suitable to use other than using source qualifier query override. The latter
doesn't allow the use of partitioning possible all the time.
Expressions - It includes the use of local variables in order to limit the number of huge
calculations. Avoiding data type conversions and reducing invoking external coding is
also part of an expression. Using operators are way better than using functions as
numeric operations are better and faster than string operation.
Aggregator - Filtering the data is a necessity before the Aggregation process. It is also
important to use sorted input.
Filter - The data needs a filter transformation and it is a necessity to be close to the
source. Sometimes, multiple filters are also needed to be used which can also be later
replied by a router.
Joiner - The data is required to be joined in the Source Qualifier as it is important to do
so. It is also important to avoid the outer joins. A fewer row is much more efficient to be
used as a Master Source.
Lookup - Here, joins replace the large lookup tables and the database is reviewed. Also,
database indexes are added to columns. Lookups should only return those ports that
meet a particular condition.
21. What are the different mapping design tips for Informatica?
The different mapping design tips are as follows:
Any number of sessions can be grouped in one batch but however, for an easier migration
process, it is better if the number is lesser in one batch.
23. What is the difference between mapping parameters and mapping variables?
The mapping variable refers to the changing values of the sessions' execution. On the other
hand, when the value doesn't change during the session then it is called mapping parameters.
The mapping procedure explains the procedure of the mapping parameters and the usage of
this parameter. Values are best allocated before the beginning of the session to these mapping
parameters.
1. Difficult requirements
2. Numerous transformations
3. Complex logic regarding business
25. Which option helps in finding whether the mapping is correct or not?
The debugging option helps in judging whether the mapping is correct or not without really
connecting to the session.
1. ROLAP
2. HOLAP
The surrogate key is just the replacement in the place of the prime key. The latter is natural in
nature. This is a different type of identity for each consisting of different data.
When the Power Centre Server transfers data from the source to the target, it is often guided by
a set of instructions and this is known as the session task.
Command task only allows the flow of more than one shell command or sometimes flow of one
shell command in Windows while the work is running.
The type of command task that allows the shell commands to run anywhere during the workflow
is known as the standalone task.
The workflow includes a set of instructions that allows the server to communicate for the
implementation of tasks.
1. Task Designer
2. Task Developer
3. Workflow Designer
4. Worklet Designer
Source Definition
Session and session logs
Workflow
Target Definition
Mapping
ODBC Connection
1. Global Repositories
2. Local Repositories
Mainly Extraction, Loading (ETL), and Transformation of the above-mentioned metadata are
performed through the Power Centre Repository.
36. Name the scenario in which the Informatica server rejects files?
When the server faces rejection of the update strategy transformation, it regrets files. The
database consisting of the information and data also gets disrupted. This is a rare case scenario.
This is of type an Active T/R which reads the data from COBOL files and VSAM sources
(virtual storage access method)
Normalizer T/R act like a source Qualifier T/R while reading the data from COBOL files.
Use Normalizer T/R that converting each input record into multiple output records. This
is known as Data pivoting.
Procedure:
Attribute Value
3. Select the mapping tab --> set reader, writer connection with target load type normal.
Double click the session --> Select the mapping tab from the left window --> select pushdown
optimization.
Copy Shortcut
Changes to the original object don’t reflect Dynamically reflects the changes to an original ob
1. Reusable scheduler
2. Non Reusable scheduler
Reusable scheduler:
Before we run the workflow manually. Through scheduling, we run workflow this is called
Auto Running
The cache updates or changes dynamically when lookup at the target table.
The dynamic lookup T/R allows for the synchronization of the target lookup table image
in the memory with its physical table in the database.
The dynamic lookup T/R or dynamic lookup cache is operated in only connected mode
(connected lookup )
Dynamic lookup cache support only equality conditions (=conditions)
The transformation language provides two comment specifiers to let you insert comments in the
expression:
Two Dashes ( - - )
0 The integration service does not update or insert the row in the cache
Two Slashes ( / / )
The Power center integration service ignores all text on a line preceded by these two comment
specifiers.
Can’t be used with SQL override Can be used with SQL override
44. What is the difference between the variable port and the Mapping variable?
The following are the differences between variable port and Mapping variable:
45. Which is the T/R that builts only single cache memory?
Rank can build two types of cache memory. But sorter always built only one cache memory. The
cache is also called Buffer.
Design mapping applications that first load the data into the dimension tables. And then load
the data into the fact table.
Load Rule: If all dimension table loadings are a success then load the data into the fact
table.
Load Frequency: Database gets refreshed on daily loads, weekly loads, and monthly
loads.
Snowflake Schema is a large denormalized dimension table is split into multiple normalized
dimensions.
Advantage:
Disadvantage:
1. It can be used anywhere in the workflow, defined will Link conditions to notify the
success or failure of prior tasks.
2. Visible in Flow Diagram.
3. Email Variables can be defined with stand-alone email tasks.
Lookup T/R
Note:- Prevent wait is available in any task. It is available only in the Event wait task.
Relative Time: The timer task can start the timer from the start timer of the timer task, the start
time of the workflow or worklet, or from the start time of the parent workflow.
A timer task is mainly used for scheduling workflow.
Workflow 11 AM --> Timer (11:05 AM) --> Absolute Mode
Anytime workflow start after 5 mins Timer --> (5 mins) will start Relative Mode.
The following are the differences between Filter T/R and Router T/R:
It is a GVI based administrative client that allows performing the following administrative tasks:
1. It is a GUI based client application that allows users to monitor ETL objects running an
ETL Server.
2. Collect runtime statistics such as:
o No. of records extracted.
o No. of records loaded.
o No. of records were rejected.
o Fetch session log
o Throughput
59. If Informatica has its own scheduler why using a third-party scheduler?
The client uses various applications (mainframes, oracle apps use Tivoli scheduling tool) and
integrates different applications & scheduling those applications it is very easy by using third
party schedulers.
It is a GUI-based client that allows you to create the following ETL objects.
Session
Workflow
Scheduler
Session:
Workflow:
Workflow is a set of instructions that tells how to run the session tasks and when to run the
session tasks.
A data integration tool that combines the data from multiple OLTP source systems, transforms
the data into a homogeneous format and delivers the data throughout the enterprise at any
speed.
It is a GUI-based ETL product from Informatica corporation which was founded in 1993 in
Redwood City, California.
Informatica Analyzer.
Life cycle management.
Master data
Data Modeling:
Dimensional modeling consists of the following types of schemas designed for Datawarehouse:
o Star Schema.
o Snowflake Schema.
o Gallery Schema.
Rank transformation can return the strings at the top or the bottom of a session sort order.
When the Integration Service runs in Unicode mode, it sorts character data in the session using
the selected sort order associated with the Code Page of IS which may be French, German, etc.
When the Integration Service runs in ASCII mode, it ignores this setting and uses a binary sort
order to sort character data.
The sorter is an active transformation because when it configures output rows, it discards
duplicates from the key and consequently changes the number of rows.
Based on the change in the number of rows, the active transformations are those which change
the number of input and data rows passed to them. While passive transformations remain the
same for any number of input and output rows passed to them.
67. What are the output files created by the Informatica server at runtime?
The output files created by the Informatica server at runtime are listed below:
Informatica Server log: Informatica home directory creates a log for all the error
messages and status.
Session log file: For each session, a session log file stores the data into the log file about
the ongoing initialization process, SQL commands, errors, and more.
Session detail file: It contains load statistics for each target in mapping, including data
about the name of the table, no of rows written or rejected.
Performance detail file: It includes data about session performance.
Reject file: Rows of data not written to targets.
Control file: Information about target flat-file and loading instructions to the external
loader.
Post-session email: Automatically delivers session run data to designated recipients.
Indicator file: It contains a number to indicate whether the row was marked for insert,
delete or reject, and update.
Output file: Informatica server creates a target file based on the details entered in the
session property sheet.
Cache file: It automatically builds, when the Informatica server creates a memory cache.
68. What is the difference between static cache and dynamic cache?
The following are the differences between static cache and dynamic cache:
69. Can you tell what types of groups does router transformation contains?
1. Input group
2. Output group:
1. User-defined groups
2. Default group
70. How do you differentiate stop and abort options in a workflow monitor?
The below table will detail the differences between the stop and abort options in a workflow
monitor:
Stop Abort
The stop option is used for executing the The abort option turns off the task
session task and allows another task to run. completely that is running.
In Informatica, Data-Driven is the property that decides the way the data needs to
perform when mapping includes an Update strategy transformation.
By mentioning DD_INSERT or DD_DELETE or DD_UPDATE in the update strategy
transformation, we can execute data-driven sessions.
Ans. A reusable data object created in the Mapplet designer is called a Mapplet. It includes a
collection of transformations that allows you to reuse transformation logic in different
mappings.
Mapping Mapplet
76. State the differences between SQL override and Lookup override.
The differences between SQL override and Lookup override are listed below:
Limits the no of rows that enter the Limits the no of lookup rows for avoiding table scan
mapping pipeline and saves lookup time
A shared cache is a static lookup cache shared by various lookup transformations in the
mapping. Using a shared cache reduces the amount of time needed to build the cache.
Compatibility between code pages used for getting accurate data movement when the
Informatica Server runs in the Unicode data movement mode. There won't be any data losses if
code pages are identical. One code page can be a superset or subset of another.
Filter transformation in Informatica is an active transformation that changes the number of rows
passed through it. It allows the rows to pass through it based on specified filter conditions and
drops rows that don't meet the requirement. The data can be filtered based on one or more
terms.
Incremental aggregation usually gets created when a session gets created through the
execution of an application. This aggregation allows you to capture changes in the source data
for aggregating calculations in a session. If the source changes incrementally, you can capture
those changes and configure the session to process them. It will allow you to update the target
incrementally, rather than deleting the previous load data and recalculating similar data each
time you run the session.
The update strategy is the active and connected transformation that allows to insert, delete, or
update records in the target table. Also, it restricts the files from not reaching the target table.
Both Informatica and Datastage are powerful ETL tools. Still, the significant difference between
both is Informatica forces you to organize in a step-by-step process. In contrast, Datastage
provides flexibility in dragging and dropping objects based on logic flow.
Informatica Datastage
Dynamic partitioning Static partitioning
Supports flat-file lookups Supports hash files, lookup file sets, etc.
TC_CONTINUE_TRANSACTION
TC_COMMIT_BEFORE
TC_COMMIT_AFTER
TC_ROLLBACK_BEFORE
TC_ROLLBACK_AFTER
In a file each and very column is separated by a delimiter like a comma (,) or a tab or a tilt
sympbol
In SQL it will remove duplicate while using same structure table & union all it will not
remove duplicate
But in Informatica , Union will act as union all , that is Union will not remove duplicate
92. Set max variable in Informatica did you use increament load
94. Tell me some of the Dimension & fact tables in your project
Fact table:
-----------
https://dwbi1.wordpress.com/2019/02/20/transactional-fact-
tables/#:~:text=A%20transactional%20fact%20table%20is,same%20as%20the%20source%
20table.
Dimension:
==========
Account_dim
User_dim
Cust_dim
date_dim
product_dim
region_dim
branch_dim
store_dim
employee_dim
1. Lookup Reusability :
Its easy to connect since it has no pipeline or physical connection like connected lookup
2. Condition lookup :
Out of some million records if some 10000 records not having production description in
that case we can use unconnected lookup to update
IIF(isnull(name),:LKP.lkptrans(ssn),name)
IIF(ISNULL(PROD_Description),:LKP.LKPTRANS(Prod_id),prod_description)
1. Static cache - Everytime the cache will get varnished after the completion
of the session
2. Dynamic cache – while using SCD it will not allow duplicate from source to
target if when we enable Dynamic cache first it will insert in target
thanlookup cache will create for that particular record , if it found same
record again source it will identify & it will not allow the duplicate record
it will sych up with the After enable dynamic cache a port will create new
loolkup row , no nee to go for new record
3. Persistant cache
And then identify the next performance bottleneck until you are satisfied with the session
performance.
Performance bottlenecks can occur in the source and target, the mapping, the session,
and the system.
Source bottlenecks:-
Inefficient query or small database network packet sizes can cause source bottlenecks.
To identify source bottle neck if source is a relational table, put a filter transformation in
the mapping,
Without filter total time = time taken by (source + transformations + target load)
Target bottlenecks:-
If the target is a relational table, then substitute it with a flat file and run the session.
If the time taken now is very much less than the time taken for the session to load to
table,
Mapping bottlenecks:-
A complex mapping logic or a not well written mapping logic can lead to mapping
bottleneck.
With mapping bottleneck, transformation thread runs slower causing the reader thread to
wait
For free blocks and writer thread to wait blocks filled up for writing to target.
Session bottlenecks:-
If you do not have a source, target, or mapping bottleneck, you may have a session
bottleneck.
Small cache size, low buffer memory, and small commit intervals can cause session
bottlenecks.
System bottlenecks
Optimization:
When this option is not enabled the server queries the lookup table on a row-by row
basis.
If your mapping contains multiple lookups that look up on the same lookup table,
It is suggested you share the cache in order to avoid performing caching multiple times.
Whenever multiple conditions are placed, the condition with equality sign should take
precedence.
3.lookup override :
You can reduce the processing time if you use lookup sql override properly in the lookup
transformation.
You can use the lookup sql override to reduce the amount of data that you look up.
The un-cached lookup since the server issues a select statement for each row passing
Into lookup transformation, it is better to index the lookup table on the columns in the
Condition.
5.replace large lookup tables with joins in the source qualifier when possible
You can improve the efficiency by filtering early in the data flow.
Use source qualifier to filter the data. You can also use s
Ource qualifier sql override to filter the records, instead of using filter transformation.
Bring only the required columns from the source to the source qualifier.
Avoid using order by clause inside the source qualifier sql override.
2.use sorted input. The sorted input decreases the use of aggregate caches.
1. Try creating a reusable seq. Generator transformation and use it in multiple mappings
1.it is recommended to assign the table with lesser number of records as master while
using joiner transformation.
2.it is also recommended to perform joining in the source qualifier using sql override as
While this feature is a powerful way to save work and enforce standards,
Standalone command tasktask can be used anywhere in the workflow to run the shell
commands
Session logs contain information about the tasks that the integration service performs
during a session,
15. Select all record from emp table where deptno =10 or 40.
16. Select all record from emp table where deptno=30 and sal>1500.
17. Select all record from emp where job not in SALESMAN or CLERK.
19. Select all records where ename starts with ‘S’ and its lenth is 6 char.
20. Select all records where ename may be any no of character but it should end with
‘R’.
select * from emp where sal> any(select sal from emp where sal<3000);
select * from emp where sal> all(select sal from emp where sal<3000);
25. Select all the employee group by deptno and sal in descending order.
26. How can I create an empty table emp1 with same structure as emp?
28. Select all records where dept no of both emp and dept table matches.
select * from emp where exists(select * from dept where emp.deptno=dept.deptno)
29. If there are two tables emp1 and emp2, and both have common record. How can I
fetch all the recods but common records only once?
(Select * from emp) Union (Select * from emp1)
30. How to fetch only common records from two tables emp and emp1?
(Select * from emp) Intersect (Select * from emp1)
31. How can I retrive all records of emp1 those should not present in emp2?
(Select * from emp) Minus (Select * from emp1)
32. Count the totalsa deptno wise where more than 2 employees exist.
SELECT deptno, sum(sal) As totalsal
FROM emp
GROUP BY deptno
HAVING COUNT(empno) > 2