Informatica Edureka
Informatica Edureka
You can go through this Informatica Interview Questions video lecture where
our expert is discussing the important question that can help you ace your
interview.
Informatica lookups can be cached or un-cached (no cache). Cached lookups can
be either static or dynamic. A lookup cache can also be divided as persistent or
non-persistent based on whether Informatica retains the cache even after
completing session run or if it deletes it.
Static cache
Dynamic cache
Persistent cache
Shared cache
Recache
Aggregator transformation
Expression transformation
Filter transformation
Joiner transformation
Lookup transformation
Normalizer transformation
Rank transformation
Router transformation
Sequence generator transformation
Stored procedure transformation
Sorter transformation
Update strategy transformation
XML source qualifier transformation
Does not change the number of rows that pass through the
transformation
Maintains the transaction boundary
Maintains the row type
Informatica server log: Informatica server (on UNIX) creates a log for all
status and error messages (default name: pm.server.log). It also creates
an error log for error messages. These files will be created
in the Informaticahome directory.
Session log file: Informatica server creates session log files for each
session. It writes information about sessions into log files such as
initialization process, creation of SQL commands for reader and writer
threads, errors encountered and load summary. The amount of detail
in the session log file depends on the tracing level that you set.
Session detail file: This file contains load statistics for each target in
mapping. Session detail includes information such as table name, number
of rows written or rejected. You can view this file by double clicking on the
session in the monitor window.
Performance detail file: This file contains session performance details
which tells you where performance can be improved. To generate this
file, select the performance detail option in the session property sheet.
Reject file: This file contains the rows of data that the writer does not write
to targets.
Control file: Informatica server creates a control file and a target file when
you run a session that uses the external loader. The control file contains
the information about the target flat file such as data format and loading
instructions for the external loader.
Post session email: Post session email allows you to automatically
communicate information about a session run to designated recipients.
You can create two different messages. One if the session completed
successfully and another if the session fails.
Indicator file: If you use the flat file as a target, you can configure the
Informatica server to create an indicator file. For each target row, the
indicator file contains a number to indicate whether the row was marked
for insert, update, delete or reject.
Output file: If a session writes to a target file, the Informatica server
creates the target file based on file properties entered in the session
property sheet.
Cache files: When the Informatica server creates a memory cache, it also
creates cache files. For the following circumstances, Informatica server
creates index and data cache files.
On issuing the STOP command on the session task, the integration service stops
reading data from the source although it continues processing the data to
targets. If the integration service cannot finish processing and committing data,
we can issue the abort command.
If you run the session in the time stamp mode then automatically session log out
will not overwrite the current session log.
Save session log for these runs –> Change the number that you want to save the
number of log files (Default is 0)
If you want to save all of the log files created by every run, and then select the
option Save session log for these runs –> Session TimeStamp
13. What are the similarities and differences between ROUTER and
FILTER?
For E.g.:
Imagine we have 3 departments in source and want to send these records into 3
tables. To achieve this, we require only one Router transformation. In case we
want to get same result with Filter transformation then we require at least 3
Filter transformations.
Similarity:
1. Override the ORDER BY clause. Create the ORDER BY clause with fewer
columns to increase performance. When you override the ORDER BY
clause, you must suppress the generated ORDER BY clause with a
comment notation.
Note: If you use pushdown optimization, you cannot override the ORDER
BY clause or suppress the generated ORDER BY clause with a comment
notation.
2. A lookup table name or column names contains a reserved word. If the
table name or any column name in the lookup query contains a reserved
word, you must ensure that they are enclosed in quotes.
3. Use parameters and variables. Use parameters and variables when you
enter a lookup SQL override. Use any parameter or variable type that you
can define in the parameter file. You can enter a parameter or variable
within the SQL statement, or use a parameter or variable as the SQL
query. For example, you can use a session parameter,
$ParamMyLkpOverride, as the lookup SQL query, and set
$ParamMyLkpOverride to the SQL statement in a parameter file. The
designer cannot expand parameters and variables in the query override
and does not validate it when you use a parameter or variable. The
integration service expands the parameters and variables when you run
the session.
4. A lookup column name contains a slash (/) character. When generating
the default lookup query, the designer and integration service replace any
slash character (/) in the lookup column name with an underscore
character. To query lookup column names containing the slash character,
override the default lookup query, replace the underscore characters with
the slash character, and enclose the column name in double quotes.
5. Add a WHERE clause. Use a lookup SQL override to add a WHERE clause to
the default SQL statement. You might want to use the WHERE clause to
reduce the number of rows included in the cache. When you add a WHERE
clause to a Lookup transformation using a dynamic cache, use a Filter
transformation before the Lookup transformation to pass rows into the
dynamic cache that match the WHERE clause.
Note: The session fails if you include large object ports in a WHERE clause.
6. Other. Use a lookup SQL override if you want to query lookup data from
multiple lookups or if you want to modify the data queried from the
lookup table before the Integration Service caches the lookup rows. For
example, use TO_CHAR to convert dates to strings.
16. What are data driven sessions?
When you configure a session using update strategy, the session property data
driven instructs Informatica server to use the instructions coded in mapping to
flag the rows for insert, update, delete or reject. This is done by
mentioning DD_UPDATE or DD_INSERT or DD_DELETE in the update
strategy transformation.
“Treat source rows as” property in session is set to “Data Driven” by default
when using a update strategy transformation in a mapping.
We can make use of sorter transformation and select distinct option to delete
the duplicate rows.
The source qualifier transformation can be used to perform the following tasks:
Joins: You can join two or more tables from the same source database. By
default, the sources are joined based on the primary key-foreign key
relationships. This can be changed by explicitly specifying the join
condition in the “user-defined join” property.
Filter rows: You can filter the rows from the source database. The
integration service adds a WHERE clause to the default query.
Sorting input: You can sort the source data by specifying the number for
sorted ports. The integration service adds an ORDER BY clause to the
default SQL query
Distinct rows: You can get distinct rows from the source by choosing the
“Select Distinct” property. The integration service adds a SELECT DISTINCT
statement to the default SQL query.
Custom SQL Query: You can write your own SQL query to do calculations.
21. What are the different ways to filter rows using Informatica
transformations?
Source Qualifier
Joiner
Filter
Router
22. What are the different transformations where you can use a
SQL override?
Source Qualifier
Lookup
Target
The Source Qualifier provides the SQL Query option to override the default
query. You can enter any SQL statement supported by your source database.
You might enter your own SELECT statement, or have the database perform
aggregate calculations, or call a stored procedure or stored function to read the
data and perform some tasks.
The role of SQL Override is to limit the number of incoming rows entering
the mapping pipeline, whereas Lookup Override is used to limit the
number of lookup rows to avoid the whole table scan by saving the
lookup time and the cache it uses.
Lookup Override uses the “Order By” clause by default. SQL Override
doesn’t use it and should be manually entered in the query if we require it
SQL Override can provide any kind of ‘join’ by writing the query
Lookup Override provides only Non-Equi joins.
Lookup Override gives only one record even if it finds multiple records for
a single condition
SQL Override doesn’t do that.
If you want to get hands-on learning on Informatica, you can also check out the
tutorial given below. In this tutorial, you will learn about Informatica
Architecture, Domain & Nodes in Informatica, and other related concepts.
After optimizing the session to its fullest, we can further improve performance
by exploiting under utilized hardware power. This refers to parallel processing
and we can achieve this in Informatica Powercenter using Partitioning Sessions.
Database partitioning: The Integration Service queries the database system for
table partition information. It reads partitioned data from the corresponding
nodes in the database.
Key Range Partitioning: With this type of partitioning, you can specify one or
more ports to form a compound partition key for a source or target. The
Integration Service then passes data to each partition depending on the ranges
you specify for each port.
Source Qualifier – use shortcuts, extract only the necessary data, limit
read of columns and rows on source. Try to use the default query options
(User Defined Join, Filter) instead of using SQL Query override which may
impact database resources and make unable to use partitioning and
push-down.
Expressions – use local variables to limit the amount of redundant
calculations, avoid datatype conversions, reduce invoking external scripts
(coding outside of Informatica), provide comments, use operators (||, +, /)
instead of functions. Keep in mind that numeric operations are generally
faster than string operations.
Filter – use the Filter transformation as close to the source as possible. If
multiple filters need to be applied, usually it’s more efficient to replace
them with Router.
Aggregator – use sorted input, also use as early (close to the source) as
possible and filter the data before aggregating.
Joiner – try to join the data in Source Qualifier wherever possible, and
avoid outer joins. It is good practice to use a source with fewer rows, such
as a Master source.
Lookup – relational lookup should only return ports that meet the
condition. Call Unconnected Lookup in expression (IIF). Replace large
lookup tables with joins whenever possible. Review the database objects
and add indexes to database columns when possible. Use Cache
Calculator in session to eliminate paging in lookup cache.
Informatica and IBM InfoSphere DataStage are both popular data integration
and ETL (Extract, Transform, Load) tools used in the field of data management.
They serve similar purposes but have differences in their features, capabilities,
and underlying architectures. Here’s a comparison between the two:
Instructor-led Sessions
Real-life Case Studies
Assignments
Lifetime Access
Explore Curriculum
IBM InfoSphere
Aspect Informatica
DataStage
IBM (International
Vendor Informatica Corporation Business Machines
Corporation)
Comprehensive ETL, data ETL, data
Features and integration, data profiling, data transformation, data
Capabilities quality, data governance, master quality, parallel
data management processing
Scalable with emphasis
Scalability Scalable for large volumes of data
on parallel processing
Visual interface with a
User-friendly drag-and-drop
User Interface steeper learning curve for
interface
business users
Integrates well with other
Wide range of connectors and
Integration IBM products and
integration options
technologies
Deployment On-premises and cloud-based On-premises and cloud
Options options deployment options
Market Widely recognized and used in the Strong market presence,
Presence industry particularly among
IBM InfoSphere
Aspect Informatica
DataStage
IBM (International
Vendor Informatica Corporation Business Machines
Corporation)
Comprehensive ETL, data ETL, data
Features and integration, data profiling, data transformation, data
Capabilities quality, data governance, master quality, parallel
data management processing
Scalable with emphasis
Scalability Scalable for large volumes of data
on parallel processing
existing IBM users
These use cases illustrate the versatility of Informatica across a wide range of
industries and sectors, showcasing its ability to address diverse data integration
and management challenges.
Each repository is its own area where workflows, maps, sessions, and other
related items can be made, saved, and managed. You could make multiple
folders to keep projects, teams, or settings separate. But the number of folders
you can make may depend on how your Informatica system is licensed and how
much space it has.
When choosing how many sources to make, it’s important to think about the
organization’s needs, security needs, and speed. If you work in a setting with
more than one team or have more than one project, making separate
repositories can help keep things organized and easy to handle. For the best
ways to handle a repository, you should look at the official instructions from
Informatica and the rules from your company.
2. Purpose: Post-session shell commands are used for actions that need to
take place after the session runs. This can include tasks like generating
reports, archiving logs, or triggering notifications.
Usage:
– You can use pre-session and post-session shell commands to customize and
extend the functionality of your ETL processes.
– These commands can be written in various scripting languages like Unix Shell
Script, Windows Batch Script, or any language supported by the execution
environment.
Considerations:
– When using shell commands, ensure that the environment and permissions
are set correctly to execute the desired actions.
– Be cautious with these commands, as they run outside the scope of the actual
ETL logic. Incorrect commands can lead to unexpected results or errors.
1. Sort Data: If possible, sort the data before it enters the Aggregator
transformation. Sorting data beforehand improves the efficiency of
grouping and aggregation processes.
2. Minimize Input Rows: Filter out irrelevant rows from the input data using
filters and routers. This reduces the data volume entering the Aggregator.
3. Use Sorted Ports for Grouping: If your data is already sorted, utilize
ports with “sorted” attributes for grouping. This speeds up group
processing.
4. Simplify Expressions: Reduce the use of complex expressions within the
Aggregator transformation, as these can slow down processing.
5. Limit Grouping Ports: Only use necessary grouping ports to avoid
unnecessary memory usage.
6. Aggregator Cache: Utilize the Aggregator cache to store intermediate
results and minimize data processing repetition.
7. Enable Aggregator Sorted Input: If the data isn’t pre-sorted, enable the
“Aggregator Sorted Input” option to sort data within the transformation.
8. Parallel Processing: Leverage partitioning and parallel processing to
distribute workloads across multiple resources, if feasible.
9. Memory Allocation: Adjust memory settings in session properties to
allocate adequate memory for the Aggregator transformation.
10.Source Indexes and Partitioning: Consider using indexes and
partitioning in the source database for efficient data retrieval.
11.Pushdown Optimization: If supported, use pushdown optimization to
perform some aggregation operations directly within the source
database.
12.Persistent Cache: For large datasets, implement persistent cache to store
aggregated data across sessions, minimizing recalculations.
13.Regular Monitoring: Monitor session logs and performance metrics to
identify performance bottlenecks and areas for optimization.
14.Test Different Configurations: Test various configurations to determine
the most effective settings for your specific environment.
15.Aggregate at Source: Whenever possible, perform aggregations at the
source database level before data extraction.
16.Use Aggregator Expressions Wisely: Use built-in aggregate functions
rather than custom expressions whenever feasible.
17.Avoid Unnecessary Ports: Remove any unused input/output ports from
the Aggregator transformation.
18.Properly Define Ports: Ensure that the “precision” and “scale” properties
of output ports are appropriately defined to minimize data type
conversions.
19.Session-Level Recovery: Enable session-level recovery to resume a
session from the point of failure, minimizing reprocessing.
37.How can we update a record in the target table without using Update
Strategy?
To update records in the target table without using the Update Strategy
transformation in Informatica, you can use the “Update” option available in the
Target Designer. This approach is suitable when you want to update existing
records in the target table based on certain conditions. Here’s how you can
achieve this:
– In the “Keys” tab, specify the primary key or unique key columns that will be
used to identify records for updates.
– In the “Update as Update” option, select the columns that you want to update
in the target table.
2. In the Mapping:
– In your mapping, ensure that you are using the same primary key or unique
key columns in the source that match the key columns defined in the target.
– The mapping will pass data to the target table, and during the session run,
Informatica will compare the source data with the target data using the defined
key columns.
– Records with matching keys will be updated in the target table based on the
columns you specified in the “Update as Update” option.
3. In the Session:
– In the “Mapping” tab, make sure the target table is linked to the correct target
definition.
– In the “Properties” tab, ensure the “Target Load Plan” is set to “Normal” or
“Update.”
Mapping Parameters:
Next
1. Dynamic Configuration: Mapping parameters allow you to pass values
dynamically to a mapping during runtime. This is useful when you need to
provide different parameter values for each session run, such as source
file paths, target table names, or database connection information.
2. Reusability: By using mapping parameters, you can create reusable
mappings that can be configured differently for different tasks or
environments without modifying the mapping itself.
3. Change Impact Management: If a source or target changes, you can
update the parameter value instead of modifying the mapping, reducing
the impact of changes.
4. Session Overrides: Mapping parameters can be overridden at the session
level, giving you control to adjust values specifically for each session run.
5. Globalization and Localization: Mapping parameters can be used to
accommodate language or country-specific variations without altering the
mapping structure.
Mapping Variables:
In summary, mapping parameters and mapping variables are essential tools for
making your Informatica mappings adaptable, reusable, and capable of handling
dynamic scenarios. They provide a way to externalize configuration details,
perform calculations, and enable dynamic behavior without altering the
underlying mapping structure.
39.Define the Surrogate Key.
40.Explain sessions and shed light on how batches are used to combine
executions.
Batches in Sessions:
Batches are used in sessions to make sure that big files are processed as quickly
as possible. Instead of handling all the data at once, batches break the data up
into smaller pieces that are easier to handle. This method improves speed,
resource use, and the way memory is handled. Batching is especially helpful
when working with a lot of data to make sure everything runs smoothly and
keep the system from getting too busy.
5. Error Isolation: If errors occur within a batch, they can be contained and
addressed without affecting the entire dataset.
Here are some specific examples of the new features in Informatica Developer
9.1.0:
1. Data profiling: You can now create and manage data profiles in
Informatica Developer. Data profiles can help you to understand the
quality of your data and identify any potential problems.
2. Machine learning: You can now use machine learning to improve the
quality of your data in Informatica Developer. Machine learning can help
you to identify and correct errors in your data, and it can also help you to
identify patterns in your data.
3. Cloud integration: Informatica Developer 9.1.0 can now integrate with
cloud-based services, such as Amazon S3 and Microsoft Azure. This allows
you to store and process data in the cloud, and it also allows you to
connect to cloud-based applications.
If you are looking for a powerful and feature-rich tool for data integration, then
Informatica Developer 9.1.0 is a good choice. It has a number of new features
that can help you to improve the quality of your data and to integrate your data
with cloud-based services.
Informatica and Teradata are two different types of tools within the data
management ecosystem. Informatica is primarily known as an ETL (Extract,
Transform, Load) tool, while Teradata is a data warehousing and analytics
platform. It’s important to understand their roles and how they complement
each other rather than directly comparing them. However, here are some
advantages of using Informatica as an ETL tool:
Advantages of Teradata:
In essence, Informatica and Teradata serve different purposes within the data
management landscape. Informatica is valuable for ETL processes and data
integration, while Teradata excels in data warehousing and advanced analytics.
They often work together in data pipelines, with Informatica handling data
movement and transformation, and Teradata managing the storage and analysis
of data.
– Connect to the repository where your workflows and sessions are managed.
– You’ll find various predefined reports that provide insights into workflow and
session runs, status, performance, etc.
– You’ll find various reports related to the health, performance, and utilization
of the Informatica environment.
Code Page Compatibility refers to the process of converting characters from one
character encoding (code page) to another to enable accurate data transfer
between systems with different character sets. This conversion ensures that
characters are preserved during data integration and transformation, especially
when moving data between languages, regions, or applications.
1. Session Name: A unique name that identifies the session within the
project.
2. Workflow: The workflow associated with the session, which defines the
execution flow, dependencies, and triggers.
3. Mapping: The mapping used to transform the data from source to target.
The mapping specifies how data is manipulated, filtered, and
transformed.
4. Source and Target Connections: The database or file connections used
to connect to the source and target systems.
5. Session Configuration: Settings related to the session execution, such as
database connections, commit intervals, error handling, and target load
types.
6. Source Qualifier Transformation: If the source is a relational database, a
Source Qualifier Transformation is used to extract data from the source
database.
7. Session Properties: Various session-specific properties, including
parameters, variables, and session environment settings.
8. Pre-Session Command: Optional command or script executed before the
session starts. This can include setup tasks or data validation.
9. Post-Session Command: Optional command or script executed after the
session completes. This can include cleanup tasks or notifications.
10.Mapping Parameters and Variables: Parameters and variables used
within the mapping to make it dynamic and flexible.
11.Session Log: Records information about the session’s execution, including
source data statistics, transformation logic, errors, and status.
12.Recovery Strategy: Specifies how the session should handle failures and
how to restart from the point of failure.
13.Performance Optimization: Configurations related to performance, such
as partitioning, sorting, and indexing strategies.
14.Commit and Rollback: Control settings for commit intervals and
transaction boundaries during data loading.
15.Session Schedule: Specifies when and how often the session should run,
either on-demand or according to a defined schedule.
16.Session Parameters: Parameters passed to the session that can affect its
behavior, such as date ranges or filter conditions.
17.Session Status: Indicates whether the session is enabled, disabled, or
inactive.
18.Email Notifications: Options to send email notifications after the session
completes or when specific conditions are met.
19.Session Variables: Variables used in expressions or scripts within the
session, allowing for dynamic behavior.
20.Performance Monitoring: Options for performance monitoring and
logging to identify bottlenecks and optimize the session’s execution.
After a while, data in a table becomes old or redundant. In a scenario where new
data enters the table, re cache ensures that the data is refreshed and updated in
the existing and new cache.
I hope this Informatica Interview questions blog was of some help to you. We
also have another Informatica Interview questions wherein scenario based
questions have been compiled. It tests your hands-on knowledge of working on
Informatica tool. You can go through that Scenario based Informatica Interview
Questions blog by clicking on the hyperlink or by clicking on the button at the
right hand corner.