Informatica Certification MCQ Dumps
Informatica Certification MCQ Dumps
In this tutorial, you will learn how Informatica performs various activities such as data profiling, data
cleansing, transforming, and scheduling the workflows from source to target.
Informatica is used to extracting required data form operation all systems and transforms the same data
on its server and load it to the data warehouse.
Informatica is also introduced as a data integration tool. This tool is based on the ETL architecture. It
provides data integration software and services for different industries, businesses, government
organizations, as well as telecommunication, health care, insurance, and financial services.
It has a unique property to connect, process, and fetch the data from a different type of mixed sources.
Data Extraction
The process of reading and extracting the data from multiple source systems into the Informatica server
is called the data extraction. Informatica data can extract or read different methods such as SQL Server,
Oracle, and many more.
Data Transformation
Data transformation is a process of converting the data into the required format. Data transformation
supports the following activities, such as:
Data Aggregation: It aggregates the data using the aggregate function such as Sum(), min(), max(),
count(), etc.
Data Loading
Data loading is used to inserting the data into a target system. There are two types of data loading, such
as:
1.Initial load or Full load: It is the first step; it adds or inserts the data into an empty target table.
2.Delta load or Incremental load or Daily load: This step takes place after the initial load, and it is
only used to load new records or changed the old records.
informatica is an easy to use ETL tool you drag and drop the different objects and design process flow
diagram which is called mappings for data extraction transformation load.
-To modify, cleaning up the data based on some set of rules, we need the Informatica.
-By using the Informatica, it is accessible to the loading of bulk data from one system to another.
-It provides a broad set of features such as integration of the data from multiple unstructured,
semi-structured or structured systems, operations at row level on data, and scheduling operation of the
data operation.
-It also supports the features of metadata, so it keeps preserved the information of the process and data
operations.
Informatica Architecture
Informatica architecture is service-oriented architecture (SOA). A service-oriented architecture is
defined as a group of services that communicate with each other. It means a simple data transfer during
this communication, or it can be two or more services that coordinate the same activity.
The Informatica development depends upon the component-based development techniques. This
technique uses the predefined components and functional units with their functionalities to get the
result.
2.Integration Service: This service helps in the movement of data from sources to the targets.
6.Workflow Manager: It is used to create workflows or other tasks and their execution.
1.Service Manager: It manages domain operations such as logging, authentication, and authorization.
It runs the application services on the nodes and leads users and groups.
2.Application Services: It represents the server-specific services such as repository services, reporting
services, and integration services. The application service can run on different nodes based on
configuration.
NODE-- it is a sensible study of a machine in a domain, and a domain has different hubs. To run the
application administrations, we can design the hubs, such as mix administration.
PowerCenter Repository
The PowerCenter repository is a relational database such as SQL Server, Oracle, and Sybase. And these
databases are maintained by the repository services. The database tables store the metadata.
Informatica client tools are the three types, such as:
Informatica provides repository services, and it is used to manage the repository. The repository services
exclusively handle one request for one repository. But we can execute it on multiple nodes for better
performance.
We can maintain the different versions of the same objects because of its version control mechanism.
And also ignore multiple users that modifying the same object at the same time.
The objects created in the repository are having this three-state, such as:
Valid: Valid objects have the correct syntax, according to the Informatica. And used for the execution of
the workflow.
Invalid: Invalid objects do not follow the standard or rules. These objects checked the syntax, and
properties are valid or not during the saving of the object in Informatica.
And the repository service maintains the associations from PowerCenter customers to the PowerCenter
vault. It inserts the metadata inside the archive and keeps it refreshed. It is able to keep up consistency
inside the archive metadata.
Domain configuration
In the Informatica ETL tool, the domain is the necessary fundamental administrative control. It is an
apparent entity that provides other different services such as repository service, integration service, and
various nodes.
The Informatica admin console is used for the domain configuration. And the console is launched with
the help of the web browsers.
PowerCenter client tools are installed on the client-side machines. These tools are the development
tools such as workflow manager, PowerCenter designer, repository manager, and workflow monitor.
Informatica repository contains all the created mapping and objects in these client tools, which resides
on the Informatica server. That's why client tools must have network connectivity with the server.
Also, PowerCenter client connects to the sources and targets to import the metadata and structure
definitions. Thus, it also maintains the connectivity to the source or target systems.
--PowerCenter client uses the TCP/IP protocols for connectivity with the integration service and
repository service.
--And PowerCenter client uses the ODBC drivers for the connectivity with the source or targets.
Repository Service
The repository service is a multithreading process. It maintains the connection between the
PowerCenter clients and the PowerCenter repository
The repository service can fetch, insert, and update the metadata inside the repository. And it also
maintains the consistency inside the repository metadata.
Integration Service
The integration service is used as an execution engine in the Informatica. It helps in executing the tasks
which are created in the Informatica. Integration service works in the following manner, such as:
--Then the integration service reads workflow details from the repository.
--The integration service starts the execution of the tasks inside the workflow.
--After the execution, the task status is updated, for example, Succeeded, Failed, or Aborted.
Informatica PowerCenter
Informatica PowerCenter is an ETL tool that is extracting data from its source, transforming this data
according to requirements, and loading this data into a target data warehouse.
We can build enterprise data warehouses with the help of the Informatica PowerCenter. The
Informatica PowerCenter produces the Informatica Crop.
The main components of Informatica PowerCenter are its client tools, server, repository, and repository
server. Both the PowerCenter server and repository server make up the ETL layer, which is used to
complete the ETL processing.
The PowerCenter server executes tasks based on workflow created by workflow managers. The
workflow is monitoring through a workflow monitor. The jobs are designed in a mapping designer inside
the program, which establishes a mapping between source and target.
Mapping is a pictorial representation of the flow of data from source to target. Aggregation, filtering,
and joining are significant examples of transformation.
--B2B exchange.
--Data governance.
--Data migration.
--Data warehousing.
Informatica PowerCenter can manage the broadest range of data integration, which treated as a single
platform. The development of data warehouses and data marts is possible with the help of this ETL tool.
Informatica PowerCenter software meets with the enterprise expectations and requirements for
scalability, security, and collaboration through the following capabilities, such as:
--Dynamic partitioning
--Metadata management
--Data masking
The Informatica ETL or Informatica PowerCenter product consists of three significant applications,
such as:
1.Informatica PowerCenter Client Tools: These tools are designed to enable a developer:
2.Informatica PowerCenter Repository: It is a center of Informatica tools where all data is stored, which
is related to the mapping, sources, or targets.
3.Informatica PowerCenter Server: It is a server where all the actions are executed. To fetch the data, it
connects with the sources and targets. Then, apply for all the transformations and load the data into
target systems.
Mapping in Informatica
Mapping is a collection of source and target objects which is tied up together through a set of
transformations. These transformations are formed with a set of rules that define how the data is
loaded into the targets and flow of the data.
Source definition: The source definition defines the structure and characteristics of the source, such as
basic data types, type of the data source, and more.
Transformation: It defines how the source data is changed, and various functions can be applied
during this process.
Target Definition: The target definition defines where the data will be loaded finally.
Links: Link is used to connecting the source definition with target tables and different transformations.
And it shows the flow of data between the source and target.
Mapping can define the data transformation details and source or target object characteristics because
it is a primary object in the Informatica.
Mappings define the data transformation for each row at the individual column levels. And we can hold
multiple sources and targets in a single mapping.
Components of Mapping
Here are some essential elements used in mapping, such as:
Source tables
Target objects
Mapping transformations
A mapping contains sources, targets, mapping parameters, variables, multiple changes, mapplets, and
user-defined functions.
Mapping Source: Mapping sources are those objects whose allow fetching the source data. It can be a
flat file, database table, COBOL file source, or XML source.
Mapping target: The mapping target is the destination objects where the final data is loaded. A mapping
target can be a relational table of a database, XML file, or a flat-file. Sources and targets must be present
in any mapping with their different data types.
Mapping Parameters and Variables: It is an optional user-defined data types. The Mapping parameters
and variables are used to create temporary variable objects. It helps us to define and store temporary
values during the processing of mapping data. This user-defined data type is designed for mapping.
Mapplets: The mapplets are objects which consist of a set of source, transformation, or targets. With
the help of Mapplets, we can reuse the existing functionality of a set of changes.
For Example, if we have a "Student" table, and we want to create an identical table, "Student_stage" in
ETL schema.
In Stage Mappings,
For example, if the source table contains student details of rollno 1, 2, 3, and 10. The stage table is a
kind of table which has student records of rollno 1 & 3 only.
--In the data warehouse, we need to create stage tables to make the process of data transformation
efficiently, to minimize the dependency of ETL or Data Warehouse from the real-time operating system.
It could happen when we are fetching the relevant data only.
Here are some fundamental differences between the mapping parameters and mapping variables, such
as:
Informatica Cloud
Informatica Cloud is a data integration solution and platform that works Software as a Service (SaaS).
Informatica Cloud can connect to on-premises, cloud-based applications, databases, flat files, file feeds,
and even social networking sites
If we are using a cloud data warehouse such as AWS Redshift, Azure SQL Data Warehouse, or Snowflake,
then Informatica Cloud Data Integration solutions improve the overall performance, productivity, and
extensive connectivity to cloud and on-premises sources.
Secure Agent
In Informatica Cloud architecture, the Secure Agent is a lightweight program. And it is used to connect
on-premise data with cloud applications. It is installed on a local machine and processes data locally and
securely under the enterprise firewall. All upgrades and updates are automatically sent and installed on
the Secure Agent regularly.
Connectors
The connectors are the second central part of the Informatica Cloud architecture. These are pre-built
integration solutions for data, devices, and connecting applications. Connectors are used to connectivity
with Cloud applications (Salesforce, Workday, Amazon Redshift) and on-premise applications (SQL
Server, Dynamics CRM, Teradata)
1.Informatica Cloud: A browser-based application that runs at the Informatica Cloud hosting facility. It
allows us to configure connections, create users, run, schedule, and monitor tasks.
2.Informatica Cloud hosting facility: It is a platform to run the Informatica Cloud application. The
hosting facility is used to stores all jobs and information of an organization. It is stored in the
PowerCenter repository.
3.Informatica Cloud Application: Applications can use to perform tasks, such as data synchronization,
contact validation, and data replication.
4.Informatica Cloud Secure Agent: An Informatica Cloud is installed on a local machine that runs all
tasks and provides firewall security between the hosting facility and the organization. When the Secure
Agent runs a job, then it connects with the Informatica Cloud hosting facility to access task information.
It can connect directly and securely to sources and targets. It also can transfers data between sources
and targets and perform any additional task based on the requirements.
Connectivity
The below image shows the connectivity of the Informatica cloud, such as:
Use Cases
The Informatica cloud has the following use cases, such as:
1. Data Synchronization
3. Data Replication
4. Mappings
--Processes to extract transform and load data from multiple sources to multiple targets.
5. Monitoring
--The activity log is used for all successful and unsuccessful jobs.
--The dashboard summary is used for tasks, availability, and data processing.
Development
In Informatica Cloud, Development is divided into Data Synchronization Tasks and Mappings.
Data Synchronization tasks are ETL based tasks. We can load the data from a source to a target with the
help of data synchronization and provide some transformation during transfer.
Mappings are flows that allow for chaining multiple complex operations such as joins, filters, and
functions together to build a complex integration process.
Reporting
In Informatica Cloud, Reporting is divided into Activity Log and Activity Monitoring.
The Activity Log is responsible for generating a summary report of all successful and unsuccessful
executed jobs.
And Activity Monitoring is responsible for creating a list of all currently running jobs. Each job is listed
along with starting date or time, as well as rows processed.
Each process is also provided with information such as date, time, status, success rows, error rows, and
error message. The user can drill down into each job and download a session log that contains execution
information details at the logging level.
Informatica MDM
MDM stands for Master data management. It is a method of managing the organization
data as a single coherent system. MDM is used to ensure the reliability of data, and this
data is in various formats that collect from different data sources. And it is responsible
for data analytics decision making, AI training, data initiatives, and digital
transformation.
Master data management can link all critical data with the master file. MDM is
responsible for sharing the data across the enterprise after well implemented. MDM is
used as an effective strategy for data integration.
o Data aggregation
o Data classification
o Data collection
o Data consolidation
o Data distribution
o Data enrichment
o Data governance
o Data mapping
o Data matching
o Data normalization
Master data management is creating a clear and strategic flow between all data sources
and the various destination systems.
Benefits of MDM
A clear and coherent data management is needed for a competitive business strategy.
o Control: Know where the data is, where it’s headed, and how secure it is.
o Data accuracy: Understand how closely our metrics track follows our factors.
o Data consistency: Understand how closely our data flow tracks the underlying patterns.
Key Features
Some key features of MDM are listed below, such as:
o It supports a 360-degree view between the customers, products, suppliers, and other
entities ' relationships.
o Data as a service.
Need of MDM
The MDM solutions are involved in the broad range of transformation, data cleansing,
and integration practices. When data sources are added to the system, then MDM
initiates processes to identify, collect, transform, and repair the data.
When the data meets the quality thresholds, then we can maintain a high-quality master
reference with the help of created schemas and taxonomies. By using MDM, the
organizations feel relaxed about the accuracy, up-to-date, and consistent of the data all
over the enterprise.
Use Cases
Achieving consistency, control, and data accuracy are important because organizations
become dependent on data for all necessary operations. After effective execution,
Master data management helps organizations:
MDM Challenges
Master data management is required to remove poor data quality from the enterprise.
For example, in a company, several customer records are stored in different formats in
different systems.
The organizations may face some delivery challenges such as unknown prospects,
overstock or understock products, and many other problems. Common data quality
challenges that include:
o Duplicate records
o Erroneous information
o Incomplete information
o Inconsistent records
o Mislabeled data
Causes
Here are some reasons for poor data quality, such as:
o Varied field structures in different applications that define a particular format of data to
be entered such as John Smith or J. Smith
Trends in Master Data Management
In 2018, many organizations tied up with the EU's General Data Protection Regulation
(GDPR), which restricts the Personally Identifiable Information (PII) use. It also controls
the use of that information at the end of end-users.
On January 1, 2020, the California Consumer Privacy Act was slated to take effect even if
the content could evolve based on the November 2018 election. But this Act may be
replaced by a federal equivalent.
Many countries and jurisdictions are creating privacy laws. These laws impact companies
or doing business in those locations. The result of the increased survey is dependent on
master data management solutions.
o To perform analytics of the data in multiple data sources inside and outside of the
organization.
Metadata management is always important. But nowadays, it's becoming even more
important because organizations are extending out to IIoT, IoT, and third-party data
sources with increased the amount of data continues.
The master data management architectural elements and tools include the following:
o Data federation
o Data integration
o Data marts
o Data networks
o Data mining
o Data virtualization
o Data visualization
o Data warehouse
o Databases
o File systems
o Operational datastore
The MDM architectures become complex and unwieldy when a business adds more and
different types of MDM capabilities. Some vendors provide comprehensive solutions to
simplify the complexity and increase market share. It replaces the individual point
solutions.
Due to businesses transition from periodic business intelligence (BI) reports, MDM is
growing continuously. Master data management is also important because
organizations adopt and build AI-powered systems. An organization will be used some
data as training data for machine learning purposes.
The master data management and data management become so important because
most organizations are hiring a Chief Data Officer (CDO), a Chief Analytics Officer (CAO),
or both.
When it executed adequately, then the master data management allows companies to:
o Integrate the disparate data from various data sources into a single hub so it can
be replicated to other destinations.
History
The Ab Intio multinational Software Company invented the ETL tool. This company is
located outside of Lexington, Massachusetts. The United States framed GUI Based
parallel processing software that is called ETL.
Implementation of ETL Tool
1. Extract
The data is extracted from different sources of data. The relational databases, flat files,
and XML, Information Management System (IMS), or other data structures are including
in the standard data-source formats.
Instant data validation is used to confirm whether the pulled data from the sources have
the correct values in a given domain.
2. Transform
To prepare and to load into a target data source, we applied a set of rules and logical
functions on the extracted data. The cleaning of data means passing the correct data
into the target source.
According to the business requirements, we can apply many transformation types in the
data. Some transformation types are Key-based, column or row-based, coded and
calculated values, joining different data sources, and many more.
3. Load
In this phase, we load the data into the target data source.
All three phases do not wait for each other for starting or ending. All three-phase are
parallelly executed.
2. To set up a Data Warehouse in an organization, the data need to move from the
Production to Warehouse.
1. Parallel Processing
o The pipeline allows running several components simultaneously on the same data.
Each data row is provided with a row_id, and a piece of the process is supplied with a
run_id so that one can track the data by these ids. To complete certain phases of the
process as we create checkpoints. These checkpoints tell the need to re-run the query
for task completion.
3. Visual ETL
The PowerCenter and Metadata Messenger are advanced ETL tools. These tools help to
make faster, automated, and impactful structured data according to the business
requirements.
We can create a database and metadata modules with a drag and drop mechanism as a
solution. It can automatically configure, connect, extract, transfer, and loads the data
into the target system.
Characteristics of ETL Tool
Some attributes of the ETL tool are as follows:
3. It should support CSV extension data files then the end-users can import these files
easily or without any coding.
4. It should have a user-friendly GUI so that the end-users easily integrate the data with the
visual mapper.
5. It should allow the end-user to customize the data modules according to the business
requirements.
ETL takes the heterogeneous data and makes it homogeneous. The analysis of different
data and derive business intelligence is impossible without ETL.
4. It is self- Owned.
There is a risk of complete crashing of the systems, and it tells how good the data
recovery systems are built. Any misuse of simple data may create a massive loss in the
organization.
Informatica Transformations
Informatica Transformations are repository objects which can create, read, modifies, or
passes data to the defined target structures such as tables, files, or any other targets.
A Transformation is used to represent a set of rules, which define the data flow and how
the data is loaded into the targets.
In transformations, To passing the data, we need to connect the ports to it, and through
the output ports, it returns the output.
Classification of Transformation
Transformation is classified into two categories-the first one based on connectivity and
second based on the change in several rows. First, we will look at the transformation
based on connectivity.
o Connected Transformations
o Unconnected Transformations
For example, Source qualifier transformation of Source table Stud is connected to filter
transformation to filter students of a class.
Their functionality is used by calling them inside other transformations. And these
transformations are not part of the pipeline.
The connected transformations are preferred when the transformation is called for every
input row or expected to return a value.
o Active Transformations
o Passive Transformations
Active Transformations are those who modify the data rows, and the number of input
rows passed to them. For example, if a transformation receives 10 numbers of rows as
input, and it returns 15 numbers of rows as an output, then it is an active
transformation. In the active transformation, the data is modified in the row.
In the passive transformation, we cannot create new rows, and no existing rows
dropped.
o Router Transformation
o Joiner transformation
o Rank Transformation
o Normalizer Transformation
o External Transformation
o Expression Transformation
For example, for loading the student records having rollno equal to 20 only, we can put
filter transformation in the mapping with the filter condition rollno=20. So only those
records which have rollno =20 will be passed by filter transformation, rest other records
will be dropped.
Step 4: The filter transformation will be created, click on the Done button in the creative
transformation window.
1. Drag and drop all the source qualifier columns from the filter transformation.
Step 6: Double click on the filter transformation to open its properties, and
Step 7: Then,
Now save the created mapping and execute this after creating session and workflow. In
the target table, only the rollno = 20 will be loaded from the record.
The source qualifier transformation converts the source data types in the Informatica
native data types. That's why there is no need to alter the data types of the ports.
The source qualifier transformation does the following tasks, such as:
o Joins: We can join two or more tables from the same source database. By default, the
sources are merged based on the primary key and foreign key relationships. This can be
changed by specifying the join condition in the "user-defined join" property.
o Filter rows: We can filter the rows from the source database. In the default query, the
integration service adds a WHERE clause.
o Sorting input: We can sort the source data by specifying the number for sorted ports. In
the default SQL query, the Integration Service adds an ORDER BY clause.
o Distinct rows: We can get separate rows from the source by choosing the "Select
Distinct" property. In the default SQL query, the Integration Service adds a SELECT
DISTINCT statement.
Option Description
Step 2: Double click on the Source Qualifier transformation "SQ_STUD". It will open the
edit transformation property window for it. Then
2. Click on the SQLQuery Modify option, and this will open an SQL editor window.
2. Under the ports tab, you will see all the ports. Keep only the ports STUDNO, SNAME,
CLASS, SEC and delete other ports
Now, again click on the properties tab in the Edit Transformations window, and we will
see only selected data. After clicking the "OK" button it will open SQL Editor Window,
and
1. It will confirm the data you have chosen are correct and ready for loading into the target
table.
Save the mapping (using ctrl+s) and execute the workflow. After execution, only the
selected columns will be loaded into the target.
In this way, we can override in source qualifier what columns need to be fetched from
the source & this is the only way to replace what specific columns will be brought inside
the mapping.
Aggregator Transformation
Aggregator transformation is an active transformation. And it is used to perform
calculations on the data such as sums, averages, counts, etc.
The integration service stores the group of data and row data in the aggregate cache.
The Aggregator Transformation is more beneficial in comparison to the SQL. We can use
conditional clauses to filter rows.
o Aggregate Expression
o Group by port
o Sorted Input
o Aggregate cache
o Unsorted Input
Aggregate Expression
Aggregate functions are used to drive the aggregate expression, which can be
developed either in variable ports or output ports only.
Sorted input
Group by ports are sorted using a sorted transformation and receive the sorted data as
an input to improve the performance of data aggregation.
Aggregate Cache
Unsorted inputs
The aggregate cache contains group by ports, non-group by input ports, and output
port, which provides aggregate expression.
Aggregate Expressions
This transformation offers more functionality to the comparison of SQL's group by
statements. Because one can apply conditional logic to groups within the aggregator
transformation. Many different aggregate functions can be used to individual output
ports within the transformation. Below is the list of these aggregate functions, such as:
o AVG
o COUNT
o FIRST
o LAST
o MAX
o MEDIAN
o MIN
o PERCENTILE
o STDDEV
o SUM
o VARIANCE
Step 1: Go to the Mapping Designer, click on transformation in the toolbar and create.
Step 2: Select the Aggregator transformation, enter the name, and click create.
To create ports, we can either drag the ports to the aggregator transformation or create
in the ports tab of the aggregator.
Configuring the Aggregator Transformation
We can configure the following components in aggregator transformation in the
Informatica.
Aggregate Cache:The integration service stores the group values in the index cache
and row data in the data cache.
1. Aggregate Expression:We can enter expressions in the output port or variable port.
2. Group by Port: This option tells the integration service on how to create groups. We can
configure input, output, or variable ports for the group.
3. Sorted Input: This option is used to improve session performance. This option will apply
only when the input to the aggregator transformation in sorted on the group by ports.
But we cannot use both single-level and nested aggregate functions in an aggregator
transformation Informatica. The mapping designer marks the mapping as invalid if an
aggregator transformation contains both single-level and nested aggregate functions. If
we want to create both single-level and nested aggregate functions, create separate
aggregate transformations.
Router Transformation
Router transformation is an active and connected transformation, and it is similar to the
filter transformation, which is used to filter the source data.
If we need to check the same input data based on multiple conditions, then we use a
Router transformation in a mapping instead of creating multiple Filter transformations.
The following table compares the Router transformation to the Filter transformation:
For example, when filtering the data form rollno =20, we can also get those records
where rollno is not equal to 20. So, router transformation gives multiple output groups,
and each output group can have its filter condition.
Also, there is a default group, and this default group has record sets that don't satisfy
any of the group conditions.
For example, if we have created two groups for the filter conditions rollno=20 &
rollno=30 respectively, then those records which are not having rollno 20 and 30 will be
passed into this default group.
The data which is rejected by the filter groups will be collected by this default group,
and sometimes there can be a requirement to store these rejected data. In this way, the
default output group can be useful.
To allow multiple filter conditions, the router transformation provides a group option.
o There is also a default output group that contains all those data which are not passed by
any filter condition.
o For every filter condition, an output group is created in router transformation. We can
connect different targets to these different groups.
Step 4: The router transformation will be created in the mapping, select done option in
the window.
Step 5: Drag and drop all the columns from Source qualifier to router transformation.
Step 6: Double click on the router transformation, then in the transformation property
of it
Step 9: Connect the ports from the group rollno_30 of router transformation to target
table ports.
Now, when we execute this mapping, the filtered records will get loaded into the target
table.
Joiner Transformation
Joiner transformation is an active and connected transformation. It provides the option
of creating joins in the Informatica. By using the joiner transformation, the created joins
are similar to the joins in databases.
The joiner transformation is used to join two heterogeneous sources. The joiner
transformation joins sources on the basis of a condition that matches one or more pairs
of columns between the two sources.
The two input pipelines include a master and a detail pipeline. We need to join the
output of the joiner transformation with another source to join more than two sources.
And to join n number of sources in mapping, we need n-1 joiner transformations.
In joiner transformation, there are two sources which we are using for joins, such as:
o Master Source
o Detail Source
In the properties of joiner transformation, we can select which data source can be a
Master source and which source can be a detail source.
During execution, the master source is cached into the memory for joining purpose. So
it is necessary to select the source with less number of records as the master source.
o Case-Sensitive String Comparison: The integration service uses this option when we
are performing joins on string columns. By default, the case sensitive string comparison
option is checked.
o Cache Directory: Directory used to cache the master or detail rows. The default
directory path is $PMCacheDir. We can override this value as well.
o Join Type: The type of join to be performed as Master Outer Join, Detail Outer Join,
Normal Join, or Full Outer Join.
o Tracing Level: It is used to track the Level of tracing in the session log file.
o Joiner Data Cache Size: It tells the size of the data cache. And Auto is the default value
of the data cache size.
o Joiner Index Cache Size: It tells the size of the index cache. And Auto is the default
value of the index cache size.
o Sorted Input: This option is used when the input data is in sorted order. And it gives
better performance.
o Master Sort Order: It gives the sort order of the master source data. If the master
source data is sorted in ascending order, then we choose Ascending. We have to enable
the Sorted Input option if we choose Ascending. And Auto is the default value for this
option.
o Transformation Scope: We can select the transformation scope as All Input or Row.
Types of Joins
In Informatica, the following joins can be created using joiner transformation, such as:
4. Normal join
In normal join, only matching rows are returned from both the sources.
Example
In the following example, we will join emp and dept tables using joiner transformation in
the following steps:
Step 1: Create a new target table EMP_DEPTNAME in the database using the below
script and import the table in Informatica targets.
Step 2: Create a new mapping and import source tables "EMP" and "DEPT" and target
table, which we created in the previous step.
Step 4: Drag and drop all the columns from both the source qualifiers to the joiner
transformation.
Step 5: Double click on the joiner transformation, then in the edit transformation
window:
For performance optimization, we assign the master source to the source table pipeline,
which is having less number of records. To perform this task:
Step 7: Double click on the joiner transformation to open the edit properties window,
and then
Step 8: Link the relevant columns from the joiner transformation to the target table.
Now save the mapping and execute it after creating a session and workflow for it. The
join will be created using Informatica joiner, and relevant details will be fetched from
both the tables.
Sorted Input
When both the Master and detail source are sorted on the ports specified in the join
condition, then use the sorted input option in the joiner properties tab.
We can improve the performance by using the sorted input option as the integration
service performs the join by minimizing the number of disk IOs. It gives excellent
performance when we are working with large data sets.
here are some steps to configuring the sorted input option, such as:
o Sort the master and detail source either by using the source qualifier transformation or
sorter transformation.
o Sort both the source on the ports to be used in join conditions either in ascending or
descending order.
o Specify the Sorted Input option in the joiner transformation properties tab.
Blocking Transformation
The joiner Transformation is called as the blocking transformation. The integration
service blocks and unblocks the source data depending on whether the joiner
transformation is configured for sorted input or not.
Unsorted Joiner Transformation
In the case of unsorted joiner transformation, the integration service first reads all the
master rows before it reads the detail rows.
The integration service blocks the detail source while it caches all the master rows. Once
it reads all the master rows, then it unblocks the detail source and understands the
details rows.
Sorted Joiner Transformation
The blocking logic may or may not possible in case of sorted joiner transformation. The
integration service uses blocking logic if it can do so without blocking all sources in the
target load order group. Otherwise, it does not use blocking logic.
o We can improve the session performance by configuring the Sorted Input option in the
joiner transformation properties tab.
o Specify the source with fewer rows and with fewer duplicate keys as the Master and the
other source as detail.
o We cannot use joiner transformation when the input pipeline contains an update
strategy transformation.
Rank Transformation
Rank is an active and connected transformation that performs the filtering of data based
on the group and ranks. The rank transformation also provides the feature to do ranking
based on groups.
The rank transformation has an output port, and it is used to assign a rank to the rows.
In Informatica, it is used to select a bottom or top range of data. While string value ports
can be ranked, the Informatica Rank Transformation is used to rank numeric port values.
One might think MAX and MIN functions can accomplish this same task.
However, the rank transformation allows groups of records to be listed instead of a
single value or record. The rank transformation is created with the following types of
ports.
Rank Port
The port which is participated in a rank calculation is known as Rank port.
Variable Port
A port that allows us to develop expression to store the data temporarily for rank
calculation is known as a variable port.
The variable port will enable us to write expressions that are required for rank
calculation.
o Cache Directory: The directory is a space where the integration service creates the index
and data cache files.
o Top/Bottom: It specifies whether we want to select the top or bottom rank of data.
o Case-Sensitive String Comparison: It is used to sort the strings by using the case
sensitive.
o Tracing Level: The amount of logging to be tracked in the session log file.
o Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. We can set a
numeric value or Auto for the data cache size. In the case of Auto, the Integration Service
determines the cache size at runtime.
o Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. We can set
a numeric value or Auto for the index cache size. In the case of Auto, the Integration
Service determines the cache size at runtime.
After the Rank transformation identifies all rows that belong to a top or bottom rank, it
then assigns rank index values. If two rank values match, they receive the same value in
the rank index, and the transformation skips the next value.
The rank index is an output port only. We can pass the rank index to another
transformation in the mapping or directly to a target.
Defining Groups
The Rank transformation gives us group information like the aggregator transformation.
For example: If we want to select the 20 most expensive items by manufacturer, we
would first define a group for each manufacturer.
Example
Suppose we want to load top 5 salaried employees for each department; we will
implement this using rank transformation in the following steps, such as:
Step 4: The rank transformation will be created in the mapping, select the done button
in the window.
Step 5: Connect all the ports from source qualifier to the rank transformation.
Step 6: Double click on the rank transformation, and it will open the "edit
transformation window". In this window,
Now, save the mapping and execute it after creating session and workflow. The source
qualifier will fetch all the records, but the rank transformation will pass only records
having three high salaries for each department.
The Sequence Generator transformation is used to create unique primary key values and
replace missing primary keys.
For example, if we want to assign sequence values to the source records, then we need
to use a sequence generator.
The sequence generator transformation consists of two output ports. We cannot edit or
delete these ports, such as:
1. CURRVAL
2. NEXTVAL
NEXTVAL
The NEXTVAL port is used to generate sequence numbers by connecting it to a
Transformation or target. The generated sequence numbers are based on the Current
Value and Increment By properties.
If the sequence generator is not configuring to Cycle, then the NEXTVAL port makes the
sequence numbers up to the set End Value.
The sequence generator transformation creates a block of numbers at the same time. If
the block of numbers is used, then it generates the next block of sequence numbers.
For example, we might connect NEXTVAL to two target tables in mapping to create
unique primary key values.
The integration service generates a block of numbers 1 to 10 for the first target. When
the first block of numbers has been loaded, only then another block of numbers 11 to
20 will be generated for the second target.
CURRVAL
The CURRVAL port is NEXTVAL plus the Increment By value.
We only connect the CURRVAL port when the NEXTVAL port is already linked to a
downstream transformation.
If we combine the CURRVAL port without connecting the NEXTVAL port, the Integration
Service passes a constant value for each row.
When we combine the CURRVAL port in a Sequence Generator Transformation, then the
Integration Service processes one row in each block.
Property Description
Example
In the below example, we will generate sequence numbers and store in the target in the
following steps, such as:
Step 3: Create a new mapping and import STUD source and STUD_SEQUENCE target
table.
Step 6: Link the NEXTVAL column of sequence generator to the SNO column in the
target table.
Step 7: Link the other columns from source qualifier transformation to the target table.
Step 8: Double click on the sequence generator to open the property window, and then
2. Enter the properties with Start value =1 and leave the other properties as default.
3. Click on the OK
Now save the mapping and execute it after creating the session and workflow.
The SNO column in the target would contain the sequence numbers generated by the
sequence generator transformation.
A transaction is the set of rows bound by commit or rollback rows. We can define a
transaction based on the varying number of input rows. We can also identify
transactions based on a group of rows ordered on a common key, such as employee ID
or order entry date.
When processing a high volume of data, there can be a situation to commit the data to
the target. If a commit is performed too quickly, then it will be an overhead to the
system.
If a commit is performed too late, then in the case of failure, there are chances of losing
the data. So the Transaction control transformation provides flexibility.
o Within a session: We configure a session for the user-defined commit. If the Integration
Service fails to transform or write any row to the target, then We can choose to commit
or rollback a transaction.
When we run the session, then the Integration Service evaluates the expression for each
row that enters the transformation. When it evaluates a committed row, then it commits
all rows in the transaction to the target or targets. When the Integration Service
evaluates a rollback row, then it rolls back all rows in the transaction from the target or
targets.
If the mapping has a flat-file as the target, then the integration service can generate an
output file for a new transaction each time. We can dynamically name the target flat
files. Here is the example of creating flat files dynamically - Dynamic flat-file creation.
1. TC_CONTINUE_TRANSACTION
The Integration Service does not perform any transaction change for the row. This is the
default value of the expression.
2. TC_COMMIT_BEFORE
The Integration Service commits the transaction, begins a new transaction, and writes the
current row to the target. The current row is in the new transaction.
In tc_commit_before, when this flag is found set, then a commit is performed before the
processing of the current row.
3. TC_COMMIT_AFTER
The Integration Service writes the current row to the target, commits the transaction, and
begins a new transaction. The current row is in the committed transaction.
In tc_commit_after, the current row is processed then a commit is performed.
4. TC_ROLLBACK_BEFORE
The Integration Service rolls back the current transaction, begins a new transaction, and
writes the current row to the target. The current row is in the new transaction.
In tc_rollback_before, rollback is performed first, and then data is processed to write.
5. TC_ROLLBACK_AFTER
The Integration Service writes the current row to the target, rollback the transaction, and
begins a new transaction. The current row is in the rolled-back transaction.
In tc_rollback_after data is processed, then the rollback is performed.
Step 2: Click on transformation in the toolbar, and click on the Create button.
Step 3: Select the transaction control transformation.
Step 4: Then, enter the name and click on the Create button.
Step 6: We can drag the ports into the transaction control transformation, or we can
create the ports manually in the ports tab.
Step 8: And enter the transaction control expression in the Transaction Control
Condition.
3. Properties Tab: It can define the transaction control expression and tracing level.
The transaction control expression uses the IIF function to check each row against the
condition.
Syntax
Here is the following syntax for the Transaction Control transformation expression, such
as:
1. IIF (condition, value1, value2)
For example:
1. IIF (dept_id=11, TC_COMMIT_BEFORE,TC_ROLLBACK_BEFORE)
Example
In the following example, we will commit data to the target when dept no =10, and this
condition is found true.
Step 1: Create a mapping with EMP as a source and EMP_TARGET as the target.
Step 3: The transaction control transformation will be created, then click on the done
button.
Step 4: Drag and drop all the columns from source qualifier to the transaction control
transformation then link all the columns from transaction control transformation to the
target table.
Step 5: Double click on the transaction control transformation and then in the edit
property window:
1. "iif(deptno=10,tc_commit_before,tc_continue_transaction)".
3. It means if deptno 10 is found, then commit transaction in target, else continue the
current processing.
Now save the mapping and execute it after creating sessions and workflows. When the
department number 10 is found in the data, then this mapping will commit the data to
the target.
It is a kind of join operation in which one of the joining tables is the source data, and the
other joining table is the lookup table.
The Lookup transformation is used to retrieve data based on a specified lookup
condition. For example, we can use a Lookup transformation to retrieve values from a
database table for codes used in source data.
When a mapping task includes a Lookup transformation, then the task queries the
lookup source based on the lookup fields and a lookup condition. The Lookup
transformation returns the result of the lookup to the target or another transformation.
We can configure the Lookup transformation to return a single row or multiple rows.
This is the passive transformation which allows performing the lookup on the flat files,
relational table, views, and synonyms.
When we configure the Lookup transformation to return multiple rows, the Lookup
transformation is an active transformation. The lookup transformation supports
horizontal merging, such as equijoin and nonequijoin.
When the mapping contains the work up transformation, the integration service queries
the lock up data and compares it with lookup input port values.
The lookup transformation is created with the following type of ports, such as:
o Get a related value: Retrieve a value from the lookup table on the basis of a value in the
source. For example, the source has a student rollno. Retrieve the student name from the
lookup table.
o Get multiple values: Retrieve the multiple rows from a lookup table. For example, return
all students in a class.
o Perform a calculation: Retrieve any value from a lookup table and use it in a calculation.
For example, retrieve the marks, calculate the percentage, and return the percentage to a
target.
o Update slowly changing dimension tables: Determine the rows that exist in the target.
Configure the Lookup Transformation
Configure the Lookup transformation to perform the different types of lookups, such as:
o Relational or flat-file lookup: Perform a lookup on a flat file or a relational table. When
we create a Lookup transformation by using a relational table as the lookup source, we
can connect to the lookup source using ODBC and import the table definition as the
structure for the Lookup transformation.
When we create a Lookup transformation by using a flat-file as a lookup source, the
Designer invokes the Flat-file Wizard.
o Pipeline lookup: Perform a lookup on the application sources such as JMS or MSMQ.
Drag the source into the mapping and associate the Lookup transformation with the
source qualifier. When the Integration Service retrieves source data for the lookup cache
then configure the partitions to improve performance.
o Cached or uncached lookup: Cache the lookup source to improve performance. We can
use static or dynamic cache for caching the lookup source.
By default, the lookup cache remains static and does not change during the session.
With a dynamic cache, the Integration Service inserts or updates rows in the cache.
When we cache the target table as the lookup source, we can look up values in the cache
to determine if the values exist in the target. The Lookup transformation marks rows to
insert or update the target.
Normalizer Transformation
The Normalizer is an active transformation. It is used to convert a single row into
multiple rows. When the Normalizer transformation receives a row that contains
multiple-occurring data, it returns a row for each instance of the multiple-occurring
data.
If in a single row, there is repeating data in multiple columns, then it can be split into
multiple rows. Sometimes we have data in multiple occurring columns.
For example, a relational source includes four fields with flat sales data. We can
configure a Normalizer transformation to generate a separate output row for each flat.
When the Normalizer returns multiple rows from an incoming row, it returns duplicate
data for single-occurring incoming columns.
Here are the following properties of Normalizer transformation in the Properties panel,
such as:
o Normalized Fields Tab: Define the multiple-occurring fields and specify additional fields
that you want to use in the mapping.
o Field Mapping Tab: Connect the incoming fields to the normalized fields.
Example
We create the following table that represents the student marks records of different
classes, such as:
Step 1: Create the source table "stud_source" and target table "stud_target" using the
script and import them in Informatica.
Joy 60 65 75 80
Edward 65 70 80 90
Step 2: Create a mapping having source stud_source and target table stud_target.
Step 4: The transformation will be created, then click on the Done button.
5. Click on the OK
Columns will be generated in the transformation. We will see 4 number of marks column
as we set the number of occurrences to 4.
1. Link the four-column of source qualifier of the four class to the normalizer columns,
respectively.
3. Link student_name & marks columns from normalizer to the target table.
Save the mapping and execute it after creating session and workflow. The class score
column is repeating in four columns. For each class score of the student, a separate row
will be created by using the Normalizer transformation.
The output of the above mapping will look like the following:
Joy 7 60
Joy 8 65
Joy 9 75
Joy 10 80
Edward 7 65
Edward 8 70
Edward 9 80
Edward 10 90
The source data had repeating columns, namely class7, class 8, class 9, and class 10. We
have rearranged the data to fit into a single column of class, and for one source record,
four records are created in the target by using Normalizer.
In this way, we can normalize data and create multiple records for a single source of
data.
Tuning starts with the identification of bottlenecks in the source, target, and mapping
and further to session tuning. It might need further tuning on the system resources on
which the Informatica PowerCenter Services are running.
We can use the test load option to run sessions when we tune session performance.
Adding partitions will improve the performance by utilizing more of the system
hardware while processing the session.
Determining the best way to improve performance can be complicated, so it's better to
change one variable at a time. If the session performance does not improve, then we
can return to the original configuration.
The goal of performance tuning is to optimize session performance so that the sessions
run during the available load window for the Informatica Server.
We can increase the session performance with the help of the following tasks, such as:
o Flat files: If the flat files are stored on a machine other than the Informatica server, then
move those files to the device that consists of the Informatica server.
o Less Connection: Minimize the connections to sources, targets, and Informatica server
to improve session performance. Moving the target database into the server system may
improve session performance.
o Staging areas: If we use staging areas, then force the Informatica server to perform
multiple data passes. Removing of staging areas can improve the session performance.
Use the staging area only when it is mandatory.
o Informatica Servers: We can run the multiple Informatica servers against the same
repository. Distributing the session load into the multiple Informatica servers improves
the session performance.
o ASCII: Run the Informatica server in ASCII data movement mode improves the session
performance. Because ASCII data movement mode stores a character value in one byte,
Unicode mode takes 2 bytes to save a character.
o Source qualifier: If a session joins multiple source tables in one Source Qualifier,
optimizing the query can improve performance. Also, single table select statements with
an ORDER BY or GROUP BY clause can be beneficial from optimization, such as adding
indexes.
o Drop constraints: If the target consists of key constraints and indexes, then it slows the
loading of data. To improve the session performance, drop constraints and indexes
before running the session (while loading facts and dimensions) and rebuild them after
completion of the session.
o Parallel sessions: Running parallel sessions by using concurrent batches will also reduce
the time of loading the data. So concurrent batches increase the session performance.
o Packet size: We can improve the session performance by configuring the network packet
size, which allows data to cross the network at one time. To do this, go to the server
manager, choose server configure database connections.
Informatica BDM
Informatica Big Data Management (BDM) product is a GUI based integrated
development tool. This tool is used by organizations to build Data Quality, Data
Integration, and Data Governance processes for their big data platforms.
Informatica BDM has built-in Smart Executor that supports various processing engines
such as Apache Spark, Blaze, Apache Hive on Tez, and Apache Hive on MapReduce.
Informatica BDM is used to perform data ingestion into a Hadoop cluster, data
processing on the cluster, and extraction of data from the Hadoop cluster.
In Spark mode, the Informatica mappings are translated into Scala code.
In Hive and MapReduce mode, Informatica's mappings are translated into MapReduce
code and are executed natively to the Hadoop cluster.
Informatica BDM integrates seamlessly with the Hortonworks Data Platform (HDP)
Hadoop cluster in all related aspects, including its default authorization system. Ranger
can be used to enforce a fine-grained role-based authorization to data as well as
metadata stored inside the HDP cluster.
Informatica's BDM integrates with Ranger in all modes of execution. Informatica's BDM
has a Smart Executor that enables organizations to run their Informatica mappings
seamlessly on one or more methods of implementation under the purview of their
existing security setup.
Authentication
Authentication is the process of dependably ensuring the user is who claims to be.
Kerberos is the widely accepted authentication mechanism on Hadoop, including the
Hortonworks Data Platform. Kerberos protocol relies on a Key Distribution Center (KDC),
a network service that issues tickets permitting access.
Authorization
Authorization is the process of determining whether a user has access to perform
certain operations on a given system or not. In HDP Hadoop clusters, authorization
plays a vital role in ensuring the users access only the data that they are allowed to by
the Hadoop administrator.
When an Informatica mapping gets executed in Blaze mode, then call is made to the
Hive Metastore to understand the structure of the tables.
The Blaze runtime then loads the optimized mapping into memory. This mapping then
interacts with the corresponding Hadoop service to read the data or write the data.
The Hadoop service itself is integrated with Ranger and ensures the authorization is
taken place before the request is served.
2. Spark
Informatica BDM can execute mappings as Spark's Scala code on the HDP Hadoop
cluster. The illustration details different steps involved when using Spark execution
mode.
The Spark executor translates Informatica's mappings into the Spark Scala code. As part
of this translation, if Hive sources or targets are involved, then Spark executor makes a
call to Hive metastore to understand the structure of the Hive tables and optimize the
Scala code.
Then, this Scala code is submitted to YARN for execution. When the Spark code accesses
the data, the corresponding Hadoop service relies on Ranger for authorization.
3. Hive on MapReduce
Informatica BDM can execute mappings as MapReduce code on the Hadoop cluster.
Below illustration steps involved Hive on MapReduce mode.
When a mapping is executed in Hive on MapReduce mode, the Hive executor on the
Informatica node translates the Informatica mapping into MapReduce and submits the
job to the Hadoop cluster.
If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta
store to understand the table structure and accordingly optimize the mapping. As the
MapReduce interacts with Hadoop services such as HDFS and Hive, the Hadoop service
authorizes the requests with Ranger.
4. Hive on Tez
Tez can be enabled in Informatica BDM by a configuration change and is transparent to
the mapping developed.
Hence mappings running on Hive on Tez follow a similar pattern as Hive on MapReduce.
When a mapping is executed in the Hive on Tez mode, the Hive executor on
the Informatica node translates the Informatica mapping into Tez job and submits it to
the Hadoop cluster.
If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta
store to understand the table structure and accordingly optimize the mapping. As the
Tez job interacts with Hadoop services such as HDFS and Hive, the Hadoop service
authorizes the requests with Ranger.
Partitioning in Informatica
The PowerCenter Integration Services creates a default partition type at each partition
point. If we have the Partitioning option, we can change the partition type. The partition
type controls how the PowerCenter Integration Service distributes data among
partitions at partition points.
When we configure the partitioning information for a pipeline, then we must define a
partition type at each partition point in the pipeline. The partition type determines how
the PowerCenter Integration Service redistributes data across partition points.
Here are the following partition types in the Workflow Manager, such as:
1. Database partitioning: The PowerCenter Integration Service queries the IBM DB2 or
Oracle system for table partition information. It reads partitioned data from the
corresponding nodes in the database. Use database partitioning with Oracle or IBM DB2
source instances on a multi-node table space. Use database partitioning with DB2
targets.
2. Hash partitioning: Use hash partitioning when we want the PowerCenter Integration
Service to distribute rows to the partitions by the group. For example, we need to sort
items by item ID, but we do not know how many items have a particular ID number.
Here are the two types of hash partitioning, such as:
o Hash auto-keys: The PowerCenter Integration Service uses all grouped or sorted
ports as a compound partition key. Then we need to use hash auto-keys
partitioning at Rank, Sorter, and unsorted Aggregator transformations.
o Hash user keys: The PowerCenter Integration Service uses a hash function to
group rows of data among partitions. And define the number of ports to
generate the partition key.
3. Key range: It specifies one or more ports to form a compound partition key. The
PowerCenter Integration Service passes data to each partition depending on the ranges
we define for each port. Use key range partitioning where the sources or targets in the
pipeline are partitioned by key range.
4. Pass-through: The PowerCenter Integration Service passes all rows at one partition
point to the next partition point without redistributing them. Choose pass-through
partitioning where we want to create a new pipeline stage to improve performance, but
do not want to change the distribution of data across partitions.
o We cannot create a partition key for round-robin, hash auto-keys, and pass-through
partition.
o If we have a bitmap index upon the target and using the pass-through partition, then we
need to update the target table. In this process, the session might be failing because the
bitmap index creates the locking problem.
o Partition increases the total DTM buffer memory requirement. To ensure enough free
memory to avoid memory allocation failures.
o We can use the native database options as partition alternatives to increase the degree
of parallelism of query processing.
For example, in the Oracle database, we can specify a PARALLEL hint or alter the DOP of
the table.
o We can also use both Informatica and native database level parallel as per the
requirements.
For example, create 2 pass-through pipelines and each sending the query to the Oracle
database with the PARALLEL hint.
Informatica IDQ
Informatica Data Quality is a suite of applications and components that we can integrate
with Informatica PowerCenter to deliver enterprise-strength data quality capability in a
wide range of scenarios.
Data Quality Server: It is used to enable plan and file sharing and to run programs in a
networked environment. The Data Quality Server supports networking through service
domains and communicates with Workbench over TCP/IP.
Both Workbench and Server install with a Data Quality engine and a Data Quality
repository. Users cannot create or edit programs with Server, although users can run a
program to any Data Quality engine independently of Workbench by runtime
commands or from PowerCenter.
Users can apply parameter files, which modify program operations, to runtime
commands when running data quality projects to a Data Quality
engine. Informatica also provides a Data Quality Integration plug-in for PowerCenter.
A project is composed of one or more of the following types of component, such as:
o Operational components perform the data analysis or data enhancement actions on the
data they receive.
IDQ has been a front runner in the Data Quality (DQ) tools market. It will provide a
glance at the features these tools offer.
o Informatica Analyst
o Informatica Developer
Informatica analyst: It is a web-based tool that can be used by business analysts &
developers to analyze, profile, cleanses, standardize & scorecard data in an enterprise.
Role of Dictionaries
Projects can make use of reference dictionaries to identify, repair, or remove inaccurate
or duplicate data values. Informatica Data Quality projects can make use of three types
of reference data.
Standard dictionary files: These files are installed with Informatica Data Quality and
can be used by various kinds of the component in Workbench. All dictionaries installed
with Data Quality are text dictionaries. These are plain-text files saved in .DIC file format.
They can be manually created and edited.
Database dictionaries: Informatica Data Quality users with database expertise can
design and specify dictionaries that are linked to database tables, and that this can be
updated dynamically when the underlying data is updated.
Third-party reference data: These data files are provided by third parties and are
provided by Informatica customers as premium product options. The reference data
provided by third-party vendors are typically in database format.
Advantages
o Stage tables are immediately available to use in the Developer tool after synchronization,
eliminating the need to manually create physical data objects.
o Changes to the synchronized structures are reflected in the Developer tool automatically.
o Enables loading data into Informatica MDM's staging tables, bypassing the landing
tables.
Disadvantages
o Creating a connection for each Base Object folder in the Developer tool can be
inconvenient to maintain.
o Hub Stage options like Delta detection, hard delete detection, and audit trails are not
available.
o Rejected records are not captured in the _REJ table of the corresponding stage table but
get caught in .bad file.
o Invalid lookup values are not rejected while data loads to stage, unlike in the Hub Stage
Process. The record with invalid value gets rejected and captured by the Hub Load
process.
Advantages
o Quickly build transformations in IDQ's Informatica Developer tool rather than creating
complex java functions.
o Unlike Informatica Platform staging, Hub Stage process options such as delta detection,
hard delete detection, audit trail are available for use.
Disadvantages
o Physical data objects need to be manually created for each staging table and manually
updated for any changes to the table.
o IDQ function must contain all transformation logic to leverage the batching of records. If
any transformation logic is additionally defined in the MDM map, then calls to the IDQ
web service will be a single record leading to performance issues.
o Web service invocations are synchronous only, which can be a concern for large data
volume.
Informatica MDM can be used as a target for loading the data to landing tables in
Informatica MDM.
Advantages
o The single connection created in the Developer tool for Informatica MDM is less
cumbersome when compared to creating multiple connections with Informatica platform
staging.
o Unlike Informatica Platform staging, Hub Stage process options - delta detection, hard
delete detection, audit trail are available to use.
Disadvantages
o Physical data objects need to be manually created for each landing table and manually
updated for any changes to the table.
o Need to develop mappings at two levels (i) source to landing and (ii) landing to staging
(direct mapping).
Informatica MDM can be used as a target for loading the directly to staging tables in
Informatica MDM, bypassing landing tables.
Advantages
o The single connection created in the Developer tool for Informatica MDM is less
cumbersome when compared to creating multiple connections with Informatica platform
staging.
o It can be used for the lower version of Informatica MDM, where the Informatica Platform
staging option is not available.
Disadvantages
o Physical data objects need to be manually created for each staging table and manually
updated for any changes to the table.
o Hub Stage Delta detection, hard delete detection, and audit trails options are not
available.
o Rejected records are not captured in the _REJ table of the corresponding stage table but
get caught in .bad file.
o Invalid lookup values are not rejected while data loads to stage, unlike in the Hub Stage
Process. The record with invalid value gets rejected and captured by the Hub Load
process.