0% found this document useful (0 votes)
516 views73 pages

Informatica Certification MCQ Dumps

The document provides an overview of the Informatica ETL tool, describing its key components and functions like data extraction, transformation, loading, architecture, repository, services, and connectivity between client and server.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
516 views73 pages

Informatica Certification MCQ Dumps

The document provides an overview of the Informatica ETL tool, describing its key components and functions like data extraction, transformation, loading, architecture, repository, services, and connectivity between client and server.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 73

Informatica Tutorial

In this tutorial, you will learn how Informatica performs various activities such as data profiling, data
cleansing, transforming, and scheduling the workflows from source to target.

Informatica is used to extracting required data form operation all systems and transforms the same data
on its server and load it to the data warehouse.

Informatica is also introduced as a data integration tool. This tool is based on the ETL architecture. It
provides data integration software and services for different industries, businesses, government
organizations, as well as telecommunication, health care, insurance, and financial services.

It has a unique property to connect, process, and fetch the data from a different type of mixed sources.

Data Extraction
The process of reading and extracting the data from multiple source systems into the Informatica server
is called the data extraction. Informatica data can extract or read different methods such as SQL Server,
Oracle, and many more.

Data Transformation
Data transformation is a process of converting the data into the required format. Data transformation
supports the following activities, such as:

Data Merging: It integrates the data from multiple sources.

Data Cleansing: It cleans the data from unwanted or unnecessary information.

Data Aggregation: It aggregates the data using the aggregate function such as Sum(), min(), max(),
count(), etc.

Data Scrubbing: It is used to derive the new data.

Data Loading
Data loading is used to inserting the data into a target system. There are two types of data loading, such
as:

1.Initial load or Full load: It is the first step; it adds or inserts the data into an empty target table.

2.Delta load or Incremental load or Daily load: This step takes place after the initial load, and it is
only used to load new records or changed the old records.
informatica is an easy to use ETL tool you drag and drop the different objects and design process flow
diagram which is called mappings for data extraction transformation load.

Why We Need Informatica?


-To perform some operations on the data at the backend in a data system, then we need the
Informatica.

-To modify, cleaning up the data based on some set of rules, we need the Informatica.

-By using the Informatica, it is accessible to the loading of bulk data from one system to another.

-It provides a broad set of features such as integration of the data from multiple unstructured,
semi-structured or structured systems, operations at row level on data, and scheduling operation of the
data operation.

-It also supports the features of metadata, so it keeps preserved the information of the process and data
operations.

Informatica Architecture
Informatica architecture is service-oriented architecture (SOA). A service-oriented architecture is
defined as a group of services that communicate with each other. It means a simple data transfer during
this communication, or it can be two or more services that coordinate the same activity.

The Informatica development depends upon the component-based development techniques. This
technique uses the predefined components and functional units with their functionalities to get the
result.

PowerCenter is based on the component-based development methodologies. To build a dataflow from


the source to target, it used different components, and this process is called transformation.
Informatica ETL tool has the below services and components, such as:
1.Repository Service: It is responsible for maintaining Informatica metadata and provides access to the
same to other services.

2.Integration Service: This service helps in the movement of data from sources to the targets.

3.Reporting Service: This service generates the reports.

4.Nodes: This is a computing platform to execute the above services.

5.Informatica Designer: It creates the mappings between source and target.

6.Workflow Manager: It is used to create workflows or other tasks and their execution.

7.Workflow Monitor: It is used to monitor the execution of workflows.

8.Repository Manager: It is used to manage the objects in the repository.


INFORMATICA DOMAIN--
The Informatica domain is the fundamental administrative unit. it consists of nodes and services. These
nodes and services are categorized into folders or sub-folders based on administration requirements
and design architecture.

The domain provides two types of services, such as:

1.Service Manager: It manages domain operations such as logging, authentication, and authorization.
It runs the application services on the nodes and leads users and groups.

2.Application Services: It represents the server-specific services such as repository services, reporting
services, and integration services. The application service can run on different nodes based on
configuration.

NODE-- it is a sensible study of a machine in a domain, and a domain has different hubs. To run the
application administrations, we can design the hubs, such as mix administration.

PowerCenter Repository
The PowerCenter repository is a relational database such as SQL Server, Oracle, and Sybase. And these
databases are maintained by the repository services. The database tables store the metadata.
Informatica client tools are the three types, such as:

1.Informatica designer 2.Informatica workflow manager 3.Informatica workflow monitor

Informatica provides repository services, and it is used to manage the repository. The repository services
exclusively handle one request for one repository. But we can execute it on multiple nodes for better
performance.

We can maintain the different versions of the same objects because of its version control mechanism.
And also ignore multiple users that modifying the same object at the same time.

The objects created in the repository are having this three-state, such as:

Valid: Valid objects have the correct syntax, according to the Informatica. And used for the execution of
the workflow.

Invalid: Invalid objects do not follow the standard or rules. These objects checked the syntax, and
properties are valid or not during the saving of the object in Informatica.

Impacted: The child objects of the affected object are invalid.

PowerCenter Repository Service


PowerCenter repository service is a different multi-strung process. It allows the customer to change the
metadata in the store. It accepts demands from the considerate benefit for metadata to run work
processes.

And the repository service maintains the associations from PowerCenter customers to the PowerCenter
vault. It inserts the metadata inside the archive and keeps it refreshed. It is able to keep up consistency
inside the archive metadata.

Domain configuration
In the Informatica ETL tool, the domain is the necessary fundamental administrative control. It is an
apparent entity that provides other different services such as repository service, integration service, and
various nodes.

The Informatica admin console is used for the domain configuration. And the console is launched with
the help of the web browsers.

PowerCenter Client and Server Connectivity

PowerCenter client tools are installed on the client-side machines. These tools are the development
tools such as workflow manager, PowerCenter designer, repository manager, and workflow monitor.
Informatica repository contains all the created mapping and objects in these client tools, which resides
on the Informatica server. That's why client tools must have network connectivity with the server.

Also, PowerCenter client connects to the sources and targets to import the metadata and structure
definitions. Thus, it also maintains the connectivity to the source or target systems.

--PowerCenter client uses the TCP/IP protocols for connectivity with the integration service and
repository service.

--And PowerCenter client uses the ODBC drivers for the connectivity with the source or targets.

Repository Service
The repository service is a multithreading process. It maintains the connection between the
PowerCenter clients and the PowerCenter repository

The repository service can fetch, insert, and update the metadata inside the repository. And it also
maintains the consistency inside the repository metadata.

Integration Service
The integration service is used as an execution engine in the Informatica. It helps in executing the tasks
which are created in the Informatica. Integration service works in the following manner, such as:

--A user performs a workflow.

--The Informatica instructs the integration service to execute the workflow.

--Then the integration service reads workflow details from the repository.

--The integration service starts the execution of the tasks inside the workflow.

--After the execution, the task status is updated, for example, Succeeded, Failed, or Aborted.

--Then it grants the session log and workflow log.

--This service loads the data into the target systems.

--Integration service combines data from different sources

Informatica PowerCenter
Informatica PowerCenter is an ETL tool that is extracting data from its source, transforming this data
according to requirements, and loading this data into a target data warehouse.

We can build enterprise data warehouses with the help of the Informatica PowerCenter. The
Informatica PowerCenter produces the Informatica Crop.
The main components of Informatica PowerCenter are its client tools, server, repository, and repository
server. Both the PowerCenter server and repository server make up the ETL layer, which is used to
complete the ETL processing.

The PowerCenter server executes tasks based on workflow created by workflow managers. The
workflow is monitoring through a workflow monitor. The jobs are designed in a mapping designer inside
the program, which establishes a mapping between source and target.

Mapping is a pictorial representation of the flow of data from source to target. Aggregation, filtering,
and joining are significant examples of transformation.

The Informatica PowerCenter is having the following services, such as:

--B2B exchange.

--Data governance.

--Data migration.

--Data warehousing.

--Data synchronization and replication.

--Integration Competency Centers (ICC).

--Master Data Management (MDM).

--Service-oriented architectures (SOA) and many more.

Informatica PowerCenter can manage the broadest range of data integration, which treated as a single
platform. The development of data warehouses and data marts is possible with the help of this ETL tool.

Informatica PowerCenter software meets with the enterprise expectations and requirements for
scalability, security, and collaboration through the following capabilities, such as:

--Dynamic partitioning

--High availability/seamless recovery

--Metadata management

--Data masking

--Grid computing support, and more

The Informatica ETL or Informatica PowerCenter product consists of three significant applications,
such as:
1.Informatica PowerCenter Client Tools: These tools are designed to enable a developer:

--To report the metadata.

--To manage the repository.

--To monitor sessions' execution.

--To define mapping and run-time properties.

2.Informatica PowerCenter Repository: It is a center of Informatica tools where all data is stored, which
is related to the mapping, sources, or targets.

3.Informatica PowerCenter Server: It is a server where all the actions are executed. To fetch the data, it
connects with the sources and targets. Then, apply for all the transformations and load the data into
target systems.

Mapping in Informatica
Mapping is a collection of source and target objects which is tied up together through a set of
transformations. These transformations are formed with a set of rules that define how the data is
loaded into the targets and flow of the data.

Mapping in Informatica includes the following set of objects, such as:

Source definition: The source definition defines the structure and characteristics of the source, such as
basic data types, type of the data source, and more.

Transformation: It defines how the source data is changed, and various functions can be applied
during this process.

Target Definition: The target definition defines where the data will be loaded finally.

Links: Link is used to connecting the source definition with target tables and different transformations.
And it shows the flow of data between the source and target.

Why do we need Mapping?


In Informatica, Mapping is an object which can define the process of modification of the source data
before it reaches the target object.

Mapping can define the data transformation details and source or target object characteristics because
it is a primary object in the Informatica.

Mappings define the data transformation for each row at the individual column levels. And we can hold
multiple sources and targets in a single mapping.
Components of Mapping
Here are some essential elements used in mapping, such as:

Source tables

Mapping parameters and variables

Target objects

Mapping transformations

A mapping contains sources, targets, mapping parameters, variables, multiple changes, mapplets, and
user-defined functions.

Mapping Source: Mapping sources are those objects whose allow fetching the source data. It can be a
flat file, database table, COBOL file source, or XML source.

Mapping target: The mapping target is the destination objects where the final data is loaded. A mapping
target can be a relational table of a database, XML file, or a flat-file. Sources and targets must be present
in any mapping with their different data types.

Mapping Parameters and Variables: It is an optional user-defined data types. The Mapping parameters
and variables are used to create temporary variable objects. It helps us to define and store temporary
values during the processing of mapping data. This user-defined data type is designed for mapping.

Mapplets: The mapplets are objects which consist of a set of source, transformation, or targets. With
the help of Mapplets, we can reuse the existing functionality of a set of changes.

What is Stage Mapping?


In a stage mapping, we create the replica of the source table.

For Example, if we have a "Student" table, and we want to create an identical table, "Student_stage" in
ETL schema.

In Stage Mappings,

--Source and Target tables are having identical structures.

--In the staging table, data is a replica of source table data.

--In the source table, data is a subset of source data.

For example, if the source table contains student details of rollno 1, 2, 3, and 10. The stage table is a
kind of table which has student records of rollno 1 & 3 only.

--In the data warehouse, we need to create stage tables to make the process of data transformation
efficiently, to minimize the dependency of ETL or Data Warehouse from the real-time operating system.
It could happen when we are fetching the relevant data only.

Mapping Parameters and Variables


In Informatica, we need to follow the predefined syntax and navigation to create parameters and
variables.

Here are some fundamental differences between the mapping parameters and mapping variables, such
as:

Informatica Cloud
Informatica Cloud is a data integration solution and platform that works Software as a Service (SaaS).
Informatica Cloud can connect to on-premises, cloud-based applications, databases, flat files, file feeds,
and even social networking sites

If we are using a cloud data warehouse such as AWS Redshift, Azure SQL Data Warehouse, or Snowflake,
then Informatica Cloud Data Integration solutions improve the overall performance, productivity, and
extensive connectivity to cloud and on-premises sources.

Informatica Cloud Connectors deliver connectivity to an enterprise application or database. Connectors


exist for many systems, including ACORD, Google BigQuery, Goldmine, JD Edwards, MS Access, MS
Dynamics, MS Great Plains, MS Navision, Netsuite, Oracle EBS, Salesforce, SQL Database, and SWIFT
Transformation.
Architecture
The architecture of the Informatica cloud shows through the following image, such as:

Secure Agent
In Informatica Cloud architecture, the Secure Agent is a lightweight program. And it is used to connect
on-premise data with cloud applications. It is installed on a local machine and processes data locally and
securely under the enterprise firewall. All upgrades and updates are automatically sent and installed on
the Secure Agent regularly.

Connectors
The connectors are the second central part of the Informatica Cloud architecture. These are pre-built
integration solutions for data, devices, and connecting applications. Connectors are used to connectivity
with Cloud applications (Salesforce, Workday, Amazon Redshift) and on-premise applications (SQL
Server, Dynamics CRM, Teradata)

Informatica Cloud Components


Informatica Cloud has the following components, such as:

1.Informatica Cloud: A browser-based application that runs at the Informatica Cloud hosting facility. It
allows us to configure connections, create users, run, schedule, and monitor tasks.

2.Informatica Cloud hosting facility: It is a platform to run the Informatica Cloud application. The
hosting facility is used to stores all jobs and information of an organization. It is stored in the
PowerCenter repository.

3.Informatica Cloud Application: Applications can use to perform tasks, such as data synchronization,
contact validation, and data replication.

4.Informatica Cloud Secure Agent: An Informatica Cloud is installed on a local machine that runs all
tasks and provides firewall security between the hosting facility and the organization. When the Secure
Agent runs a job, then it connects with the Informatica Cloud hosting facility to access task information.
It can connect directly and securely to sources and targets. It also can transfers data between sources
and targets and perform any additional task based on the requirements.

Connectivity
The below image shows the connectivity of the Informatica cloud, such as:

Use Cases

The Informatica cloud has the following use cases, such as:

1. Data Synchronization

--Cloud application to cloud application


--SaaS to SaaS

--On-premise application to cloud application

--On-premise application to on-premise application

2. System Maintenance Tasks

--Performs Create Update Delete (CRUD) operations

--Bulk pulling and loading data to 3rd party systems or applications

3. Data Replication

--Copy data sets

--Backup or Warehouse data

4. Mappings

--Processes to extract transform and load data from multiple sources to multiple targets.

5. Monitoring

--The activity log is used for all successful and unsuccessful jobs.

--The audit log is used for all events by users in an organization.

--The dashboard summary is used for tasks, availability, and data processing.

Development
In Informatica Cloud, Development is divided into Data Synchronization Tasks and Mappings.

Data Synchronization tasks are ETL based tasks. We can load the data from a source to a target with the
help of data synchronization and provide some transformation during transfer.

Mappings are flows that allow for chaining multiple complex operations such as joins, filters, and
functions together to build a complex integration process.

Reporting
In Informatica Cloud, Reporting is divided into Activity Log and Activity Monitoring.

The Activity Log is responsible for generating a summary report of all successful and unsuccessful
executed jobs.

And Activity Monitoring is responsible for creating a list of all currently running jobs. Each job is listed
along with starting date or time, as well as rows processed.
Each process is also provided with information such as date, time, status, success rows, error rows, and
error message. The user can drill down into each job and download a session log that contains execution
information details at the logging level.

Informatica MDM
MDM stands for Master data management. It is a method of managing the organization
data as a single coherent system. MDM is used to ensure the reliability of data, and this
data is in various formats that collect from different data sources. And it is responsible
for data analytics decision making, AI training, data initiatives, and digital
transformation.

Master data management can link all critical data with the master file. MDM is
responsible for sharing the data across the enterprise after well implemented. MDM is
used as an effective strategy for data integration.

Organizations are dependent on the data to streamline operations. The quality of


business intelligence, analytics, and AI results depends on the quality of data. Master
data management helps:

o In removing the duplicity of the data.

o In integrating the data from various data sources.

o In standardizing unrelated data, therefore, the data effectively used.

o In eliminating inaccurate data.

o In enables a single source of reference that's called "Golden Record".

Master Data Management Processes


The full range of MDM processes is a mixture of the underlying process. These are the
key to MDM processes, such as:
Backward Skip 10sPlay VideoForward Skip 10s

o Business rule administration

o Data aggregation

o Data classification

o Data collection

o Data consolidation
o Data distribution

o Data enrichment

o Data governance

o Data mapping

o Data matching

o Data normalization

Master data management is creating a clear and strategic flow between all data sources
and the various destination systems.

Benefits of MDM
A clear and coherent data management is needed for a competitive business strategy.

Some important benefits of MDM are given below, such as:

o Control: Know where the data is, where it’s headed, and how secure it is.
o Data accuracy: Understand how closely our metrics track follows our factors.

o Data consistency: Understand how closely our data flow tracks the underlying patterns.

Key Features
Some key features of MDM are listed below, such as:

o It provides a modular design.

o It supports a 360-degree view between the customers, products, suppliers, and other
entities ' relationships.

o It supports third-party data integration.

o It gives 360 solutions and prebuilt data models and accelerators.

o It has High scalability.

o It provides an intelligent search.

o It supports intelligent matches and merges property.

o It has intelligent security.

o Data as a service.

Need of MDM
The MDM solutions are involved in the broad range of transformation, data cleansing,
and integration practices. When data sources are added to the system, then MDM
initiates processes to identify, collect, transform, and repair the data.

When the data meets the quality thresholds, then we can maintain a high-quality master
reference with the help of created schemas and taxonomies. By using MDM, the
organizations feel relaxed about the accuracy, up-to-date, and consistent of the data all
over the enterprise.

Use Cases
Achieving consistency, control, and data accuracy are important because organizations
become dependent on data for all necessary operations. After effective execution,
Master data management helps organizations:

o To compete more effectively.


o To improve customer experiences by accurately identify specific customers in different
departments.

o To improve operational efficiencies by reducing data-related friction.

o To smooth Streamline supplier relationships with vendor MDM.

o To understand the journey of the customer through customer MDM.

o To understand product life cycles in detail through product MDM.

MDM Challenges
Master data management is required to remove poor data quality from the enterprise.
For example, in a company, several customer records are stored in different formats in
different systems.

The organizations may face some delivery challenges such as unknown prospects,
overstock or understock products, and many other problems. Common data quality
challenges that include:

o Duplicate records

o Erroneous information

o Incomplete information

o Inconsistent records

o Mislabeled data

Causes
Here are some reasons for poor data quality, such as:

o A lack of standards in the organization.

o Having the same entity

o For different account numbers.

o Redundant or duplicate data.

o Varied field structures in different applications that define a particular format of data to
be entered such as John Smith or J. Smith
Trends in Master Data Management
In 2018, many organizations tied up with the EU's General Data Protection Regulation
(GDPR), which restricts the Personally Identifiable Information (PII) use. It also controls
the use of that information at the end of end-users.

On January 1, 2020, the California Consumer Privacy Act was slated to take effect even if
the content could evolve based on the November 2018 election. But this Act may be
replaced by a federal equivalent.

Many countries and jurisdictions are creating privacy laws. These laws impact companies
or doing business in those locations. The result of the increased survey is dependent on
master data management solutions.

The metadata management is an important aspect of the MDM. Metadata management


is used to manage data about data. Metadata management helps:

o To ensure compliance with the organizations.

o To locate a specific data asset in the organizations.

o To manage the risks in the organizations.

o To make sense of data in organizations.

o To perform analytics of the data in multiple data sources inside and outside of the
organization.

Metadata management is always important. But nowadays, it's becoming even more
important because organizations are extending out to IIoT, IoT, and third-party data
sources with increased the amount of data continues.

Master Data Management Best Practices


The data management reference architectures are provided by the solution provider
that explains the basics concepts and helps customers to understand the company's
product offerings.

The master data management architectural elements and tools include the following:

o Data federation

o Data integration

o Data marts
o Data networks

o Data mining

o Data virtualization

o Data visualization

o Data warehouse

o Databases

o File systems

o Operational datastore

Master Data Management Future


Large and medium enterprises are increasingly dependent on master data management
tools as the volume and variety of data have continued to grow up, and their businesses
have evolved.

The MDM architectures become complex and unwieldy when a business adds more and
different types of MDM capabilities. Some vendors provide comprehensive solutions to
simplify the complexity and increase market share. It replaces the individual point
solutions.

Due to businesses transition from periodic business intelligence (BI) reports, MDM is
growing continuously. Master data management is also important because
organizations adopt and build AI-powered systems. An organization will be used some
data as training data for machine learning purposes.

The master data management and data management become so important because
most organizations are hiring a Chief Data Officer (CDO), a Chief Analytics Officer (CAO),
or both.

When it executed adequately, then the master data management allows companies to:

o Integrate the disparate data from various data sources into a single hub so it can
be replicated to other destinations.

o Provide a single view of master data among the destination systems.

o Copy master data from one system to another.


Informatica ETL
Informatica ETL is used to data extraction, and it is based on the data warehouse
concept, where the data is extracted from multiples different databases.

History
The Ab Intio multinational Software Company invented the ETL tool. This company is
located outside of Lexington, Massachusetts. The United States framed GUI Based
parallel processing software that is called ETL.
Implementation of ETL Tool

1. Extract
The data is extracted from different sources of data. The relational databases, flat files,
and XML, Information Management System (IMS), or other data structures are including
in the standard data-source formats.

Instant data validation is used to confirm whether the pulled data from the sources have
the correct values in a given domain.
2. Transform
To prepare and to load into a target data source, we applied a set of rules and logical
functions on the extracted data. The cleaning of data means passing the correct data
into the target source.

According to the business requirements, we can apply many transformation types in the
data. Some transformation types are Key-based, column or row-based, coded and
calculated values, joining different data sources, and many more.
3. Load
In this phase, we load the data into the target data source.

All three phases do not wait for each other for starting or ending. All three-phase are
parallelly executed.

Uses in Real-Time Business


Informatica company provides data integration products for ETL such as data quality,
data masking, data virtualization, master data management, data replica, etc.
Informatica ETL is the most common Data integration tool which is used for connecting
& fetching data from different data sources.
To approach this software, some use cases are given below, such as:

1. An organization is migrating a new database system from an existing software system.

2. To set up a Data Warehouse in an organization, the data need to move from the
Production to Warehouse.

3. It works as a data cleansing tool where data is corrected, detected, or removed


inaccurate records from a database.

Features of ETL Tool


Here are some essential features of the ETL tool, such as:

1. Parallel Processing

ETL is implemented by using a concept of Parallel Processing. Parallel Processing is


executed on multiple processes that running simultaneously. ETL is working on three
types of parallelism, such as:

o By splitting a single file into smaller data files.

o The pipeline allows running several components simultaneously on the same data.

o A component is the executables processes involved for running simultaneously on


different data to do the same job.

2. Data Reuse, Data Re-Run, and Data Recovery

Each data row is provided with a row_id, and a piece of the process is supplied with a
run_id so that one can track the data by these ids. To complete certain phases of the
process as we create checkpoints. These checkpoints tell the need to re-run the query
for task completion.

3. Visual ETL

The PowerCenter and Metadata Messenger are advanced ETL tools. These tools help to
make faster, automated, and impactful structured data according to the business
requirements.

We can create a database and metadata modules with a drag and drop mechanism as a
solution. It can automatically configure, connect, extract, transfer, and loads the data
into the target system.
Characteristics of ETL Tool
Some attributes of the ETL tool are as follows:

1. It should increase data connectivity and scalability.

2. It should be capable of connecting multiple relational databases.

3. It should support CSV extension data files then the end-users can import these files
easily or without any coding.

4. It should have a user-friendly GUI so that the end-users easily integrate the data with the
visual mapper.

5. It should allow the end-user to customize the data modules according to the business
requirements.

Why do you need ETL?


It is common for data from disparate sources to be brought together in one place
during creating a data warehouse so that it can be analyzed for patterns and insights.
It's okay if data from all these sources had a compatible schema from the outset, but it
happens very rarely.

ETL takes the heterogeneous data and makes it homogeneous. The analysis of different
data and derive business intelligence is impossible without ETL.

ETL Tool Products and Services


Informatica -ETL products and services are used to improve business operations, reduce
big data management, provide high security of data, data recovery under unforeseen
conditions and automate the process of developing and artistically design visual data.
The ETL tool product and services are divided into the following:

1. ETL with Big Data

2. ETL with Cloud

3. ETL with SAS

4. ETL with HADOOP

5. ETL with Metadata

6. ETL as Self-service access

7. Mobile optimized solution and many more.


Why is ETL Tool so trending?
The following qualities of ETL tool being it so trending, such as:

1. ETL tool has accurate and automates deployments.

2. It minimizes the risks of adopting new technologies.

3. It provides highly secured data.

4. It is self- Owned.

5. It includes recovery from a data disaster.

6. It provides data monitoring and data maintenance.

7. It has an attractive and artistic visual data delivery.

8. It supports the centralized and cloud-based server.

9. It provides concrete firmware protection of data.

Side effects of ETL Tool


The organization continuously depends on the data integration tool. It is a machine, and
it will work only after receiving a programmed input.

There is a risk of complete crashing of the systems, and it tells how good the data
recovery systems are built. Any misuse of simple data may create a massive loss in the
organization.

Informatica Transformations
Informatica Transformations are repository objects which can create, read, modifies, or
passes data to the defined target structures such as tables, files, or any other targets.

In Informatica, the purpose of transformation is to modify the source data according to


the requirement of the target system. It also ensures the quality of the data being
loaded into the target.

A Transformation is used to represent a set of rules, which define the data flow and how
the data is loaded into the targets.

Informatica provides multiple transformations to perform specific functionalities.

In transformations, To passing the data, we need to connect the ports to it, and through
the output ports, it returns the output.
Classification of Transformation
Transformation is classified into two categories-the first one based on connectivity and
second based on the change in several rows. First, we will look at the transformation
based on connectivity.

1. Here are two types of transformation based on connectivity, such as:

o Connected Transformations

o Unconnected Transformations

In Informatica, one transformation is connected to other transformations during


mappings are called connected transformations.

For example, Source qualifier transformation of Source table Stud is connected to filter
transformation to filter students of a class.

Those transformations whose not link to any other transformations are


called unconnected transformations.

Their functionality is used by calling them inside other transformations. And these
transformations are not part of the pipeline.

The connected transformations are preferred when the transformation is called for every
input row or expected to return a value.

The unconnected transformations are useful if their functionality is required periodically


only or based upon certain conditions. For example, calculate the tax details if tax value
is not available.
1. Here are two types of transformations based on the change in several rows, such as:

o Active Transformations

o Passive Transformations

Active Transformations are those who modify the data rows, and the number of input
rows passed to them. For example, if a transformation receives 10 numbers of rows as
input, and it returns 15 numbers of rows as an output, then it is an active
transformation. In the active transformation, the data is modified in the row.

Passive Transformations do not change the number of input rows. In passive


transformations, the number of input and output rows remains the same, and data is
modified at row level only.

In the passive transformation, we cannot create new rows, and no existing rows
dropped.

List of Transformations in Informatica


o Source Qualifier Transformation
o Aggregator Transformation

o Router Transformation

o Joiner transformation

o Rank Transformation

o Sequence Generator Transformation

o Transaction Control Transformation

o Lookup and Re-usable transformation

o Normalizer Transformation

o Performance Tuning for Transformation

o External Transformation

o Expression Transformation

What is Filter Transformation?


Filter Transformation is an active transformation because it changes the number of
records. We can filter the records according to the requirements by using the filter
condition.

For example, for loading the student records having rollno equal to 20 only, we can put
filter transformation in the mapping with the filter condition rollno=20. So only those
records which have rollno =20 will be passed by filter transformation, rest other records
will be dropped.

Step 1: Create a mapping having source "Stu" and target "Stu_target".

Step 2: Then in the mapping

1. Select Transformation menu

2. Select the create option.

Step 3: In the create transformation window

1. Select Filter Transformation from the list.

2. Enter Transformation name fltr_rollno_20


3. Select create option

Step 4: The filter transformation will be created, click on the Done button in the creative
transformation window.

Step 5: in the mapping,

1. Drag and drop all the source qualifier columns from the filter transformation.

2. And Link the columns of filter transformation to the target table.

Step 6: Double click on the filter transformation to open its properties, and

1. Select the properties menu.

2. Click on the filter condition editor.

Step 7: Then,

1. Enter filter condition rollno=20.

2. Click on the OK button.

Step 8: Again in the edit transformation window,

1. We will see the filter condition in the properties tab.

2. Click on the OK button.

Now save the created mapping and execute this after creating session and workflow. In
the target table, only the rollno = 20 will be loaded from the record.

Source Qualifier Transformation


The source qualifier transformation is active and connected. It is used to represent the
rows that the integrations service reads when it runs a session. We need to join the
source qualifier transformation with the relational or flat file definition in a mapping.

The source qualifier transformation converts the source data types in the Informatica
native data types. That's why there is no need to alter the data types of the ports.

The source qualifier transformation does the following tasks, such as:

o Joins: We can join two or more tables from the same source database. By default, the
sources are merged based on the primary key and foreign key relationships. This can be
changed by specifying the join condition in the "user-defined join" property.
o Filter rows: We can filter the rows from the source database. In the default query, the
integration service adds a WHERE clause.

o Sorting input: We can sort the source data by specifying the number for sorted ports. In
the default SQL query, the Integration Service adds an ORDER BY clause.

o Distinct rows: We can get separate rows from the source by choosing the "Select
Distinct" property. In the default SQL query, the Integration Service adds a SELECT
DISTINCT statement.

o Custom SQL Query: We can write your SQL query to do calculations.

Source Qualifier Transformation Properties


The source qualifier transformation has the following properties, such as:

Option Description

SQL Query It defines a custom query that replaces the


default query of the Integration Service,
which is used to read data from sources. A
custom query overrides entries for a custom
join or a source filter.

User-Defined It specifies the condition which is used to


Join join data from multiple sources represented
in the same Source Qualifier transformation.

Source Filter It specifies the filter condition of the


Integration Service that applies while
querying the rows.

Number of It indicates the number of columns, and it is


Sorted Ports used during the sorting rows queried from
relational sources. If we go with this option,
the Integration Service adds an ORDER BY
to the default query when it reads source
rows. The ORDER BY includes the number of
ports specified, starting from the top of the
transformation.
When selected, the database sort order
must match the session sort order.

Tracing Level It sets the amount of detail included in the


session log when we run a session
containing this transformation.

Select Distinct Specifies if you want to select unique rows.


The Integration Service includes a SELECT
DISTINCT statement if you choose this
option.

Pre-SQL Pre-session SQL commands are used to run


against the source database before the
Integration Service reads the source.

Post-SQL Post-session SQL commands are used to run


against the source database after the
Integration Service writes to the target.

Output is Relational source or transformation output


Deterministic that does not change between session runs
when the input data is consistent between
runs. When you configure this property, the
Integration Service does not stage source
data for recovery if transformations in the
pipeline always produce repeatable data.

Output is Relational source or transformation output


Repeatable that is in the same order between session
runs when the request of the input data is
consistent. When the output is deterministic,
and output is repeatable, the Integration
Service does not stage source data for
recovery.
Examples
In this example, we want to modify the source qualifier of the mapping
"m_emp_emp_target", so instead of returning all the columns, it will return only selected
columns.

Step 1: Open mapping "m_stud_stud_target" in mapping designer.

Step 2: Double click on the Source Qualifier transformation "SQ_STUD". It will open the
edit transformation property window for it. Then

1. Click on the properties tab.

2. Click on the SQLQuery Modify option, and this will open an SQL editor window.

Step 3: In the SQL editor window

1. Enter the following query


SELECT STUDNO, SNAME, CLASS, SEC FROM STUD
Note: we are selecting the columns STUDNO, SNAME, CLASS & SECTION from the
source, so we have kept only those in the select query.
2. Click on the OK button.

Step 4: In the "edit transformations" window,

1. Select the Ports tab from the menu.

2. Under the ports tab, you will see all the ports. Keep only the ports STUDNO, SNAME,
CLASS, SEC and delete other ports

Step 5: After the deletion of ports, click OK Button.

Now, again click on the properties tab in the Edit Transformations window, and we will
see only selected data. After clicking the "OK" button it will open SQL Editor Window,
and

1. It will confirm the data you have chosen are correct and ready for loading into the target
table.

2. Click on the OK button.

Save the mapping (using ctrl+s) and execute the workflow. After execution, only the
selected columns will be loaded into the target.
In this way, we can override in source qualifier what columns need to be fetched from
the source & this is the only way to replace what specific columns will be brought inside
the mapping.

Aggregator Transformation
Aggregator transformation is an active transformation. And it is used to perform
calculations on the data such as sums, averages, counts, etc.

The integration service stores the group of data and row data in the aggregate cache.
The Aggregator Transformation is more beneficial in comparison to the SQL. We can use
conditional clauses to filter rows.

Properties of Aggregator Transformation


Here are some features of aggregator transformation, such as:

o Aggregate Expression

o Group by port

o Sorted Input

o Aggregate cache

o Unsorted Input

Aggregate Expression

Aggregate functions are used to drive the aggregate expression, which can be
developed either in variable ports or output ports only.

Sorted input

Group by ports are sorted using a sorted transformation and receive the sorted data as
an input to improve the performance of data aggregation.

It keeps the sorted transformation before the aggregator transformation to perform


sorting on fro up by ports.

Aggregate Cache

An integration service creates an aggregate cache.

Unsorted inputs
The aggregate cache contains group by ports, non-group by input ports, and output
port, which provides aggregate expression.

Aggregate Expressions
This transformation offers more functionality to the comparison of SQL's group by
statements. Because one can apply conditional logic to groups within the aggregator
transformation. Many different aggregate functions can be used to individual output
ports within the transformation. Below is the list of these aggregate functions, such as:

o AVG

o COUNT

o FIRST

o LAST

o MAX

o MEDIAN

o MIN

o PERCENTILE

o STDDEV

o SUM

o VARIANCE

Creating an Aggregator Transformation


Follows the following steps, such as:

Step 1: Go to the Mapping Designer, click on transformation in the toolbar and create.

Step 2: Select the Aggregator transformation, enter the name, and click create.

Step 3: Then click on the Done button.

It will create an aggregator transformation without ports.

To create ports, we can either drag the ports to the aggregator transformation or create
in the ports tab of the aggregator.
Configuring the Aggregator Transformation
We can configure the following components in aggregator transformation in the
Informatica.

Aggregate Cache:The integration service stores the group values in the index cache
and row data in the data cache.

1. Aggregate Expression:We can enter expressions in the output port or variable port.

2. Group by Port: This option tells the integration service on how to create groups. We can
configure input, output, or variable ports for the group.

3. Sorted Input: This option is used to improve session performance. This option will apply
only when the input to the aggregator transformation in sorted on the group by ports.

Informatica Nested Aggregate Functions


We can nest one aggregate function within another aggregate function. We can either
use single-level aggregate functions or multiple nested functions in an aggregate
transformation.

But we cannot use both single-level and nested aggregate functions in an aggregator
transformation Informatica. The mapping designer marks the mapping as invalid if an
aggregator transformation contains both single-level and nested aggregate functions. If
we want to create both single-level and nested aggregate functions, create separate
aggregate transformations.

Incremental Aggregation in Informatica


We can enable the session option and Incremental Aggregation After creating a session
that includes an Aggregator transformation. When the integration service performs
incremental aggregation, it passes source data through the mapping and uses historical
cache data to perform aggregation calculations incrementally.

Router Transformation
Router transformation is an active and connected transformation, and it is similar to the
filter transformation, which is used to filter the source data.

In a Router transformation, Data Integration is used as a filter condition to evaluate each


row of incoming data. It checks the conditions of each user-defined group before
processing the default group.
If a row connects more than one group filter condition, Data Integration passes the row
multiple times. We can drop rows that do not meet any of the conditions to a default
output group.

If we need to check the same input data based on multiple conditions, then we use a
Router transformation in a mapping instead of creating multiple Filter transformations.

The following table compares the Router transformation to the Filter transformation:

Options Router Filter

Conditions Test for Check for one


multiple condition per
conditions in a Filter
single Router transformation
transformation

Handle rows that do Route rows to Drop


not meet the the default rows that
condition output group or do not
drop rows that meet the
do not meet the condition
condition

Incoming The process Process in each


data once with a Filter
single Router transformation
transformation

For example, when filtering the data form rollno =20, we can also get those records
where rollno is not equal to 20. So, router transformation gives multiple output groups,
and each output group can have its filter condition.

Also, there is a default group, and this default group has record sets that don't satisfy
any of the group conditions.

For example, if we have created two groups for the filter conditions rollno=20 &
rollno=30 respectively, then those records which are not having rollno 20 and 30 will be
passed into this default group.
The data which is rejected by the filter groups will be collected by this default group,
and sometimes there can be a requirement to store these rejected data. In this way, the
default output group can be useful.

To allow multiple filter conditions, the router transformation provides a group option.

o There is a default input group that takes input data.

o There is also a default output group that contains all those data which are not passed by
any filter condition.

o For every filter condition, an output group is created in router transformation. We can
connect different targets to these different groups.

Creating Router Transformation


Follows the following steps to create the router transformation, such as:

Step 1: Create a mapping having source "STUD" and target "STUD_TARGET."

Step 2: Then in the mapping

1. Select Transformation menu.

2. Click on the Create

Step 3: In the create transformation window

1. Select router transformation

2. Enter a name for the transformation "rtr_rollno_20"

3. And click on the Create

Step 4: The router transformation will be created in the mapping, select done option in
the window.

Step 5: Drag and drop all the columns from Source qualifier to router transformation.

Step 6: Double click on the router transformation, then in the transformation property
of it

1. Select the group tab.

2. Enter the group name rollno_30.

3. Click on the group filter condition.


Step 7: In the expression editor, enter the filter condition rollno=30 and select the OK
button.

Step 8: Click on the OK button in the group window.

Step 9: Connect the ports from the group rollno_30 of router transformation to target
table ports.

Now, when we execute this mapping, the filtered records will get loaded into the target
table.

Joiner Transformation
Joiner transformation is an active and connected transformation. It provides the option
of creating joins in the Informatica. By using the joiner transformation, the created joins
are similar to the joins in databases.

The joiner transformation is used to join two heterogeneous sources. The joiner
transformation joins sources on the basis of a condition that matches one or more pairs
of columns between the two sources.

The two input pipelines include a master and a detail pipeline. We need to join the
output of the joiner transformation with another source to join more than two sources.
And to join n number of sources in mapping, we need n-1 joiner transformations.

In joiner transformation, there are two sources which we are using for joins, such as:

o Master Source

o Detail Source

In the properties of joiner transformation, we can select which data source can be a
Master source and which source can be a detail source.

During execution, the master source is cached into the memory for joining purpose. So
it is necessary to select the source with less number of records as the master source.

Configuring Joiner Transformation


In Informatica, we configure the following properties of joiner transformation, such as:

o Case-Sensitive String Comparison: The integration service uses this option when we
are performing joins on string columns. By default, the case sensitive string comparison
option is checked.
o Cache Directory: Directory used to cache the master or detail rows. The default
directory path is $PMCacheDir. We can override this value as well.

o Join Type: The type of join to be performed as Master Outer Join, Detail Outer Join,
Normal Join, or Full Outer Join.

o Tracing Level: It is used to track the Level of tracing in the session log file.

o Joiner Data Cache Size: It tells the size of the data cache. And Auto is the default value
of the data cache size.

o Joiner Index Cache Size: It tells the size of the index cache. And Auto is the default
value of the index cache size.

o Sorted Input: This option is used when the input data is in sorted order. And it gives
better performance.

o Master Sort Order: It gives the sort order of the master source data. If the master
source data is sorted in ascending order, then we choose Ascending. We have to enable
the Sorted Input option if we choose Ascending. And Auto is the default value for this
option.

o Transformation Scope: We can select the transformation scope as All Input or Row.

Types of Joins
In Informatica, the following joins can be created using joiner transformation, such as:

1. Master outer join


In Master outer join, all records from the Detail source are returned by the join, and only
matching rows from the master source are returned.

2. Detail outer join


In detail outer join, only matching rows are returned from the detail source, and all rows
from the master source are returned.

3. Full outer join.


In full outer join, all records from both the sources are returned. Master outer and Detail
outer joins are equivalent to left outer joins in SQL.

4. Normal join
In normal join, only matching rows are returned from both the sources.
Example
In the following example, we will join emp and dept tables using joiner transformation in
the following steps:

Step 1: Create a new target table EMP_DEPTNAME in the database using the below
script and import the table in Informatica targets.

Step 2: Create a new mapping and import source tables "EMP" and "DEPT" and target
table, which we created in the previous step.

Step 3: From the transformation menu, select create option and,

1. Select joiner transformation

2. Enter transformation name "jnr_emp_dept"

3. Select create option

Step 4: Drag and drop all the columns from both the source qualifiers to the joiner
transformation.

Step 5: Double click on the joiner transformation, then in the edit transformation
window:

1. Select the condition tab.

2. Click on the add new condition icon.

3. Select deptno in master and detail columns list.

Step 6: Then, in the same window:

1. Select the properties tab.

2. Select normal Join as join type.

3. Click on the OK button.

For performance optimization, we assign the master source to the source table pipeline,
which is having less number of records. To perform this task:

Step 7: Double click on the joiner transformation to open the edit properties window,
and then

1. Select the ports tab.


2. Select any column of a particular source that you want to make a master.

3. Click on the OK button.

Step 8: Link the relevant columns from the joiner transformation to the target table.

Now save the mapping and execute it after creating a session and workflow for it. The
join will be created using Informatica joiner, and relevant details will be fetched from
both the tables.

Sorted Input
When both the Master and detail source are sorted on the ports specified in the join
condition, then use the sorted input option in the joiner properties tab.

We can improve the performance by using the sorted input option as the integration
service performs the join by minimizing the number of disk IOs. It gives excellent
performance when we are working with large data sets.

here are some steps to configuring the sorted input option, such as:

o Sort the master and detail source either by using the source qualifier transformation or
sorter transformation.

o Sort both the source on the ports to be used in join conditions either in ascending or
descending order.

o Specify the Sorted Input option in the joiner transformation properties tab.

Blocking Transformation
The joiner Transformation is called as the blocking transformation. The integration
service blocks and unblocks the source data depending on whether the joiner
transformation is configured for sorted input or not.
Unsorted Joiner Transformation
In the case of unsorted joiner transformation, the integration service first reads all the
master rows before it reads the detail rows.

The integration service blocks the detail source while it caches all the master rows. Once
it reads all the master rows, then it unblocks the detail source and understands the
details rows.
Sorted Joiner Transformation
The blocking logic may or may not possible in case of sorted joiner transformation. The
integration service uses blocking logic if it can do so without blocking all sources in the
target load order group. Otherwise, it does not use blocking logic.

How to Improve Joiner Transformation Performance?


Below are some important points to improve the performance of a joiner
transformation, such as:

o If possible, perform joins in a database. Performing joins in a database is faster than


performing joins in a session.

o We can improve the session performance by configuring the Sorted Input option in the
joiner transformation properties tab.

o Specify the source with fewer rows and with fewer duplicate keys as the Master and the
other source as detail.

Limitations of Joiner Transformation


Here are the following limitations of joiner transformation, such as:

o We cannot use joiner transformation when the input pipeline contains an update
strategy transformation.

o We cannot connect a sequence generator transformation directly to the joiner


transformation.

Rank Transformation
Rank is an active and connected transformation that performs the filtering of data based
on the group and ranks. The rank transformation also provides the feature to do ranking
based on groups.

The rank transformation has an output port, and it is used to assign a rank to the rows.

In Informatica, it is used to select a bottom or top range of data. While string value ports
can be ranked, the Informatica Rank Transformation is used to rank numeric port values.
One might think MAX and MIN functions can accomplish this same task.
However, the rank transformation allows groups of records to be listed instead of a
single value or record. The rank transformation is created with the following types of
ports.

1. Input port (I)

2. Output port (O)

3. Variable port (V)

4. Rank Port (R)

Rank Port
The port which is participated in a rank calculation is known as Rank port.

Variable Port
A port that allows us to develop expression to store the data temporarily for rank
calculation is known as a variable port.

The variable port will enable us to write expressions that are required for rank
calculation.

Ports in a Rank Transformation


Ports Number Description
Required

I 1 Minimum Port to receive data from another


transformation.

O 1 Minimum Port we want to pass to other


transformations.

V not needed It is used to store values or


calculations for use in an
expression.

R Only 1 The Rank port is an input or


output port.
We have linked the Rank port to
another transformation.
For example: Total Salary

Configuring the Rank Transformation


Let’s see how to configure the following properties of Rank transformation:

o Cache Directory: The directory is a space where the integration service creates the index
and data cache files.

o Top/Bottom: It specifies whether we want to select the top or bottom rank of data.

o Number of Ranks: It specifies the number of rows that we want to rank.

o Case-Sensitive String Comparison: It is used to sort the strings by using the case
sensitive.

o Tracing Level: The amount of logging to be tracked in the session log file.

o Rank Data Cache Size: The data cache size default value is 2,000,000 bytes. We can set a
numeric value or Auto for the data cache size. In the case of Auto, the Integration Service
determines the cache size at runtime.

o Rank Index Cache Size: The index cache size default value is 1,000,000 bytes. We can set
a numeric value or Auto for the index cache size. In the case of Auto, the Integration
Service determines the cache size at runtime.

What is Rank Index?


The Developer tool creates a rank index port for each Rank transformation. The Data
Integration Service uses the Rank Index port to store the ranking position for each row
in a group.

After the Rank transformation identifies all rows that belong to a top or bottom rank, it
then assigns rank index values. If two rank values match, they receive the same value in
the rank index, and the transformation skips the next value.

The rank index is an output port only. We can pass the rank index to another
transformation in the mapping or directly to a target.

Defining Groups
The Rank transformation gives us group information like the aggregator transformation.
For example: If we want to select the 20 most expensive items by manufacturer, we
would first define a group for each manufacturer.

Example
Suppose we want to load top 5 salaried employees for each department; we will
implement this using rank transformation in the following steps, such as:

Step 1: Create a mapping having source EMP and target EMP_TARGET

Step 2: Then in the mapping,

1. Select the transformation menu.

2. And click on the Create option.

Step 3: In the create transformation window,

1. Select rank transformation.

2. Enter transformation name "rnk_salary".

3. And click on the Create button.

Step 4: The rank transformation will be created in the mapping, select the done button
in the window.

Step 5: Connect all the ports from source qualifier to the rank transformation.

Step 6: Double click on the rank transformation, and it will open the "edit
transformation window". In this window,

1. Select the properties menu.

2. Select the "Top" option from the Top/Bottom property.

3. Enter 5 in the number of ranks.

Step 7: In the "edit transformation" window again,

1. Select the ports tab.

2. Select group by option for the Department number column.

3. Select Rank in the Salary Column.

4. Click on the OK button.


Step 8: Connect the ports from rank transformation to the target table.

Now, save the mapping and execute it after creating session and workflow. The source
qualifier will fetch all the records, but the rank transformation will pass only records
having three high salaries for each department.

Sequence Generator Transformation


Sequence generator is a passive and connected transformation, and it generates
numeric sequence values such as 1, 2, 3, and so on. It does not affect the number of
input rows.

The Sequence Generator transformation is used to create unique primary key values and
replace missing primary keys.

For example, if we want to assign sequence values to the source records, then we need
to use a sequence generator.

The sequence generator transformation consists of two output ports. We cannot edit or
delete these ports, such as:

1. CURRVAL

2. NEXTVAL

NEXTVAL
The NEXTVAL port is used to generate sequence numbers by connecting it to a
Transformation or target. The generated sequence numbers are based on the Current
Value and Increment By properties.

If the sequence generator is not configuring to Cycle, then the NEXTVAL port makes the
sequence numbers up to the set End Value.

We can connect the NEXTVAL port to multiple transformations to generate unique


values for each row.

The sequence generator transformation creates a block of numbers at the same time. If
the block of numbers is used, then it generates the next block of sequence numbers.

For example, we might connect NEXTVAL to two target tables in mapping to create
unique primary key values.
The integration service generates a block of numbers 1 to 10 for the first target. When
the first block of numbers has been loaded, only then another block of numbers 11 to
20 will be generated for the second target.

CURRVAL
The CURRVAL port is NEXTVAL plus the Increment By value.

We only connect the CURRVAL port when the NEXTVAL port is already linked to a
downstream transformation.

If we combine the CURRVAL port without connecting the NEXTVAL port, the Integration
Service passes a constant value for each row.

When we combine the CURRVAL port in a Sequence Generator Transformation, then the
Integration Service processes one row in each block.

We can optimize performance by connecting only the NEXTVAL port in a Mapping.

Example: Suppose STUD will be a source table.

Create a target STUD_SEQ_GEN_EXAMPLE in the shared folder. Structure the same as


STUD. Add two more ports NEXT_VALUE and CURR_VALUE to the target table.

We can create a Sequence Generator transformation to use in a single mapping, or a


reusable Sequence Generator transformation to use in multiple mappings.

A reusable Sequence Generator transformation maintains the integrity of the sequence


in each mapping that uses an instance of the Sequence Generator transformation.

Properties of Sequence Generator Transformation


Below are the following properties to configure a sequence data object and a new
sequence:

Property Description

Start Value The start value of the generated sequence is


the Integration Service when using the Cycle
option. If we select Cycle, the Integration
Service cycles back to this value when it
reaches the end value.
The default value is 0.
Maximum value is 9,223,372,036,854,775,806.
End Value The maximum value that the Integration
Service generates. If the Integration Service
reaches this value during the session, and the
sequence is not configured to cycle, the
session fails.
Maximum value is 9,223,372,036,854,775,807.

Increment Difference between two consecutive values


Value from the NEXTVAL port.
The default value is 1.
And it must be a positive integer.
Maximum value is 2,147,483,647.

Cycle If enabled, the Integration Service cycles


through the sequence range and start over
with the start value.
If disabled, the Integration Service stops the
sequence at the configured end value. The
Integration Service fails the session with
overflow errors if it reaches the end value and
still has rows to process.

Reset If enabled, the Integration service resets the


sequence data object to the start value when
the mapping completely run. If disabled, the
Integration Service increments the current
value after the mapping run ends, and uses
that value in the next mapping run.
This property is disabled for reusable
Sequence Generator transformations and for
non-reusable Sequence Generator
transformations that use a reusable sequence
data object.

Tracing Level of detail about the transformation that


Level the Integration Service writes into the
mapping log. We can choose terse, normal,
verbose initialization or verbose data. Normal
sets as a default level.

Maintain Maintain the row order of the input data to the


Row Order transformation. Select this option if the
Integration Service should not perform any
optimization that can change the row order.

Example
In the below example, we will generate sequence numbers and store in the target in the
following steps, such as:

Step 1: Create a target table.

Step 2: Import that created table in Informatica as the target table.

Step 3: Create a new mapping and import STUD source and STUD_SEQUENCE target
table.

Step 4: Create a new transformation in the mapping,

1. Select sequence transformation as the type.

2. Enter transformation name such as seq_stud.

3. Click on the Create

Step 5: Sequence generator transformation will be created, then click on


the Done button.

Step 6: Link the NEXTVAL column of sequence generator to the SNO column in the
target table.

Step 7: Link the other columns from source qualifier transformation to the target table.

Step 8: Double click on the sequence generator to open the property window, and then

1. Select the properties tab.

2. Enter the properties with Start value =1 and leave the other properties as default.

3. Click on the OK

Now save the mapping and execute it after creating the session and workflow.
The SNO column in the target would contain the sequence numbers generated by the
sequence generator transformation.

Transaction Control Transformation


A Transaction Control transformation is an active and connected transformation. It
allows us to commit and rollback transactions based on a set of rows that pass through
a Transaction Control transformation.

Commit and rollback operations are of significant importance as it guarantees the


availability of data.

A transaction is the set of rows bound by commit or rollback rows. We can define a
transaction based on the varying number of input rows. We can also identify
transactions based on a group of rows ordered on a common key, such as employee ID
or order entry date.

When processing a high volume of data, there can be a situation to commit the data to
the target. If a commit is performed too quickly, then it will be an overhead to the
system.

If a commit is performed too late, then in the case of failure, there are chances of losing
the data. So the Transaction control transformation provides flexibility.

In PowerCenter, the transaction control transformation is defined in the following levels,


such as:

o Within a mapping: Within a mapping, we use the Transaction Control transformation to


determine a transaction. We define transactions using an expression in a Transaction
Control transformation. We can choose to commit, rollback, or continue on the basis of
the return value of the expression without any transaction change.

o Within a session: We configure a session for the user-defined commit. If the Integration
Service fails to transform or write any row to the target, then We can choose to commit
or rollback a transaction.

When we run the session, then the Integration Service evaluates the expression for each
row that enters the transformation. When it evaluates a committed row, then it commits
all rows in the transaction to the target or targets. When the Integration Service
evaluates a rollback row, then it rolls back all rows in the transaction from the target or
targets.
If the mapping has a flat-file as the target, then the integration service can generate an
output file for a new transaction each time. We can dynamically name the target flat
files. Here is the example of creating flat files dynamically - Dynamic flat-file creation.

TCL COMMIT & ROLLBACK Commands


There are five in-built variables available in the transaction control transformation to
handle the operation.

1. TC_CONTINUE_TRANSACTION
The Integration Service does not perform any transaction change for the row. This is the
default value of the expression.

2. TC_COMMIT_BEFORE
The Integration Service commits the transaction, begins a new transaction, and writes the
current row to the target. The current row is in the new transaction.
In tc_commit_before, when this flag is found set, then a commit is performed before the
processing of the current row.

3. TC_COMMIT_AFTER
The Integration Service writes the current row to the target, commits the transaction, and
begins a new transaction. The current row is in the committed transaction.
In tc_commit_after, the current row is processed then a commit is performed.

4. TC_ROLLBACK_BEFORE
The Integration Service rolls back the current transaction, begins a new transaction, and
writes the current row to the target. The current row is in the new transaction.
In tc_rollback_before, rollback is performed first, and then data is processed to write.

5. TC_ROLLBACK_AFTER
The Integration Service writes the current row to the target, rollback the transaction, and
begins a new transaction. The current row is in the rolled-back transaction.
In tc_rollback_after data is processed, then the rollback is performed.

How to Create Transaction Control Transformation


Follows the following steps to create transaction control transformation, such as:

Step 1: Go to the mapping designer.

Step 2: Click on transformation in the toolbar, and click on the Create button.
Step 3: Select the transaction control transformation.

Step 4: Then, enter the name and click on the Create button.

Step 5: Now click on the Done button.

Step 6: We can drag the ports into the transaction control transformation, or we can
create the ports manually in the ports tab.

Step 7: Go to the properties tab.

Step 8: And enter the transaction control expression in the Transaction Control
Condition.

Configuring Transaction Control Transformation


Here are the following components which can be configuring in the transaction control
transformation, such as:

1. Transformation Tab: It can rename the transformation and add a description.

2. Ports Tab: It can create input or output ports.

3. Properties Tab: It can define the transaction control expression and tracing level.

4. Metadata Extensions Tab: It can add metadata information.

Transaction Control Expression


We can enter the transaction control expression in the Transaction Control Condition
option in the properties tab.

The transaction control expression uses the IIF function to check each row against the
condition.

Syntax

Here is the following syntax for the Transaction Control transformation expression, such
as:
1. IIF (condition, value1, value2)
For example:
1. IIF (dept_id=11, TC_COMMIT_BEFORE,TC_ROLLBACK_BEFORE)
Example
In the following example, we will commit data to the target when dept no =10, and this
condition is found true.
Step 1: Create a mapping with EMP as a source and EMP_TARGET as the target.

Step 2: Create a new transformation using the transformation menu, then

1. Select a transaction control as the new transformation.

2. Enter transformation name tc_commit_dept10.

3. And click on the create button.

Step 3: The transaction control transformation will be created, then click on the done
button.

Step 4: Drag and drop all the columns from source qualifier to the transaction control
transformation then link all the columns from transaction control transformation to the
target table.

Step 5: Double click on the transaction control transformation and then in the edit
property window:

1. Select the property tab.

2. Click on the transaction control editor icon.

Step 6: In the expression editor enter the following expression:

1. "iif(deptno=10,tc_commit_before,tc_continue_transaction)".

2. And click on the OK button.

3. It means if deptno 10 is found, then commit transaction in target, else continue the
current processing.

Step 7: Click on the OK button in the previous window.

Now save the mapping and execute it after creating sessions and workflows. When the
department number 10 is found in the data, then this mapping will commit the data to
the target.

Lookup Transformation in Informatica


Lookup transformation is used to look up a source, source qualifier, or target to get the
relevant data.

It is a kind of join operation in which one of the joining tables is the source data, and the
other joining table is the lookup table.
The Lookup transformation is used to retrieve data based on a specified lookup
condition. For example, we can use a Lookup transformation to retrieve values from a
database table for codes used in source data.

When a mapping task includes a Lookup transformation, then the task queries the
lookup source based on the lookup fields and a lookup condition. The Lookup
transformation returns the result of the lookup to the target or another transformation.

We can configure the Lookup transformation to return a single row or multiple rows.
This is the passive transformation which allows performing the lookup on the flat files,
relational table, views, and synonyms.

When we configure the Lookup transformation to return multiple rows, the Lookup
transformation is an active transformation. The lookup transformation supports
horizontal merging, such as equijoin and nonequijoin.

When the mapping contains the work up transformation, the integration service queries
the lock up data and compares it with lookup input port values.

The lookup transformation is created with the following type of ports, such as:

o Input port (I)

o Output port (O)

o Look up Ports (L)

o Return Port (R)

Perform the following tasks using a Lookup transformation, such as:

o Get a related value: Retrieve a value from the lookup table on the basis of a value in the
source. For example, the source has a student rollno. Retrieve the student name from the
lookup table.

o Get multiple values: Retrieve the multiple rows from a lookup table. For example, return
all students in a class.

o Perform a calculation: Retrieve any value from a lookup table and use it in a calculation.
For example, retrieve the marks, calculate the percentage, and return the percentage to a
target.

o Update slowly changing dimension tables: Determine the rows that exist in the target.
Configure the Lookup Transformation
Configure the Lookup transformation to perform the different types of lookups, such as:

o Relational or flat-file lookup: Perform a lookup on a flat file or a relational table. When
we create a Lookup transformation by using a relational table as the lookup source, we
can connect to the lookup source using ODBC and import the table definition as the
structure for the Lookup transformation.
When we create a Lookup transformation by using a flat-file as a lookup source, the
Designer invokes the Flat-file Wizard.

o Pipeline lookup: Perform a lookup on the application sources such as JMS or MSMQ.
Drag the source into the mapping and associate the Lookup transformation with the
source qualifier. When the Integration Service retrieves source data for the lookup cache
then configure the partitions to improve performance.

o Connected or unconnected lookup: A connected Lookup transformation receives


source data, performs a lookup, and returns data to the pipeline. Or the unconnected
Lookup transformation is not connected to a target.
A transformation in the pipeline calls the Lookup transformation with a :LKP expression.
The unconnected Lookup transformation returns one column to the calling
transformation.

o Cached or uncached lookup: Cache the lookup source to improve performance. We can
use static or dynamic cache for caching the lookup source.
By default, the lookup cache remains static and does not change during the session.
With a dynamic cache, the Integration Service inserts or updates rows in the cache.
When we cache the target table as the lookup source, we can look up values in the cache
to determine if the values exist in the target. The Lookup transformation marks rows to
insert or update the target.

Normalizer Transformation
The Normalizer is an active transformation. It is used to convert a single row into
multiple rows. When the Normalizer transformation receives a row that contains
multiple-occurring data, it returns a row for each instance of the multiple-occurring
data.

If in a single row, there is repeating data in multiple columns, then it can be split into
multiple rows. Sometimes we have data in multiple occurring columns.

For example, a relational source includes four fields with flat sales data. We can
configure a Normalizer transformation to generate a separate output row for each flat.

When the Normalizer returns multiple rows from an incoming row, it returns duplicate
data for single-occurring incoming columns.

The Normalizer transformation receives a row that contains multiple-occurring columns


and returns a row for each instance of the multiple-occurring data. The transformation
processes multiple-occurring columns or multiple-occurring groups of columns in each
source row.

Here are the following properties of Normalizer transformation in the Properties panel,
such as:

o Normalized Fields Tab: Define the multiple-occurring fields and specify additional fields
that you want to use in the mapping.

o Field Mapping Tab: Connect the incoming fields to the normalized fields.

We need the appropriate license to use the Normalizer transformation.

The Normalizer transformation parses multiple-occurring columns from COBOL sources,


relational tables, or other sources. It can process multiple record types from a COBOL
source that contains a REDEFINES clause.

Normalizer Transformation Types


Here are the two types of Normalizer transformation, such as:

o VSAM Normalizer Transformation: A non-reusable transformation that is a Source


Qualifier transformation for a COBOL source. The Mapping Designer creates VSAM
Normalizer columns from a COBOL source in a mapping.
The column attributes are read-only. The VSAM Normalizer receives a multiple-occurring
source column through one input port.

o Pipeline Normalizer Transformation: A transformation that processes


multiple-occurring data from relational tables or flat files.
We create the columns manually and edit them in the Transformation Developer or
Mapping Designer. The pipeline Normalizer transformation represents multiple-occurring
columns with one input port for each source column occurrence.

Example
We create the following table that represents the student marks records of different
classes, such as:

Step 1: Create the source table "stud_source" and target table "stud_target" using the
script and import them in Informatica.

Student Class Class Class Class


Name 7 8 9 10

Joy 60 65 75 80

Edward 65 70 80 90

Step 2: Create a mapping having source stud_source and target table stud_target.

Step 3: From the transformation menu create a new transformation

1. Select normalizer as transformation.

2. Enter the name nrm_stud.

3. And click on the Create

Step 4: The transformation will be created, then click on the Done button.

Step 5: Double click on the normalizer transformation, then

1. Select the normalizer tab.

2. Click on icon to create two columns.

3. Enter column names.

4. Set number of occurrences to 4 for marks and 0 for student name.

5. Click on the OK
Columns will be generated in the transformation. We will see 4 number of marks column
as we set the number of occurrences to 4.

Step 6: Then in the mapping

1. Link the four-column of source qualifier of the four class to the normalizer columns,
respectively.

2. Link the student name column to the normalizer column.

3. Link student_name & marks columns from normalizer to the target table.

Save the mapping and execute it after creating session and workflow. The class score
column is repeating in four columns. For each class score of the student, a separate row
will be created by using the Normalizer transformation.

The output of the above mapping will look like the following:

Student Name Class Score

Joy 7 60

Joy 8 65

Joy 9 75

Joy 10 80

Edward 7 65

Edward 8 70

Edward 9 80

Edward 10 90

The source data had repeating columns, namely class7, class 8, class 9, and class 10. We
have rearranged the data to fit into a single column of class, and for one source record,
four records are created in the target by using Normalizer.
In this way, we can normalize data and create multiple records for a single source of
data.

Performance Tuning in Informatica


The goal of performance tuning is to optimize session performance by eliminating
performance bottlenecks to get a better acceptable ETL load time.

Tuning starts with the identification of bottlenecks in the source, target, and mapping
and further to session tuning. It might need further tuning on the system resources on
which the Informatica PowerCenter Services are running.

We can use the test load option to run sessions when we tune session performance.

If we tune all the bottlenecks, we can further optimize session performance by


increasing the number of pipeline partitions in the session.

Adding partitions will improve the performance by utilizing more of the system
hardware while processing the session.

Determining the best way to improve performance can be complicated, so it's better to
change one variable at a time. If the session performance does not improve, then we
can return to the original configuration.

The goal of performance tuning is to optimize session performance so that the sessions
run during the available load window for the Informatica Server.

We can increase the session performance with the help of the following tasks, such as:

o Network: The performance of the Informatica Server is related to network connections.


Generally, the data moves across a network at less than 1 MB per second, whereas a local
disk moves data five to twenty times faster. Thus network connections often affect
session performance. So avoid network connections.

o Flat files: If the flat files are stored on a machine other than the Informatica server, then
move those files to the device that consists of the Informatica server.

o Less Connection: Minimize the connections to sources, targets, and Informatica server
to improve session performance. Moving the target database into the server system may
improve session performance.
o Staging areas: If we use staging areas, then force the Informatica server to perform
multiple data passes. Removing of staging areas can improve the session performance.
Use the staging area only when it is mandatory.

o Informatica Servers: We can run the multiple Informatica servers against the same
repository. Distributing the session load into the multiple Informatica servers improves
the session performance.

o ASCII: Run the Informatica server in ASCII data movement mode improves the session
performance. Because ASCII data movement mode stores a character value in one byte,
Unicode mode takes 2 bytes to save a character.

o Source qualifier: If a session joins multiple source tables in one Source Qualifier,
optimizing the query can improve performance. Also, single table select statements with
an ORDER BY or GROUP BY clause can be beneficial from optimization, such as adding
indexes.

o Drop constraints: If the target consists of key constraints and indexes, then it slows the
loading of data. To improve the session performance, drop constraints and indexes
before running the session (while loading facts and dimensions) and rebuild them after
completion of the session.

o Parallel sessions: Running parallel sessions by using concurrent batches will also reduce
the time of loading the data. So concurrent batches increase the session performance.

o Partitioning: The session improves the session performance by creating multiple


connections for sources/targets and loads data in parallel pipelines.

o Incremental Aggregation: If a session contains an aggregator transformation, then we


use incremental aggregation to improve session performance.

o Transformation Errors: Avoid transformation errors to improve session performance.


Before saving the mapping, validate it and see if any error occurs, then transformation
errors rectify it.

o Lookup Transformations: If the session contained lookup transformation, then we can


improve the session performance by enabling the lookup cache. The cache enhances the
speed by saving the previous data and hence no need to load that again.
o Filter Transformations: If the session contains filter transformation, create that filter
transformation nearer to the sources, or we can use filter condition in source qualifier.

o Group transformations: Aggregator, Rank, and joiner transformation may often


decrease the session performance because they must group data before processing it.
We use sorted ports option to improve session performance, i.e., sort the data before
applying the transformation.

o Packet size: We can improve the session performance by configuring the network packet
size, which allows data to cross the network at one time. To do this, go to the server
manager, choose server configure database connections.

Informatica BDM
Informatica Big Data Management (BDM) product is a GUI based integrated
development tool. This tool is used by organizations to build Data Quality, Data
Integration, and Data Governance processes for their big data platforms.

Informatica BDM has built-in Smart Executor that supports various processing engines
such as Apache Spark, Blaze, Apache Hive on Tez, and Apache Hive on MapReduce.

Informatica BDM is used to perform data ingestion into a Hadoop cluster, data
processing on the cluster, and extraction of data from the Hadoop cluster.

In Blaze mode, the Informatica mapping is processed by BlazeTM - Informatica's native


engine that runs as a YARN based application.

In Spark mode, the Informatica mappings are translated into Scala code.

In Hive and MapReduce mode, Informatica's mappings are translated into MapReduce
code and are executed natively to the Hadoop cluster.

Informatica BDM integrates seamlessly with the Hortonworks Data Platform (HDP)
Hadoop cluster in all related aspects, including its default authorization system. Ranger
can be used to enforce a fine-grained role-based authorization to data as well as
metadata stored inside the HDP cluster.

Informatica's BDM integrates with Ranger in all modes of execution. Informatica's BDM
has a Smart Executor that enables organizations to run their Informatica mappings
seamlessly on one or more methods of implementation under the purview of their
existing security setup.
Authentication
Authentication is the process of dependably ensuring the user is who claims to be.
Kerberos is the widely accepted authentication mechanism on Hadoop, including the
Hortonworks Data Platform. Kerberos protocol relies on a Key Distribution Center (KDC),
a network service that issues tickets permitting access.

Informatica BDM supports Kerberos authentication on both Active directory and


MIT-based key distribution centers. Kerberos authentication is supported by all modes
of execution in Informatica BDM.

Authorization
Authorization is the process of determining whether a user has access to perform
certain operations on a given system or not. In HDP Hadoop clusters, authorization
plays a vital role in ensuring the users access only the data that they are allowed to by
the Hadoop administrator.

1. Blaze- YARN Application


When executing mappings on Informatica Blaze, optimizer first makes an invocation to
Hadoop Service to fetch metadata information such as the hive table's partitioning
details.
Then the job is submitted to Blaze Runtime. The illustration represents how Blaze
interacts with the Hadoop Service, such as Hive Server 2.

When an Informatica mapping gets executed in Blaze mode, then call is made to the
Hive Metastore to understand the structure of the tables.

The Blaze runtime then loads the optimized mapping into memory. This mapping then
interacts with the corresponding Hadoop service to read the data or write the data.

The Hadoop service itself is integrated with Ranger and ensures the authorization is
taken place before the request is served.

2. Spark
Informatica BDM can execute mappings as Spark's Scala code on the HDP Hadoop
cluster. The illustration details different steps involved when using Spark execution
mode.
The Spark executor translates Informatica's mappings into the Spark Scala code. As part
of this translation, if Hive sources or targets are involved, then Spark executor makes a
call to Hive metastore to understand the structure of the Hive tables and optimize the
Scala code.

Then, this Scala code is submitted to YARN for execution. When the Spark code accesses
the data, the corresponding Hadoop service relies on Ranger for authorization.

3. Hive on MapReduce
Informatica BDM can execute mappings as MapReduce code on the Hadoop cluster.
Below illustration steps involved Hive on MapReduce mode.
When a mapping is executed in Hive on MapReduce mode, the Hive executor on the
Informatica node translates the Informatica mapping into MapReduce and submits the
job to the Hadoop cluster.

If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta
store to understand the table structure and accordingly optimize the mapping. As the
MapReduce interacts with Hadoop services such as HDFS and Hive, the Hadoop service
authorizes the requests with Ranger.

4. Hive on Tez
Tez can be enabled in Informatica BDM by a configuration change and is transparent to
the mapping developed.
Hence mappings running on Hive on Tez follow a similar pattern as Hive on MapReduce.
When a mapping is executed in the Hive on Tez mode, the Hive executor on
the Informatica node translates the Informatica mapping into Tez job and submits it to
the Hadoop cluster.

If Hive sources or targets are involved, the Hive executor makes a call to the Hive Meta
store to understand the table structure and accordingly optimize the mapping. As the
Tez job interacts with Hadoop services such as HDFS and Hive, the Hadoop service
authorizes the requests with Ranger.

Partitioning in Informatica
The PowerCenter Integration Services creates a default partition type at each partition
point. If we have the Partitioning option, we can change the partition type. The partition
type controls how the PowerCenter Integration Service distributes data among
partitions at partition points.
When we configure the partitioning information for a pipeline, then we must define a
partition type at each partition point in the pipeline. The partition type determines how
the PowerCenter Integration Service redistributes data across partition points.

Here are the following partition types in the Workflow Manager, such as:

1. Database partitioning: The PowerCenter Integration Service queries the IBM DB2 or
Oracle system for table partition information. It reads partitioned data from the
corresponding nodes in the database. Use database partitioning with Oracle or IBM DB2
source instances on a multi-node table space. Use database partitioning with DB2
targets.

2. Hash partitioning: Use hash partitioning when we want the PowerCenter Integration
Service to distribute rows to the partitions by the group. For example, we need to sort
items by item ID, but we do not know how many items have a particular ID number.
Here are the two types of hash partitioning, such as:

o Hash auto-keys: The PowerCenter Integration Service uses all grouped or sorted
ports as a compound partition key. Then we need to use hash auto-keys
partitioning at Rank, Sorter, and unsorted Aggregator transformations.

o Hash user keys: The PowerCenter Integration Service uses a hash function to
group rows of data among partitions. And define the number of ports to
generate the partition key.

3. Key range: It specifies one or more ports to form a compound partition key. The
PowerCenter Integration Service passes data to each partition depending on the ranges
we define for each port. Use key range partitioning where the sources or targets in the
pipeline are partitioned by key range.

4. Pass-through: The PowerCenter Integration Service passes all rows at one partition
point to the next partition point without redistributing them. Choose pass-through
partitioning where we want to create a new pipeline stage to improve performance, but
do not want to change the distribution of data across partitions.

5. Round-robin: The PowerCenter Integration Service distributes blocks of data to one or


more partitions. Use round-robin partitioning so that each partition process rows based
on the number and size of the blocks.
Key Points of Informatica Partitions
Below are some essential points while we use the partitions in Informatica, such as:

o We cannot create a partition key for round-robin, hash auto-keys, and pass-through
partition.

o If we have a bitmap index upon the target and using the pass-through partition, then we
need to update the target table. In this process, the session might be failing because the
bitmap index creates the locking problem.

o Partition increases the total DTM buffer memory requirement. To ensure enough free
memory to avoid memory allocation failures.

o When we use a pass-through partition, then Informatica tries to make multiple


connection requests to the database server. To ensure that the database is configured to
accept the more connections requests.

o We can use the native database options as partition alternatives to increase the degree
of parallelism of query processing.
For example, in the Oracle database, we can specify a PARALLEL hint or alter the DOP of
the table.

o We can also use both Informatica and native database level parallel as per the
requirements.
For example, create 2 pass-through pipelines and each sending the query to the Oracle
database with the PARALLEL hint.

Informatica IDQ
Informatica Data Quality is a suite of applications and components that we can integrate
with Informatica PowerCenter to deliver enterprise-strength data quality capability in a
wide range of scenarios.

The IDQ has the following core components such as:

o Data Quality Workbench

o Data Quality Server


Data Quality Workbench: It is used to design, test, and deploy data quality processes.
Workbench allows testing and executing plans as needed, enabling rapid data
investigation and testing of data quality methodologies.

Data Quality Server: It is used to enable plan and file sharing and to run programs in a
networked environment. The Data Quality Server supports networking through service
domains and communicates with Workbench over TCP/IP.

Both Workbench and Server install with a Data Quality engine and a Data Quality
repository. Users cannot create or edit programs with Server, although users can run a
program to any Data Quality engine independently of Workbench by runtime
commands or from PowerCenter.

Users can apply parameter files, which modify program operations, to runtime
commands when running data quality projects to a Data Quality
engine. Informatica also provides a Data Quality Integration plug-in for PowerCenter.

In Data Quality, a project is a self-contained set of data analysis or data enhancement


processes.

A project is composed of one or more of the following types of component, such as:

o Data sources provide the input data for the program.

o Data sinks collect the data output from the program.

o Operational components perform the data analysis or data enhancement actions on the
data they receive.

IDQ has been a front runner in the Data Quality (DQ) tools market. It will provide a
glance at the features these tools offer.

IDQ has two type variants, such as:

o Informatica Analyst

o Informatica Developer

Informatica analyst: It is a web-based tool that can be used by business analysts &
developers to analyze, profile, cleanses, standardize & scorecard data in an enterprise.

Informatica developer: It is a client-based tool where developers can create mappings


to implement data quality transformations or services. This tool offers an editor where
objects can be built with a wide range of data quality transformations such as Parser,
standardizer, address validator, match-merge, etc.
Develop once & deploy anywhere: Both tools can be used to create DQ rules or
mappings and can be implemented as web services. Once the DQ transformations are
deployed as services, they can be used across the enterprise and platforms.

Role of Dictionaries
Projects can make use of reference dictionaries to identify, repair, or remove inaccurate
or duplicate data values. Informatica Data Quality projects can make use of three types
of reference data.

Standard dictionary files: These files are installed with Informatica Data Quality and
can be used by various kinds of the component in Workbench. All dictionaries installed
with Data Quality are text dictionaries. These are plain-text files saved in .DIC file format.
They can be manually created and edited.

Database dictionaries: Informatica Data Quality users with database expertise can
design and specify dictionaries that are linked to database tables, and that this can be
updated dynamically when the underlying data is updated.

Third-party reference data: These data files are provided by third parties and are
provided by Informatica customers as premium product options. The reference data
provided by third-party vendors are typically in database format.

How to Integrate IDQ with MDM


Data cleansing and standardization is an essential aspect of any MDM project.
Informatica MDM Multi-Domain Edition (MDE) provides a reasonable number of
cleansing functions out-of-the-box. However, there are requirements when the OOTB
cleanse functions are not enough, and there is a need for comprehensive functions to
achieve data cleansing and standardization, e.g., address validation, sequence
generation. The Informatica Data Quality (IDQ) provides an extensive array of cleansing
and standardization options. IDQ can easily be used along with Informatica MDM.

There are three methods to integrate IDQ with Informatica MDM.

1. Informatica Platform staging

2. IDQ Cleanse Library

3. Informatica MDM as target

1. Informatica Platform Staging


Starting with Informatica MDM's Multi-Domain Edition (MDE) version 10.x, Informatica
has introduced a new feature called "Informatica Platform Staging" within MDM to
integrate with IDQ (Developer Tool). This feature enables to direct stage or cleanse data
using IDQ mappings to MDM's Stage tables bypassing Landing tables.

Advantages

o Stage tables are immediately available to use in the Developer tool after synchronization,
eliminating the need to manually create physical data objects.

o Changes to the synchronized structures are reflected in the Developer tool automatically.

o Enables loading data into Informatica MDM's staging tables, bypassing the landing
tables.

Disadvantages

o Creating a connection for each Base Object folder in the Developer tool can be
inconvenient to maintain.

o Hub Stage options like Delta detection, hard delete detection, and audit trails are not
available.

o System generated columns need to be populated manually.

o Rejected records are not captured in the _REJ table of the corresponding stage table but
get caught in .bad file.
o Invalid lookup values are not rejected while data loads to stage, unlike in the Hub Stage
Process. The record with invalid value gets rejected and captured by the Hub Load
process.

2. IDQ Cleanse Library


IDQ allows us to create functions as operation mappings and deploys them as web
service, which can then be imported in Informatica MDM Hub implementation as a new
type of cleansing library defined as IDQ cleanse library. This functionality allows usage of
the imported IDQ cleanse functions, just like any other out-of-the-box cleanse function.
Informatica MDM Hub acts as a Web service client application that consumes IDQ's web
services.

Advantages

o Quickly build transformations in IDQ's Informatica Developer tool rather than creating
complex java functions.

o Unlike Informatica Platform staging, Hub Stage process options such as delta detection,
hard delete detection, audit trail are available for use.

Disadvantages

o Physical data objects need to be manually created for each staging table and manually
updated for any changes to the table.

o IDQ function must contain all transformation logic to leverage the batching of records. If
any transformation logic is additionally defined in the MDM map, then calls to the IDQ
web service will be a single record leading to performance issues.
o Web service invocations are synchronous only, which can be a concern for large data
volume.

3. Informatica MDM as target


3.1 Loading data landing tables

Informatica MDM can be used as a target for loading the data to landing tables in
Informatica MDM.

Advantages

o The single connection created in the Developer tool for Informatica MDM is less
cumbersome when compared to creating multiple connections with Informatica platform
staging.

o No need to standardize data in the Hub Stage Process.

o Unlike Informatica Platform staging, Hub Stage process options - delta detection, hard
delete detection, audit trail are available to use.

Disadvantages

o Physical data objects need to be manually created for each landing table and manually
updated for any changes to the table.

o Need to develop mappings at two levels (i) source to landing and (ii) landing to staging
(direct mapping).

3.2 Loading data staging tables (bypassing landing tables)

Informatica MDM can be used as a target for loading the directly to staging tables in
Informatica MDM, bypassing landing tables.
Advantages

o The single connection created in the Developer tool for Informatica MDM is less
cumbersome when compared to creating multiple connections with Informatica platform
staging.

o It can be used for the lower version of Informatica MDM, where the Informatica Platform
staging option is not available.

Disadvantages

o Physical data objects need to be manually created for each staging table and manually
updated for any changes to the table.

o Hub Stage Delta detection, hard delete detection, and audit trails options are not
available.

o System generated columns need to be populated manually.

o Rejected records are not captured in the _REJ table of the corresponding stage table but
get caught in .bad file.

o Invalid lookup values are not rejected while data loads to stage, unlike in the Hub Stage
Process. The record with invalid value gets rejected and captured by the Hub Load
process.

You might also like