0% found this document useful (0 votes)
3 views5 pages

Informatica Notes

Informatica is a widely used ETL tool for data integration, enabling users to extract, transform, and load data from various sources into target systems. Key components include PowerCenter, Designer, Workflow Manager, and various transformations that facilitate data processing. Best practices for using Informatica involve modular design, robust error handling, and performance optimization to ensure efficient ETL processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views5 pages

Informatica Notes

Informatica is a widely used ETL tool for data integration, enabling users to extract, transform, and load data from various sources into target systems. Key components include PowerCenter, Designer, Workflow Manager, and various transformations that facilitate data processing. Best practices for using Informatica involve modular design, robust error handling, and performance optimization to ensure efficient ETL processes.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Informatica is one of the most widely used ETL (Extract, Transform, Load) tools for data integration.

It
allows users to connect and fetch data from various sources, transform the data as required, and
load it into target systems like data warehouses, databases, etc. Below are some Informatica Notes
that cover the key concepts, components, and best practices:

1. Informatica Architecture Overview

 PowerCenter: The main product in Informatica for ETL processing.

 Client Components:

o Designer: Used to create mappings, which define the flow of data.

o Workflow Manager: Defines workflows and schedules tasks.

o Workflow Monitor: Monitors the status of workflows and sessions.

o Repository Manager: Manages repository objects (like mappings, sessions).

o Administrator Console: Manages the server, users, and privileges.

 Server Components:

o Integration Service: Executes ETL tasks (sessions, workflows).

o Repository Service: Manages the metadata and repository database.

o Session Logs: Store logs of the data processing.

 Repository: Stores metadata, mappings, transformations, and other objects.

2. Informatica PowerCenter Workflow

 A workflow is a set of instructions that tell Informatica how to run a task. It defines the
sequence of operations, such as sessions, commands, and other tasks.

 Session: Executes the mapping. It is part of a workflow.

 Command Task: Can run external programs or scripts.

 Email Notification: To send alerts based on success or failure.

 Event Wait/Trigger: Can pause workflows based on external events (like file arrival).

 Worklet: A reusable set of tasks that can be included in workflows.

3. Mapping and Transformations

 Mapping: Defines how data flows from source to target. It consists of sources,
transformations, and targets.

 Transformations:

o Source Qualifier: Filters and extracts data from the source.

o Expression: Used for data transformation, calculations, or manipulations.

o Aggregator: Performs aggregate functions (like sum, count) on data.


o Joiner: Joins two data sources based on a common key.

o Filter: Filters records based on specified conditions.

o Sorter: Sorts data before passing it to the next transformation.

o Lookup: Used to fetch related data from another source based on a lookup
condition.

o Router: Routes data to multiple target groups based on conditions.

o Update Strategy: Defines how to update data in the target (insert, update, delete, or
reject).

o Rank: Filters top N rows based on a ranking condition.

o Normalizer: Normalizes data (splits rows based on input fields).

4. Source and Target Definitions

 Source: The input data, which can come from different systems (databases, flat files, XML,
etc.).

 Target: The output system where the transformed data is loaded, such as databases, data
warehouses, or flat files.

 Source Definitions:

o Relational: Sources like Oracle, SQL Server, etc.

o Flat Files: Sources like CSV, text files.

o XML: XML files.

o Cobol: Cobol files with record-based data.

 Target Definitions:

o Same as sources (databases, files, XML, etc.).

5. Sessions and Workflows

 Session: A session runs a mapping. It reads data from the source, transforms it using the
transformations defined in the mapping, and writes the data to the target.

o Session Properties: Includes properties like session name, source and target
connections, performance options, and error handling.

o Session Logs: Log files that provide detailed information about the session execution.

 Workflow: A collection of tasks (like sessions, command tasks, etc.) that are executed in
sequence or parallel.

 Running a Workflow: Can be triggered manually, scheduled, or initiated by an external


event.

 Session and Workflow Recovery: If a session fails, Informatica can automatically restart from
the point of failure based on the recovery settings.
6. Informatica Repository

 Metadata Repository: The database where all the metadata for mappings, sessions,
workflows, transformations, etc., is stored.

 Repository Manager: Used to manage repository objects like mappings, sessions, workflows,
and folders.

 Versioning: Informatica allows version control of repository objects, so you can track changes
and roll back if necessary.

7. Performance Optimization

 Partitioning: Improves performance by dividing the data into partitions and processing them
in parallel.

 Pushdown Optimization: Moves some or all transformation logic to the database for better
performance.

 Session Tuning: Adjust session properties like memory allocation, commit intervals, and
parallel processing to improve performance.

 Incremental Load: Load only changed data (new or modified) instead of reloading the entire
dataset to improve performance.

 Cache Management: Optimize the cache for transformations like Lookup, Aggregator, and
Rank to minimize memory usage.

8. Error Handling and Debugging

 Error Logs: Capture errors during ETL processing and can be used for debugging.

 Session Log Files: Provide detailed logs for each session execution, including warnings and
errors.

 Reject Files: Data that doesn’t meet the transformation criteria can be written to a reject file
for later analysis.

 Data Validation: You can validate data at different stages of the process using conditional
transformations (like Filter, Expression).

 Debugging Mappings: Informatica allows for debugging by testing mappings step by step and
checking the output at each stage.

9. Informatica Connectors and Integration

 Database Connections: Informatica supports a variety of databases like Oracle, SQL Server,
DB2, etc.

 Flat Files: Supports different flat file types, including delimited, fixed-width, and XML.

 Cloud Integration: Supports cloud sources and targets like Amazon S3, Azure, Google Cloud,
and various SaaS applications.

 Web Services: Integrates with web services to consume or publish data.


 Third-Party Integrations: Informatica can connect to various third-party tools and
applications for data synchronization.

10. Data Quality and Data Governance

 Data Quality Transformation: Informatica has built-in data quality transformations to


cleanse, standardize, and validate data before it’s loaded into the target.

 Data Profiling: Helps in analyzing the source data to understand its quality and structure
before processing it.

 Data Lineage: Tracks the origin, movement, and transformation of data throughout the ETL
pipeline.

 Audit and Metadata Management: Tracks ETL operations for auditing and compliance.

11. Security and Permissions

 User Roles: Define permissions at the user level to control access to different components of
Informatica.

 Folder Security: Set permissions for different folders to control access to mappings, sessions,
and other objects.

 SSL/TLS Encryption: Secure communication between client and server.

 Database Security: Ensures that connections to databases are secure using authentication
and encryption.

12. Best Practices

 Modular Design: Break down mappings and workflows into reusable components (e.g.,
worklets, reusable transformations).

 Error Handling: Set up robust error handling mechanisms to capture and log data errors and
transformations.

 Logging and Monitoring: Use session logs, workflow logs, and system logs to monitor ETL
processes.

 Data Validation: Always include checks and validations at every stage of the ETL process to
ensure data accuracy.

 Automation: Automate repetitive tasks, such as running workflows on a schedule or


triggering tasks based on external events.

 Version Control: Regularly version your repository objects and manage changes to mappings,
sessions, and workflows.

Informatica is a powerful and flexible tool for data integration, allowing teams to efficiently manage
complex ETL processes. By following best practices, optimizing performance, and utilizing its robust
error handling and logging features, you can ensure that your ETL jobs run efficiently and accurately.

If you need more detail on any specific concept or component of Informatica, feel free to ask!

You might also like