Informatica Notes
Informatica Notes
It
allows users to connect and fetch data from various sources, transform the data as required, and
load it into target systems like data warehouses, databases, etc. Below are some Informatica Notes
that cover the key concepts, components, and best practices:
Client Components:
Server Components:
A workflow is a set of instructions that tell Informatica how to run a task. It defines the
sequence of operations, such as sessions, commands, and other tasks.
Event Wait/Trigger: Can pause workflows based on external events (like file arrival).
Mapping: Defines how data flows from source to target. It consists of sources,
transformations, and targets.
Transformations:
o Lookup: Used to fetch related data from another source based on a lookup
condition.
o Update Strategy: Defines how to update data in the target (insert, update, delete, or
reject).
Source: The input data, which can come from different systems (databases, flat files, XML,
etc.).
Target: The output system where the transformed data is loaded, such as databases, data
warehouses, or flat files.
Source Definitions:
Target Definitions:
Session: A session runs a mapping. It reads data from the source, transforms it using the
transformations defined in the mapping, and writes the data to the target.
o Session Properties: Includes properties like session name, source and target
connections, performance options, and error handling.
o Session Logs: Log files that provide detailed information about the session execution.
Workflow: A collection of tasks (like sessions, command tasks, etc.) that are executed in
sequence or parallel.
Session and Workflow Recovery: If a session fails, Informatica can automatically restart from
the point of failure based on the recovery settings.
6. Informatica Repository
Metadata Repository: The database where all the metadata for mappings, sessions,
workflows, transformations, etc., is stored.
Repository Manager: Used to manage repository objects like mappings, sessions, workflows,
and folders.
Versioning: Informatica allows version control of repository objects, so you can track changes
and roll back if necessary.
7. Performance Optimization
Partitioning: Improves performance by dividing the data into partitions and processing them
in parallel.
Pushdown Optimization: Moves some or all transformation logic to the database for better
performance.
Session Tuning: Adjust session properties like memory allocation, commit intervals, and
parallel processing to improve performance.
Incremental Load: Load only changed data (new or modified) instead of reloading the entire
dataset to improve performance.
Cache Management: Optimize the cache for transformations like Lookup, Aggregator, and
Rank to minimize memory usage.
Error Logs: Capture errors during ETL processing and can be used for debugging.
Session Log Files: Provide detailed logs for each session execution, including warnings and
errors.
Reject Files: Data that doesn’t meet the transformation criteria can be written to a reject file
for later analysis.
Data Validation: You can validate data at different stages of the process using conditional
transformations (like Filter, Expression).
Debugging Mappings: Informatica allows for debugging by testing mappings step by step and
checking the output at each stage.
Database Connections: Informatica supports a variety of databases like Oracle, SQL Server,
DB2, etc.
Flat Files: Supports different flat file types, including delimited, fixed-width, and XML.
Cloud Integration: Supports cloud sources and targets like Amazon S3, Azure, Google Cloud,
and various SaaS applications.
Data Profiling: Helps in analyzing the source data to understand its quality and structure
before processing it.
Data Lineage: Tracks the origin, movement, and transformation of data throughout the ETL
pipeline.
Audit and Metadata Management: Tracks ETL operations for auditing and compliance.
User Roles: Define permissions at the user level to control access to different components of
Informatica.
Folder Security: Set permissions for different folders to control access to mappings, sessions,
and other objects.
Database Security: Ensures that connections to databases are secure using authentication
and encryption.
Modular Design: Break down mappings and workflows into reusable components (e.g.,
worklets, reusable transformations).
Error Handling: Set up robust error handling mechanisms to capture and log data errors and
transformations.
Logging and Monitoring: Use session logs, workflow logs, and system logs to monitor ETL
processes.
Data Validation: Always include checks and validations at every stage of the ETL process to
ensure data accuracy.
Version Control: Regularly version your repository objects and manage changes to mappings,
sessions, and workflows.
Informatica is a powerful and flexible tool for data integration, allowing teams to efficiently manage
complex ETL processes. By following best practices, optimizing performance, and utilizing its robust
error handling and logging features, you can ensure that your ETL jobs run efficiently and accurately.
If you need more detail on any specific concept or component of Informatica, feel free to ask!