0% found this document useful (0 votes)
47 views9 pages

Informatica Cloud Data Integration Elastic (CDI-E)

Uploaded by

Bipin Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views9 pages

Informatica Cloud Data Integration Elastic (CDI-E)

Uploaded by

Bipin Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

1.

Introduction
Informatica Cloud Data Integration is a cloud platform that allows users to
seamlessly connect, integrate, and transform data across various sources and
targets in cloud and hybrid environments.

Informatica Cloud Data Integration Elastic also known as Advanced Cloud


Data Integration extends the capabilities of Informatica Cloud Data
Integration with enhanced scalable and flexible features for complex data
integration scenarios.

2. What is Informatica Cloud Data


Integration Elastic?
Informatica Cloud Data Integration Elastic (CDI-E) enables you to
process your data integration jobs using Spark serverless engine
running on a Kubernetes cluster. With Cloud Data Integration Elastic the
users do not have to manage Spark. The Elastic Secure Agent handles
the conversion of mappings into Spark code and running them on
advanced clusters on cloud platform of your choice. This helps run your
data integration jobs in multiple clouds.

In CDI-Elastic, the Secure Agent resides in the customer


managed infrastructure and data is processed in automated
infrastructure whose life cycle is managed by Informatica.

3. Difference between CDI and CDI-


Elastic?
Below are the differences between Cloud Data Integration and Cloud Data
Integration Elastic
3.1. Infrastructure Management
In Cloud Data Integration, you are responsible for managing the underlying
compute resources required for data integration tasks. You need to ensure
that the infrastructure can handle the expected data volumes and processing
demands efficiently.

Cloud Data Integration Elastic takes a serverless approach to infrastructure


management. You don’t need to worry about configuring or scaling compute
resources manually. The platform dynamically allocates and manages
resources based on workload demands.

3.2. Compute Engine


In Cloud Data Integration, the Secure Agent orchestrates and executes data
integration jobs on the Data Integration Server engine. This engine is
optimized for a wide range of data integration tasks, providing versatility and
reliability.
In Cloud Data Integration Elastic, the Secure Agent takes advantage of an
advanced Spark serverless engine. This engine is specifically designed to
handle large-scale data processing tasks with efficiency and scalability.

3.3. Workload Data Volume


Cloud Data Integration is well-suited for low to medium workloads. It efficiently
manages data integration tasks involving moderate data volumes and
complexity.

Cloud Data Integration Elastic is well-suited to handle medium to high


workloads. It excels in scenarios where large data volumes need to be
processed, transformed, and integrated.

4. Execution Life cycle of Cloud


Data Integration Elastic
The Cloud Data Integration Elastic relies on execution of jobs using Apache
Spark on Kubernetes. It is called as IICS based Spark Serverless solution.

Apache Spark is an open-source, powerful data processing


framework used for big data workloads. Spark features a distributed
processing model, where tasks are divided and executed across multiple
computers or nodes in a cluster, allowing for parallel computation and
enabling scalability.

Kubernetes is an open-source system for automating deployment, scaling,


and management of containerized applications. It also creates and manages
clusters of computers to ensure your apps run reliably and efficiently.

Containerized applications are applications that run in isolated packages of


code called containers. Containers are a way to package and distribute code,
making it easy to move and run applications consistently across
different servers, environments, and platforms.
Elastic Jobs running on Spark cluster

The execution life cycle of Cloud Data Integration Elastic involves the
following stages:

4.1. Setting up IICS Cloud Environment


Set up your cloud environment so that the Secure Agent can connect to and
access cloud resources and also create and deploy an elastic cluster.

4.2. Configuring the Kubernetes Cluster


(Advanced Cluster)
Referred to as the “Advanced Cluster” within IICS, the Kubernetes Cluster
properties should be configured by the Administrator. Navigate
to Administrator > Advanced Clusters, set up Advanced
Configuration which is a set of properties that define the resources that you
provision to create an advanced cluster.
The Advanced Cluster creation is supported on AWS, Azure and
GCP.
4.3. Designing and Executing Mappings
User creates an Elastic Mapping within Informatica Cloud Data Integration
encapsulating the business logic and submits the job for execution.
Alternatively, an existing mapping can be triggered in Advanced Mode. This
creates a copy of Elastic mapping (if supported) which then can be triggered
to run on spark engine.

4.4. Transformation to Spark Code and


Kubernetes Deployment
The Cloud Data Integration Elastic Secure Agent converts the mapping logic
into a deployable spark code which then through Cluster Creation Service
creates and starts the Kubernetes cluster based on Advanced
Configuration and automatically pushes the spark code to the cluster for
processing.

During the entire data processing on Advanced Cluster, the data


always stays within the Customer’s VPC
4.5. Scaling up Nodes in the Cluster
The Elastic Secure Agent ensures optimal resource utilization by dynamically
scaling the cluster in response to demand and consumption patterns. When
additional jobs are submitted for execution, the agent intelligently identifies the
requirement for more resources and takes action by introducing new nodes to
the cluster.

4.6. Monitoring Spark Job Logs and Cluster


Health
The logs generated during the execution of elastic jobs on the Spark engine
are securely stored in the cloud location configured within the Advanced
Configuration. Due to the transient(short-lived) nature of the nodes running
Spark jobs, it’s mandatory to configure a suitable log storage location based
on the chosen cloud platform (e.g., Amazon S3 for AWS, ADLS Gen2 Storage
for Microsoft Azure).
Additionally, the Secure Agent provides real-time reporting on the status of
Spark jobs and cluster statistics, which users can monitor from the IICS
Monitor service.

4.7. Scaling down Nodes and Cluster Deletion


Upon the completion of submitted jobs, the Elastic Secure Agent initiates the
scaling down of nodes within the cluster. This process ensures that resources
are optimally allocated and reduces unnecessary overhead. Ultimately, when
all submitted jobs are successfully executed and resources are no longer
needed, the cluster resources are deleted.

The agent restarts the cluster when another elastic job is


submitted.

Execution Life cycle of Cloud Data Integration Elastic

This well-defined sequence of stages ensures the efficient, scalable, and


seamless execution of jobs in Cloud Data Integration Elastic.
5. Why Cloud Data Integration
Elastic is needed?
Below are some of the scenarios which can help you understand the need for
Cloud Data Integration Elastic.

5.1. Handling Resource Intensive Jobs


Consider a scenario where you have resource intensive jobs that process
large volumes of data in Cloud Data Integration. If the overall processing time
is more with the existing configuration of sever on which the secure agent is
installed, the configuration of the server can be increased for better
performance of jobs.
However, this results in the increased costs of server maintenance and the
resources are under-utilized when the resource intensive jobs are completed.
5.2. Parallel Processing of Jobs
Consider a scenario where multiple jobs are submitted for execution during a
time window in Cloud Data Integration. In order to enable parallel processing
of these jobs, a secure agent group with multiple agents could be
configured. This enables the execution of tasks distributed across various
agents in the secure agent group.

However, the challenges with this approach are

 The secure agent should be installed on multiple servers.


 All servers must be up and running irrespective of the number
tasks that are running at the moment as they cannot be
dynamically scaled up and down.
 The user must make sure the various components utilized by the
IICS tasks like parameter files, source files, scripts and the
libraries used by scripts are available and maintained consistently
across all the servers in the group.
 A Data Integration Job can run only on one server at a time and it
could not utilize the compute resources of the other servers in the
secure agent group even though they are idle.
These above mentioned challenges can be resolved using Cloud Data
Integration Elastic.
6. Benefits of Cloud Data
Integration Elastic
CDI-Elastic brings all the capabilities of CDI along with advanced data
processing capabilities of Spark engine. Below are the benefits of Cloud Data
Integration Elastic.

6.1. Auto Scaling


The Cloud Data Integration Elastic offers automatic scaling of resources
based on demand and consumption. Auto scaling ensures optimal
performance without manual intervention, providing cost savings and
improved processing times.

6.2. Spark Serverless Compute Engine


The Cloud Data Integration Elastic leverages the advanced Spark serverless
compute engine to process large volumes of data with high concurrency. The
Elastic Secure Agent allows you to take full advantage of Spark’s advanced
processing capabilities without the need to manage underlying infrastructure.

6.3. Multi-Cloud Support


The Cloud Data Integration Elastic offers support for multiple cloud platforms,
including AWS, Azure, and GCP. This multi-cloud compatibility provides you
with the flexibility to choose the cloud environment that best suits your
organization’s needs and strategies.

6.4. Controlled Cost


With Cloud Data Integration Elastic, you gain better control over costs. The
auto-scaling feature ensures that you only use the necessary resources when
needed, preventing overprovisioning and unnecessary expenses.

6.5. Simplified Monitoring


The Cloud Data Integration Elastic provides a simplified and centralized
monitoring mechanism. You can easily track the status of your data integration
jobs, monitor cluster performance, and review Spark job logs, all from a
unified interface.

You might also like