Partitioning in Informatica Cloud (IICS) - ThinkETL
Partitioning in Informatica Cloud (IICS) - ThinkETL
ThinkETL
-30% -14%
MENU
Contents [ hide ]
1. Introduction
2. How Partitioning works in Informatica Cloud?
3. Types of Partitions supported in Informatica Cloud
3.1 Key Range Partitioning
3.2 Fixed Partitioning
4. Guidelines to Informatica Cloud Partitioning
5. Informatica Cloud Partitioning vs Informatica Powercenter Partitioning
6. Conclusion
1. Introduction
Informatica offers several performance tuning and optimization techniques. Partitioning is one such performance tuning option offered
in Informatica cloud which enables you to optimize the performance of mapping tasks.
In this article, let us understand the Partitioning options available in Informatica Cloud Data Integration, its advantages and limitations
and how it differs from partitioning available in Powercenter.
For each Flow Run Order in a mapping, several threads are created depending on the design of the mapping and the transformations
used.
The following shows a simple mapping in Informatica Cloud with no partitioning enabled.
With no partitioning enabled the mapping is considered to running on a single partition. In this case DTM process creates one reader,
one transformation, and one writer thread to process the data.
The reader thread controls how the Data Integration Service process extracts source data.
The transformation thread controls how the Data Integration Service process processes the data.
The writer thread controls how the Data Integration Service process loads data into the target.
With the Partitioning enabled, you can select the number of partitions for the mapping. The DTM process then creates a reader thread,
transformation thread and writer thread for each partition allowing the data to be processed concurrently, thereby reducing the
execution time of the task.
Partitioning is nothing but enabling the parallel processing of the data through separate pipelines.
Partitions can be enabled by configuring the Source transformation in mapping designer. When the partitions are configured in Source
transformation, partitioning occurs throughout the mapping.
String
Number
Date/time
In the below example with a test data of million records of employee data, the Partitioning Key is selected on the field EMPLOYEE_ID.
There were three partitions created and the start and end range values of each partition are as shown below.
Use a blank value for the start range to indicate the minimum value. Use a blank value for the end
range to indicate the maximum value.
Without the partitioning enabled, all the 1 million records are read and processed in a single pipeline.
In my testing to load a million records from an Oracle database into a flat file, it took 28 seconds without partitioning. With partitioning
enabled, it took 20 seconds to load the data. If there is some transformation logic also involved, the difference would be much higher
as that would also be processed concurrently in three different pipelines.
It can also be understood that higher the number of records, higher the performance boost that can be obtained from partitioning.
In the below example, three partitions are created on a flat file source with million records.
In my testing it took 3 minutes, 25 seconds to load a million records from a flat file into an oracle table. With partitioning enabled, it
took 1 minute, 39 seconds to load the data.
6. Conclusion
Partitioning is a great way to improve the task performance when working with large amounts of data. At present Informatica Cloud
offers only two partitioning types and only on Source transformation which makes it easy to work with as you need not worry about
selecting an optimal partitioning type from a broad list of partitioning types at various partition points. Just make sure an optimal
number of partitions are selected for the amount of data you are processing and you are good to go.
There is a lot of catching up to do for Informatica cloud in terms of partitioning options compared to Powercenter and hope to see them
getting added in future releases.
Omkar
February 26, 2021 at 5:46 pm
Reply
ThinkETL
February 28, 2021 at 7:10 pm
Currently Partitioning is not supported when using custom query in source transformation in IICS.
Reply
AK
April 17, 2021 at 1:16 am
I am reading data from 3 diff regions (mysql is the src database) and merging thru union and finally loading into Oracle tgt.In
this case i need to give same partitions across all 3 src regions ?
Reply
ThinkETL
April 18, 2021 at 1:29 am
Yes, if the mapping includes multiple sources, specify the same number of partitions for each source so that data is
processed in consistent pipelines.
Reply
MN
February 25, 2022 at 6:42 pm
Hi,
Reply
ThinkETL
March 14, 2022 at 12:01 am
No
Reply
MN
February 25, 2022 at 6:44 pm
Also, can you please share an example on how to use Key Range partitioning using string data type?
Reply
ThinkETL
March 14, 2022 at 12:01 am
It seems to be not supported currently. Alternatively you can assign a unique number for each key value of the string field on
which you want to partition and use that value while defining the Key Range partitioning.
Reply
-30%
Leave a Comment
Name *
Email *
Website
Save my name, email, and website in this browser for the next time I comment.
Post Comment
Related Posts
Subscribe
Name:
Email:
Submit
Recent Posts
-20%
-20% -20%
About ThinkETL
ThinkETL is your go to resource for learning Informatica Cloud and Snowflake Concepts, Interview preparation and Automation
Ideas and strategies that work.
Pages
-20%
About
Catalogue
-20%
Certification
Contact
Courses
Privacy Policy
Subscribe
Terms Upgrade
of Use your sneaker…
Superkicks
Automation
Certification
Concepts
Informatica
Interview Questions
Snowflake
Follow Us