1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
1601 PerformanceTuningAndBestPracticesForGoogleBigQueryV2Connector en H2L
© Copyright Informatica LLC 2021, 2023. Informatica, Informatica Cloud, and the Informatica logo are trademarks
or registered trademarks of Informatica LLC in the United States and many jurisdictions throughout the world. A
current list of Informatica trademarks is available on the web at https://www.informatica.com/trademarks.html.
Abstract
When you use Google BigQuery V2 Connector to read data from or write data to Google BigQuery, multiple factors such
as hardware, Google Compute Engine, Secure Agent, and Informatica mapping parameters impact the connector
performance. You can optimize the performance by tuning these parameters appropriately. This article describes
general reference guidelines to help you tune the performance of Google BigQuery V2 Connector in Cloud Data
Integration.
Supported Versions
• Informatica® Cloud Data Integration
Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Performance tuning areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Google Compute Engine best practices. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Tune the Secure Agent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
JVM heap size. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Bulk processing for write operations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Configure Secure Agent concurrency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Tune the Google BigQuery mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Partitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Compress files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Number of threads for uploading staging file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Read modes for Google BigQuery sources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Write modes for Google BigQuery targets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Data format of the staging file. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Local flat file staging. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Poll interval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Overview
Performance tuning is an iterative process in which you analyze the performance, use guidelines to estimate and
define parameters that impact the performance, and monitor and adjust the results as required.
This document describes the key hardware, database, Google Compute Engine, Secure Agent, and Informatica
mapping parameters that you can tune to optimize the performance of Google BigQuery V2 Connector. This document
also includes case studies that involve the tuning specifications required when you configure a mapping to write data
to Google BigQuery from a flat file using Google Compute Engine and the Secure Agent. The graphs constructed from
the case studies illustrate the impact that the tuning has on Google BigQuery V2 Connector performance.
Note: The performance testing results listed in this article are based on observations in an internal Informatica
environment using data from real-world scenarios. The performance of Google BigQuery V2 Connector might vary
based on individual environments and other parameters even when you use the same data.
2
Performance tuning areas
You can optimize the performance of a read or write operation when you use Google BigQuery V2 Connector by tuning
the following parameters:
• Ensure that the Secure Agent hosted on the Google Compute Engine virtual machine is in the same region as
the Google BigQuery dataset and the Google Cloud Storage bucket.
• Choose the Google BigQuery dataset in the region where the Google Compute Engine virtual machine is
located.
• Choose the correct Google Compute Engine virtual machine instance type based on your requirements.
If you use the default JVM heap size, the mapping fails with an out-of-memory error. You can increase the JVM heap
size to improve the performance of the read and write operations.
3
The following graph shows the results and the recommended values for setting the JVM maximum heap size:
For Google BigQuery V2 mappings and mapping tasks that read data from Google BigQuery, specify a JVM memory of
-Xmx256m.
If you already configured a heap size value that is higher than 256 MB in the JVM options, do not change it to -
Xmx256m.
1. In Administrator, select the Secure Agent listed on the Runtime Environments tab.
2. Click Edit.
3. In the System Configuration Details section, select Data Integration Service as the service and DTM as the
type.
4. Edit the JVMOption1 property, and enter -Xmx256m.
5. Click Save.
To enable bulk processing, specify the property -DENABLE_WRITER_BULK_PROCESSING=true in the Secure Agent
properties:
Perform the following steps to configure bulk processing before you run a mapping:
1. In Administrator, select the Secure Agent listed on the Runtime Environments tab.
2. Click Edit.
3. In the System Configuration Details section, select Data Integration Server as the service and DTM as the
type.
4. Edit the JVM option, and enter -DENABLE_WRITER_BULK_PROCESSING=true.
5. Click Save.
4
The following graph illustrates the impact of partitioning on the writer performance:
Note: This feature does not apply for mapping tasks configured with pushdown optimization.
If there are more than two concurrent tasks, additional tasks are queued and then scheduled for execution when a slot
becomes available. This can cause the capacity of the Secure Agent machine to be underutilized.
To achieve better utilization of the CPU capacity of the Secure Agent machine and achieve a higher degree of
concurrency, you can set the maxDTMProcesses custom property for the Data Integration Server to the number of
parallel tasks.
For example, if you set the maxDTMProcesses property to 16, the Secure Agent allows 16 mapping tasks to run
simultaneously.
Note: Each mapping task spawns its own JVM and reserves the specified JVM heap from the physical memory.
5
Tune the Google BigQuery mapping
You can configure the following properties to optimize the Google BigQuery mapping performance:
• Partitions
• File Compression
• Data format of the staging file
• Read and write modes
• Poll interval
Partitions
You can configure partitions to optimize the mapping performance. Specify the number of partitions configured for the
read or write operation in each mapping.
The following graph illustrates the impact of partitioning on the writer performance:
Compress files
If the Secure Agent and Google BigQuery are configured in different regions and there is a network latency, you can
enable compression to compress the staged file to reduce the transfer time when you read data from or write data to
Google BigQuery.
Informatica recommends that you set the value of the Number of Threads for Uploading Staging file target advanced
property to 4 to optimize the performance to upload the staging file.
This property is only honored when the number of partitions configured for the source is set to 1.
6
Read modes for Google BigQuery sources
You can configure different modes to optimize the performance when you read data from a Google BigQuery source. If
you want to read large volumes of data from a Google BigQuery source, you can specify the Read mode as Staging in
the advanced source properties.
The following graph illustrates the impact of Read modes on the reader performance:
7
The following graph illustrates the impact of Write modes on the writer performance:
The following graph illustrates how different data formats for the staging files impact the writer performance:
8
Local flat file staging
Informatica has redesigned the way Data Integration stages the data when you read from and write to a Google
BigQuery target. This has significantly improved the performance in the local staging execution time.
You can optimize the mapping performance by setting the staging property in the Secure Agent to create a flat file in a
temporary folder to stage the data locally before reading from or writing to Google BigQuery.
When you set the staging property INFA_DTM_RDR_STAGING_ENABLED_CONNECTORS for the Tomcat to the plugin
ID of the Google BigQuery V2 Connector in the Secure Agent properties, Data Integration, by default, creates a
flat file locally to stage the data and then reads the data from the staging file from the Google BigQuery
source. For more information on how to set the property, see the help for Google BigQuery V2 Connector.
The following graph shows the performance comparison of the execution time before and after setting the
staging property for a mapping that reads from a Google BigQuery source and writes to a flat file target:
When you set the staging property INFA_DTM_STAGING_ENABLED_CONNECTORS for the Tomcat to the plugin ID
of the Google BigQuery V2 Connector in the Secure Agent properties, Data Integration, by default, creates a
flat file locally to stage the data and then loads the data from the staging file to the Google BigQuery target.
You can find the plugin ID in the manifest file located in the following directory: <Secure Agent
installation directory>/downloads/<Google BigQuery V2 package>/CCIManifest.
For more information on how to set the property, see the help for Google BigQuery V2 Connector.
The following graph shows the performance comparison of the execution time before and after setting the
staging property for a mapping that reads from a flat file source and writes to a Google BigQuery target:
9
Informatica recommends that you add three processor cores to the agent machine to increase the processing power of
the mapping.
The following table shows the resource sizing in the old and new environment for a mapping that reads from a flat file
source and writes to a Google BigQuery target:
When you use three cores, the performance is optimized and the local staging execution time is far lesser. Though
Informatica recommends that you include three cores for a Google BigQuery mapping execution, it is not mandatory.
However, when you include lesser cores, the mapping performance is impacted but it does not result in execution
failures.
Poll interval
When you use a custom query to read from or write to Google BigQuery, you can increase or decrease the poll interval
size to improve the mapping performance.
You can specify the poll interval value in the source and target advanced attributes in a mapping based on the time that
the query job takes. If a query job takes less than 7 to 8 seconds, you can reduce the poll interval. For queries that take
longer than 7 seconds, increase the poll interval. If you reduce the poll interval, it causes additional overhead and
impacts the mapping performance.
10
Authors
Akanksha Gauniyal
Manav Khurana
11