Migrate schema and data from Teradata
The combination of the BigQuery Data Transfer Service and a special migration agent allows you to copy your data from a Teradata on-premises data warehouse instance to BigQuery. This document describes the step-by-step process of migrating data from Teradata using the BigQuery Data Transfer Service.
Before you begin
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, BigQuery Data Transfer Service, Cloud Storage, and Pub/Sub APIs.
-
Create a service account:
-
In the Google Cloud console, go to the Create service account page.
Go to Create service account - Select your project.
-
In the Service account name field, enter a name. The Google Cloud console fills in the Service account ID field based on this name.
In the Service account description field, enter a description. For example,
Service account for quickstart
. - Click Create and continue.
-
Grant the following roles to the service account: roles/bigquery.user, roles/storage.objectAdmin, roles/iam.serviceAccountTokenCreator.
To grant a role, find the Select a role list, then select the role.
To grant additional roles, click
Add another role and add each additional role. - Click Continue.
-
Click Done to finish creating the service account.
Do not close your browser window. You will use it in the next step.
-
-
Create a service account key:
- In the Google Cloud console, click the email address for the service account that you created.
- Click Keys.
- Click Add key, and then click Create new key.
- Click Create. A JSON key file is downloaded to your computer.
- Click Close.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the BigQuery, BigQuery Data Transfer Service, Cloud Storage, and Pub/Sub APIs.
-
Create a service account:
-
In the Google Cloud console, go to the Create service account page.
Go to Create service account - Select your project.
-
In the Service account name field, enter a name. The Google Cloud console fills in the Service account ID field based on this name.
In the Service account description field, enter a description. For example,
Service account for quickstart
. - Click Create and continue.
-
Grant the following roles to the service account: roles/bigquery.user, roles/storage.objectAdmin, roles/iam.serviceAccountTokenCreator.
To grant a role, find the Select a role list, then select the role.
To grant additional roles, click
Add another role and add each additional role. - Click Continue.
-
Click Done to finish creating the service account.
Do not close your browser window. You will use it in the next step.
-
-
Create a service account key:
- In the Google Cloud console, click the email address for the service account that you created.
- Click Keys.
- Click Add key, and then click Create new key.
- Click Create. A JSON key file is downloaded to your computer.
- Click Close.
Set required permissions
Ensure that the principal creating the transfer has the following roles in the project containing the transfer job:
- Logs Viewer (
roles/logging.viewer
) - Storage Admin (
roles/storage.admin
), or a custom role that grants the following permissions:storage.objects.create
storage.objects.get
storage.objects.list
- BigQuery Admin (
roles/bigquery.admin
), or a custom role that grants the following permissions:bigquery.datasets.create
bigquery.jobs.create
bigquery.jobs.get
bigquery.jobs.listAll
bigquery.transfers.get
bigquery.transfers.update
Create a dataset
Create a BigQuery dataset to store your data. You do not need to create any tables.
Create a Cloud Storage bucket
Create a Cloud Storage bucket for staging the data during the transfer job.
Prepare the local environment
Complete the tasks in this section to prepare your local environment for the transfer job.
Local machine requirements
- The migration agent uses a JDBC connection with the Teradata instance and Google Cloud APIs. Ensure that network access is not blocked by a firewall.
- Ensure that Java Runtime Environment 8 or later is installed.
- Ensure that you have enough storage space for the extraction method you have chosen, as described in Extraction method.
- If you have decided to use Teradata Parallel Transporter (TPT) extraction,
ensure that the
tbuild
utility is installed. For more information on choosing an extraction method, see Extraction method.
Teradata connection details
Make sure you have the username and password of a Teradata user with read access to the system tables and the tables that are being migrated.
Make sure you know the hostname and port number to connect to the Teradata instance.
Download the JDBC driver
Download the terajdbc4.jar
JDBC driver file from Teradata
to a machine that can connect to the data warehouse.
Set the GOOGLE_APPLICATION_CREDENTIALS
variable
Set the environment variable GOOGLE_APPLICATION_CREDENTIALS
to the service account key you downloaded in the
Before you begin section.
Update the VPC Service Controls egress rule
Add a BigQuery Data Transfer Service managed Google Cloud project (project number: 990232121269) to the egress rule in the VPC Service Controls perimeter.
The communication channel between the agent running on premises and BigQuery Data Transfer Service is by publishing Pub/Sub messages to a per transfer topic. BigQuery Data Transfer Service needs to send commands to the agent to extract data, and the agent needs to publish messages back to BigQuery Data Transfer Service to update the status and return data extraction responses.
Create a custom schema file
To use a custom schema file instead of automatic schema detection, create one manually, or have the migration agent create one for you when you initialize the agent.
If you create a schema file manually and you intend to use the Google Cloud console to create a transfer, upload the schema file to a Cloud Storage bucket in the same project you plan to use for the transfer.
Download the migration agent
Download the migration agent to a machine which can connect to the data warehouse. Move the migration agent JAR file to the same directory as the Teradata JDBC driver JAR file.
Set up a transfer
Create a transfer with the BigQuery Data Transfer Service.
If you want a custom schema file created automatically, use the migration agent to set up the transfer.
You can't create an on-demand transfer by using the bq command-line tool; you must use the Google Cloud console or the BigQuery Data Transfer Service API instead.
If you are creating a recurring transfer, we strongly recommend that you specify a schema file so that data from subsequent transfers can be properly partitioned when it is loaded into BigQuery. Without a schema file, the BigQuery Data Transfer Service infers the table schema from the source data being transferred, and all information about partitioning, clustering, primary keys, and change tracking is lost. In addition, subsequent transfers skip previously migrated tables after the initial transfer. For more information on how to create a schema file, see Custom schema file.
Console
In the Google Cloud console, go to the BigQuery page.
Click Data transfers.
Click Create Transfer.
In the Source type section, do the following:
- Choose Migration: Teradata.
- For Transfer config name, enter a display name for the transfer such
as
My Migration
. The display name can be any value that lets you easily identify the transfer if you need to modify it later. - Optional: For Schedule options, you can leave the default value of Daily (based on creation time) or choose another time if you want a recurring, incremental transfer. Otherwise, choose On-demand for a one-time transfer.
For Destination settings, choose the appropriate dataset.
In the Data source details section, continue with specific details for your Teradata transfer.
- For Database type, choose Teradata.
- For Cloud Storage bucket, browse for the name of the Cloud Storage
bucket for staging the migration data. Do not type in the prefix
gs://
– enter only the bucket name. - For Database name, enter the name of the source database in Teradata.
For Table name patterns, enter a pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. For example:
sales|expenses
matches tables that are namedsales
andexpenses
..*
matches all tables.
For Service account email, enter the email address associated with the service account's credentials used by an migration agent.
Optional: If you're using a custom schema file, enter the path and filename of that file in the Schema file path field. If you don't provide a custom schema file, BigQuery automatically detects the table schema by using the source data being transferred. You can create your own schema file, as shown in the following image, or you can use the migration agent to help you create a schema file. For information on creating a schema file, see initializing the migration agent.
In the Service Account menu, select a service account from the service accounts associated with your Google Cloud project. You can associate a service account with your transfer instead of using your user credentials. For more information about using service accounts with data transfers, see Use service accounts.
- If you signed in with a federated identity, then a service account is required to create a transfer. If you signed in with a Google Account, then a service account for the transfer is optional.
- The service account must have the required permissions.
Optional: In the Notification options section, do the following:
- Click the Email notifications toggle if you want the transfer administrator to receive an email notification when a transfer run fails.
- Click the Pub/Sub notifications toggle to configure Pub/Sub run notifications for your transfer. For Select a Pub/Sub topic, choose your topic name or click Create a topic.
Click Save.
On the Transfer details page, click the Configuration tab.
Note the resource name for this transfer because you need it to run the migration agent.
bq
When you create a Cloud Storage transfer using the bq tool, the transfer configuration is set to recur every 24 hours. For on-demand transfers, use the Google Cloud console or the BigQuery Data Transfer Service API.
You cannot configure notifications using the bq tool.
Enter the
bq mk
command and supply the transfer creation flag
--transfer_config
. The following flags are also required:
--data_source
--display_name
--target_dataset
--params
bq mk \ --transfer_config \ --project_id=project ID \ --target_dataset=dataset \ --display_name=name \ --service_account_name=service_account \ --params='parameters' \ --data_source=data source
Where:
- project ID is your project ID. If
--project_id
isn't supplied to specify a particular project, the default project is used. - dataset is the dataset you want to target (
--target_dataset
) for the transfer configuration. - name is the display name (
--display_name
) for the transfer configuration. The transfer's display name can be any value that lets you identify the transfer if you need to modify it later. - service_account is the service account name used to
authenticate your transfer. The service account should
be owned by the same
project_id
used to create the transfer and it should have all the listed required permissions. - parameters contains the parameters (
--params
) for the created transfer configuration in JSON format. For example:--params='{"param":"param_value"}'
.- For Teradata migrations, use the following parameters:
bucket
is the Cloud Storage bucket that will act as a staging area during the migration.database_type
is Teradata.agent_service_account
is the email address associated with the service account that you created.database_name
is the name of the source database in Teradata.table_name_patterns
is a pattern(s) for matching the table names in the source database. You can use regular expressions to specify the pattern. The pattern should follow Java regular expression syntax. For example:sales|expenses
matches tables that are namedsales
andexpenses
..*
matches all tables.
- For Teradata migrations, use the following parameters:
- data_source is the data source (
--data_source
):on_premises
.
For example, the following command creates a Teradata transfer named
My Transfer
using Cloud Storage bucket mybucket
and target dataset
mydataset
. The transfer will migrate all tables from the Teradata data
warehouse mydatabase
and the optional schema file is myschemafile.json
.
bq mk \ --transfer_config \ --project_id=123456789876 \ --target_dataset=MyDataset \ --display_name='My Migration' \ --params='{"bucket": "mybucket", "database_type": "Teradata", "database_name":"mydatabase", "table_name_patterns": ".*", "agent_service_account":"[email protected]", "schema_file_path": "gs://mybucket/myschemafile.json"}' \ --data_source=on_premises
After running the command, you receive a message like the following:
[URL omitted] Please copy and paste the above URL into your web browser and
follow the instructions to retrieve an authentication code.
Follow the instructions and paste the authentication code on the command line.
API
Use the projects.locations.transferConfigs.create
method and supply an instance of the TransferConfig
resource.
Java
Before trying this sample, follow the Java setup instructions in the BigQuery quickstart using client libraries. For more information, see the BigQuery Java API reference documentation.
To authenticate to BigQuery, set up Application Default Credentials. For more information, see Set up authentication for client libraries.
Migration agent
You can optionally set up the transfer directly from the migration agent. For more information, see Initialize the migration agent.
Initialize the migration agent
You must initialize the migration agent for a new transfer. Initialization is required only once for a transfer, whether or not it is recurring. Initialization only configures the migration agent, it doesn't start the transfer.
If you are going to use the migration agent to create a custom schema file,
ensure that you have a writeable directory under your
working directory with the same name as the project you want to use for the
transfer. This is where the migration agent creates the schema file.
For example, if you are working in /home
and you are setting up
the transfer in project myProject
, create directory /home/myProject
and make sure it is writeable by users.
Open a new session. On the command line, issue the initialization command, which follows this form:
java -cp \ OS-specific-separated-paths-to-jars (JDBC and agent) \ com.google.cloud.bigquery.dms.Agent \ --initialize
The following example shows the initialization command when the JDBC driver and migration agent JAR files are in a local
migration
directory:Unix, Linux, Mac OS
java -cp \ /usr/local/migration/terajdbc4.jar:/usr/local/migration/mirroring-agent.jar \ com.google.cloud.bigquery.dms.Agent \ --initialize
Windows
Copy all the files into the
C:\migration
folder (or adjust the paths in the command), then run:java -cp C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --initialize
When prompted, configure the following options:
- Choose whether to save the Teradata Parallel Transporter (TPT) template to disk. If you are planning to use the TPT extraction method, you can modify the saved template with parameters that suit your Teradata instance.
- Type the path to a local directory that the transfer job can use for file extraction. Ensure you have the minimum recommended storage space as described in Extraction method.
- Type the database hostname.
- Type the database port.
- Choose whether to use Teradata Parallel Transporter (TPT) as the extraction method.
- Optional: Type the path to a database credential file.
Choose whether to specify a BigQuery Data Transfer Service config name.
If you are initializing the migration agent for a transfer you have already set up, then do the following:
- Type the Resource name of the transfer. You can find this in the Configuration tab of the Transfer details page for the transfer.
- When prompted, type a path and file name for the migration agent configuration file that will be created. You refer to this file when you run the migration agent to start the transfer.
- Skip the remaining steps.
If you are using the migration agent to set up a transfer, press Enter to skip to the next prompt.
Type the Google Cloud Project ID.
Type the name of the source database in Teradata.
Type a pattern for matching the table names in the source database. You can use regular expressions to specify the pattern. For example:
sales|expenses
matches tables that are namedsales
andexpenses
..*
matches all tables.
Optional: Type the path to a local JSON schema file. This is strongly recommended for recurring transfers.
If you aren't using a schema file, or if you want the migration agent to create one for you, press Enter to skip to the next prompt.
Choose whether to create a new schema file.
If you do want to create a schema file:
- Type
yes
. - Type the username of a teradata user who has read access to the system tables and the tables you want to migrate.
Type the password for that user.
The migration agent creates the schema file and outputs its location.
Modify the schema file to mark partitioning, clustering, primary keys and change tracking columns, and verify that you want to use this schema for the transfer configuration. See Custom schema file for tips.
Press
Enter
to skip to the next prompt.
If you don't want to create a schema file, type
no
.- Type
Type the name of the target Cloud Storage bucket for staging migration data before loading to BigQuery. If you had the migration agent create a custom schema file, it is also uploaded to this bucket.
Type the name of the destination dataset in BigQuery.
Type a display name for the transfer configuration.
Type a path and file name for the migration agent configuration file that will be created.
After entering all the requested parameters, the migration agent creates a configuration file and outputs it to the local path that you specified. See the next section for a closer look at the configuration file.
Configuration file for the migration agent
The configuration file created in the initialization step looks similar to this example:
{
"agent-id": "81f452cd-c931-426c-a0de-c62f726f6a6f",
"transfer-configuration": {
"project-id": "123456789876",
"location": "us",
"id": "61d7ab69-0000-2f6c-9b6c-14c14ef21038"
},
"source-type": "teradata",
"console-log": false,
"silent": false,
"teradata-config": {
"connection": {
"host": "localhost"
},
"local-processing-space": "extracted",
"database-credentials-file-path": "",
"max-local-storage": "50GB",
"gcs-upload-chunk-size": "32MB",
"use-tpt": true,
"transfer-views": false,
"max-sessions": 0,
"spool-mode": "NoSpool",
"max-parallel-upload": 4,
"max-parallel-extract-threads": 1,
"session-charset": "UTF8",
"max-unload-file-size": "2GB"
}
}
Transfer job options in the migration agent configuration file
transfer-configuration
: Information about this transfer configuration in BigQuery.teradata-config
: Information specific for this Teradata extraction:connection
: Information about the hostname and portlocal-processing-space
: The extraction folder where the agent will extract table data to, before uploading it to Cloud Storage.database-credentials-file-path
: (Optional) The path to a file that contains credentials for connecting to the Teradata database automatically. The file should contain two lines for the credentials. You can use a username/password, as shown in the following example: You can also use a secret from SecretManager instead:username=abc password=123
When using a credentials file, take care to control access to the folder where you store it on the local file system, because it will not be encrypted. If no path is provided, you will be prompted for a username and password when you start an agent.username=abc secret_resource_id=projects/my-project/secrets/my-secret-name/versions/1
max-local-storage
: The maximum amount of local storage to use for the extraction in the specified staging directory. The default value is50GB
. The supported format is:numberKB|MB|GB|TB
.In all extraction modes, files are deleted from your local staging directory after they are uploaded to Cloud Storage.
use-tpt
: Directs the migration agent to use Teradata Parallel Transporter (TPT) as an extraction method.For each table, the migration agent generates a TPT script, starts a
tbuild
process and waits for completion. Once thetbuild
process completes, the agent lists and uploads the extracted files to Cloud Storage, and then deletes the TPT script. For more information, see Extraction method.transfer-views
: Directs the migration agent to also transfer data from views. Use this only when you require data customization during migration. In other cases, migrate views to BigQuery Views. This option has the following prerequisites:- You can only use this option with Teradata versions 16.10 and higher.
- A view should have an integer column "partition" defined, pointing to an ID of partition for the given row in the underlying table.
max-sessions
: Specifies the maximum number of sessions used by the export job (either FastExport or TPT). If set to 0, then the Teradata database will determine the maximum number of sessions for each export job.gcs-upload-chunk-size
: A large file is uploaded to Cloud Storage in chunks. This parameter along withmax-parallel-upload
are used to control how much data gets uploaded to Cloud Storage at the same time. For example, if thegcs-upload-chunk-size
is 64 MB andmax-parallel-upload
is 10 MB, then theoretically a migration agent can upload 640 MB (64 MB * 10) of data at the same time. If the chunk fails to upload, then the entire chunk has to be retried. The chunk size must be small.max-parallel-upload
: This value determines the maximum number of threads used by the migration agent to upload files to Cloud Storage. If not specified, defaults to the number of processors available to the Java virtual machine. The general rule of thumb is to choose the value based on the number of cores that you have in the machine which runs the agent. So if you haven
cores, then the optimal number of threads should ben
. If the cores are hyper-threaded, then the optimal number should be(2 * n)
. There are also other settings like network bandwidth that you must consider while adjustingmax-parallel-upload
. Adjusting this parameter can improve the performance of uploading to Cloud Storage.spool-mode
: In most cases, the NoSpool mode is the best option.NoSpool
is the default value in agent configuration. You can change this parameter if any of the disadvantages of NoSpool apply to your case.max-unload-file-size
: Determines the maximum extracted file size. This parameter is not enforced for TPT extractions.max-parallel-extract-threads
: This configuration is used only in FastExport mode. It determines the number of parallel threads used for extracting the data from Teradata. Adjusting this parameter could improve the performance of extraction.tpt-template-path
: Use this configuration to provide a custom TPT extraction script as input. You can use this parameter to apply transformations to your migration data.schema-mapping-rule-path
: (Optional) The path to a configuration file that contains a schema mapping to override the default mapping rules. Some mapping types work only with Teradata Parallel Transporter (TPT) mode.Example: Mapping from Teradata type
TIMESTAMP
to BigQuery typeDATETIME
:{ "rules": [ { "database": { "name": "database.*", "tables": [ { "name": "table.*" } ] }, "match": { "type": "COLUMN_TYPE", "value": "TIMESTAMP" }, "action": { "type": "MAPPING", "value": "DATETIME" } } ] }
Attributes:
database
: (Optional)name
is a regular expression for databases to include. All the databases are included by default.tables
: (Optional) contains an array of tables.name
is a regular expression for tables to include. All the tables are included by default.match
: (Required)type
supported values:COLUMN_TYPE
.value
supported values:TIMESTAMP
,DATETIME
.
action
: (Required)type
supported values:MAPPING
.value
supported values:TIMESTAMP
,DATETIME
.
compress-output
: (Optional) dictates whether data should be compressed before storing on Cloud Storage. This is only applied in tpt-mode. By default this value isfalse
.
Run the migration agent
After initializing the migration agent and creating the configuration file, use the following steps to run the agent and start the migration:
Run the agent by specifying the paths to the JDBC driver, the migration agent, and the configuration file that was created in the previous initialization step.
java -cp \ OS-specific-separated-paths-to-jars (JDBC and agent) \ com.google.cloud.bigquery.dms.Agent \ --configuration-file=path to configuration file
Unix, Linux, Mac OS
java -cp \ /usr/local/migration/Teradata/JDBC/terajdbc4.jar:mirroring-agent.jar \ com.google.cloud.bigquery.dms.Agent \ --configuration-file=config.json
Windows
Copy all the files into the
C:\migration
folder (or adjust the paths in the command), then run:java -cp C:\migration\terajdbc4.jar;C:\migration\mirroring-agent.jar com.google.cloud.bigquery.dms.Agent --configuration-file=config.json
If you are ready to proceed with the migration, press
Enter
and the agent will proceed if the classpath provided during initialization is valid.When prompted, type the username and password for the database connection. If the username and password are valid, the data migration starts.
Optional In the command to start the migration, you can also use a flag that passes a credentials file to the agent, instead of entering the username and password each time. See the optional parameter
database-credentials-file-path
in the agent configuration file for more information. When using a credentials file, take appropriate steps to control access to the folder where you store it on the local file system, because it will not be encrypted.Leave this session open until the migration is completed. If you created a recurring migration transfer, keep this session open indefinitely. If this session is interrupted, current and future transfer runs fail.
Periodically monitor if the agent is running. If a transfer run is in progress and no agent responds within 24 hours, the transfer run fails.
If the migration agent stops working while the transfer is in progress or scheduled, the Google Cloud console shows the error status and prompts you to restart the agent. To start the migration agent again, resume from the beginning of this section, running the migration agent, with the command for running the migration agent. You do not need to repeat the initialization command. The transfer resumes from the point where tables were not completed.
Track the progress of the migration
You can view the status of the migration in the Google Cloud console. You can also set up Pub/Sub or email notifications. See BigQuery Data Transfer Service notifications.
The BigQuery Data Transfer Service schedules and initiates a transfer run on a schedule specified upon the creation of transfer configuration. It is important that the migration agent is running when a transfer run is active. If there are no updates from the agent side within 24 hours, a transfer run fails.
Example of migration status in the Google Cloud console:
Upgrade the migration agent
If a new version of the migration agent is available, you must manually update the migration agent. To receive notices about the BigQuery Data Transfer Service, subscribe to the release notes.
What's next
- Try a test migration of Teradata to BigQuery.
- Learn more about the BigQuery Data Transfer Service.
- Migrate SQL code with the Batch SQL translation.