The CREATE MODEL statement for remote models over Cloud AI services
This document describes the CREATE MODEL
statement for creating remote models
in BigQuery over Cloud AI services by using SQL. For example,
the Cloud Natural Language API.
Alternatively, you can use the Google Cloud console user interface to
create a model by using a UI
(Preview) instead of constructing the SQL
statement yourself.
CREATE MODEL
syntax
{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL} `project_id.dataset.model_name` REMOTE WITH CONNECTION `project_id.region.connection_id` OPTIONS(REMOTE_SERVICE_TYPE = remote_service_type [, DOCUMENT_PROCESSOR = document_processor] [, SPEECH_RECOGNIZER = speech_recognizer] );
CREATE MODEL
Creates and trains a new model in the specified dataset. If the model name
exists, CREATE MODEL
returns an error.
CREATE MODEL IF NOT EXISTS
Creates and trains a new model only if the model doesn't exist in the specified dataset.
CREATE OR REPLACE MODEL
Creates and trains a model and replaces an existing model with the same name in the specified dataset.
model_name
The name of the model you're creating or replacing. The model name must be unique in the dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:
- Contain up to 1,024 characters
- Contain letters (upper or lower case), numbers, and underscores
model_name
is not case-sensitive.
If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:
`[PROJECT_ID].[DATASET].[MODEL]`
For example, `myproject.mydataset.mymodel`.
REMOTE WITH CONNECTION
Syntax
`[PROJECT_ID].[LOCATION].[CONNECTION_ID]`
BigQuery uses a Cloud resource connection to interact with the Cloud AI service.
The connection elements are as follows:
PROJECT_ID
: the project ID of the project that contains the connection.LOCATION
: the location used by the connection. The connection must be in the same location as the dataset that contains the model.CONNECTION_ID
: the connection ID—for example,myconnection
.To find your connection ID, view the connection details in the Google Cloud console. The connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example
projects/myproject/locations/connection_location/connections/myconnection
.To use a default connection, specify
DEFAULT
instead of the connection string containing PROJECT_ID.LOCATION.CONNECTION_ID.
If you are using the remote model to analyze unstructured data from an object table, you must also grant the Vertex AI Service Agent role to the service account of the connection associated with the object table. You can find the object table's connection in the Google Cloud console, on the Details pane for the object table.
Example
`myproject.us.my_connection`
REMOTE_SERVICE_TYPE
Syntax
REMOTE_SERVICE_TYPE = { 'CLOUD_AI_NATURAL_LANGUAGE_V1' | 'CLOUD_AI_TRANSLATE_V3' | 'CLOUD_AI_VISION_V1' | 'CLOUD_AI_DOCUMENT_V1' | 'CLOUD_AI_SPEECH_TO_TEXT_V2' }
Description
Specifies the service to use to create the model:
- Cloud Natural Language API
- Cloud Translation API
- Cloud Vision API
- Document AI API
- Speech-to-Text API
After you create a remote model based on a Cloud AI service, you can use the model with one of the following BigQuery ML functions to analyze your BigQuery data:
- For Cloud Natural Language API models, use
ML.UNDERSTAND_TEXT
- For Cloud Translation API models, use
ML.TRANSLATE
- For Cloud Vision API models, use
ML.ANNOTATE_IMAGE
- For Document AI API models, use
ML.PROCESS_DOCUMENT
(preview) - For Speech-to-Text API models, use
ML.TRANSCRIBE
(preview)
Example
REMOTE_SERVICE_TYPE = 'CLOUD_AI_VISION_V1'
DOCUMENT_PROCESSOR
This option identifies the
document processor to use when the
REMOTE_SERVICE_TYPE
value is CLOUD_AI_DOCUMENT_V1
. You must use
this option when creating a remote model over the Document AI API. You
can't use this option with any other type of remote model.
A document processor from Document AI should exist when you specify this option to create the model in BigQuery. You can create a document processor supported by BigQuery by using one of the following options:
- Select a prebuilt processor from the Specialized section of the
Processor Gallery. Supported
processors have a description that starts with the word
Extract
. For example,Invoice Parser
,Utility Parser
, andW2 Parser
are all supported processors. These types of processors extract predefined, domain-specific fields from the documents as output columns. - Use the Form Parser processor or the Layout Parser processor. These processors are available in the Processor Gallery.
- Use Workbench to build a
Custom Extractor
processor based on a Vertex AI foundation model. You can specify the fields for extraction, and then tune the model with custom documents that contain those fields.
You can provide the DOCUMENT_PROCESSOR
value in one of the following formats:
- Use the processor ID if you want to use the default processor version. You can
find the processor ID by
viewing the processor details page.
The processor ID is in the ID field in the Basic information section.
The processor ID is a unique string such as
639b990bd98a132a
. - Use the full processor name if you want to use a specific processor version. You can specify the full processor name in the following format:
projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/PROCESSOR_VERSION
Replace the following:
PROJECT_NUMBER
: the project number of the project that contains the document processor. To find this value, look at the processor details, look at Prediction endpoint, and take the value following the projects element—for examplehttps://us-documentai.googleapis.com/v1/projects/project_number/locations/processor_location/processors/processor_id:process
.LOCATION
: the location used by the document processor. To find this value, look at the processor details, look at Prediction endpoint, and take the value following the locations element—for examplehttps://us-documentai.googleapis.com/v1/projects/project_number/locations/processor_location/processors/processor_id:process
.PROCESSOR_ID
: the document processor ID. To find this value, look at the processor details, look at Prediction endpoint, and take the value following the processors element—for examplehttps://us-documentai.googleapis.com/v1/projects/project_number/locations/processor_location/processors/processor_id:process
.PROCESSOR_VERSION
: the document processor version. You can find this value by looking at the processor details, selecting the Manage Versions tab, and copying the Version ID value of the version you want to use.
The CREATE MODEL
statement fails if any of the following are true:
- The processor is not supported. To fix this, select a processor supported by BigQuery.
- The processor doesn't exist. To fix this, use an existing processor instead.
- The processor exists but is not enabled. To fix this, enable the processor.
- The processor exists in a different region than the model. To fix this, create the model and processor in the same region. The region of the model is determined by the dataset that you create it in.
- The service account of the connection that you use when creating the model doesn't have permission to access for the document processor. Grant the Document AI Viewer IAM role to the service account of the connection.
SPEECH_RECOGNIZER
This option identifies the
speech recognizer to optionally use when
the REMOTE_SERVICE_TYPE
value is CLOUD_AI_SPEECH_TO_TEXT_V2
. If you don't
specify this option, a default recogniser is used. In that case, you must
specify a value for the
recognition_config
argument
of the ML.TRANSCRIBE
function when you reference the remote model. The
recognition_config
argument value provides a configuration for the default
recognizer.
You can only use the chirp
transcription model
in the speech recognizer or recognition_config
value that you provide.
You can't use this option with any other type of remote model.
The SPEECH_RECOGNIZER
value must be a string in the following format:
projects/PROJECT_NUMBER/locations/LOCATION/recognizers/RECOGNIZER_ID
Replace the following:
PROJECT_NUMBER
: the project number of the project that contains the speech recognizer. You can find this value on the Project info card in the Dashboard page of the Google Cloud console.LOCATION
: the location used by the speech recognizer. You can find this value in the Location field on the List recognizers page of the Google Cloud console.RECOGNIZER_ID
: the speech recognizer ID. You can find this value in the ID field on the List recognizers page of the Google Cloud console.
Locations
For information about supported locations, see Locations for remote models.
Example
The following example creates a BigQuery ML remote model that uses the Cloud Vision API:
CREATE MODEL `project_id.mydataset.mymodel` REMOTE WITH CONNECTION `myproject.us.test_connection` OPTIONS(REMOTE_SERVICE_TYPE = 'CLOUD_AI_VISION_V1')
What's next
For more information about Generative AI in BigQuery ML, see Generative AI overview.