Start a trained model deployment Generally available; Added in 8.0.0

POST /_ml/trained_models/{model_id}/deployment/_start

It allocates the model to every machine learning node.

Required authorization

  • Cluster privileges: manage_ml

Path parameters

  • model_id string Required

    The unique identifier of the trained model. Currently, only PyTorch models are supported.

Query parameters

  • cache_size number | string

    The inference cache size (in memory outside the JVM heap) per node for the model. The default value is the same size as the model_size_bytes. To disable the cache, 0b can be provided.

  • deployment_id string

    A unique identifier for the deployment of the model.

  • number_of_allocations number

    The number of model allocations on each node where the model is deployed. All allocations on a node share the same copy of the model in memory but use a separate set of threads to evaluate the model. Increasing this value generally increases the throughput. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads. If adaptive_allocations is enabled, do not set this value, because it’s automatically set.

  • priority string

    The deployment priority.

    Values are normal or low.

  • queue_capacity number

    Specifies the number of inference requests that are allowed in the queue. After the number of requests exceeds this value, new requests are rejected with a 429 error.

  • threads_per_allocation number

    Sets the number of threads used by each model allocation during inference. This generally increases the inference speed. The inference process is a compute-bound process; any number greater than the number of available hardware threads on the machine does not increase the inference speed. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.

  • timeout string

    Specifies the amount of time to wait for the model to deploy.

    Values are -1 or 0.

  • wait_for string

    Specifies the allocation status to wait for before returning.

    Values are started, starting, or fully_allocated.

application/json

Body

  • adaptive_allocations object
    Hide adaptive_allocations attributes Show adaptive_allocations attributes object
    • enabled boolean Required

      If true, adaptive_allocations is enabled

    • min_number_of_allocations number

      Specifies the minimum number of allocations to scale to. If set, it must be greater than or equal to 0. If not defined, the deployment scales to 0.

    • max_number_of_allocations number

      Specifies the maximum number of allocations to scale to. If set, it must be greater than or equal to min_number_of_allocations.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • assignment object Required
      Hide assignment attributes Show assignment attributes object
      • adaptive_allocations object | string | null

        One of:
      • assignment_state string Required

        Values are started, starting, stopping, or failed.

      • max_assigned_allocations number
      • reason string
      • routing_table object Required

        The allocation state for each node.

        Hide routing_table attribute Show routing_table attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • reason string

            The reason for the current state. It is usually populated only when the routing_state is failed.

          • routing_state string Required

            Values are failed, started, starting, stopped, or stopping.

          • current_allocations number Required

            Current number of allocations.

          • target_allocations number Required

            Target number of allocations.

      • start_time string | number Required

        A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

        One of:

        Time unit for milliseconds

      • task_parameters object Required
        Hide task_parameters attributes Show task_parameters attributes object
POST /_ml/trained_models/{model_id}/deployment/_start
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m
resp = client.ml.start_trained_model_deployment(
    model_id="elastic__distilbert-base-uncased-finetuned-conll03-english",
    wait_for="started",
    timeout="1m",
)
const response = await client.ml.startTrainedModelDeployment({
  model_id: "elastic__distilbert-base-uncased-finetuned-conll03-english",
  wait_for: "started",
  timeout: "1m",
});
response = client.ml.start_trained_model_deployment(
  model_id: "elastic__distilbert-base-uncased-finetuned-conll03-english",
  wait_for: "started",
  timeout: "1m"
)
$resp = $client->ml()->startTrainedModelDeployment([
    "model_id" => "elastic__distilbert-base-uncased-finetuned-conll03-english",
    "wait_for" => "started",
    "timeout" => "1m",
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m"