Authentication

Api key auth (http_api_key)

Elasticsearch APIs use key-based authentication. You must create an API key and use the encoded value in the request header. For example:

curl -X GET "${ES_URL}/_cat/indices?v=true" \
  -H "Authorization: ApiKey ${API_KEY}"

For more information about where to find API keys for the Elasticsearch endpoint (${ES_URL}) for a project, go to Get started with Elasticsearch Serverless.


Get behavioral analytics collections Deprecated Technical preview

GET /_application/analytics/{name}

Path parameters

  • name array[string] Required

    A list of analytics collections to limit the returned information

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • * object Additional properties
      Hide * attribute Show * attribute object
      • event_data_stream object Required
        Hide event_data_stream attribute Show event_data_stream attribute object
        • name string Required
GET /_application/analytics/{name}
GET _application/analytics/my*
resp = client.search_application.get_behavioral_analytics(
    name="my*",
)
const response = await client.searchApplication.getBehavioralAnalytics({
  name: "my*",
});
response = client.search_application.get_behavioral_analytics(
  name: "my*"
)
$resp = $client->searchApplication()->getBehavioralAnalytics([
    "name" => "my*",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_application/analytics/my*"
Response examples (200)
A successful response from `GET _application/analytics/my*`
{
  "my_analytics_collection": {
      "event_data_stream": {
          "name": "behavioral_analytics-events-my_analytics_collection"
      }
  },
  "my_analytics_collection2": {
      "event_data_stream": {
          "name": "behavioral_analytics-events-my_analytics_collection2"
      }
  }
}

Create a behavioral analytics collection Deprecated Technical preview

PUT /_application/analytics/{name}

Path parameters

  • name string Required

    The name of the analytics collection to be created or updated.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

    • name string Required
PUT /_application/analytics/{name}
PUT _application/analytics/my_analytics_collection
resp = client.search_application.put_behavioral_analytics(
    name="my_analytics_collection",
)
const response = await client.searchApplication.putBehavioralAnalytics({
  name: "my_analytics_collection",
});
response = client.search_application.put_behavioral_analytics(
  name: "my_analytics_collection"
)
$resp = $client->searchApplication()->putBehavioralAnalytics([
    "name" => "my_analytics_collection",
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_application/analytics/my_analytics_collection"































































































































































































































Get data stream options Generally available

GET /_data_stream/{name}/_options

Get the data stream options configuration of one or more data streams.

Path parameters

  • name string | array[string] Required

    Comma-separated list of data streams to limit the request. Supports wildcards (*). To target all data streams, omit this parameter or use * or _all.

Query parameters

  • expand_wildcards string | array[string]

    Type of data stream that wildcard patterns can match. Supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none.

    Values are all, open, closed, hidden, or none.

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • data_streams array[object] Required
      Hide data_streams attributes Show data_streams attributes object
      • name string Required
      • options object

        Data stream options contain the configuration of data stream level features for a given data stream, for example, the failure store configuration.

        Hide options attribute Show options attribute object
        • failure_store object

          Data stream failure store contains the configuration of the failure store for a given data stream.

          Hide failure_store attributes Show failure_store attributes object
          • enabled boolean

            If defined, it turns the failure store on/off (true/false) for this data stream. A data stream failure store that's disabled (enabled: false) will redirect no new failed indices to the failure store; however, it will not remove any existing data from the failure store.

          • lifecycle object

            The failure store lifecycle configures the data stream lifecycle configuration for failure indices.

            Hide lifecycle attributes Show lifecycle attributes object
            • data_retention string

              A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

            • enabled boolean

              If defined, it turns data stream lifecycle on/off (true/false) for this data stream. A data stream lifecycle that's disabled (enabled: false) will have no effect on the data stream.

GET /_data_stream/{name}/_options
curl \
 --request GET 'http://api.example.com/_data_stream/{name}/_options' \
 --header "Authorization: $API_KEY"


























































































































































































































Delete component templates Generally available

DELETE /_component_template/{name}

Component templates are building blocks for constructing index templates that specify index mappings, settings, and aliases.

Required authorization

  • Cluster privileges: manage_index_templates

Path parameters

  • name string | array[string] Required

    Comma-separated list or wildcard expression of component template names used to limit the request.

Query parameters

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

DELETE /_component_template/{name}
DELETE _component_template/template_1
resp = client.cluster.delete_component_template(
    name="template_1",
)
const response = await client.cluster.deleteComponentTemplate({
  name: "template_1",
});
response = client.cluster.delete_component_template(
  name: "template_1"
)
$resp = $client->cluster()->deleteComponentTemplate([
    "name" => "template_1",
]);
curl -X DELETE -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_component_template/template_1"

































































































































































































































Get an inference endpoint Generally available

GET /_inference/{inference_id}

Path parameters

  • inference_id string Required

    The inference Id

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • endpoints array[object] Required

      Represents an inference endpoint as returned by the GET API

      Hide endpoints attributes Show endpoints attributes object

      Represents an inference endpoint as returned by the GET API

      • chunking_settings object

        Chunking configuration object

        Hide chunking_settings attributes Show chunking_settings attributes object
        • max_chunk_size number

          The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

        • overlap number

          The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

        • sentence_overlap number

          The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

        • strategy string

          The chunking strategy: sentence or word.

      • service string Required

        The service type

      • service_settings object Required
      • task_settings object
      • inference_id string Required

        The inference Id

      • task_type string Required

        Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

GET /_inference/{inference_id}
GET _inference/sparse_embedding/my-elser-model
resp = client.inference.get(
    task_type="sparse_embedding",
    inference_id="my-elser-model",
)
const response = await client.inference.get({
  task_type: "sparse_embedding",
  inference_id: "my-elser-model",
});
response = client.inference.get(
  task_type: "sparse_embedding",
  inference_id: "my-elser-model"
)
$resp = $client->inference()->get([
    "task_type" => "sparse_embedding",
    "inference_id" => "my-elser-model",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_inference/sparse_embedding/my-elser-model"

Create an inference endpoint Generally available

PUT /_inference/{inference_id}

IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Mistral, Azure OpenAI, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.

The following integrations are available through the inference API. You can find the available task types next to the integration name:

  • AlibabaCloud AI Search (completion, rerank, sparse_embedding, text_embedding)
  • Amazon Bedrock (completion, text_embedding)
  • Anthropic (completion)
  • Azure AI Studio (completion, text_embedding)
  • Azure OpenAI (completion, text_embedding)
  • Cohere (completion, rerank, text_embedding)
  • Elasticsearch (rerank, sparse_embedding, text_embedding - this service is for built-in models and models uploaded through Eland)
  • ELSER (sparse_embedding)
  • Google AI Studio (completion, text_embedding)
  • Google Vertex AI (rerank, text_embedding)
  • Hugging Face (chat_completion, completion, rerank, text_embedding)
  • Mistral (chat_completion, completion, text_embedding)
  • OpenAI (chat_completion, completion, text_embedding)
  • VoyageAI (text_embedding, rerank)
  • Watsonx inference integration (text_embedding)
  • JinaAI (text_embedding, rerank)

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • inference_id string Required

    The inference Id

application/json

Body Required

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    The service type

  • service_settings object Required
  • task_settings object

Responses

  • 200 application/json
    Hide response attributes Show response attributes object

    Represents an inference endpoint as returned by the GET API

    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are sparse_embedding, text_embedding, rerank, completion, or chat_completion.

PUT /_inference/{inference_id}
PUT _inference/rerank/my-rerank-model
{
 "service": "cohere",
 "service_settings": {
   "model_id": "rerank-english-v3.0",
   "api_key": "{{COHERE_API_KEY}}"
 }
}
resp = client.inference.put(
    task_type="rerank",
    inference_id="my-rerank-model",
    inference_config={
        "service": "cohere",
        "service_settings": {
            "model_id": "rerank-english-v3.0",
            "api_key": "{{COHERE_API_KEY}}"
        }
    },
)
const response = await client.inference.put({
  task_type: "rerank",
  inference_id: "my-rerank-model",
  inference_config: {
    service: "cohere",
    service_settings: {
      model_id: "rerank-english-v3.0",
      api_key: "{{COHERE_API_KEY}}",
    },
  },
});
response = client.inference.put(
  task_type: "rerank",
  inference_id: "my-rerank-model",
  body: {
    "service": "cohere",
    "service_settings": {
      "model_id": "rerank-english-v3.0",
      "api_key": "{{COHERE_API_KEY}}"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "rerank",
    "inference_id" => "my-rerank-model",
    "body" => [
        "service" => "cohere",
        "service_settings" => [
            "model_id" => "rerank-english-v3.0",
            "api_key" => "{{COHERE_API_KEY}}",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"cohere","service_settings":{"model_id":"rerank-english-v3.0","api_key":"{{COHERE_API_KEY}}"}}' "$ELASTICSEARCH_URL/_inference/rerank/my-rerank-model"
Request example
An example body for a `PUT _inference/rerank/my-rerank-model` request.
{
 "service": "cohere",
 "service_settings": {
   "model_id": "rerank-english-v3.0",
   "api_key": "{{COHERE_API_KEY}}"
 }
}

Perform inference on the service Generally available

POST /_inference/{inference_id}

This API enables you to use machine learning models to perform specific tasks on data that you provide as an input. It returns a response with the results of the tasks. The inference endpoint you use can perform one specific task that has been defined when the endpoint was created with the create inference API.

For details about using this API with a service, such as Amazon Bedrock, Anthropic, or HuggingFace, refer to the service-specific documentation.


The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.

Required authorization

  • Cluster privileges: monitor_inference

Path parameters

  • inference_id string Required

    The unique identifier for the inference endpoint.

Query parameters

  • timeout string

    The amount of time to wait for the inference request to complete.

    Values are -1 or 0.

application/json

Body

  • query string

    The query input, which is required only for the rerank task. It is not required for other tasks.

  • input string | array[string] Required

    The text on which you want to perform the inference task. It can be a single string or an array.


    Inference endpoints for the completion task type currently only support a single string as input.

  • task_settings object

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • text_embedding_bytes array[object]

      The text embedding result object for byte representation

      Hide text_embedding_bytes attribute Show text_embedding_bytes attribute object
      • embedding array[number] Required

        Text Embedding results containing bytes are represented as Dense Vectors of bytes.

    • text_embedding_bits array[object]

      The text embedding result object for byte representation

      Hide text_embedding_bits attribute Show text_embedding_bits attribute object
      • embedding array[number] Required

        Text Embedding results containing bytes are represented as Dense Vectors of bytes.

    • text_embedding array[object]

      The text embedding result object

      Hide text_embedding attribute Show text_embedding attribute object
      • embedding array[number] Required

        Text Embedding results are represented as Dense Vectors of floats.

    • sparse_embedding array[object]
      Hide sparse_embedding attribute Show sparse_embedding attribute object
      • embedding object Required

        Sparse Embedding tokens are represented as a dictionary of string to double.

        Hide embedding attribute Show embedding attribute object
        • * number Additional properties
    • completion array[object]

      The completion result object

      Hide completion attribute Show completion attribute object
      • result string Required
    • rerank array[object]

      The rerank result object representing a single ranked document id: the original index of the document in the request relevance_score: the relevance_score of the document relative to the query text: Optional, the text of the document, if requested

      Hide rerank attributes Show rerank attributes object
      • index number Required
      • relevance_score number Required
      • text string
POST /_inference/{inference_id}
curl \
 --request POST 'http://api.example.com/_inference/{inference_id}' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"query":"string","input":"string","task_settings":{}}'












































































Create an OpenAI inference endpoint Generally available

PUT /_inference/{task_type}/{openai_inference_id}

Create an inference endpoint to perform an inference task with the openai service or openai compatible APIs.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string Required

    The type of the inference task that the model will perform. NOTE: The chat_completion task type only supports streaming and only through the _stream API.

    Values are chat_completion, completion, or text_embedding.

  • openai_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is openai.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid API key of your OpenAI account. You can find your OpenAI API keys in your OpenAI account under the API keys section.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

      External documentation
    • dimensions number

      The number of dimensions the resulting output embeddings should have. It is supported only in text-embedding-3 and later models. If it is not set, the OpenAI defined default for the model is used.

    • model_id string Required

      The name of the model to use for the inference task. Refer to the OpenAI documentation for the list of available text embedding models.

      External documentation
    • organization_id string

      The unique identifier for your organization. You can find the Organization ID in your OpenAI account under Settings > Organizations.

    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
    • url string

      The URL endpoint to use for the requests. It can be changed for testing purposes.

  • task_settings object
    Hide task_settings attribute Show task_settings attribute object
    • user string

      For a completion or text_embedding task, specify the user issuing the request. This information can be used for abuse detection.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding, chat_completion, or completion.

PUT /_inference/{task_type}/{openai_inference_id}
PUT _inference/text_embedding/openai-embeddings
{
    "service": "openai",
    "service_settings": {
        "api_key": "OpenAI-API-Key",
        "model_id": "text-embedding-3-small",
        "dimensions": 128
    }
}
resp = client.inference.put(
    task_type="text_embedding",
    inference_id="openai-embeddings",
    inference_config={
        "service": "openai",
        "service_settings": {
            "api_key": "OpenAI-API-Key",
            "model_id": "text-embedding-3-small",
            "dimensions": 128
        }
    },
)
const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "openai-embeddings",
  inference_config: {
    service: "openai",
    service_settings: {
      api_key: "OpenAI-API-Key",
      model_id: "text-embedding-3-small",
      dimensions: 128,
    },
  },
});
response = client.inference.put(
  task_type: "text_embedding",
  inference_id: "openai-embeddings",
  body: {
    "service": "openai",
    "service_settings": {
      "api_key": "OpenAI-API-Key",
      "model_id": "text-embedding-3-small",
      "dimensions": 128
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "text_embedding",
    "inference_id" => "openai-embeddings",
    "body" => [
        "service" => "openai",
        "service_settings" => [
            "api_key" => "OpenAI-API-Key",
            "model_id" => "text-embedding-3-small",
            "dimensions" => 128,
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"openai","service_settings":{"api_key":"OpenAI-API-Key","model_id":"text-embedding-3-small","dimensions":128}}' "$ELASTICSEARCH_URL/_inference/text_embedding/openai-embeddings"
Request examples
Run `PUT _inference/text_embedding/openai-embeddings` to create an inference endpoint that performs a `text_embedding` task. The embeddings created by requests to this endpoint will have 128 dimensions.
{
    "service": "openai",
    "service_settings": {
        "api_key": "OpenAI-API-Key",
        "model_id": "text-embedding-3-small",
        "dimensions": 128
    }
}
Run `PUT _inference/completion/amazon_bedrock_completion` to create an inference endpoint to perform a completion task.
{
    "service": "amazonbedrock",
    "service_settings": {
        "access_key": "AWS-access-key",
        "secret_key": "AWS-secret-key",
        "region": "us-east-1",
        "provider": "amazontitan",
        "model": "amazon.titan-text-premier-v1:0"
    }
}




Create a Watsonx inference endpoint Generally available

PUT /_inference/{task_type}/{watsonx_inference_id}

Create an inference endpoint to perform an inference task with the watsonxai service. You need an IBM Cloud Databases for Elasticsearch deployment to use the watsonxai inference service. You can provision one through the IBM catalog, the Cloud Databases CLI plug-in, the Cloud Databases API, or Terraform.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string Required

    The type of the inference task that the model will perform.

    Values are text_embedding, chat_completion, or completion.

  • watsonx_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • service string Required

    Value is watsonxai.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid API key of your Watsonx account. You can find your Watsonx API keys or you can create a new one on the API keys page.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

      External documentation
    • api_version string Required

      A version parameter that takes a version date in the format of YYYY-MM-DD. For the active version data parameters, refer to the Wastonx documentation.

      External documentation
    • model_id string Required

      The name of the model to use for the inference task. Refer to the IBM Embedding Models section in the Watsonx documentation for the list of available text embedding models. Refer to the IBM library - Foundation models in Watsonx.ai.

      External documentation
    • project_id string Required

      The identifier of the IBM Cloud project to use for the inference task.

    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
    • url string Required

      The URL of the inference endpoint that you created on Watsonx.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding, chat_completion, or completion.

PUT /_inference/{task_type}/{watsonx_inference_id}
PUT _inference/text_embedding/watsonx-embeddings
{
  "service": "watsonxai",
  "service_settings": {
      "api_key": "Watsonx-API-Key", 
      "url": "Wastonx-URL", 
      "model_id": "ibm/slate-30m-english-rtrvr",
      "project_id": "IBM-Cloud-ID", 
      "api_version": "2024-03-14"
  }
}
resp = client.inference.put(
    task_type="text_embedding",
    inference_id="watsonx-embeddings",
    inference_config={
        "service": "watsonxai",
        "service_settings": {
            "api_key": "Watsonx-API-Key",
            "url": "Wastonx-URL",
            "model_id": "ibm/slate-30m-english-rtrvr",
            "project_id": "IBM-Cloud-ID",
            "api_version": "2024-03-14"
        }
    },
)
const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "watsonx-embeddings",
  inference_config: {
    service: "watsonxai",
    service_settings: {
      api_key: "Watsonx-API-Key",
      url: "Wastonx-URL",
      model_id: "ibm/slate-30m-english-rtrvr",
      project_id: "IBM-Cloud-ID",
      api_version: "2024-03-14",
    },
  },
});
response = client.inference.put(
  task_type: "text_embedding",
  inference_id: "watsonx-embeddings",
  body: {
    "service": "watsonxai",
    "service_settings": {
      "api_key": "Watsonx-API-Key",
      "url": "Wastonx-URL",
      "model_id": "ibm/slate-30m-english-rtrvr",
      "project_id": "IBM-Cloud-ID",
      "api_version": "2024-03-14"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "text_embedding",
    "inference_id" => "watsonx-embeddings",
    "body" => [
        "service" => "watsonxai",
        "service_settings" => [
            "api_key" => "Watsonx-API-Key",
            "url" => "Wastonx-URL",
            "model_id" => "ibm/slate-30m-english-rtrvr",
            "project_id" => "IBM-Cloud-ID",
            "api_version" => "2024-03-14",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"watsonxai","service_settings":{"api_key":"Watsonx-API-Key","url":"Wastonx-URL","model_id":"ibm/slate-30m-english-rtrvr","project_id":"IBM-Cloud-ID","api_version":"2024-03-14"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/watsonx-embeddings"
Request example
Run `PUT _inference/text_embedding/watsonx-embeddings` to create an Watonsx inference endpoint that performs a text embedding task.
{
  "service": "watsonxai",
  "service_settings": {
      "api_key": "Watsonx-API-Key", 
      "url": "Wastonx-URL", 
      "model_id": "ibm/slate-30m-english-rtrvr",
      "project_id": "IBM-Cloud-ID", 
      "api_version": "2024-03-14"
  }
}




Perform sparse embedding inference on the service Generally available

POST /_inference/sparse_embedding/{inference_id}

Path parameters

  • inference_id string Required

    The inference Id

Query parameters

  • timeout string

    Specifies the amount of time to wait for the inference request to complete.

    Values are -1 or 0.

application/json

Body

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • sparse_embedding array[object] Required
      Hide sparse_embedding attribute Show sparse_embedding attribute object
      • embedding object Required

        Sparse Embedding tokens are represented as a dictionary of string to double.

        Hide embedding attribute Show embedding attribute object
        • * number Additional properties
POST /_inference/sparse_embedding/{inference_id}
POST _inference/sparse_embedding/my-elser-model
{
  "input": "The sky above the port was the color of television tuned to a dead channel."
}
resp = client.inference.sparse_embedding(
    inference_id="my-elser-model",
    input="The sky above the port was the color of television tuned to a dead channel.",
)
const response = await client.inference.sparseEmbedding({
  inference_id: "my-elser-model",
  input:
    "The sky above the port was the color of television tuned to a dead channel.",
});
response = client.inference.sparse_embedding(
  inference_id: "my-elser-model",
  body: {
    "input": "The sky above the port was the color of television tuned to a dead channel."
  }
)
$resp = $client->inference()->sparseEmbedding([
    "inference_id" => "my-elser-model",
    "body" => [
        "input" => "The sky above the port was the color of television tuned to a dead channel.",
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"input":"The sky above the port was the color of television tuned to a dead channel."}' "$ELASTICSEARCH_URL/_inference/sparse_embedding/my-elser-model"
Request example
Run `POST _inference/sparse_embedding/my-elser-model` to perform sparse embedding on the example sentence.
{
  "input": "The sky above the port was the color of television tuned to a dead channel."
}
Response examples (200)
An abbreviated response from `POST _inference/sparse_embedding/my-elser-model`.
{
  "sparse_embedding": [
    {
      "port": 2.1259406,
      "sky": 1.7073475,
      "color": 1.6922266,
      "dead": 1.6247464,
      "television": 1.3525393,
      "above": 1.2425821,
      "tuned": 1.1440028,
      "colors": 1.1218185,
      "tv": 1.0111054,
      "ports": 1.0067928,
      "poem": 1.0042328,
      "channel": 0.99471164,
      "tune": 0.96235967,
      "scene": 0.9020516
    }
  ]
}

Perform text embedding inference on the service Generally available

POST /_inference/text_embedding/{inference_id}

Path parameters

  • inference_id string Required

    The inference Id

Query parameters

  • timeout string

    Specifies the amount of time to wait for the inference request to complete.

    Values are -1 or 0.

application/json

Body

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • text_embedding_bytes array[object]

      The text embedding result object for byte representation

      Hide text_embedding_bytes attribute Show text_embedding_bytes attribute object
      • embedding array[number] Required

        Text Embedding results containing bytes are represented as Dense Vectors of bytes.

    • text_embedding_bits array[object]

      The text embedding result object for byte representation

      Hide text_embedding_bits attribute Show text_embedding_bits attribute object
      • embedding array[number] Required

        Text Embedding results containing bytes are represented as Dense Vectors of bytes.

    • text_embedding array[object]

      The text embedding result object

      Hide text_embedding attribute Show text_embedding attribute object
      • embedding array[number] Required

        Text Embedding results are represented as Dense Vectors of floats.

POST /_inference/text_embedding/{inference_id}
POST _inference/text_embedding/my-cohere-endpoint
{
  "input": "The sky above the port was the color of television tuned to a dead channel.",
  "task_settings": {
    "input_type": "ingest"
  }
}
resp = client.inference.text_embedding(
    inference_id="my-cohere-endpoint",
    input="The sky above the port was the color of television tuned to a dead channel.",
    task_settings={
        "input_type": "ingest"
    },
)
const response = await client.inference.textEmbedding({
  inference_id: "my-cohere-endpoint",
  input:
    "The sky above the port was the color of television tuned to a dead channel.",
  task_settings: {
    input_type: "ingest",
  },
});
response = client.inference.text_embedding(
  inference_id: "my-cohere-endpoint",
  body: {
    "input": "The sky above the port was the color of television tuned to a dead channel.",
    "task_settings": {
      "input_type": "ingest"
    }
  }
)
$resp = $client->inference()->textEmbedding([
    "inference_id" => "my-cohere-endpoint",
    "body" => [
        "input" => "The sky above the port was the color of television tuned to a dead channel.",
        "task_settings" => [
            "input_type" => "ingest",
        ],
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"input":"The sky above the port was the color of television tuned to a dead channel.","task_settings":{"input_type":"ingest"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/my-cohere-endpoint"
Request example
Run `POST _inference/text_embedding/my-cohere-endpoint` to perform text embedding on the example sentence using the Cohere integration,
{
  "input": "The sky above the port was the color of television tuned to a dead channel.",
  "task_settings": {
    "input_type": "ingest"
  }
}
Response examples (200)
An abbreviated response from `POST _inference/text_embedding/my-cohere-endpoint`.
{
  "text_embedding": [
    {
      "embedding": [
        {
          0.018569946,
          -0.036895752,
          0.01486969,
          -0.0045204163,
          -0.04385376,
          0.0075950623,
          0.04260254,
          -0.004005432,
          0.007865906,
          0.030792236,
          -0.050476074,
          0.011795044,
          -0.011642456,
          -0.010070801
        }
      ]
    }
  ]
}














































































































































































































































































































































































































































































































































































Render a search template Generally available

GET /_render/template

Render a search template as a search request body.

Required authorization

  • Index privileges: read
application/json

Body

  • id string
  • file string
  • params object

    Key-value pairs used to replace Mustache variables in the template. The key is the variable name. The value is the variable value.

    Hide params attribute Show params attribute object
    • * object Additional properties
  • source string | object

    One of:

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • template_output object Required
      Hide template_output attribute Show template_output attribute object
      • * object Additional properties
POST _render/template
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 20,
    "size": 10
  }
}
resp = client.render_search_template(
    id="my-search-template",
    params={
        "query_string": "hello world",
        "from": 20,
        "size": 10
    },
)
const response = await client.renderSearchTemplate({
  id: "my-search-template",
  params: {
    query_string: "hello world",
    from: 20,
    size: 10,
  },
});
response = client.render_search_template(
  body: {
    "id": "my-search-template",
    "params": {
      "query_string": "hello world",
      "from": 20,
      "size": 10
    }
  }
)
$resp = $client->renderSearchTemplate([
    "body" => [
        "id" => "my-search-template",
        "params" => [
            "query_string" => "hello world",
            "from" => 20,
            "size" => 10,
        ],
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"id":"my-search-template","params":{"query_string":"hello world","from":20,"size":10}}' "$ELASTICSEARCH_URL/_render/template"
Request example
Run `POST _render/template`
{
  "id": "my-search-template",
  "params": {
    "query_string": "hello world",
    "from": 20,
    "size": 10
  }
}