Get CAT help Generally available

GET /_cat

Get help for the CAT APIs.

Responses

  • 200 application/json
GET /_cat
curl \
 --request GET 'http://api.example.com/_cat' \
 --header "Authorization: $API_KEY"






















































































Cancel a connector sync job Beta

PUT /_connector/_sync_job/{connector_sync_job_id}/_cancel

Cancel a connector sync job, which sets the status to cancelling and updates cancellation_requested_at to the current time. The connector service is then responsible for setting the status of connector sync jobs to cancelled.

Path parameters

  • connector_sync_job_id string Required

    The unique identifier of the connector sync job

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/_sync_job/{connector_sync_job_id}/_cancel
PUT _connector/_sync_job/my-connector-sync-job-id/_cancel
resp = client.connector.sync_job_cancel(
    connector_sync_job_id="my-connector-sync-job-id",
)
const response = await client.connector.syncJobCancel({
  connector_sync_job_id: "my-connector-sync-job-id",
});
response = client.connector.sync_job_cancel(
  connector_sync_job_id: "my-connector-sync-job-id"
)
$resp = $client->connector()->syncJobCancel([
    "connector_sync_job_id" => "my-connector-sync-job-id",
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_connector/_sync_job/my-connector-sync-job-id/_cancel"

Get a connector sync job Beta

GET /_connector/_sync_job/{connector_sync_job_id}

Path parameters

  • connector_sync_job_id string Required

    The unique identifier of the connector sync job

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • cancelation_requested_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • canceled_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • completed_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • connector object Required
      Hide connector attributes Show connector attributes object
      • configuration object Required
        Hide configuration attribute Show configuration attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • category string
          • default_value number | string | boolean | null Required

            A scalar value.

          • depends_on array[object] Required
            Hide depends_on attributes Show depends_on attributes object
            • field string Required
            • value
          • display string Required

            Values are textbox, textarea, numeric, toggle, or dropdown.

          • label string Required
          • options array[object] Required
            Hide options attributes Show options attributes object
            • label string Required
            • value
          • order number
          • placeholder string
          • required boolean Required
          • sensitive boolean Required
          • type string

            Values are str, int, list, or bool.

          • ui_restrictions array[string]
          • validations array[object]
          • value object Required
      • filtering object Required
        Hide filtering attributes Show filtering attributes object
        • advanced_snippet object Required
          Hide advanced_snippet attributes Show advanced_snippet attributes object
          • created_at string | number

            A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

            One of:
          • updated_at string | number

            A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

            One of:
          • value object Required
        • rules array[object] Required
          Hide rules attributes Show rules attributes object
          • created_at string
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • id string Required
          • order number Required
          • policy string Required

            Values are exclude or include.

          • rule string Required

            Values are contains, ends_with, equals, regex, starts_with, >, or <.

          • updated_at string
          • value string Required
        • validation object Required
          Hide validation attributes Show validation attributes object
          • errors array[object] Required
            Hide errors attributes Show errors attributes object
            • ids array[string] Required
            • messages array[string] Required
          • state string Required

            Values are edited, invalid, or valid.

      • id string Required
      • index_name string Required
      • language string
      • pipeline object
        Hide pipeline attributes Show pipeline attributes object
        • extract_binary_content boolean Required
        • name string Required
        • reduce_whitespace boolean Required
        • run_ml_inference boolean Required
      • service_type string Required
      • sync_cursor object
    • created_at string | number Required

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • deleted_document_count number Required
    • error string
    • id string Required
    • indexed_document_count number Required
    • indexed_document_volume number Required
    • job_type string Required

      Values are full, incremental, or access_control.

    • last_seen string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • metadata object Required
      Hide metadata attribute Show metadata attribute object
      • * object Additional properties
    • started_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • status string Required

      Values are canceling, canceled, completed, error, in_progress, pending, or suspended.

    • total_document_count number Required
    • trigger_method string Required

      Values are on_demand or scheduled.

    • worker_hostname string
GET /_connector/_sync_job/{connector_sync_job_id}
GET _connector/_sync_job/my-connector-sync-job
resp = client.connector.sync_job_get(
    connector_sync_job_id="my-connector-sync-job",
)
const response = await client.connector.syncJobGet({
  connector_sync_job_id: "my-connector-sync-job",
});
response = client.connector.sync_job_get(
  connector_sync_job_id: "my-connector-sync-job"
)
$resp = $client->connector()->syncJobGet([
    "connector_sync_job_id" => "my-connector-sync-job",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_connector/_sync_job/my-connector-sync-job"
















Update the connector API key ID Beta

PUT /_connector/{connector_id}/_api_key_id

Update the api_key_id and api_key_secret_id fields of a connector. You can specify the ID of the API key used for authorization and the ID of the connector secret where the API key is stored. The connector secret ID is required only for Elastic managed (native) connectors. Self-managed connectors (connector clients) do not use this field.

Path parameters

  • connector_id string Required

    The unique identifier of the connector to be updated

application/json

Body Required

  • api_key_id string
  • api_key_secret_id string

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/{connector_id}/_api_key_id
PUT _connector/my-connector/_api_key_id
{
    "api_key_id": "my-api-key-id",
    "api_key_secret_id": "my-connector-secret-id"
}
resp = client.connector.update_api_key_id(
    connector_id="my-connector",
    api_key_id="my-api-key-id",
    api_key_secret_id="my-connector-secret-id",
)
const response = await client.connector.updateApiKeyId({
  connector_id: "my-connector",
  api_key_id: "my-api-key-id",
  api_key_secret_id: "my-connector-secret-id",
});
response = client.connector.update_api_key_id(
  connector_id: "my-connector",
  body: {
    "api_key_id": "my-api-key-id",
    "api_key_secret_id": "my-connector-secret-id"
  }
)
$resp = $client->connector()->updateApiKeyId([
    "connector_id" => "my-connector",
    "body" => [
        "api_key_id" => "my-api-key-id",
        "api_key_secret_id" => "my-connector-secret-id",
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"api_key_id":"my-api-key-id","api_key_secret_id":"my-connector-secret-id"}' "$ELASTICSEARCH_URL/_connector/my-connector/_api_key_id"
Request example
{
    "api_key_id": "my-api-key-id",
    "api_key_secret_id": "my-connector-secret-id"
}
Response examples (200)
{
  "result": "updated"
}

Update the connector configuration Beta

PUT /_connector/{connector_id}/_configuration

Update the configuration field in the connector document.

Path parameters

  • connector_id string Required

    The unique identifier of the connector to be updated

application/json

Body Required

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/{connector_id}/_configuration
PUT _connector/my-spo-connector/_configuration
{
    "values": {
        "tenant_id": "my-tenant-id",
        "tenant_name": "my-sharepoint-site",
        "client_id": "foo",
        "secret_value": "bar",
        "site_collections": "*"
    }
}
resp = client.connector.update_configuration(
    connector_id="my-spo-connector",
    values={
        "tenant_id": "my-tenant-id",
        "tenant_name": "my-sharepoint-site",
        "client_id": "foo",
        "secret_value": "bar",
        "site_collections": "*"
    },
)
const response = await client.connector.updateConfiguration({
  connector_id: "my-spo-connector",
  values: {
    tenant_id: "my-tenant-id",
    tenant_name: "my-sharepoint-site",
    client_id: "foo",
    secret_value: "bar",
    site_collections: "*",
  },
});
response = client.connector.update_configuration(
  connector_id: "my-spo-connector",
  body: {
    "values": {
      "tenant_id": "my-tenant-id",
      "tenant_name": "my-sharepoint-site",
      "client_id": "foo",
      "secret_value": "bar",
      "site_collections": "*"
    }
  }
)
$resp = $client->connector()->updateConfiguration([
    "connector_id" => "my-spo-connector",
    "body" => [
        "values" => [
            "tenant_id" => "my-tenant-id",
            "tenant_name" => "my-sharepoint-site",
            "client_id" => "foo",
            "secret_value" => "bar",
            "site_collections" => "*",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"values":{"tenant_id":"my-tenant-id","tenant_name":"my-sharepoint-site","client_id":"foo","secret_value":"bar","site_collections":"*"}}' "$ELASTICSEARCH_URL/_connector/my-spo-connector/_configuration"
{
    "values": {
        "tenant_id": "my-tenant-id",
        "tenant_name": "my-sharepoint-site",
        "client_id": "foo",
        "secret_value": "bar",
        "site_collections": "*"
    }
}
{
    "values": {
        "secret_value": "foo-bar"
    }
}
Response examples (200)
{
  "result": "updated"
}




Update the connector filtering Beta

PUT /_connector/{connector_id}/_filtering

Update the draft filtering configuration of a connector and marks the draft validation state as edited. The filtering draft is activated once validated by the running Elastic connector service. The filtering property is used to configure sync rules (both basic and advanced) for a connector.

Path parameters

  • connector_id string Required

    The unique identifier of the connector to be updated

application/json

Body Required

  • filtering array[object]
    Hide filtering attributes Show filtering attributes object
    • active object Required
      Hide active attributes Show active attributes object
      • advanced_snippet object Required
        Hide advanced_snippet attributes Show advanced_snippet attributes object
        • created_at string | number

          A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

          One of:
        • updated_at string | number

          A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

          One of:
        • value object Required
      • rules array[object] Required
        Hide rules attributes Show rules attributes object
        • created_at string
        • field string Required

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • id string Required
        • order number Required
        • policy string Required

          Values are exclude or include.

        • rule string Required

          Values are contains, ends_with, equals, regex, starts_with, >, or <.

        • updated_at string
        • value string Required
      • validation object Required
        Hide validation attributes Show validation attributes object
        • errors array[object] Required
          Hide errors attributes Show errors attributes object
          • ids array[string] Required
          • messages array[string] Required
        • state string Required

          Values are edited, invalid, or valid.

    • domain string
    • draft object Required
      Hide draft attributes Show draft attributes object
      • advanced_snippet object Required
        Hide advanced_snippet attributes Show advanced_snippet attributes object
        • created_at string | number

          A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

          One of:
        • updated_at string | number

          A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

          One of:
        • value object Required
      • rules array[object] Required
        Hide rules attributes Show rules attributes object
        • created_at string
        • field string Required

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • id string Required
        • order number Required
        • policy string Required

          Values are exclude or include.

        • rule string Required

          Values are contains, ends_with, equals, regex, starts_with, >, or <.

        • updated_at string
        • value string Required
      • validation object Required
        Hide validation attributes Show validation attributes object
        • errors array[object] Required
          Hide errors attributes Show errors attributes object
          • ids array[string] Required
          • messages array[string] Required
        • state string Required

          Values are edited, invalid, or valid.

  • rules array[object]
    Hide rules attributes Show rules attributes object
    • created_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • field string Required

      Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

    • id string Required
    • order number Required
    • policy string Required

      Values are exclude or include.

    • rule string Required

      Values are contains, ends_with, equals, regex, starts_with, >, or <.

    • updated_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • value string Required
  • advanced_snippet object
    Hide advanced_snippet attributes Show advanced_snippet attributes object
    • created_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • updated_at string | number

      A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

      One of:
    • value object Required

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/{connector_id}/_filtering
PUT _connector/my-g-drive-connector/_filtering
{
    "rules": [
         {
            "field": "file_extension",
            "id": "exclude-txt-files",
            "order": 0,
            "policy": "exclude",
            "rule": "equals",
            "value": "txt"
        },
        {
            "field": "_",
            "id": "DEFAULT",
            "order": 1,
            "policy": "include",
            "rule": "regex",
            "value": ".*"
        }
    ]
}
resp = client.connector.update_filtering(
    connector_id="my-g-drive-connector",
    rules=[
        {
            "field": "file_extension",
            "id": "exclude-txt-files",
            "order": 0,
            "policy": "exclude",
            "rule": "equals",
            "value": "txt"
        },
        {
            "field": "_",
            "id": "DEFAULT",
            "order": 1,
            "policy": "include",
            "rule": "regex",
            "value": ".*"
        }
    ],
)
const response = await client.connector.updateFiltering({
  connector_id: "my-g-drive-connector",
  rules: [
    {
      field: "file_extension",
      id: "exclude-txt-files",
      order: 0,
      policy: "exclude",
      rule: "equals",
      value: "txt",
    },
    {
      field: "_",
      id: "DEFAULT",
      order: 1,
      policy: "include",
      rule: "regex",
      value: ".*",
    },
  ],
});
response = client.connector.update_filtering(
  connector_id: "my-g-drive-connector",
  body: {
    "rules": [
      {
        "field": "file_extension",
        "id": "exclude-txt-files",
        "order": 0,
        "policy": "exclude",
        "rule": "equals",
        "value": "txt"
      },
      {
        "field": "_",
        "id": "DEFAULT",
        "order": 1,
        "policy": "include",
        "rule": "regex",
        "value": ".*"
      }
    ]
  }
)
$resp = $client->connector()->updateFiltering([
    "connector_id" => "my-g-drive-connector",
    "body" => [
        "rules" => array(
            [
                "field" => "file_extension",
                "id" => "exclude-txt-files",
                "order" => 0,
                "policy" => "exclude",
                "rule" => "equals",
                "value" => "txt",
            ],
            [
                "field" => "_",
                "id" => "DEFAULT",
                "order" => 1,
                "policy" => "include",
                "rule" => "regex",
                "value" => ".*",
            ],
        ),
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"rules":[{"field":"file_extension","id":"exclude-txt-files","order":0,"policy":"exclude","rule":"equals","value":"txt"},{"field":"_","id":"DEFAULT","order":1,"policy":"include","rule":"regex","value":".*"}]}' "$ELASTICSEARCH_URL/_connector/my-g-drive-connector/_filtering"
Request examples
{
    "rules": [
         {
            "field": "file_extension",
            "id": "exclude-txt-files",
            "order": 0,
            "policy": "exclude",
            "rule": "equals",
            "value": "txt"
        },
        {
            "field": "_",
            "id": "DEFAULT",
            "order": 1,
            "policy": "include",
            "rule": "regex",
            "value": ".*"
        }
    ]
}
{
    "advanced_snippet": {
        "value": [{
            "tables": [
                "users",
                "orders"
            ],
            "query": "SELECT users.id AS id, orders.order_id AS order_id FROM users JOIN orders ON users.id = orders.user_id"
        }]
    }
}
Response examples (200)
{
  "result": "updated"
}












Update the connector is_native flag Beta

PUT /_connector/{connector_id}/_native

Path parameters

  • connector_id string Required

    The unique identifier of the connector to be updated

application/json

Body Required

  • is_native boolean Required

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result string Required

      Values are created, updated, deleted, not_found, or noop.

PUT /_connector/{connector_id}/_native
curl \
 --request PUT 'http://api.example.com/_connector/{connector_id}/_native' \
 --header "Authorization: $API_KEY" \
 --header "Content-Type: application/json" \
 --data '{"is_native":true}'










































































































Delete a document Generally available

DELETE /{index}/_doc/{id}

Remove a JSON document from the specified index.

NOTE: You cannot send deletion requests directly to a data stream. To delete a document in a data stream, you must target the backing index containing the document.

Optimistic concurrency control

Delete operations can be made conditional and only be performed if the last modification to the document was assigned the sequence number and primary term specified by the if_seq_no and if_primary_term parameters. If a mismatch is detected, the operation will result in a VersionConflictException and a status code of 409.

Versioning

Each document indexed is versioned. When deleting a document, the version can be specified to make sure the relevant document you are trying to delete is actually being deleted and it has not changed in the meantime. Every write operation run on a document, deletes included, causes its version to be incremented. The version number of a deleted document remains available for a short time after deletion to allow for control of concurrent operations. The length of time for which a deleted document's version remains available is determined by the index.gc_deletes index setting.

Routing

If routing is used during indexing, the routing value also needs to be specified to delete a document.

If the _routing mapping is set to required and no routing value is specified, the delete API throws a RoutingMissingException and rejects the request.

For example:

DELETE /my-index-000001/_doc/1?routing=shard-1

This request deletes the document with ID 1, but it is routed based on the user. The document is not deleted if the correct routing is not specified.

Distributed

The delete operation gets hashed into a specific shard ID. It then gets redirected into the primary shard within that ID group and replicated (if needed) to shard replicas within that ID group.

Required authorization

  • Index privileges: delete

Path parameters

  • index string Required

    The name of the target index.

  • id string Required

    A unique identifier for the document.

Query parameters

  • if_primary_term number

    Only perform the operation if the document has this primary term.

  • if_seq_no number

    Only perform the operation if the document has this sequence number.

  • refresh string

    If true, Elasticsearch refreshes the affected shards to make this operation visible to search. If wait_for, it waits for a refresh to make this operation visible to search. If false, it does nothing with refreshes.

    Values are true, false, or wait_for.

  • routing string

    A custom value used to route operations to a specific shard.

  • timeout string

    The period to wait for active shards.

    This parameter is useful for situations where the primary shard assigned to perform the delete operation might not be available when the delete operation runs. Some reasons for this might be that the primary shard is currently recovering from a store or undergoing relocation. By default, the delete operation will wait on the primary shard to become available for up to 1 minute before failing and responding with an error.

    Values are -1 or 0.

  • version number

    An explicit version number for concurrency control. It must match the current version of the document for the request to succeed.

  • version_type string

    The version type.

    Values are internal, external, external_gte, or force.

  • wait_for_active_shards number | string

    The minimum number of shard copies that must be active before proceeding with the operation. You can set it to all or any positive integer up to the total number of shards in the index (number_of_replicas+1). The default value of 1 means it waits for each primary shard to be active.

    Values are all or index-setting.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • _id string Required
    • _index string Required
    • _primary_term number

      The primary term assigned to the document for the indexing operation.

    • result string Required

      Values are created, updated, deleted, not_found, or noop.

    • _seq_no number
    • _shards object Required
      Hide _shards attributes Show _shards attributes object
      • failed number Required
      • successful number Required
      • total number Required
      • failures array[object]
        Hide failures attributes Show failures attributes object
        • index string
        • node string
        • reason object Required

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Hide reason attributes Show reason attributes object
          • type string Required

            The type of error

          • reason string | null

            A human-readable explanation of the error, in English.

          • stack_trace string

            The server stack trace. Present only if the error_trace=true parameter was sent with the request.

          • caused_by object

            Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          • root_cause array[object]

            Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

            Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          • suppressed array[object]

            Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

            Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • shard number Required
        • status string
      • skipped number
    • _version number Required
    • forced_refresh boolean
DELETE /my-index-000001/_doc/1
resp = client.delete(
    index="my-index-000001",
    id="1",
)
const response = await client.delete({
  index: "my-index-000001",
  id: 1,
});
response = client.delete(
  index: "my-index-000001",
  id: "1"
)
$resp = $client->delete([
    "index" => "my-index-000001",
    "id" => "1",
]);
curl -X DELETE -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/my-index-000001/_doc/1"
Response examples (200)
A successful response from `DELETE /my-index-000001/_doc/1`, which deletes the JSON document 1 from the `my-index-000001` index.
{
  "_shards": {
    "total": 2,
    "failed": 0,
    "successful": 2
  },
  "_index": "my-index-000001",
  "_id": "1",
  "_version": 2,
  "_primary_term": 1,
  "_seq_no": 5,
  "result": "deleted"
}
























Get multiple documents Generally available

POST /_mget

Get multiple JSON documents by ID from one or more indices. If you specify an index in the request URI, you only need to specify the document IDs in the request body. To ensure fast responses, this multi get (mget) API responds with partial results if one or more shards fail.

Filter source fields

By default, the _source field is returned for every document (if stored). Use the _source and _source_include or source_exclude attributes to filter what fields are returned for a particular document. You can include the _source, _source_includes, and _source_excludes query parameters in the request URI to specify the defaults to use when there are no per-document instructions.

Get stored fields

Use the stored_fields attribute to specify the set of stored fields you want to retrieve. Any requested fields that are not stored are ignored. You can include the stored_fields query parameter in the request URI to specify the defaults to use when there are no per-document instructions.

Required authorization

  • Index privileges: read

Query parameters

  • preference string

    Specifies the node or shard the operation should be performed on. Random by default.

  • realtime boolean

    If true, the request is real-time as opposed to near-real-time.

  • refresh boolean

    If true, the request refreshes relevant shards before retrieving documents.

  • routing string

    Custom value used to route operations to a specific shard.

  • _source boolean | string | array[string]

    True or false to return the _source field or not, or a list of fields to return.

  • _source_excludes string | array[string]

    A comma-separated list of source fields to exclude from the response. You can also use this parameter to exclude fields from the subset specified in _source_includes query parameter.

  • _source_includes string | array[string]

    A comma-separated list of source fields to include in the response. If this parameter is specified, only these source fields are returned. You can exclude fields from this subset using the _source_excludes query parameter. If the _source parameter is false, this parameter is ignored.

  • stored_fields string | array[string]

    If true, retrieves the document fields stored in the index rather than the document _source.

application/json

Body Required

  • docs array[object]

    The documents you want to retrieve. Required if no index is specified in the request URI.

    Hide docs attributes Show docs attributes object
    • _id string Required
    • _index string
    • routing string
    • _source boolean | object

      Defines how to fetch a source. Fetching can be disabled entirely, or the source can be filtered.

      One of:
    • stored_fields string | array[string]
    • version number
    • version_type string

      Values are internal, external, external_gte, or force.

  • ids string | array[string]

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • docs array[object] Required

      The response includes a docs array that contains the documents in the order specified in the request. The structure of the returned documents is similar to that returned by the get API. If there is a failure getting a particular document, the error is included in place of the document.

      One of:
      Hide attributes Show attributes
      • _index string Required
      • fields object

        If the stored_fields parameter is set to true and found is true, it contains the document fields stored in the index.

        Hide fields attribute Show fields attribute object
        • * object Additional properties
      • _ignored array[string]
      • found boolean Required

        Indicates whether the document exists.

      • _id string Required
      • _primary_term number

        The primary term assigned to the document for the indexing operation.

      • _routing string

        The explicit routing, if set.

      • _seq_no number
      • _source object

        If found is true, it contains the document data formatted in JSON. If the _source parameter is set to false or the stored_fields parameter is set to true, it is excluded.

      • _version number
GET /my-index-000001/_mget
{
  "docs": [
    {
      "_id": "1"
    },
    {
      "_id": "2"
    }
  ]
}
resp = client.mget(
    index="my-index-000001",
    docs=[
        {
            "_id": "1"
        },
        {
            "_id": "2"
        }
    ],
)
const response = await client.mget({
  index: "my-index-000001",
  docs: [
    {
      _id: "1",
    },
    {
      _id: "2",
    },
  ],
});
response = client.mget(
  index: "my-index-000001",
  body: {
    "docs": [
      {
        "_id": "1"
      },
      {
        "_id": "2"
      }
    ]
  }
)
$resp = $client->mget([
    "index" => "my-index-000001",
    "body" => [
        "docs" => array(
            [
                "_id" => "1",
            ],
            [
                "_id" => "2",
            ],
        ),
    ],
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"docs":[{"_id":"1"},{"_id":"2"}]}' "$ELASTICSEARCH_URL/my-index-000001/_mget"
Run `GET /my-index-000001/_mget`. When you specify an index in the request URI, only the document IDs are required in the request body.
{
  "docs": [
    {
      "_id": "1"
    },
    {
      "_id": "2"
    }
  ]
}
Run `GET /_mget`. This request sets `_source` to `false` for document 1 to exclude the source entirely. It retrieves `field3` and `field4` from document 2. It retrieves the `user` field from document 3 but filters out the `user.location` field.
{
  "docs": [
    {
      "_index": "test",
      "_id": "1",
      "_source": false
    },
    {
      "_index": "test",
      "_id": "2",
      "_source": [ "field3", "field4" ]
    },
    {
      "_index": "test",
      "_id": "3",
      "_source": {
        "include": [ "user" ],
        "exclude": [ "user.location" ]
      }
    }
  ]
}
Run `GET /_mget`. This request retrieves `field1` and `field2` from document 1 and `field3` and `field4` from document 2.
{
  "docs": [
    {
      "_index": "test",
      "_id": "1",
      "stored_fields": [ "field1", "field2" ]
    },
    {
      "_index": "test",
      "_id": "2",
      "stored_fields": [ "field3", "field4" ]
    }
  ]
}
Run `GET /_mget?routing=key1`. If routing is used during indexing, you need to specify the routing value to retrieve documents. This request fetches `test/_doc/2` from the shard corresponding to routing key `key1`. It fetches `test/_doc/1` from the shard corresponding to routing key `key2`.
{
  "docs": [
    {
      "_index": "test",
      "_id": "1",
      "routing": "key2"
    },
    {
      "_index": "test",
      "_id": "2"
    }
  ]
}




















Get multiple term vectors Generally available

POST /{index}/_mtermvectors

Get multiple term vectors with a single request. You can specify existing documents by index and ID or provide artificial documents in the body of the request. You can specify the index in the request body or request URI. The response contains a docs array with all the fetched termvectors. Each element has the structure provided by the termvectors API.

Artificial documents

You can also use mtermvectors to generate term vectors for artificial documents provided in the body of the request. The mapping used is determined by the specified _index.

Required authorization

  • Index privileges: read

Path parameters

  • index string Required

    The name of the index that contains the documents.

Query parameters

  • ids array[string]

    A comma-separated list of documents ids. You must define ids as parameter or set "ids" or "docs" in the request body

  • fields string | array[string]

    A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.

  • field_statistics boolean

    If true, the response includes the document count, sum of document frequencies, and sum of total term frequencies.

  • offsets boolean

    If true, the response includes term offsets.

  • payloads boolean

    If true, the response includes term payloads.

  • positions boolean

    If true, the response includes term positions.

  • preference string

    The node or shard the operation should be performed on. It is random by default.

  • realtime boolean

    If true, the request is real-time as opposed to near-real-time.

  • routing string

    A custom value used to route operations to a specific shard.

  • term_statistics boolean

    If true, the response includes term frequency and document frequency.

  • version number

    If true, returns the document version as part of a hit.

  • version_type string

    The version type.

    Values are internal, external, external_gte, or force.

application/json

Body

  • docs array[object]

    An array of existing or artificial documents.

    Hide docs attributes Show docs attributes object
    • _id string
    • _index string
    • doc object

      An artificial document (a document not present in the index) for which you want to retrieve term vectors.

    • fields string | array[string]
    • field_statistics boolean

      If true, the response includes the document count, sum of document frequencies, and sum of total term frequencies.

    • filter object
      Hide filter attributes Show filter attributes object
      • max_doc_freq number

        Ignore words which occur in more than this many docs. Defaults to unbounded.

      • max_num_terms number

        The maximum number of terms that must be returned per field.

      • max_term_freq number

        Ignore words with more than this frequency in the source doc. It defaults to unbounded.

      • max_word_length number

        The maximum word length above which words will be ignored. Defaults to unbounded.

      • min_doc_freq number

        Ignore terms which do not occur in at least this many docs.

      • min_term_freq number

        Ignore words with less than this frequency in the source doc.

      • min_word_length number

        The minimum word length below which words will be ignored.

    • offsets boolean

      If true, the response includes term offsets.

    • payloads boolean

      If true, the response includes term payloads.

    • positions boolean

      If true, the response includes term positions.

    • routing string
    • term_statistics boolean

      If true, the response includes term frequency and document frequency.

    • version number
    • version_type string

      Values are internal, external, external_gte, or force.

  • ids array[string]

    A simplified syntax to specify documents by their ID if they're in the same index.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • docs array[object] Required
      Hide docs attributes Show docs attributes object
      • _id string
      • _index string Required
      • _version number
      • took number
      • found boolean
      • term_vectors object
        Hide term_vectors attribute Show term_vectors attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • field_statistics object
            Hide field_statistics attributes Show field_statistics attributes object
            • doc_count number Required
            • sum_doc_freq number Required
            • sum_ttf number Required
          • terms object Required
            Hide terms attribute Show terms attribute object
            • * object Additional properties
      • error object

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        Hide error attributes Show error attributes object
        • type string Required

          The type of error

        • reason string | null

          A human-readable explanation of the error, in English.

        • stack_trace string

          The server stack trace. Present only if the error_trace=true parameter was sent with the request.

        • caused_by object

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • root_cause array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • suppressed array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

POST /{index}/_mtermvectors
POST /my-index-000001/_mtermvectors
{
  "docs": [
      {
        "_id": "2",
        "fields": [
            "message"
        ],
        "term_statistics": true
      },
      {
        "_id": "1"
      }
  ]
}
resp = client.mtermvectors(
    index="my-index-000001",
    docs=[
        {
            "_id": "2",
            "fields": [
                "message"
            ],
            "term_statistics": True
        },
        {
            "_id": "1"
        }
    ],
)
const response = await client.mtermvectors({
  index: "my-index-000001",
  docs: [
    {
      _id: "2",
      fields: ["message"],
      term_statistics: true,
    },
    {
      _id: "1",
    },
  ],
});
response = client.mtermvectors(
  index: "my-index-000001",
  body: {
    "docs": [
      {
        "_id": "2",
        "fields": [
          "message"
        ],
        "term_statistics": true
      },
      {
        "_id": "1"
      }
    ]
  }
)
$resp = $client->mtermvectors([
    "index" => "my-index-000001",
    "body" => [
        "docs" => array(
            [
                "_id" => "2",
                "fields" => array(
                    "message",
                ),
                "term_statistics" => true,
            ],
            [
                "_id" => "1",
            ],
        ),
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"docs":[{"_id":"2","fields":["message"],"term_statistics":true},{"_id":"1"}]}' "$ELASTICSEARCH_URL/my-index-000001/_mtermvectors"
Run `POST /my-index-000001/_mtermvectors`. When you specify an index in the request URI, the index does not need to be specified for each documents in the request body.
{
  "docs": [
      {
        "_id": "2",
        "fields": [
            "message"
        ],
        "term_statistics": true
      },
      {
        "_id": "1"
      }
  ]
}
Run `POST /my-index-000001/_mtermvectors`. If all requested documents are in same index and the parameters are the same, you can use a simplified syntax.
{
  "ids": [ "1", "2" ],
  "fields": [
    "message"
  ],
  "term_statistics": true
}
Run `POST /_mtermvectors` to generate term vectors for artificial documents provided in the body of the request. The mapping used is determined by the specified `_index`.
{
  "docs": [
      {
        "_index": "my-index-000001",
        "doc" : {
            "message" : "test test test"
        }
      },
      {
        "_index": "my-index-000001",
        "doc" : {
          "message" : "Another test ..."
        }
      }
  ]
}




Get term vector information Generally available

GET /{index}/_termvectors/{id}

Get information and statistics about terms in the fields of a particular document.

You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request. You can specify the fields you are interested in through the fields parameter or by adding the fields to the request body. For example:

GET /my-index-000001/_termvectors/1?fields=message

Fields can be specified using wildcards, similar to the multi match query.

Term vectors are real-time by default, not near real-time. This can be changed by setting realtime parameter to false.

You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.

Term information

  • term frequency in the field (always returned)
  • term positions (positions: true)
  • start and end offsets (offsets: true)
  • term payloads (payloads: true), as base64 encoded bytes

If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.


Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.

Behaviour

The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected. Use routing only to hit a particular shard. Refer to the linked documentation for detailed examples of how to use this API.

Required authorization

  • Index privileges: read
External documentation

Path parameters

  • index string Required

    The name of the index that contains the document.

  • id string Required

    A unique identifier for the document.

Query parameters

  • fields string | array[string]

    A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.

  • field_statistics boolean

    If true, the response includes:

    • The document count (how many documents contain this field).
    • The sum of document frequencies (the sum of document frequencies for all terms in this field).
    • The sum of total term frequencies (the sum of total term frequencies of each term in this field).
  • offsets boolean

    If true, the response includes term offsets.

  • payloads boolean

    If true, the response includes term payloads.

  • positions boolean

    If true, the response includes term positions.

  • preference string

    The node or shard the operation should be performed on. It is random by default.

  • realtime boolean

    If true, the request is real-time as opposed to near-real-time.

  • routing string

    A custom value that is used to route operations to a specific shard.

  • term_statistics boolean

    If true, the response includes:

    • The total term frequency (how often a term occurs in all documents).
    • The document frequency (the number of documents containing the current term).

    By default these values are not returned since term statistics can have a serious performance impact.

  • version number

    If true, returns the document version as part of a hit.

  • version_type string

    The version type.

    Values are internal, external, external_gte, or force.

application/json

Body

  • doc object

    An artificial document (a document not present in the index) for which you want to retrieve term vectors.

  • filter object
    Hide filter attributes Show filter attributes object
    • max_doc_freq number

      Ignore words which occur in more than this many docs. Defaults to unbounded.

    • max_num_terms number

      The maximum number of terms that must be returned per field.

    • max_term_freq number

      Ignore words with more than this frequency in the source doc. It defaults to unbounded.

    • max_word_length number

      The maximum word length above which words will be ignored. Defaults to unbounded.

    • min_doc_freq number

      Ignore terms which do not occur in at least this many docs.

    • min_term_freq number

      Ignore words with less than this frequency in the source doc.

    • min_word_length number

      The minimum word length below which words will be ignored.

  • per_field_analyzer object

    Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.

    Hide per_field_analyzer attribute Show per_field_analyzer attribute object
    • * string Additional properties
  • fields string | array[string]
  • field_statistics boolean

    If true, the response includes:

    • The document count (how many documents contain this field).
    • The sum of document frequencies (the sum of document frequencies for all terms in this field).
    • The sum of total term frequencies (the sum of total term frequencies of each term in this field).
  • offsets boolean

    If true, the response includes term offsets.

  • payloads boolean

    If true, the response includes term payloads.

  • positions boolean

    If true, the response includes term positions.

  • term_statistics boolean

    If true, the response includes:

    • The total term frequency (how often a term occurs in all documents).
    • The document frequency (the number of documents containing the current term).

    By default these values are not returned since term statistics can have a serious performance impact.

  • routing string
  • version number
  • version_type string

    Values are internal, external, external_gte, or force.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • found boolean Required
    • _id string
    • _index string Required
    • term_vectors object
      Hide term_vectors attribute Show term_vectors attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • field_statistics object
          Hide field_statistics attributes Show field_statistics attributes object
          • doc_count number Required
          • sum_doc_freq number Required
          • sum_ttf number Required
        • terms object Required
          Hide terms attribute Show terms attribute object
          • * object Additional properties
            Hide * attributes Show * attributes object
            • doc_freq number
            • score number
            • term_freq number Required
            • tokens array[object]
            • ttf number
    • took number Required
    • _version number Required
GET /{index}/_termvectors/{id}
GET /my-index-000001/_termvectors/1
{
  "fields" : ["text"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}
resp = client.termvectors(
    index="my-index-000001",
    id="1",
    fields=[
        "text"
    ],
    offsets=True,
    payloads=True,
    positions=True,
    term_statistics=True,
    field_statistics=True,
)
const response = await client.termvectors({
  index: "my-index-000001",
  id: 1,
  fields: ["text"],
  offsets: true,
  payloads: true,
  positions: true,
  term_statistics: true,
  field_statistics: true,
});
response = client.termvectors(
  index: "my-index-000001",
  id: "1",
  body: {
    "fields": [
      "text"
    ],
    "offsets": true,
    "payloads": true,
    "positions": true,
    "term_statistics": true,
    "field_statistics": true
  }
)
$resp = $client->termvectors([
    "index" => "my-index-000001",
    "id" => "1",
    "body" => [
        "fields" => array(
            "text",
        ),
        "offsets" => true,
        "payloads" => true,
        "positions" => true,
        "term_statistics" => true,
        "field_statistics" => true,
    ],
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"fields":["text"],"offsets":true,"payloads":true,"positions":true,"term_statistics":true,"field_statistics":true}' "$ELASTICSEARCH_URL/my-index-000001/_termvectors/1"
Run `GET /my-index-000001/_termvectors/1` to return all information and statistics for field `text` in document 1.
{
  "fields" : ["text"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}
Run `GET /my-index-000001/_termvectors/1` to set per-field analyzers. A different analyzer than the one at the field may be provided by using the `per_field_analyzer` parameter.
{
  "doc" : {
    "fullname" : "John Doe",
    "text" : "test test test"
  },
  "fields": ["fullname"],
  "per_field_analyzer" : {
    "fullname": "keyword"
  }
}
Run `GET /imdb/_termvectors` to filter the terms returned based on their tf-idf scores. It returns the three most "interesting" keywords from the artificial document having the given "plot" field value. Notice that the keyword "Tony" or any stop words are not part of the response, as their tf-idf must be too low.
{
  "doc": {
    "plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil."
  },
  "term_statistics": true,
  "field_statistics": true,
  "positions": false,
  "offsets": false,
  "filter": {
    "max_num_terms": 3,
    "min_term_freq": 1,
    "min_doc_freq": 1
  }
}
Run `GET /my-index-000001/_termvectors/1`. Term vectors which are not explicitly stored in the index are automatically computed on the fly. This request returns all information and statistics for the fields in document 1, even though the terms haven't been explicitly stored in the index. Note that for the field text, the terms are not regenerated.
{
  "fields" : ["text", "some_field_without_term_vectors"],
  "offsets" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}
Run `GET /my-index-000001/_termvectors`. Term vectors can be generated for artificial documents, that is for documents not present in the index. If dynamic mapping is turned on (default), the document fields not in the original mapping will be dynamically created.
{
  "doc" : {
    "fullname" : "John Doe",
    "text" : "test test test"
  }
}
Response examples (200)
A successful response from `GET /my-index-000001/_termvectors/1`.
{
  "_index": "my-index-000001",
  "_id": "1",
  "_version": 1,
  "found": true,
  "took": 6,
  "term_vectors": {
    "text": {
      "field_statistics": {
        "sum_doc_freq": 4,
        "doc_count": 2,
        "sum_ttf": 6
      },
      "terms": {
        "test": {
          "doc_freq": 2,
          "ttf": 4,
          "term_freq": 3,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4,
              "payload": "d29yZA=="
            },
            {
              "position": 1,
              "start_offset": 5,
              "end_offset": 9,
              "payload": "d29yZA=="
            },
            {
              "position": 2,
              "start_offset": 10,
              "end_offset": 14,
              "payload": "d29yZA=="
            }
          ]
        }
      }
    }
  }
}
A successful response from `GET /my-index-000001/_termvectors` with `per_field_analyzer` in the request body.
{
  "_index": "my-index-000001",
  "_version": 0,
  "found": true,
  "took": 6,
  "term_vectors": {
    "fullname": {
      "field_statistics": {
          "sum_doc_freq": 2,
          "doc_count": 4,
          "sum_ttf": 4
      },
      "terms": {
          "John Doe": {
            "term_freq": 1,
            "tokens": [
                {
                  "position": 0,
                  "start_offset": 0,
                  "end_offset": 8
                }
            ]
          }
      }
    }
  }
}
A successful response from `GET /my-index-000001/_termvectors` with a `filter` in the request body.
{
  "_index": "imdb",
  "_version": 0,
  "found": true,
  "term_vectors": {
      "plot": {
        "field_statistics": {
            "sum_doc_freq": 3384269,
            "doc_count": 176214,
            "sum_ttf": 3753460
        },
        "terms": {
            "armored": {
              "doc_freq": 27,
              "ttf": 27,
              "term_freq": 1,
              "score": 9.74725
            },
            "industrialist": {
              "doc_freq": 88,
              "ttf": 88,
              "term_freq": 1,
              "score": 8.590818
            },
            "stark": {
              "doc_freq": 44,
              "ttf": 47,
              "term_freq": 1,
              "score": 9.272792
            }
        }
      }
  }
}








Get term vector information Generally available

POST /{index}/_termvectors

Get information and statistics about terms in the fields of a particular document.

You can retrieve term vectors for documents stored in the index or for artificial documents passed in the body of the request. You can specify the fields you are interested in through the fields parameter or by adding the fields to the request body. For example:

GET /my-index-000001/_termvectors/1?fields=message

Fields can be specified using wildcards, similar to the multi match query.

Term vectors are real-time by default, not near real-time. This can be changed by setting realtime parameter to false.

You can request three types of values: term information, term statistics, and field statistics. By default, all term information and field statistics are returned for all fields but term statistics are excluded.

Term information

  • term frequency in the field (always returned)
  • term positions (positions: true)
  • start and end offsets (offsets: true)
  • term payloads (payloads: true), as base64 encoded bytes

If the requested information wasn't stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.


Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16.

Behaviour

The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected. Use routing only to hit a particular shard. Refer to the linked documentation for detailed examples of how to use this API.

Required authorization

  • Index privileges: read
External documentation

Path parameters

  • index string Required

    The name of the index that contains the document.

Query parameters

  • fields string | array[string]

    A comma-separated list or wildcard expressions of fields to include in the statistics. It is used as the default list unless a specific field list is provided in the completion_fields or fielddata_fields parameters.

  • field_statistics boolean

    If true, the response includes:

    • The document count (how many documents contain this field).
    • The sum of document frequencies (the sum of document frequencies for all terms in this field).
    • The sum of total term frequencies (the sum of total term frequencies of each term in this field).
  • offsets boolean

    If true, the response includes term offsets.

  • payloads boolean

    If true, the response includes term payloads.

  • positions boolean

    If true, the response includes term positions.

  • preference string

    The node or shard the operation should be performed on. It is random by default.

  • realtime boolean

    If true, the request is real-time as opposed to near-real-time.

  • routing string

    A custom value that is used to route operations to a specific shard.

  • term_statistics boolean

    If true, the response includes:

    • The total term frequency (how often a term occurs in all documents).
    • The document frequency (the number of documents containing the current term).

    By default these values are not returned since term statistics can have a serious performance impact.

  • version number

    If true, returns the document version as part of a hit.

  • version_type string

    The version type.

    Values are internal, external, external_gte, or force.

application/json

Body

  • doc object

    An artificial document (a document not present in the index) for which you want to retrieve term vectors.

  • filter object
    Hide filter attributes Show filter attributes object
    • max_doc_freq number

      Ignore words which occur in more than this many docs. Defaults to unbounded.

    • max_num_terms number

      The maximum number of terms that must be returned per field.

    • max_term_freq number

      Ignore words with more than this frequency in the source doc. It defaults to unbounded.

    • max_word_length number

      The maximum word length above which words will be ignored. Defaults to unbounded.

    • min_doc_freq number

      Ignore terms which do not occur in at least this many docs.

    • min_term_freq number

      Ignore words with less than this frequency in the source doc.

    • min_word_length number

      The minimum word length below which words will be ignored.

  • per_field_analyzer object

    Override the default per-field analyzer. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be regenerated.

    Hide per_field_analyzer attribute Show per_field_analyzer attribute object
    • * string Additional properties
  • fields string | array[string]
  • field_statistics boolean

    If true, the response includes:

    • The document count (how many documents contain this field).
    • The sum of document frequencies (the sum of document frequencies for all terms in this field).
    • The sum of total term frequencies (the sum of total term frequencies of each term in this field).
  • offsets boolean

    If true, the response includes term offsets.

  • payloads boolean

    If true, the response includes term payloads.

  • positions boolean

    If true, the response includes term positions.

  • term_statistics boolean

    If true, the response includes:

    • The total term frequency (how often a term occurs in all documents).
    • The document frequency (the number of documents containing the current term).

    By default these values are not returned since term statistics can have a serious performance impact.

  • routing string
  • version number
  • version_type string

    Values are internal, external, external_gte, or force.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • found boolean Required
    • _id string
    • _index string Required
    • term_vectors object
      Hide term_vectors attribute Show term_vectors attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • field_statistics object
          Hide field_statistics attributes Show field_statistics attributes object
          • doc_count number Required
          • sum_doc_freq number Required
          • sum_ttf number Required
        • terms object Required
          Hide terms attribute Show terms attribute object
          • * object Additional properties
            Hide * attributes Show * attributes object
            • doc_freq number
            • score number
            • term_freq number Required
            • tokens array[object]
            • ttf number
    • took number Required
    • _version number Required
GET /my-index-000001/_termvectors/1
{
  "fields" : ["text"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}
resp = client.termvectors(
    index="my-index-000001",
    id="1",
    fields=[
        "text"
    ],
    offsets=True,
    payloads=True,
    positions=True,
    term_statistics=True,
    field_statistics=True,
)
const response = await client.termvectors({
  index: "my-index-000001",
  id: 1,
  fields: ["text"],
  offsets: true,
  payloads: true,
  positions: true,
  term_statistics: true,
  field_statistics: true,
});
response = client.termvectors(
  index: "my-index-000001",
  id: "1",
  body: {
    "fields": [
      "text"
    ],
    "offsets": true,
    "payloads": true,
    "positions": true,
    "term_statistics": true,
    "field_statistics": true
  }
)
$resp = $client->termvectors([
    "index" => "my-index-000001",
    "id" => "1",
    "body" => [
        "fields" => array(
            "text",
        ),
        "offsets" => true,
        "payloads" => true,
        "positions" => true,
        "term_statistics" => true,
        "field_statistics" => true,
    ],
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"fields":["text"],"offsets":true,"payloads":true,"positions":true,"term_statistics":true,"field_statistics":true}' "$ELASTICSEARCH_URL/my-index-000001/_termvectors/1"
Run `GET /my-index-000001/_termvectors/1` to return all information and statistics for field `text` in document 1.
{
  "fields" : ["text"],
  "offsets" : true,
  "payloads" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}
Run `GET /my-index-000001/_termvectors/1` to set per-field analyzers. A different analyzer than the one at the field may be provided by using the `per_field_analyzer` parameter.
{
  "doc" : {
    "fullname" : "John Doe",
    "text" : "test test test"
  },
  "fields": ["fullname"],
  "per_field_analyzer" : {
    "fullname": "keyword"
  }
}
Run `GET /imdb/_termvectors` to filter the terms returned based on their tf-idf scores. It returns the three most "interesting" keywords from the artificial document having the given "plot" field value. Notice that the keyword "Tony" or any stop words are not part of the response, as their tf-idf must be too low.
{
  "doc": {
    "plot": "When wealthy industrialist Tony Stark is forced to build an armored suit after a life-threatening incident, he ultimately decides to use its technology to fight against evil."
  },
  "term_statistics": true,
  "field_statistics": true,
  "positions": false,
  "offsets": false,
  "filter": {
    "max_num_terms": 3,
    "min_term_freq": 1,
    "min_doc_freq": 1
  }
}
Run `GET /my-index-000001/_termvectors/1`. Term vectors which are not explicitly stored in the index are automatically computed on the fly. This request returns all information and statistics for the fields in document 1, even though the terms haven't been explicitly stored in the index. Note that for the field text, the terms are not regenerated.
{
  "fields" : ["text", "some_field_without_term_vectors"],
  "offsets" : true,
  "positions" : true,
  "term_statistics" : true,
  "field_statistics" : true
}
Run `GET /my-index-000001/_termvectors`. Term vectors can be generated for artificial documents, that is for documents not present in the index. If dynamic mapping is turned on (default), the document fields not in the original mapping will be dynamically created.
{
  "doc" : {
    "fullname" : "John Doe",
    "text" : "test test test"
  }
}
Response examples (200)
A successful response from `GET /my-index-000001/_termvectors/1`.
{
  "_index": "my-index-000001",
  "_id": "1",
  "_version": 1,
  "found": true,
  "took": 6,
  "term_vectors": {
    "text": {
      "field_statistics": {
        "sum_doc_freq": 4,
        "doc_count": 2,
        "sum_ttf": 6
      },
      "terms": {
        "test": {
          "doc_freq": 2,
          "ttf": 4,
          "term_freq": 3,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4,
              "payload": "d29yZA=="
            },
            {
              "position": 1,
              "start_offset": 5,
              "end_offset": 9,
              "payload": "d29yZA=="
            },
            {
              "position": 2,
              "start_offset": 10,
              "end_offset": 14,
              "payload": "d29yZA=="
            }
          ]
        }
      }
    }
  }
}
A successful response from `GET /my-index-000001/_termvectors` with `per_field_analyzer` in the request body.
{
  "_index": "my-index-000001",
  "_version": 0,
  "found": true,
  "took": 6,
  "term_vectors": {
    "fullname": {
      "field_statistics": {
          "sum_doc_freq": 2,
          "doc_count": 4,
          "sum_ttf": 4
      },
      "terms": {
          "John Doe": {
            "term_freq": 1,
            "tokens": [
                {
                  "position": 0,
                  "start_offset": 0,
                  "end_offset": 8
                }
            ]
          }
      }
    }
  }
}
A successful response from `GET /my-index-000001/_termvectors` with a `filter` in the request body.
{
  "_index": "imdb",
  "_version": 0,
  "found": true,
  "term_vectors": {
      "plot": {
        "field_statistics": {
            "sum_doc_freq": 3384269,
            "doc_count": 176214,
            "sum_ttf": 3753460
        },
        "terms": {
            "armored": {
              "doc_freq": 27,
              "ttf": 27,
              "term_freq": 1,
              "score": 9.74725
            },
            "industrialist": {
              "doc_freq": 88,
              "ttf": 88,
              "term_freq": 1,
              "score": 8.590818
            },
            "stark": {
              "doc_freq": 44,
              "ttf": 47,
              "term_freq": 1,
              "score": 9.272792
            }
        }
      }
  }
}














































Get EQL search results Generally available

POST /{index}/_eql/search

Returns search results for an Event Query Language (EQL) query. EQL assumes each document in a data stream or index corresponds to an event.

External documentation

Path parameters

  • index string | array[string] Required

    The name of the index to scope the operation

Query parameters

  • allow_no_indices boolean
  • allow_partial_search_results boolean

    If true, returns partial results if there are shard failures. If false, returns an error with no partial results.

  • allow_partial_sequence_results boolean

    If true, sequence queries will return partial results in case of shard failures. If false, they will return no results at all. This flag has effect only if allow_partial_search_results is true.

  • expand_wildcards string | array[string]

    Values are all, open, closed, hidden, or none.

  • ignore_unavailable boolean

    If true, missing or closed indices are not included in the response.

  • keep_alive string

    Period for which the search and its results are stored on the cluster.

    Values are -1 or 0.

  • keep_on_completion boolean

    If true, the search and its results are stored on the cluster.

  • wait_for_completion_timeout string

    Timeout duration to wait for the request to finish. Defaults to no timeout, meaning the request waits for complete search results.

    Values are -1 or 0.

application/json

Body Required

  • query string Required

    EQL query you wish to run.

  • case_sensitive boolean
  • event_category_field string

    Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • tiebreaker_field string

    Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • timestamp_field string

    Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • fetch_size number
  • filter object | array[object]

    Query, written in Query DSL, used to filter the events on which the EQL query runs.

    One of:

    An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

    External documentation
  • keep_alive string

    A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

  • keep_on_completion boolean
  • wait_for_completion_timeout string

    A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

  • allow_partial_search_results boolean

    Allow query execution also in case of shard failures. If true, the query will keep running and will return results based on the available shards. For sequences, the behavior can be further refined using allow_partial_sequence_results

  • allow_partial_sequence_results boolean

    This flag applies only to sequences and has effect only if allow_partial_search_results=true. If true, the sequence query will return results based on the available shards, ignoring the others. If false, the sequence query will return successfully, but will always have empty results.

  • size number
  • fields object | array[object]

    Array of wildcard (*) patterns. The response returns values for field names matching these patterns in the fields property of each hit.

    One of:

    A reference to a field with formatting instructions on how to return the value

    Hide attributes Show attributes
    • field string Required

      Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

    • format string

      The format in which the values are returned.

    • include_unmapped boolean
  • result_position string

    Values are tail or head.

  • runtime_mappings object
    Hide runtime_mappings attribute Show runtime_mappings attribute object
    • * object Additional properties
      Hide * attributes Show * attributes object
      • fields object

        For type composite

        Hide fields attribute Show fields attribute object
        • * object Additional properties
          Hide * attribute Show * attribute object
          • type string Required

            Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

      • fetch_fields array[object]

        For type lookup

        Hide fetch_fields attributes Show fetch_fields attributes object
        • field string Required

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • format string
      • format string

        A custom format for date type runtime fields.

      • input_field string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • target_field string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • target_index string
      • script object
        Hide script attributes Show script attributes object
        • source string | object

          One of:
        • id string
        • params object

          Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

          Hide params attribute Show params attribute object
          • * object Additional properties
        • lang string

          Any of:

          Values are painless, expression, mustache, or java.

        • options object
          Hide options attribute Show options attribute object
          • * string Additional properties
      • type string Required

        Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

  • max_samples_per_key number

    By default, the response of a sample query contains up to 10 samples, with one sample per unique set of join keys. Use the size parameter to get a smaller or larger set of samples. To retrieve more than one sample per set of join keys, use the max_samples_per_key parameter. Pipes are not supported for sample queries.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • id string
    • is_partial boolean

      If true, the response does not contain complete search results.

    • is_running boolean

      If true, the search request is still executing.

    • took number

      Time unit for milliseconds

    • timed_out boolean

      If true, the request timed out before completion.

    • hits object Required
      Hide hits attributes Show hits attributes object
      • total object
        Hide total attributes Show total attributes object
        • relation string Required

          Values are eq or gte.

        • value number Required
      • events array[object]

        Contains events matching the query. Each object represents a matching event.

        Hide events attributes Show events attributes object
        • _index string Required
        • _id string Required
        • _source object Required

          Original JSON body passed for the event at index time.

        • missing boolean

          Set to true for events in a timespan-constrained sequence that do not meet a given condition.

        • fields object
          Hide fields attribute Show fields attribute object
          • * array[object] Additional properties
      • sequences array[object]

        Contains event sequences matching the query. Each object represents a matching sequence. This parameter is only returned for EQL queries containing a sequence.

        Hide sequences attributes Show sequences attributes object
        • events array[object] Required

          Contains events matching the query. Each object represents a matching event.

          Hide events attributes Show events attributes object
          • _index string Required
          • _id string Required
          • _source object Required

            Original JSON body passed for the event at index time.

          • missing boolean

            Set to true for events in a timespan-constrained sequence that do not meet a given condition.

          • fields object
        • join_keys array[object]

          Shared field values used to constrain matches in the sequence. These are defined using the by keyword in the EQL query syntax.

    • shard_failures array[object]

      Contains information about shard failures (if any), in case allow_partial_search_results=true

      Hide shard_failures attributes Show shard_failures attributes object
      • index string
      • node string
      • reason object Required

        Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        Hide reason attributes Show reason attributes object
        • type string Required

          The type of error

        • reason string | null

          A human-readable explanation of the error, in English.

        • stack_trace string

          The server stack trace. Present only if the error_trace=true parameter was sent with the request.

        • caused_by object

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • root_cause array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

        • suppressed array[object]

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

          Cause and details about a request failure. This class defines the properties common to all error types. Additional details are also provided, that depend on the error type.

      • shard number Required
      • status string
GET /my-data-stream/_eql/search
{
  "query": """
    process where (process.name == "cmd.exe" and process.pid != 2013)
  """
}
resp = client.eql.search(
    index="my-data-stream",
    query="\n    process where (process.name == \"cmd.exe\" and process.pid != 2013)\n  ",
)
const response = await client.eql.search({
  index: "my-data-stream",
  query:
    '\n    process where (process.name == "cmd.exe" and process.pid != 2013)\n  ',
});
response = client.eql.search(
  index: "my-data-stream",
  body: {
    "query": "\n    process where (process.name == \"cmd.exe\" and process.pid != 2013)\n  "
  }
)
$resp = $client->eql()->search([
    "index" => "my-data-stream",
    "body" => [
        "query" => "\n    process where (process.name == \"cmd.exe\" and process.pid != 2013)\n  ",
    ],
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"query":"\n    process where (process.name == \"cmd.exe\" and process.pid != 2013)\n  "}' "$ELASTICSEARCH_URL/my-data-stream/_eql/search"
Request examples
Run `GET /my-data-stream/_eql/search` to search for events that have a `process.name` of `cmd.exe` and a `process.pid` other than `2013`.
{
  "query": """
    process where (process.name == "cmd.exe" and process.pid != 2013)
  """
}
Run `GET /my-data-stream/_eql/search` to search for a sequence of events. The sequence starts with an event with an `event.category` of `file`, a `file.name` of `cmd.exe`, and a `process.pid` other than `2013`. It is followed by an event with an `event.category` of `process` and a `process.executable` that contains the substring `regsvr32`. These events must also share the same `process.pid` value.
{
  "query": """
    sequence by process.pid
      [ file where file.name == "cmd.exe" and process.pid != 2013 ]
      [ process where stringContains(process.executable, "regsvr32") ]
  """
}
Response examples (200)
{
  "is_partial": false,
  "is_running": false,
  "took": 6,
  "timed_out": false,
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "sequences": [
      {
        "join_keys": [
          2012
        ],
        "events": [
          {
            "_index": ".ds-my-data-stream-2099.12.07-000001",
            "_id": "AtOJ4UjUBAAx3XR5kcCM",
            "_source": {
              "@timestamp": "2099-12-06T11:04:07.000Z",
              "event": {
                "category": "file",
                "id": "dGCHwoeS",
                "sequence": 2
              },
              "file": {
                "accessed": "2099-12-07T11:07:08.000Z",
                "name": "cmd.exe",
                "path": "C:\\Windows\\System32\\cmd.exe",
                "type": "file",
                "size": 16384
              },
              "process": {
                "pid": 2012,
                "name": "cmd.exe",
                "executable": "C:\\Windows\\System32\\cmd.exe"
              }
            }
          },
          {
            "_index": ".ds-my-data-stream-2099.12.07-000001",
            "_id": "OQmfCaduce8zoHT93o4H",
            "_source": {
              "@timestamp": "2099-12-07T11:07:09.000Z",
              "event": {
                "category": "process",
                "id": "aR3NWVOs",
                "sequence": 4
              },
              "process": {
                "pid": 2012,
                "name": "regsvr32.exe",
                "command_line": "regsvr32.exe  /s /u /i:https://...RegSvr32.sct scrobj.dll",
                "executable": "C:\\Windows\\System32\\regsvr32.exe"
              }
            }
          }
        ]
      }
    ]
  }
}

Get a specific running ES|QL query information Technical preview

GET /_query/queries/{id}

Returns an object extended information about a running ES|QL query.

Required authorization

  • Cluster privileges: monitor_esql

Path parameters

  • id string Required

    The query ID

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • id number Required
    • node string Required
    • start_time_millis number Required
    • running_time_nanos number Required
    • query string Required
    • coordinating_node string Required
    • data_nodes array[string] Required
GET /_query/queries/{id}
curl \
 --request GET 'http://api.example.com/_query/queries/{id}' \
 --header "Authorization: $API_KEY"


































Check component templates Generally available

HEAD /_component_template/{name}

Returns information about whether a particular component template exists.

Path parameters

  • name string | array[string] Required

    Comma-separated list of component template names used to limit the request. Wildcard (*) expressions are supported.

Query parameters

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

  • local boolean

    If true, the request retrieves information from the local node only. Defaults to false, which means information is retrieved from the master node.

Responses

  • 200 application/json
HEAD /_component_template/{name}
curl \
 --request HEAD 'http://api.example.com/_component_template/{name}' \
 --header "Authorization: $API_KEY"












Get tokens from text analysis Generally available

POST /_analyze

The analyze API performs analysis on a text string and returns the resulting tokens.

Generating excessive amount of tokens may cause a node to run out of memory. The index.analyze.max_token_count setting enables you to limit the number of tokens that can be produced. If more than this limit of tokens gets generated, an error occurs. The _analyze endpoint without a specified index will always use 10000 as its limit.

Required authorization

  • Index privileges: index
External documentation

Query parameters

  • index string

    Index used to derive the analyzer. If specified, the analyzer or field parameter overrides this value. If no index is specified or the index does not have a default analyzer, the analyze API uses the standard analyzer.

application/json

Body

  • analyzer string

    The name of the analyzer that should be applied to the provided text. This could be a built-in analyzer, or an analyzer that’s been configured in the index.

  • attributes array[string]

    Array of token attributes used to filter the output of the explain parameter.

  • char_filter array

    Array of character filters used to preprocess characters before the tokenizer.

    External documentation
  • explain boolean

    If true, the response includes token attributes and additional details.

  • field string

    Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • filter array

    Array of token filters used to apply after the tokenizer.

    External documentation
  • normalizer string

    Normalizer to use to convert text into a single token.

  • text string | array[string]

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • detail object
      Hide detail attributes Show detail attributes object
      • analyzer object
        Hide analyzer attributes Show analyzer attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
      • charfilters array[object]
        Hide charfilters attributes Show charfilters attributes object
        • filtered_text array[string] Required
        • name string Required
      • custom_analyzer boolean Required
      • tokenfilters array[object]
        Hide tokenfilters attributes Show tokenfilters attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
      • tokenizer object
        Hide tokenizer attributes Show tokenizer attributes object
        • name string Required
        • tokens array[object] Required
          Hide tokens attributes Show tokens attributes object
          • bytes string Required
          • end_offset number Required
          • keyword boolean
          • position number Required
          • positionLength number Required
          • start_offset number Required
          • termFrequency number Required
          • token string Required
          • type string Required
    • tokens array[object]
      Hide tokens attributes Show tokens attributes object
      • end_offset number Required
      • position number Required
      • positionLength number
      • start_offset number Required
      • token string Required
      • type string Required
GET /_analyze
{
  "analyzer": "standard",
  "text": "this is a test"
}
resp = client.indices.analyze(
    analyzer="standard",
    text="this is a test",
)
const response = await client.indices.analyze({
  analyzer: "standard",
  text: "this is a test",
});
response = client.indices.analyze(
  body: {
    "analyzer": "standard",
    "text": "this is a test"
  }
)
$resp = $client->indices()->analyze([
    "body" => [
        "analyzer" => "standard",
        "text" => "this is a test",
    ],
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"analyzer":"standard","text":"this is a test"}' "$ELASTICSEARCH_URL/_analyze"
You can apply any of the built-in analyzers to the text string without specifying an index.
{
  "analyzer": "standard",
  "text": "this is a test"
}
If the text parameter is provided as array of strings, it is analyzed as a multi-value field.
{
  "analyzer": "standard",
  "text": [
    "this is a test",
    "the second text"
  ]
}
You can test a custom transient analyzer built from tokenizers, token filters, and char filters. Token filters use the filter parameter.
{
  "tokenizer": "keyword",
  "filter": [
    "lowercase"
  ],
  "char_filter": [
    "html_strip"
  ],
  "text": "this is a <b>test</b>"
}
Custom tokenizers, token filters, and character filters can be specified in the request body.
{
  "tokenizer": "whitespace",
  "filter": [
    "lowercase",
    {
      "type": "stop",
      "stopwords": [
        "a",
        "is",
        "this"
      ]
    }
  ],
  "text": "this is a test"
}
Run `GET /analyze_sample/_analyze` to run an analysis on the text using the default index analyzer associated with the `analyze_sample` index. Alternatively, the analyzer can be derived based on a field mapping.
{
  "field": "obj1.field1",
  "text": "this is a test"
}
Run `GET /analyze_sample/_analyze` and supply a normalizer for a keyword field if there is a normalizer associated with the specified index.
{
  "normalizer": "my_normalizer",
  "text": "BaR"
}
If you want to get more advanced details, set `explain` to `true`. It will output all token attributes for each token. You can filter token attributes you want to output by setting the `attributes` option. NOTE: The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
{
  "tokenizer": "standard",
  "filter": [
    "snowball"
  ],
  "text": "detailed output",
  "explain": true,
  "attributes": [
    "keyword"
  ]
}
Response examples (200)
A successful response for an analysis with `explain` set to `true`.
{
  "detail": {
    "custom_analyzer": true,
    "charfilters": [],
    "tokenizer": {
      "name": "standard",
      "tokens": [
        {
          "token": "detailed",
          "start_offset": 0,
          "end_offset": 8,
          "type": "<ALPHANUM>",
          "position": 0
        },
        {
          "token": "output",
          "start_offset": 9,
          "end_offset": 15,
          "type": "<ALPHANUM>",
          "position": 1
        }
      ]
    },
    "tokenfilters": [
      {
        "name": "snowball",
        "tokens": [
          {
            "token": "detail",
            "start_offset": 0,
            "end_offset": 8,
            "type": "<ALPHANUM>",
            "position": 0,
            "keyword": false
          },
          {
            "token": "output",
            "start_offset": 9,
            "end_offset": 15,
            "type": "<ALPHANUM>",
            "position": 1,
            "keyword": false
          }
        ]
      }
    ]
  }
}
























Get aliases Generally available

GET /{index}/_alias/{name}

Retrieves information for one or more data stream or index aliases.

Required authorization

  • Index privileges: view_index_metadata

Path parameters

  • index string | array[string] Required

    Comma-separated list of data streams or indices used to limit the request. Supports wildcards (*). To target all data streams and indices, omit this parameter or use * or _all.

  • name string | array[string] Required

    Comma-separated list of aliases to retrieve. Supports wildcards (*). To retrieve all aliases, omit this parameter or use * or _all.

Query parameters

  • allow_no_indices boolean

    If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices.

  • expand_wildcards string | array[string]

    Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden. Valid values are: all, open, closed, hidden, none.

    Values are all, open, closed, hidden, or none.

  • ignore_unavailable boolean

    If false, the request returns an error if it targets a missing or closed index.

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • * object Additional properties
      Hide * attribute Show * attribute object
      • aliases object Required
        Hide aliases attribute Show aliases attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • filter object

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

            External documentation
          • index_routing string

            Value used to route indexing operations to a specific shard. If specified, this overwrites the routing value for indexing operations.

          • is_write_index boolean

            If true, the index is the write index for the alias.

          • routing string

            Value used to route indexing and search operations to a specific shard.

          • search_routing string

            Value used to route search operations to a specific shard. If specified, this overwrites the routing value for search operations.

          • is_hidden boolean Generally available

            If true, the alias is hidden. All indices for the alias must have the same is_hidden value.

GET _alias
resp = client.indices.get_alias()
const response = await client.indices.getAlias();
response = client.indices.get_alias
$resp = $client->indices()->getAlias();
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_alias"




















Create or update an alias Generally available

POST /{index}/_aliases/{name}

Adds a data stream or index to an alias.

Path parameters

  • index string | array[string] Required

    Comma-separated list of data streams or indices to add. Supports wildcards (*). Wildcard patterns that match both data streams and indices return an error.

  • name string Required

    Alias to update. If the alias doesn’t exist, the request creates it. Index alias names support date math.

Query parameters

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

application/json

Body

  • filter object

    An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

    External documentation
  • index_routing string
  • is_write_index boolean

    If true, sets the write index or data stream for the alias. If an alias points to multiple indices or data streams and is_write_index isn’t set, the alias rejects write requests. If an index alias points to one index and is_write_index isn’t set, the index automatically acts as the write index. Data stream aliases don’t automatically set a write data stream, even if the alias points to one data stream.

  • routing string
  • search_routing string

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

POST /{index}/_aliases/{name}
POST _aliases
{
  "actions": [
    {
      "add": {
        "index": "my-data-stream",
        "alias": "my-alias"
      }
    }
  ]
}
resp = client.indices.update_aliases(
    actions=[
        {
            "add": {
                "index": "my-data-stream",
                "alias": "my-alias"
            }
        }
    ],
)
const response = await client.indices.updateAliases({
  actions: [
    {
      add: {
        index: "my-data-stream",
        alias: "my-alias",
      },
    },
  ],
});
response = client.indices.update_aliases(
  body: {
    "actions": [
      {
        "add": {
          "index": "my-data-stream",
          "alias": "my-alias"
        }
      }
    ]
  }
)
$resp = $client->indices()->updateAliases([
    "body" => [
        "actions" => array(
            [
                "add" => [
                    "index" => "my-data-stream",
                    "alias" => "my-alias",
                ],
            ],
        ),
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"actions":[{"add":{"index":"my-data-stream","alias":"my-alias"}}]}' "$ELASTICSEARCH_URL/_aliases"
Request example
{
  "actions": [
    {
      "add": {
        "index": "my-data-stream",
        "alias": "my-alias"
      }
    }
  ]
}
















Delete an index template Generally available

DELETE /_index_template/{name}

The provided may contain multiple template names separated by a comma. If multiple template names are specified then there is no wildcard support and the provided names should match completely with existing templates.

Required authorization

  • Cluster privileges: manage_index_templates

Path parameters

  • name string | array[string] Required

    Comma-separated list of index template names used to limit the request. Wildcard (*) expressions are supported.

Query parameters

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

DELETE /_index_template/{name}
DELETE /_index_template/my-index-template
resp = client.indices.delete_index_template(
    name="my-index-template",
)
const response = await client.indices.deleteIndexTemplate({
  name: "my-index-template",
});
response = client.indices.delete_index_template(
  name: "my-index-template"
)
$resp = $client->indices()->deleteIndexTemplate([
    "name" => "my-index-template",
]);
curl -X DELETE -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_index_template/my-index-template"
















































Get index settings Generally available

GET /{index}/_settings

Get setting information for one or more indices. For data streams, it returns setting information for the stream's backing indices.

Required authorization

  • Index privileges: view_index_metadata

Path parameters

  • index string | array[string] Required

    Comma-separated list of data streams, indices, and aliases used to limit the request. Supports wildcards (*). To target all data streams and indices, omit this parameter or use * or _all.

Query parameters

  • allow_no_indices boolean

    If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

  • expand_wildcards string | array[string]

    Type of index that wildcard patterns can match. If the request can target data streams, this argument determines whether wildcard expressions match hidden data streams. Supports comma-separated values, such as open,hidden.

    Values are all, open, closed, hidden, or none.

  • flat_settings boolean

    If true, returns settings in flat format.

  • ignore_unavailable boolean

    If false, the request returns an error if it targets a missing or closed index.

  • include_defaults boolean

    If true, return all default settings in the response.

  • local boolean

    If true, the request retrieves information from the local node only. If false, information is retrieved from the master node.

  • master_timeout string

    Period to wait for a connection to the master node. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • * object
      Hide * attributes Show * attributes object
      • aliases object
        Hide aliases attribute Show aliases attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • filter object

            An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

            External documentation
          • index_routing string
          • is_hidden boolean

            If true, the alias is hidden. All indices for the alias must have the same is_hidden value.

          • is_write_index boolean

            If true, the index is the write index for the alias.

          • routing string
          • search_routing string
      • mappings object
        Hide mappings attributes Show mappings attributes object
        • all_field object
          Hide all_field attributes Show all_field attributes object
          • analyzer string Required
          • enabled boolean Required
          • omit_norms boolean Required
          • search_analyzer string Required
          • similarity string Required
          • store boolean Required
          • store_term_vector_offsets boolean Required
          • store_term_vector_payloads boolean Required
          • store_term_vector_positions boolean Required
          • store_term_vectors boolean Required
        • date_detection boolean
        • dynamic string

          Values are strict, runtime, true, or false.

        • dynamic_date_formats array[string]
        • dynamic_templates array[object]
        • _field_names object
          Hide _field_names attribute Show _field_names attribute object
          • enabled boolean Required
        • index_field object
          Hide index_field attribute Show index_field attribute object
          • enabled boolean Required
        • _meta object
          Hide _meta attribute Show _meta attribute object
          • * object Additional properties
        • numeric_detection boolean
        • properties object
        • _routing object
          Hide _routing attribute Show _routing attribute object
          • required boolean Required
        • _size object
          Hide _size attribute Show _size attribute object
          • enabled boolean Required
        • _source object
          Hide _source attributes Show _source attributes object
          • compress boolean
          • compress_threshold string
          • enabled boolean
          • excludes array[string]
          • includes array[string]
          • mode string

            Values are disabled, stored, or synthetic.

        • runtime object
          Hide runtime attribute Show runtime attribute object
          • * object Additional properties
            Hide * attributes Show * attributes object
            • fields object

              For type composite

              Hide fields attribute Show fields attribute object
              • * object Additional properties
            • fetch_fields array[object]

              For type lookup

            • format string

              A custom format for date type runtime fields.

            • input_field string

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • target_field string

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • target_index string
            • script object
              Hide script attributes Show script attributes object
              • source
              • id string
              • params object

                Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

              • lang
              • options object
            • type string Required

              Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

        • enabled boolean
        • subobjects string

          Values are true or false.

        • _data_stream_timestamp object
          Hide _data_stream_timestamp attribute Show _data_stream_timestamp attribute object
          • enabled boolean Required
      • settings object Additional properties
        Index settings
      • defaults object Additional properties
        Index settings
      • data_stream string
      • lifecycle object

        Data stream lifecycle denotes that a data stream is managed by the data stream lifecycle and contains the configuration.

        Hide lifecycle attributes Show lifecycle attributes object
        • data_retention string

          A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

        • downsampling object
          Hide downsampling attribute Show downsampling attribute object
          • rounds array[object] Required

            The list of downsampling rounds to execute as part of this downsampling configuration

            Hide rounds attributes Show rounds attributes object
            • after string Required

              A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

            • config object Required
        • enabled boolean

          If defined, it turns data stream lifecycle on/off (true/false) for this data stream. A data stream lifecycle that's disabled (enabled: false) will have no effect on the data stream.

GET _all/_settings?expand_wildcards=all&filter_path=*.settings.index.*.slowlog
resp = client.indices.get_settings(
    index="_all",
    expand_wildcards="all",
    filter_path="*.settings.index.*.slowlog",
)
const response = await client.indices.getSettings({
  index: "_all",
  expand_wildcards: "all",
  filter_path: "*.settings.index.*.slowlog",
});
response = client.indices.get_settings(
  index: "_all",
  expand_wildcards: "all",
  filter_path: "*.settings.index.*.slowlog"
)
$resp = $client->indices()->getSettings([
    "index" => "_all",
    "expand_wildcards" => "all",
    "filter_path" => "*.settings.index.*.slowlog",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_all/_settings?expand_wildcards=all&filter_path=*.settings.index.*.slowlog"








































































Inference

Inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.





Perform completion inference on the service Generally available

POST /_inference/completion/{inference_id}

Path parameters

  • inference_id string Required

    The inference Id

Query parameters

  • timeout string

    Specifies the amount of time to wait for the inference request to complete.

    Values are -1 or 0.

application/json

Body

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • completion array[object] Required

      The completion result object

      Hide completion attribute Show completion attribute object
      • result string Required
POST /_inference/completion/{inference_id}
POST _inference/completion/openai_chat_completions
{
  "input": "What is Elastic?"
}
resp = client.inference.completion(
    inference_id="openai_chat_completions",
    input="What is Elastic?",
)
const response = await client.inference.completion({
  inference_id: "openai_chat_completions",
  input: "What is Elastic?",
});
response = client.inference.completion(
  inference_id: "openai_chat_completions",
  body: {
    "input": "What is Elastic?"
  }
)
$resp = $client->inference()->completion([
    "inference_id" => "openai_chat_completions",
    "body" => [
        "input" => "What is Elastic?",
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"input":"What is Elastic?"}' "$ELASTICSEARCH_URL/_inference/completion/openai_chat_completions"
Request example
Run `POST _inference/completion/openai_chat_completions` to perform a completion on the example question.
{
  "input": "What is Elastic?"
}
Response examples (200)
A successful response from `POST _inference/completion/openai_chat_completions`.
{
  "completion": [
    {
      "result": "Elastic is a company that provides a range of software solutions for search, logging, security, and analytics. Their flagship product is Elasticsearch, an open-source, distributed search engine that allows users to search, analyze, and visualize large volumes of data in real-time. Elastic also offers products such as Kibana, a data visualization tool, and Logstash, a log management and pipeline tool, as well as various other tools and solutions for data analysis and management."
    }
  ]
}
















































Create an Azure AI studio inference endpoint Generally available

PUT /_inference/{task_type}/{azureaistudio_inference_id}

Create an inference endpoint to perform an inference task with the azureaistudio service.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform.

    Values are completion or text_embedding.

  • azureaistudio_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is azureaistudio.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid API key of your Azure AI Studio model deployment. This key can be found on the overview page for your deployment in the management section of your Azure AI Studio account.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

      External documentation
    • endpoint_type string Required

      The type of endpoint that is available for deployment through Azure AI Studio: token or realtime. The token endpoint type is for "pay as you go" endpoints that are billed per token. The realtime endpoint type is for "real-time" endpoints that are billed per hour of usage.

      External documentation
    • target string Required

      The target URL of your Azure AI Studio model deployment. This can be found on the overview page for your deployment in the management section of your Azure AI Studio account.

    • provider string Required

      The model provider for your deployment. Note that some providers may support only certain task types. Supported providers include:

      • cohere - available for text_embedding and completion task types
      • databricks - available for completion task type only
      • meta - available for completion task type only
      • microsoft_phi - available for completion task type only
      • mistral - available for completion task type only
      • openai - available for text_embedding and completion task types
    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
  • task_settings object
    Hide task_settings attributes Show task_settings attributes object
    • do_sample number

      For a completion task, instruct the inference process to perform sampling. It has no effect unless temperature or top_p is specified.

    • max_new_tokens number

      For a completion task, provide a hint for the maximum number of output tokens to be generated.

    • temperature number

      For a completion task, control the apparent creativity of generated completions with a sampling temperature. It must be a number in the range of 0.0 to 2.0. It should not be used if top_p is specified.

    • top_p number

      For a completion task, make the model consider the results of the tokens with nucleus sampling probability. It is an alternative value to temperature and must be a number in the range of 0.0 to 2.0. It should not be used if temperature is specified.

    • user string

      For a text_embedding task, specify the user issuing the request. This information can be used for abuse detection.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding or completion.

PUT /_inference/{task_type}/{azureaistudio_inference_id}
PUT _inference/text_embedding/azure_ai_studio_embeddings
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "Azure-AI-Studio-API-key",
        "target": "Target-Uri",
        "provider": "openai",
        "endpoint_type": "token"
    }
}
resp = client.inference.put(
    task_type="text_embedding",
    inference_id="azure_ai_studio_embeddings",
    inference_config={
        "service": "azureaistudio",
        "service_settings": {
            "api_key": "Azure-AI-Studio-API-key",
            "target": "Target-Uri",
            "provider": "openai",
            "endpoint_type": "token"
        }
    },
)
const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "azure_ai_studio_embeddings",
  inference_config: {
    service: "azureaistudio",
    service_settings: {
      api_key: "Azure-AI-Studio-API-key",
      target: "Target-Uri",
      provider: "openai",
      endpoint_type: "token",
    },
  },
});
response = client.inference.put(
  task_type: "text_embedding",
  inference_id: "azure_ai_studio_embeddings",
  body: {
    "service": "azureaistudio",
    "service_settings": {
      "api_key": "Azure-AI-Studio-API-key",
      "target": "Target-Uri",
      "provider": "openai",
      "endpoint_type": "token"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "text_embedding",
    "inference_id" => "azure_ai_studio_embeddings",
    "body" => [
        "service" => "azureaistudio",
        "service_settings" => [
            "api_key" => "Azure-AI-Studio-API-key",
            "target" => "Target-Uri",
            "provider" => "openai",
            "endpoint_type" => "token",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"azureaistudio","service_settings":{"api_key":"Azure-AI-Studio-API-key","target":"Target-Uri","provider":"openai","endpoint_type":"token"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/azure_ai_studio_embeddings"
Request examples
Run `PUT _inference/text_embedding/azure_ai_studio_embeddings` to create an inference endpoint that performs a text_embedding task. Note that you do not specify a model here, as it is defined already in the Azure AI Studio deployment.
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "Azure-AI-Studio-API-key",
        "target": "Target-Uri",
        "provider": "openai",
        "endpoint_type": "token"
    }
}
Run `PUT _inference/completion/azure_ai_studio_completion` to create an inference endpoint that performs a completion task.
{
    "service": "azureaistudio",
    "service_settings": {
        "api_key": "Azure-AI-Studio-API-key",
        "target": "Target-URI",
        "provider": "databricks",
        "endpoint_type": "realtime"
    }
}

Create an Azure OpenAI inference endpoint Generally available

PUT /_inference/{task_type}/{azureopenai_inference_id}

Create an inference endpoint to perform an inference task with the azureopenai service.

The list of chat completion models that you can choose from in your Azure OpenAI deployment include:

The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform. NOTE: The chat_completion task type only supports streaming and only through the _stream API.

    Values are completion or text_embedding.

  • azureopenai_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is azureopenai.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string

      A valid API key for your Azure OpenAI account. You must specify either api_key or entra_id. If you do not provide either or you provide both, you will receive an error when you try to create your model.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

      External documentation
    • api_version string Required

      The Azure API version ID to use. It is recommended to use the latest supported non-preview version.

    • deployment_id string Required

      The deployment name of your deployed models. Your Azure OpenAI deployments can be found though the Azure OpenAI Studio portal that is linked to your subscription.

      External documentation
    • entra_id string

      A valid Microsoft Entra token. You must specify either api_key or entra_id. If you do not provide either or you provide both, you will receive an error when you try to create your model.

      External documentation
    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
    • resource_name string Required

      The name of your Azure OpenAI resource. You can find this from the list of resources in the Azure Portal for your subscription.

      External documentation
  • task_settings object
    Hide task_settings attribute Show task_settings attribute object
    • user string

      For a completion or text_embedding task, specify the user issuing the request. This information can be used for abuse detection.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding or completion.

PUT /_inference/{task_type}/{azureopenai_inference_id}
PUT _inference/text_embedding/azure_openai_embeddings
{
    "service": "azureopenai",
    "service_settings": {
        "api_key": "Api-Key",
        "resource_name": "Resource-name",
        "deployment_id": "Deployment-id",
        "api_version": "2024-02-01"
    }
}
resp = client.inference.put(
    task_type="text_embedding",
    inference_id="azure_openai_embeddings",
    inference_config={
        "service": "azureopenai",
        "service_settings": {
            "api_key": "Api-Key",
            "resource_name": "Resource-name",
            "deployment_id": "Deployment-id",
            "api_version": "2024-02-01"
        }
    },
)
const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "azure_openai_embeddings",
  inference_config: {
    service: "azureopenai",
    service_settings: {
      api_key: "Api-Key",
      resource_name: "Resource-name",
      deployment_id: "Deployment-id",
      api_version: "2024-02-01",
    },
  },
});
response = client.inference.put(
  task_type: "text_embedding",
  inference_id: "azure_openai_embeddings",
  body: {
    "service": "azureopenai",
    "service_settings": {
      "api_key": "Api-Key",
      "resource_name": "Resource-name",
      "deployment_id": "Deployment-id",
      "api_version": "2024-02-01"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "text_embedding",
    "inference_id" => "azure_openai_embeddings",
    "body" => [
        "service" => "azureopenai",
        "service_settings" => [
            "api_key" => "Api-Key",
            "resource_name" => "Resource-name",
            "deployment_id" => "Deployment-id",
            "api_version" => "2024-02-01",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"azureopenai","service_settings":{"api_key":"Api-Key","resource_name":"Resource-name","deployment_id":"Deployment-id","api_version":"2024-02-01"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/azure_openai_embeddings"
Request examples
Run `PUT _inference/text_embedding/azure_openai_embeddings` to create an inference endpoint that performs a `text_embedding` task. You do not specify a model, as it is defined already in the Azure OpenAI deployment.
{
    "service": "azureopenai",
    "service_settings": {
        "api_key": "Api-Key",
        "resource_name": "Resource-name",
        "deployment_id": "Deployment-id",
        "api_version": "2024-02-01"
    }
}
Run `PUT _inference/completion/azure_openai_completion` to create an inference endpoint that performs a `completion` task.
{
    "service": "azureopenai",
    "service_settings": {
        "api_key": "Api-Key",
        "resource_name": "Resource-name",
        "deployment_id": "Deployment-id",
        "api_version": "2024-02-01"
    }
}

Create a Cohere inference endpoint Generally available

PUT /_inference/{task_type}/{cohere_inference_id}

Create an inference endpoint to perform an inference task with the cohere service.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform.

    Values are completion, rerank, or text_embedding.

  • cohere_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is cohere.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid API key for your Cohere account. You can find or create your Cohere API keys on the Cohere API key settings page.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

      External documentation
    • embedding_type string

      Values are binary, bit, byte, float, or int8.

    • model_id string

      For a completion, rerank, or text_embedding task, the name of the model to use for the inference task.

      The default value for a text embedding task is embed-english-v2.0.

    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
    • similarity string

      Values are cosine, dot_product, or l2_norm.

  • task_settings object
    Hide task_settings attributes Show task_settings attributes object
    • input_type string

      Values are classification, clustering, ingest, or search.

    • return_documents boolean

      For a rerank task, return doc text within the results.

    • top_n number

      For a rerank task, the number of most relevant documents to return. It defaults to the number of the documents. If this inference endpoint is used in a text_similarity_reranker retriever query and top_n is set, it must be greater than or equal to rank_window_size in the query.

    • truncate string

      Values are END, NONE, or START.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding, rerank, or completion.

PUT /_inference/{task_type}/{cohere_inference_id}
PUT _inference/text_embedding/cohere-embeddings
{
    "service": "cohere",
    "service_settings": {
        "api_key": "Cohere-Api-key",
        "model_id": "embed-english-light-v3.0",
        "embedding_type": "byte"
    }
}
resp = client.inference.put(
    task_type="text_embedding",
    inference_id="cohere-embeddings",
    inference_config={
        "service": "cohere",
        "service_settings": {
            "api_key": "Cohere-Api-key",
            "model_id": "embed-english-light-v3.0",
            "embedding_type": "byte"
        }
    },
)
const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "cohere-embeddings",
  inference_config: {
    service: "cohere",
    service_settings: {
      api_key: "Cohere-Api-key",
      model_id: "embed-english-light-v3.0",
      embedding_type: "byte",
    },
  },
});
response = client.inference.put(
  task_type: "text_embedding",
  inference_id: "cohere-embeddings",
  body: {
    "service": "cohere",
    "service_settings": {
      "api_key": "Cohere-Api-key",
      "model_id": "embed-english-light-v3.0",
      "embedding_type": "byte"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "text_embedding",
    "inference_id" => "cohere-embeddings",
    "body" => [
        "service" => "cohere",
        "service_settings" => [
            "api_key" => "Cohere-Api-key",
            "model_id" => "embed-english-light-v3.0",
            "embedding_type" => "byte",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"cohere","service_settings":{"api_key":"Cohere-Api-key","model_id":"embed-english-light-v3.0","embedding_type":"byte"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/cohere-embeddings"
Request examples
Run `PUT _inference/text_embedding/cohere-embeddings` to create an inference endpoint that performs a text embedding task.
{
    "service": "cohere",
    "service_settings": {
        "api_key": "Cohere-Api-key",
        "model_id": "embed-english-light-v3.0",
        "embedding_type": "byte"
    }
}
Run `PUT _inference/rerank/cohere-rerank` to create an inference endpoint that performs a rerank task.
{
    "service": "cohere",
    "service_settings": {
        "api_key": "Cohere-API-key",
        "model_id": "rerank-english-v3.0"
    },
    "task_settings": {
        "top_n": 10,
        "return_documents": true
    }
}








Create an Google AI Studio inference endpoint Generally available

PUT /_inference/{task_type}/{googleaistudio_inference_id}

Create an inference endpoint to perform an inference task with the googleaistudio service.

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform.

    Values are completion or text_embedding.

  • googleaistudio_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is googleaistudio.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid API key of your Google Gemini account.

    • model_id string Required

      The name of the model to use for the inference task. Refer to the Google documentation for the list of supported models.

      External documentation
    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are text_embedding or completion.

PUT /_inference/{task_type}/{googleaistudio_inference_id}
PUT _inference/completion/google_ai_studio_completion
{
    "service": "googleaistudio",
    "service_settings": {
        "api_key": "api-key",
        "model_id": "model-id"
    }
}
resp = client.inference.put(
    task_type="completion",
    inference_id="google_ai_studio_completion",
    inference_config={
        "service": "googleaistudio",
        "service_settings": {
            "api_key": "api-key",
            "model_id": "model-id"
        }
    },
)
const response = await client.inference.put({
  task_type: "completion",
  inference_id: "google_ai_studio_completion",
  inference_config: {
    service: "googleaistudio",
    service_settings: {
      api_key: "api-key",
      model_id: "model-id",
    },
  },
});
response = client.inference.put(
  task_type: "completion",
  inference_id: "google_ai_studio_completion",
  body: {
    "service": "googleaistudio",
    "service_settings": {
      "api_key": "api-key",
      "model_id": "model-id"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "completion",
    "inference_id" => "google_ai_studio_completion",
    "body" => [
        "service" => "googleaistudio",
        "service_settings" => [
            "api_key" => "api-key",
            "model_id" => "model-id",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"googleaistudio","service_settings":{"api_key":"api-key","model_id":"model-id"}}' "$ELASTICSEARCH_URL/_inference/completion/google_ai_studio_completion"
Request example
Run `PUT _inference/completion/google_ai_studio_completion` to create an inference endpoint to perform a `completion` task type.
{
    "service": "googleaistudio",
    "service_settings": {
        "api_key": "api-key",
        "model_id": "model-id"
    }
}




Create a Hugging Face inference endpoint Generally available

PUT /_inference/{task_type}/{huggingface_inference_id}

Create an inference endpoint to perform an inference task with the hugging_face service. Supported tasks include: text_embedding, completion, and chat_completion.

To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.

For Elastic's text_embedding task: The selected model must support the Sentence Embeddings task. On the new endpoint creation page, select the Sentence Embeddings task under the Advanced Configuration section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models for text_embedding task:

  • all-MiniLM-L6-v2
  • all-MiniLM-L12-v2
  • all-mpnet-base-v2
  • e5-base-v2
  • e5-small-v2
  • multilingual-e5-base
  • multilingual-e5-small

For Elastic's chat_completion and completion tasks: The selected model must support the Text Generation task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for Text Generation. When creating dedicated endpoint select the Text Generation task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes /v1/chat/completions part in URL. Then, copy the full endpoint URL for use. Recommended models for chat_completion and completion tasks:

  • Mistral-7B-Instruct-v0.2
  • QwQ-32B
  • Phi-3-mini-128k-instruct

For Elastic's rerank task: The selected model must support the sentence-ranking task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints for Rerank so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models for rerank task:

  • bge-reranker-base
  • jina-reranker-v1-turbo-en-GGUF

Required authorization

  • Cluster privileges: manage_inference

Path parameters

  • task_type string

    The type of the inference task that the model will perform.

    Values are chat_completion, completion, rerank, or text_embedding.

  • huggingface_inference_id string Required

    The unique identifier of the inference endpoint.

application/json

Body

  • chunking_settings object

    Chunking configuration object

    Hide chunking_settings attributes Show chunking_settings attributes object
    • max_chunk_size number

      The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

    • overlap number

      The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

    • sentence_overlap number

      The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

    • strategy string

      The chunking strategy: sentence or word.

  • service string Required

    Value is hugging_face.

  • service_settings object Required
    Hide service_settings attributes Show service_settings attributes object
    • api_key string Required

      A valid access token for your HuggingFace account. You can create or find your access tokens on the HuggingFace settings page.

      IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.

      External documentation
    • rate_limit object

      This setting helps to minimize the number of rate limit errors returned from the service.

      Hide rate_limit attribute Show rate_limit attribute object
      • requests_per_minute number

        The number of requests allowed per minute. By default, the number of requests allowed per minute is set by each service as follows:

        • alibabacloud-ai-search service: 1000
        • anthropic service: 50
        • azureaistudio service: 240
        • azureopenai service and task type text_embedding: 1440
        • azureopenai service and task type completion: 120
        • cohere service: 10000
        • elastic service and task type chat_completion: 240
        • googleaistudio service: 360
        • googlevertexai service: 30000
        • hugging_face service: 3000
        • jinaai service: 2000
        • mistral service: 240
        • openai service and task type text_embedding: 3000
        • openai service and task type completion: 500
        • voyageai service: 2000
        • watsonxai service: 120
    • url string Required

      The URL endpoint to use for the requests. For completion and chat_completion tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (see the linked external documentation for details). The endpoint URL for the request must include /v1/chat/completions. If the model supports the OpenAI Chat Completion schema, a toggle should appear in the interface. Enabling this toggle doesn't change any model behavior, it reveals the full endpoint URL needed (which should include /v1/chat/completions) when configuring the inference endpoint in Elasticsearch. If the model doesn't support this schema, the toggle may not be shown.

      External documentation
    • model_id string

      The name of the HuggingFace model to use for the inference task. For completion and chat_completion tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints. For the text_embedding task, this field should not be included. Otherwise, the request will fail.

  • task_settings object
    Hide task_settings attributes Show task_settings attributes object
    • return_documents boolean

      For a rerank task, return doc text within the results.

    • top_n number

      For a rerank task, the number of most relevant documents to return. It defaults to the number of the documents.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • chunking_settings object

      Chunking configuration object

      Hide chunking_settings attributes Show chunking_settings attributes object
      • max_chunk_size number

        The maximum size of a chunk in words. This value cannot be higher than 300 or lower than 20 (for sentence strategy) or 10 (for word strategy).

      • overlap number

        The number of overlapping words for chunks. It is applicable only to a word chunking strategy. This value cannot be higher than half the max_chunk_size value.

      • sentence_overlap number

        The number of overlapping sentences for chunks. It is applicable only for a sentence chunking strategy. It can be either 1 or 0.

      • strategy string

        The chunking strategy: sentence or word.

    • service string Required

      The service type

    • service_settings object Required
    • task_settings object
    • inference_id string Required

      The inference Id

    • task_type string Required

      Values are chat_completion, completion, rerank, or text_embedding.

PUT /_inference/{task_type}/{huggingface_inference_id}
PUT _inference/text_embedding/hugging-face-embeddings
{
    "service": "hugging_face",
    "service_settings": {
        "api_key": "hugging-face-access-token", 
        "url": "url-endpoint" 
    }
}
resp = client.inference.put(
    task_type="text_embedding",
    inference_id="hugging-face-embeddings",
    inference_config={
        "service": "hugging_face",
        "service_settings": {
            "api_key": "hugging-face-access-token",
            "url": "url-endpoint"
        }
    },
)
const response = await client.inference.put({
  task_type: "text_embedding",
  inference_id: "hugging-face-embeddings",
  inference_config: {
    service: "hugging_face",
    service_settings: {
      api_key: "hugging-face-access-token",
      url: "url-endpoint",
    },
  },
});
response = client.inference.put(
  task_type: "text_embedding",
  inference_id: "hugging-face-embeddings",
  body: {
    "service": "hugging_face",
    "service_settings": {
      "api_key": "hugging-face-access-token",
      "url": "url-endpoint"
    }
  }
)
$resp = $client->inference()->put([
    "task_type" => "text_embedding",
    "inference_id" => "hugging-face-embeddings",
    "body" => [
        "service" => "hugging_face",
        "service_settings" => [
            "api_key" => "hugging-face-access-token",
            "url" => "url-endpoint",
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"service":"hugging_face","service_settings":{"api_key":"hugging-face-access-token","url":"url-endpoint"}}' "$ELASTICSEARCH_URL/_inference/text_embedding/hugging-face-embeddings"
Request examples
Run `PUT _inference/text_embedding/hugging-face-embeddings` to create an inference endpoint that performs a `text_embedding` task type.
{
    "service": "hugging_face",
    "service_settings": {
        "api_key": "hugging-face-access-token", 
        "url": "url-endpoint" 
    }
}
Run `PUT _inference/rerank/hugging-face-rerank` to create an inference endpoint that performs a `rerank` task type.
{
    "service": "hugging_face",
    "service_settings": {
        "api_key": "hugging-face-access-token", 
        "url": "url-endpoint" 
    },
    "task_settings": {
        "return_documents": true,
        "top_n": 3
    }
}

































Get cluster info Generally available

GET /

Get basic build, version, and cluster information.

Required authorization

  • Cluster privileges: monitor

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • cluster_name string Required
    • cluster_uuid string Required
    • name string Required
    • tagline string Required
    • version object Required
      Hide version attributes Show version attributes object
      • build_date string | number Required

        A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

        One of:
      • build_flavor string Required

        The build flavor. For example, default.

      • build_hash string Required

        The Elasticsearch Git commit's SHA hash.

      • build_snapshot boolean Required

        Indicates whether the Elasticsearch build was a snapshot.

      • build_type string Required

        The build type that corresponds to how Elasticsearch was installed. For example, docker, rpm, or tar.

      • lucene_version string Required
      • minimum_index_compatibility_version string Required
      • minimum_wire_compatibility_version string Required
      • number string Required

        The Elasticsearch version number.

GET /
resp = client.info()
const response = await client.info();
response = client.info
$resp = $client->info();
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/"
Response examples (200)
A successful response from `GET /`s.
{
  "name": "instance-0000000000",
  "cluster_name": "my_test_cluster",
  "cluster_uuid": "5QaxoN0pRZuOmWSxstBBwQ",
  "version": {
    "build_date": "2024-02-01T13:07:13.727175297Z",
    "minimum_wire_compatibility_version": "7.17.0",
    "build_hash": "6185ba65d27469afabc9bc951cded6c17c21e3f3",
    "number": "8.12.1",
    "lucene_version": "9.9.2",
    "minimum_index_compatibility_version": "7.0.0",
    "build_flavor": "default",
    "build_snapshot": false,
    "build_type": "docker"
  },
  "tagline": "You Know, for Search"
}











































Get Logstash pipelines Generally available

GET /_logstash/pipeline/{id}

Get pipelines that are used for Logstash Central Management.

Required authorization

  • Cluster privileges: manage_logstash_pipelines
External documentation

Path parameters

  • id string | array[string] Required

    A comma-separated list of pipeline identifiers.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • * object Additional properties
      Hide * attributes Show * attributes object
      • description string Required

        A description of the pipeline. This description is not used by Elasticsearch or Logstash.

      • last_modified string | number Required

        A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

        One of:
      • pipeline string Required

        The configuration for the pipeline.

        External documentation
      • pipeline_metadata object Required
        Hide pipeline_metadata attributes Show pipeline_metadata attributes object
        • type string Required
        • version string Required
      • pipeline_settings object Required
        Hide pipeline_settings attributes Show pipeline_settings attributes object
        • pipeline.workers number Required

          The number of workers that will, in parallel, execute the filter and output stages of the pipeline.

        • pipeline.batch.size number Required

          The maximum number of events an individual worker thread will collect from inputs before attempting to execute its filters and outputs.

        • pipeline.batch.delay number Required

          When creating pipeline event batches, how long in milliseconds to wait for each event before dispatching an undersized batch to pipeline workers.

        • queue.type string Required

          The internal queuing model to use for event buffering.

        • queue.max_bytes string Required

          The total capacity of the queue (queue.type: persisted) in number of bytes.

        • queue.checkpoint.writes number Required

          The maximum number of written events before forcing a checkpoint when persistent queues are enabled (queue.type: persisted).

      • username string Required

        The user who last updated the pipeline.

GET /_logstash/pipeline/{id}
GET _logstash/pipeline/my_pipeline
resp = client.logstash.get_pipeline(
    id="my_pipeline",
)
const response = await client.logstash.getPipeline({
  id: "my_pipeline",
});
response = client.logstash.get_pipeline(
  id: "my_pipeline"
)
$resp = $client->logstash()->getPipeline([
    "id" => "my_pipeline",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_logstash/pipeline/my_pipeline"
Response examples (200)
A successful response from `GET _logstash/pipeline/my_pipeline`.
{
  "my_pipeline": {
    "description": "Sample pipeline for illustration purposes",
    "last_modified": "2021-01-02T02:50:51.250Z",
    "pipeline_metadata": {
      "type": "logstash_pipeline",
      "version": "1"
    },
    "username": "elastic",
    "pipeline": "input {}\\n filter { grok {} }\\n output {}",
    "pipeline_settings": {
      "pipeline.workers": 1,
      "pipeline.batch.size": 125,
      "pipeline.batch.delay": 50,
      "queue.type": "memory",
      "queue.max_bytes": "1gb",
      "queue.checkpoint.writes": 1024
    }
  }
}








Get Logstash pipelines Generally available

GET /_logstash/pipeline

Get pipelines that are used for Logstash Central Management.

Required authorization

  • Cluster privileges: manage_logstash_pipelines
External documentation

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • * object Additional properties
      Hide * attributes Show * attributes object
      • description string Required

        A description of the pipeline. This description is not used by Elasticsearch or Logstash.

      • last_modified string | number Required

        A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

        One of:
      • pipeline string Required

        The configuration for the pipeline.

        External documentation
      • pipeline_metadata object Required
        Hide pipeline_metadata attributes Show pipeline_metadata attributes object
        • type string Required
        • version string Required
      • pipeline_settings object Required
        Hide pipeline_settings attributes Show pipeline_settings attributes object
        • pipeline.workers number Required

          The number of workers that will, in parallel, execute the filter and output stages of the pipeline.

        • pipeline.batch.size number Required

          The maximum number of events an individual worker thread will collect from inputs before attempting to execute its filters and outputs.

        • pipeline.batch.delay number Required

          When creating pipeline event batches, how long in milliseconds to wait for each event before dispatching an undersized batch to pipeline workers.

        • queue.type string Required

          The internal queuing model to use for event buffering.

        • queue.max_bytes string Required

          The total capacity of the queue (queue.type: persisted) in number of bytes.

        • queue.checkpoint.writes number Required

          The maximum number of written events before forcing a checkpoint when persistent queues are enabled (queue.type: persisted).

      • username string Required

        The user who last updated the pipeline.

GET _logstash/pipeline/my_pipeline
resp = client.logstash.get_pipeline(
    id="my_pipeline",
)
const response = await client.logstash.getPipeline({
  id: "my_pipeline",
});
response = client.logstash.get_pipeline(
  id: "my_pipeline"
)
$resp = $client->logstash()->getPipeline([
    "id" => "my_pipeline",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_logstash/pipeline/my_pipeline"
Response examples (200)
A successful response from `GET _logstash/pipeline/my_pipeline`.
{
  "my_pipeline": {
    "description": "Sample pipeline for illustration purposes",
    "last_modified": "2021-01-02T02:50:51.250Z",
    "pipeline_metadata": {
      "type": "logstash_pipeline",
      "version": "1"
    },
    "username": "elastic",
    "pipeline": "input {}\\n filter { grok {} }\\n output {}",
    "pipeline_settings": {
      "pipeline.workers": 1,
      "pipeline.batch.size": 125,
      "pipeline.batch.delay": 50,
      "queue.type": "memory",
      "queue.max_bytes": "1gb",
      "queue.checkpoint.writes": 1024
    }
  }
}

Machine learning anomaly detection





Get calendar configuration info Generally available

GET /_ml/calendars/{calendar_id}

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • calendar_id string Required

    A string that uniquely identifies a calendar. You can get information for multiple calendars by using a comma-separated list of ids or a wildcard expression. You can get information for all calendars by using _all or * or by omitting the calendar identifier.

Query parameters

  • from number

    Skips the specified number of calendars. This parameter is supported only when you omit the calendar identifier.

  • size number

    Specifies the maximum number of calendars to obtain. This parameter is supported only when you omit the calendar identifier.

application/json

Body

  • page object
    Hide page attributes Show page attributes object
    • from number

      Skips the specified number of items.

    • size number

      Specifies the maximum number of items to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • calendars array[object] Required
      Hide calendars attributes Show calendars attributes object
      • calendar_id string Required
      • description string

        A description of the calendar.

      • job_ids array[string] Required

        An array of anomaly detection job identifiers.

    • count number Required
GET /_ml/calendars/{calendar_id}
GET _ml/calendars/planned-outages
resp = client.ml.get_calendars(
    calendar_id="planned-outages",
)
const response = await client.ml.getCalendars({
  calendar_id: "planned-outages",
});
response = client.ml.get_calendars(
  calendar_id: "planned-outages"
)
$resp = $client->ml()->getCalendars([
    "calendar_id" => "planned-outages",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/calendars/planned-outages"

Create a calendar Generally available

PUT /_ml/calendars/{calendar_id}

Required authorization

  • Cluster privileges: manage_ml

Path parameters

  • calendar_id string Required

    A string that uniquely identifies a calendar.

application/json

Body

  • job_ids array[string]

    An array of anomaly detection job identifiers.

  • description string

    A description of the calendar.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
PUT /_ml/calendars/{calendar_id}
PUT _ml/calendars/planned-outages
resp = client.ml.put_calendar(
    calendar_id="planned-outages",
)
const response = await client.ml.putCalendar({
  calendar_id: "planned-outages",
});
response = client.ml.put_calendar(
  calendar_id: "planned-outages"
)
$resp = $client->ml()->putCalendar([
    "calendar_id" => "planned-outages",
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/calendars/planned-outages"

Get calendar configuration info Generally available

POST /_ml/calendars/{calendar_id}

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • calendar_id string Required

    A string that uniquely identifies a calendar. You can get information for multiple calendars by using a comma-separated list of ids or a wildcard expression. You can get information for all calendars by using _all or * or by omitting the calendar identifier.

Query parameters

  • from number

    Skips the specified number of calendars. This parameter is supported only when you omit the calendar identifier.

  • size number

    Specifies the maximum number of calendars to obtain. This parameter is supported only when you omit the calendar identifier.

application/json

Body

  • page object
    Hide page attributes Show page attributes object
    • from number

      Skips the specified number of items.

    • size number

      Specifies the maximum number of items to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • calendars array[object] Required
      Hide calendars attributes Show calendars attributes object
      • calendar_id string Required
      • description string

        A description of the calendar.

      • job_ids array[string] Required

        An array of anomaly detection job identifiers.

    • count number Required
POST /_ml/calendars/{calendar_id}
GET _ml/calendars/planned-outages
resp = client.ml.get_calendars(
    calendar_id="planned-outages",
)
const response = await client.ml.getCalendars({
  calendar_id: "planned-outages",
});
response = client.ml.get_calendars(
  calendar_id: "planned-outages"
)
$resp = $client->ml()->getCalendars([
    "calendar_id" => "planned-outages",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/calendars/planned-outages"
















Get datafeeds configuration info Generally available

GET /_ml/datafeeds/{datafeed_id}

You can get information for multiple datafeeds in a single API request by using a comma-separated list of datafeeds or a wildcard expression. You can get information for all datafeeds by using _all, by specifying * as the <feed_id>, or by omitting the <feed_id>. This API returns a maximum of 10,000 datafeeds.

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • datafeed_id string | array[string] Required

    Identifier for the datafeed. It can be a datafeed identifier or a wildcard expression. If you do not specify one of these options, the API returns information about all datafeeds.

Query parameters

  • allow_no_match boolean

    Specifies what to do when the request:

    1. Contains wildcard expressions and there are no datafeeds that match.
    2. Contains the _all string or no identifiers and there are no matches.
    3. Contains wildcard expressions and there are only partial matches.

    The default value is true, which returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. If this parameter is false, the request returns a 404 status code when there are no matches or only partial matches.

  • exclude_generated boolean

    Indicates if certain fields should be removed from the configuration on retrieval. This allows the configuration to be in an acceptable format to be retrieved and then added to another cluster.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • datafeeds array[object] Required
      Hide datafeeds attributes Show datafeeds attributes object
      • aggregations object
      • authorization object
        Hide authorization attributes Show authorization attributes object
        • api_key object
          Hide api_key attributes Show api_key attributes object
          • id string Required

            The identifier for the API key.

          • name string Required

            The name of the API key.

        • roles array[string]

          If a user ID was used for the most recent update to the datafeed, its roles at the time of the update are listed in the response.

        • service_account string

          If a service account was used for the most recent update to the datafeed, the account name is listed in the response.

      • chunking_config object
        Hide chunking_config attributes Show chunking_config attributes object
        • mode string Required

          Values are auto, manual, or off.

        • time_span string

          A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

      • datafeed_id string Required
      • frequency string

        A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

      • indices array[string] Required
      • indexes array[string]
      • job_id string Required
      • max_empty_searches number
      • query_delay string

        A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

      • script_fields object
        Hide script_fields attribute Show script_fields attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • script object Required
            Hide script attributes Show script attributes object
            • source
            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            • lang
            • options object
          • ignore_failure boolean
      • scroll_size number
      • delayed_data_check_config object Required
        Hide delayed_data_check_config attributes Show delayed_data_check_config attributes object
        • check_window string

          A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

        • enabled boolean Required

          Specifies whether the datafeed periodically checks for delayed data.

      • runtime_mappings object
        Hide runtime_mappings attribute Show runtime_mappings attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • fields object

            For type composite

            Hide fields attribute Show fields attribute object
            • * object Additional properties
          • fetch_fields array[object]

            For type lookup

          • format string

            A custom format for date type runtime fields.

          • input_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_index string
          • script object
            Hide script attributes Show script attributes object
            • source
            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            • lang
            • options object
          • type string Required

            Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

      • indices_options object

        Controls how to deal with unavailable concrete indices (closed or missing), how wildcard expressions are expanded to actual indices (all, closed or open indices) and how to deal with wildcard expressions that resolve to no indices.

        Hide indices_options attributes Show indices_options attributes object
        • allow_no_indices boolean

          If false, the request returns an error if any wildcard expression, index alias, or _all value targets only missing or closed indices. This behavior applies even if the request targets other open indices. For example, a request targeting foo*,bar* returns an error if an index starts with foo but no index starts with bar.

        • expand_wildcards string | array[string]
        • ignore_unavailable boolean

          If true, missing or closed indices are not included in the response.

        • ignore_throttled boolean

          If true, concrete, expanded or aliased indices are ignored when frozen.

      • query object Required

        The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {"boost": 1}}.

        Query DSL
GET /_ml/datafeeds/{datafeed_id}
GET _ml/datafeeds/datafeed-high_sum_total_sales
resp = client.ml.get_datafeeds(
    datafeed_id="datafeed-high_sum_total_sales",
)
const response = await client.ml.getDatafeeds({
  datafeed_id: "datafeed-high_sum_total_sales",
});
response = client.ml.get_datafeeds(
  datafeed_id: "datafeed-high_sum_total_sales"
)
$resp = $client->ml()->getDatafeeds([
    "datafeed_id" => "datafeed-high_sum_total_sales",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/datafeeds/datafeed-high_sum_total_sales"
















































Get calendar configuration info Generally available

GET /_ml/calendars

Required authorization

  • Cluster privileges: monitor_ml

Query parameters

  • from number

    Skips the specified number of calendars. This parameter is supported only when you omit the calendar identifier.

  • size number

    Specifies the maximum number of calendars to obtain. This parameter is supported only when you omit the calendar identifier.

application/json

Body

  • page object
    Hide page attributes Show page attributes object
    • from number

      Skips the specified number of items.

    • size number

      Specifies the maximum number of items to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • calendars array[object] Required
      Hide calendars attributes Show calendars attributes object
      • calendar_id string Required
      • description string

        A description of the calendar.

      • job_ids array[string] Required

        An array of anomaly detection job identifiers.

    • count number Required
GET _ml/calendars/planned-outages
resp = client.ml.get_calendars(
    calendar_id="planned-outages",
)
const response = await client.ml.getCalendars({
  calendar_id: "planned-outages",
});
response = client.ml.get_calendars(
  calendar_id: "planned-outages"
)
$resp = $client->ml()->getCalendars([
    "calendar_id" => "planned-outages",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/calendars/planned-outages"

Get calendar configuration info Generally available

POST /_ml/calendars

Required authorization

  • Cluster privileges: monitor_ml

Query parameters

  • from number

    Skips the specified number of calendars. This parameter is supported only when you omit the calendar identifier.

  • size number

    Specifies the maximum number of calendars to obtain. This parameter is supported only when you omit the calendar identifier.

application/json

Body

  • page object
    Hide page attributes Show page attributes object
    • from number

      Skips the specified number of items.

    • size number

      Specifies the maximum number of items to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • calendars array[object] Required
      Hide calendars attributes Show calendars attributes object
      • calendar_id string Required
      • description string

        A description of the calendar.

      • job_ids array[string] Required

        An array of anomaly detection job identifiers.

    • count number Required
GET _ml/calendars/planned-outages
resp = client.ml.get_calendars(
    calendar_id="planned-outages",
)
const response = await client.ml.getCalendars({
  calendar_id: "planned-outages",
});
response = client.ml.get_calendars(
  calendar_id: "planned-outages"
)
$resp = $client->ml()->getCalendars([
    "calendar_id" => "planned-outages",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/calendars/planned-outages"

Get datafeed stats Generally available

GET /_ml/datafeeds/{datafeed_id}/_stats

You can get statistics for multiple datafeeds in a single API request by using a comma-separated list of datafeeds or a wildcard expression. You can get statistics for all datafeeds by using _all, by specifying * as the <feed_id>, or by omitting the <feed_id>. If the datafeed is stopped, the only information you receive is the datafeed_id and the state. This API returns a maximum of 10,000 datafeeds.

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • datafeed_id string | array[string] Required

    Identifier for the datafeed. It can be a datafeed identifier or a wildcard expression. If you do not specify one of these options, the API returns information about all datafeeds.

Query parameters

  • allow_no_match boolean

    Specifies what to do when the request:

    1. Contains wildcard expressions and there are no datafeeds that match.
    2. Contains the _all string or no identifiers and there are no matches.
    3. Contains wildcard expressions and there are only partial matches.

    The default value is true, which returns an empty datafeeds array when there are no matches and the subset of results when there are partial matches. If this parameter is false, the request returns a 404 status code when there are no matches or only partial matches.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • datafeeds array[object] Required
      Hide datafeeds attributes Show datafeeds attributes object
      • assignment_explanation string

        For started datafeeds only, contains messages relating to the selection of a node.

      • datafeed_id string Required
      • state string Required

        Values are started, stopped, starting, or stopping.

      • timing_stats object
        Hide timing_stats attributes Show timing_stats attributes object
        • bucket_count number Required

          The number of buckets processed.

        • exponential_average_search_time_per_hour_ms number

          Time unit for fractional milliseconds

        • exponential_average_calculation_context object
          Hide exponential_average_calculation_context attributes Show exponential_average_calculation_context attributes object
          • incremental_metric_value_ms number

            Time unit for fractional milliseconds

          • latest_timestamp number

            Time unit for milliseconds

          • previous_exponential_average_ms number

            Time unit for fractional milliseconds

        • job_id string Required
        • search_count number Required

          The number of searches run by the datafeed.

        • total_search_time_ms number

          Time unit for fractional milliseconds

        • average_search_time_per_bucket_ms number

          Time unit for fractional milliseconds

      • running_state object
        Hide running_state attributes Show running_state attributes object
        • real_time_configured boolean Required

          Indicates if the datafeed is "real-time"; meaning that the datafeed has no configured end time.

        • real_time_running boolean Required

          Indicates whether the datafeed has finished running on the available past data. For datafeeds without a configured end time, this means that the datafeed is now running on "real-time" data.

        • search_interval object
          Hide search_interval attributes Show search_interval attributes object
          • end string

            A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

          • end_ms number

            Time unit for milliseconds

          • start string

            A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

          • start_ms number

            Time unit for milliseconds

GET /_ml/datafeeds/{datafeed_id}/_stats
GET _ml/datafeeds/datafeed-high_sum_total_sales/_stats
resp = client.ml.get_datafeed_stats(
    datafeed_id="datafeed-high_sum_total_sales",
)
const response = await client.ml.getDatafeedStats({
  datafeed_id: "datafeed-high_sum_total_sales",
});
response = client.ml.get_datafeed_stats(
  datafeed_id: "datafeed-high_sum_total_sales"
)
$resp = $client->ml()->getDatafeedStats([
    "datafeed_id" => "datafeed-high_sum_total_sales",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/datafeeds/datafeed-high_sum_total_sales/_stats"








Get filters Generally available

GET /_ml/filters

You can get a single filter or all filters.

Required authorization

  • Cluster privileges: manage_ml

Query parameters

  • from number

    Skips the specified number of filters.

  • size number

    Specifies the maximum number of filters to obtain.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • filters array[object] Required
      Hide filters attributes Show filters attributes object
      • description string

        A description of the filter.

      • filter_id string Required
      • items array[string] Required

        An array of strings which is the filter item list.

GET _ml/filters/safe_domains
resp = client.ml.get_filters(
    filter_id="safe_domains",
)
const response = await client.ml.getFilters({
  filter_id: "safe_domains",
});
response = client.ml.get_filters(
  filter_id: "safe_domains"
)
$resp = $client->ml()->getFilters([
    "filter_id" => "safe_domains",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/filters/safe_domains"
















Get overall bucket results Generally available

POST /_ml/anomaly_detectors/{job_id}/results/overall_buckets

Retrievs overall bucket results that summarize the bucket results of multiple anomaly detection jobs.

The overall_score is calculated by combining the scores of all the buckets within the overall bucket span. First, the maximum anomaly_score per anomaly detection job in the overall bucket is calculated. Then the top_n of those scores are averaged to result in the overall_score. This means that you can fine-tune the overall_score so that it is more or less sensitive to the number of jobs that detect an anomaly at the same time. For example, if you set top_n to 1, the overall_score is the maximum bucket score in the overall bucket. Alternatively, if you set top_n to the number of jobs, the overall_score is high only when all jobs detect anomalies in that overall bucket. If you set the bucket_span parameter (to a value greater than its default), the overall_score is the maximum overall_score of the overall buckets that have a span equal to the jobs' largest bucket span.

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • job_id string Required

    Identifier for the anomaly detection job. It can be a job identifier, a group name, a comma-separated list of jobs or groups, or a wildcard expression.

    You can summarize the bucket results for all anomaly detection jobs by using _all or by specifying * as the <job_id>.

Query parameters

  • allow_no_match boolean

    Specifies what to do when the request:

    1. Contains wildcard expressions and there are no jobs that match.
    2. Contains the _all string or no identifiers and there are no matches.
    3. Contains wildcard expressions and there are only partial matches.

    If true, the request returns an empty jobs array when there are no matches and the subset of results when there are partial matches. If this parameter is false, the request returns a 404 status code when there are no matches or only partial matches.

  • bucket_span string

    The span of the overall buckets. Must be greater or equal to the largest bucket span of the specified anomaly detection jobs, which is the default value.

    By default, an overall bucket has a span equal to the largest bucket span of the specified anomaly detection jobs. To override that behavior, use the optional bucket_span parameter.

    Values are -1 or 0.

  • end string | number

    Returns overall buckets with timestamps earlier than this time.

  • exclude_interim boolean

    If true, the output excludes interim results.

  • overall_score number | string

    Returns overall buckets with overall scores greater than or equal to this value.

  • start string | number

    Returns overall buckets with timestamps after this time.

  • top_n number

    The number of top anomaly detection job bucket scores to be used in the overall_score calculation.

application/json

Body

  • allow_no_match boolean

    Refer to the description for the allow_no_match query parameter.

  • bucket_span string

    A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

  • end string | number

    A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

    One of:
  • exclude_interim boolean

    Refer to the description for the exclude_interim query parameter.

  • overall_score number | string

    Refer to the description for the overall_score query parameter.

  • start string | number

    A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

    One of:
  • top_n number

    Refer to the description for the top_n query parameter.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • overall_buckets array[object] Required

      Array of overall bucket objects

      Hide overall_buckets attributes Show overall_buckets attributes object
      • bucket_span number

        Time unit for seconds

      • is_interim boolean Required

        If true, this is an interim result. In other words, the results are calculated based on partial input data.

      • jobs array[object] Required

        An array of objects that contain the max_anomaly_score per job_id.

        Hide jobs attributes Show jobs attributes object
        • job_id string Required
        • max_anomaly_score number Required
      • overall_score number Required

        The top_n average of the maximum bucket anomaly_score per job.

      • result_type string Required

        Internal. This is always set to overall_bucket.

      • timestamp number

        Time unit for milliseconds

      • timestamp_string string | number

        A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

        One of:
POST /_ml/anomaly_detectors/{job_id}/results/overall_buckets
GET _ml/anomaly_detectors/job-*/results/overall_buckets
{
  "overall_score": 80,
  "start": "1403532000000"
}
resp = client.ml.get_overall_buckets(
    job_id="job-*",
    overall_score=80,
    start="1403532000000",
)
const response = await client.ml.getOverallBuckets({
  job_id: "job-*",
  overall_score: 80,
  start: 1403532000000,
});
response = client.ml.get_overall_buckets(
  job_id: "job-*",
  body: {
    "overall_score": 80,
    "start": "1403532000000"
  }
)
$resp = $client->ml()->getOverallBuckets([
    "job_id" => "job-*",
    "body" => [
        "overall_score" => 80,
        "start" => "1403532000000",
    ],
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"overall_score":80,"start":"1403532000000"}' "$ELASTICSEARCH_URL/_ml/anomaly_detectors/job-*/results/overall_buckets"
Request example
An example body for a `GET _ml/anomaly_detectors/job-*/results/overall_buckets` request.
{
  "overall_score": 80,
  "start": "1403532000000"
}
























Start datafeeds Generally available

POST /_ml/datafeeds/{datafeed_id}/_start

A datafeed must be started in order to retrieve data from Elasticsearch. A datafeed can be started and stopped multiple times throughout its lifecycle.

Before you can start a datafeed, the anomaly detection job must be open. Otherwise, an error occurs.

If you restart a stopped datafeed, it continues processing input data from the next millisecond after it was stopped. If new data was indexed for that exact millisecond between stopping and starting, it will be ignored.

When Elasticsearch security features are enabled, your datafeed remembers which roles the last user to create or update it had at the time of creation or update and runs the query using those same roles. If you provided secondary authorization headers when you created or updated the datafeed, those credentials are used instead.

Required authorization

  • Cluster privileges: manage_ml

Path parameters

  • datafeed_id string Required

    A numerical character string that uniquely identifies the datafeed. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.

Query parameters

  • end string | number

    The time that the datafeed should end, which can be specified by using one of the following formats:

    • ISO 8601 format with milliseconds, for example 2017-01-22T06:00:00.000Z
    • ISO 8601 format without milliseconds, for example 2017-01-22T06:00:00+00:00
    • Milliseconds since the epoch, for example 1485061200000

    Date-time arguments using either of the ISO 8601 formats must have a time zone designator, where Z is accepted as an abbreviation for UTC time. When a URL is expected (for example, in browsers), the + used in time zone designators must be encoded as %2B. The end time value is exclusive. If you do not specify an end time, the datafeed runs continuously.

  • start string | number

    The time that the datafeed should begin, which can be specified by using the same formats as the end parameter. This value is inclusive. If you do not specify a start time and the datafeed is associated with a new anomaly detection job, the analysis starts from the earliest time for which data is available. If you restart a stopped datafeed and specify a start value that is earlier than the timestamp of the latest processed record, the datafeed continues from 1 millisecond after the timestamp of the latest processed record.

  • timeout string

    Specifies the amount of time to wait until a datafeed starts.

    Values are -1 or 0.

application/json

Body

  • end string | number

    A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

    One of:
  • start string | number

    A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

    One of:
  • timeout string

    A duration. Units can be nanos, micros, ms (milliseconds), s (seconds), m (minutes), h (hours) and d (days). Also accepts "0" without a unit and "-1" to indicate an unspecified value.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • node string | array[string] Required

    • started boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

POST /_ml/datafeeds/{datafeed_id}/_start
POST _ml/datafeeds/datafeed-low_request_rate/_start
{
  "start": "2019-04-07T18:22:16Z"
}
resp = client.ml.start_datafeed(
    datafeed_id="datafeed-low_request_rate",
    start="2019-04-07T18:22:16Z",
)
const response = await client.ml.startDatafeed({
  datafeed_id: "datafeed-low_request_rate",
  start: "2019-04-07T18:22:16Z",
});
response = client.ml.start_datafeed(
  datafeed_id: "datafeed-low_request_rate",
  body: {
    "start": "2019-04-07T18:22:16Z"
  }
)
$resp = $client->ml()->startDatafeed([
    "datafeed_id" => "datafeed-low_request_rate",
    "body" => [
        "start" => "2019-04-07T18:22:16Z",
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"start":"2019-04-07T18:22:16Z"}' "$ELASTICSEARCH_URL/_ml/datafeeds/datafeed-low_request_rate/_start"
Request example
An example body for a `POST _ml/datafeeds/datafeed-low_request_rate/_start` request.
{
  "start": "2019-04-07T18:22:16Z"
}





















Create a data frame analytics job Generally available

PUT /_ml/data_frame/analytics/{id}

This API creates a data frame analytics job that performs an analysis on the source indices and stores the outcome in a destination index. By default, the query used in the source configuration is {"match_all": {}}.

If the destination index does not exist, it is created automatically when you start the job.

If you supply only a subset of the regression or classification parameters, hyperparameter optimization occurs. It determines a value for each of the undefined parameters.

Required authorization

  • Index privileges: create_index,index,manage,read,view_index_metadata
  • Cluster privileges: manage_ml

Path parameters

  • id string Required

    Identifier for the data frame analytics job. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It must start and end with alphanumeric characters.

application/json

Body Required

  • allow_lazy_start boolean

    Specifies whether this job can start when there is insufficient machine learning node capacity for it to be immediately assigned to a node. If set to false and a machine learning node with capacity to run the job cannot be immediately found, the API returns an error. If set to true, the API does not return an error; the job waits in the starting state until sufficient machine learning node capacity is available. This behavior is also affected by the cluster-wide xpack.ml.max_lazy_ml_nodes setting.

  • analysis object Required
    Hide analysis attributes Show analysis attributes object
    • classification object
      Hide classification attributes Show classification attributes object
      • alpha number

        Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

      • dependent_variable string Required

        Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

      • downsample_factor number

        Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

      • early_stopping_enabled boolean

        Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

      • eta number

        Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

      • eta_growth_rate_per_tree number

        Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

      • feature_bag_fraction number

        Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

      • feature_processors array[object]

        Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

        Hide feature_processors attributes Show feature_processors attributes object
        • frequency_encoding object
          Hide frequency_encoding attributes Show frequency_encoding attributes object
          • feature_name string Required
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • frequency_map object Required

            The resulting frequency map for the field value. If the field value is missing from the frequency_map, the resulting value is 0.

        • multi_encoding object
          Hide multi_encoding attribute Show multi_encoding attribute object
          • processors array[number] Required

            The ordered array of custom processors to execute. Must be more than 1.

        • n_gram_encoding object
          Hide n_gram_encoding attributes Show n_gram_encoding attributes object
          • feature_prefix string

            The feature name prefix. Defaults to ngram__.

          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • length number

            Specifies the length of the n-gram substring. Defaults to 50. Must be greater than 0.

          • n_grams array[number] Required

            Specifies which n-grams to gather. It’s an array of integer values where the minimum value is 1, and a maximum value is 5.

          • start number

            Specifies the zero-indexed start of the n-gram substring. Negative values are allowed for encoding n-grams of string suffixes. Defaults to 0.

          • custom boolean
        • one_hot_encoding object
          Hide one_hot_encoding attributes Show one_hot_encoding attributes object
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • hot_map string Required

            The one hot map mapping the field value with the column name.

        • target_mean_encoding object
          Hide target_mean_encoding attributes Show target_mean_encoding attributes object
          • default_value number Required

            The default value if field value is not found in the target_map.

          • feature_name string Required
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_map object Required

            The field value to target mean transition map.

      • gamma number

        Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

      • lambda number

        Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

      • max_optimization_rounds_per_hyperparameter number

        Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

      • max_trees number

        Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

      • num_top_feature_importance_values number

        Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

      • prediction_field_name string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • randomize_seed number

        Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

      • soft_tree_depth_limit number

        Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

      • soft_tree_depth_tolerance number

        Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

      • training_percent string | number

      • class_assignment_objective string
      • num_top_classes number

        Defines the number of categories for which the predicted probabilities are reported. It must be non-negative or -1. If it is -1 or greater than the total number of categories, probabilities are reported for all categories; if you have a large number of categories, there could be a significant effect on the size of your destination index. NOTE: To use the AUC ROC evaluation method, num_top_classes must be set to -1 or a value greater than or equal to the total number of categories.

    • outlier_detection object
      Hide outlier_detection attributes Show outlier_detection attributes object
      • compute_feature_influence boolean

        Specifies whether the feature influence calculation is enabled.

      • feature_influence_threshold number

        The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.

      • method string

        The method that outlier detection uses. Available methods are lof, ldof, distance_kth_nn, distance_knn, and ensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score.

      • n_neighbors number

        Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.

      • outlier_fraction number

        The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.

      • standardization_enabled boolean

        If true, the following operation is performed on the columns before computing outlier scores: (x_i - mean(x_i)) / sd(x_i).

    • regression object
      Hide regression attributes Show regression attributes object
      • alpha number

        Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

      • dependent_variable string Required

        Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

      • downsample_factor number

        Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

      • early_stopping_enabled boolean

        Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

      • eta number

        Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

      • eta_growth_rate_per_tree number

        Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

      • feature_bag_fraction number

        Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

      • feature_processors array[object]

        Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

        Hide feature_processors attributes Show feature_processors attributes object
        • frequency_encoding object
          Hide frequency_encoding attributes Show frequency_encoding attributes object
          • feature_name string Required
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • frequency_map object Required

            The resulting frequency map for the field value. If the field value is missing from the frequency_map, the resulting value is 0.

        • multi_encoding object
          Hide multi_encoding attribute Show multi_encoding attribute object
          • processors array[number] Required

            The ordered array of custom processors to execute. Must be more than 1.

        • n_gram_encoding object
          Hide n_gram_encoding attributes Show n_gram_encoding attributes object
          • feature_prefix string

            The feature name prefix. Defaults to ngram__.

          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • length number

            Specifies the length of the n-gram substring. Defaults to 50. Must be greater than 0.

          • n_grams array[number] Required

            Specifies which n-grams to gather. It’s an array of integer values where the minimum value is 1, and a maximum value is 5.

          • start number

            Specifies the zero-indexed start of the n-gram substring. Negative values are allowed for encoding n-grams of string suffixes. Defaults to 0.

          • custom boolean
        • one_hot_encoding object
          Hide one_hot_encoding attributes Show one_hot_encoding attributes object
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • hot_map string Required

            The one hot map mapping the field value with the column name.

        • target_mean_encoding object
          Hide target_mean_encoding attributes Show target_mean_encoding attributes object
          • default_value number Required

            The default value if field value is not found in the target_map.

          • feature_name string Required
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_map object Required

            The field value to target mean transition map.

      • gamma number

        Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

      • lambda number

        Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

      • max_optimization_rounds_per_hyperparameter number

        Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

      • max_trees number

        Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

      • num_top_feature_importance_values number

        Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

      • prediction_field_name string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • randomize_seed number

        Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

      • soft_tree_depth_limit number

        Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

      • soft_tree_depth_tolerance number

        Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

      • training_percent string | number

      • loss_function string

        The loss function used during regression. Available options are mse (mean squared error), msle (mean squared logarithmic error), huber (Pseudo-Huber loss).

      • loss_function_parameter number

        A positive number that is used as a parameter to the loss_function.

  • analyzed_fields object
    Hide analyzed_fields attributes Show analyzed_fields attributes object
    • includes array[string]

      An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

    • excludes array[string]

      An array of strings that defines the fields that will be included in the analysis.

  • description string

    A description of the job.

  • dest object Required
    Hide dest attributes Show dest attributes object
    • index string Required
    • results_field string

      Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

  • max_num_threads number

    The maximum number of threads to be used by the analysis. Using more threads may decrease the time necessary to complete the analysis at the cost of using more CPU. Note that the process may use additional threads for operational functionality other than the analysis itself.

  • _meta object
    Hide _meta attribute Show _meta attribute object
    • * object Additional properties
  • model_memory_limit string

    The approximate maximum amount of memory resources that are permitted for analytical processing. If your elasticsearch.yml file contains an xpack.ml.max_model_memory_limit setting, an error occurs when you try to create data frame analytics jobs that have model_memory_limit values greater than that setting.

  • source object Required
    Hide source attributes Show source attributes object
    • index string | array[string] Required
    • runtime_mappings object
      Hide runtime_mappings attribute Show runtime_mappings attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • fields object

          For type composite

          Hide fields attribute Show fields attribute object
          • * object Additional properties
            Hide * attribute Show * attribute object
            • type string Required

              Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

        • fetch_fields array[object]

          For type lookup

          Hide fetch_fields attributes Show fetch_fields attributes object
          • field string Required

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • format string
        • format string

          A custom format for date type runtime fields.

        • input_field string

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • target_field string

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • target_index string
        • script object
          Hide script attributes Show script attributes object
          • source string | object

            One of:
          • id string
          • params object

            Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

            Hide params attribute Show params attribute object
            • * object Additional properties
          • lang string

            Any of:

            Values are painless, expression, mustache, or java.

          • options object
            Hide options attribute Show options attribute object
            • * string Additional properties
        • type string Required

          Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

    • _source object
      Hide _source attributes Show _source attributes object
      • includes array[string]

        An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

      • excludes array[string]

        An array of strings that defines the fields that will be included in the analysis.

    • query object

      The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {}}.

      Query DSL
  • headers object
  • version string

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • authorization object
      Hide authorization attributes Show authorization attributes object
      • api_key object
        Hide api_key attributes Show api_key attributes object
        • id string Required

          The identifier for the API key.

        • name string Required

          The name of the API key.

      • roles array[string]

        If a user ID was used for the most recent update to the job, its roles at the time of the update are listed in the response.

      • service_account string

        If a service account was used for the most recent update to the job, the account name is listed in the response.

    • allow_lazy_start boolean Required
    • analysis object Required
      Hide analysis attributes Show analysis attributes object
      • classification object
        Hide classification attributes Show classification attributes object
        • alpha number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

        • dependent_variable string Required

          Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

        • downsample_factor number

          Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

        • early_stopping_enabled boolean

          Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

        • eta number

          Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

        • eta_growth_rate_per_tree number

          Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

        • feature_bag_fraction number

          Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

        • feature_processors array[object]

          Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

          Hide feature_processors attributes Show feature_processors attributes object
          • frequency_encoding object
          • multi_encoding object
          • n_gram_encoding object
          • one_hot_encoding object
          • target_mean_encoding object
        • gamma number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • lambda number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • max_optimization_rounds_per_hyperparameter number

          Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

        • max_trees number

          Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

        • num_top_feature_importance_values number

          Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

        • prediction_field_name string

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • randomize_seed number

          Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

        • soft_tree_depth_limit number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

        • soft_tree_depth_tolerance number

          Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

        • training_percent string | number

        • class_assignment_objective string
        • num_top_classes number

          Defines the number of categories for which the predicted probabilities are reported. It must be non-negative or -1. If it is -1 or greater than the total number of categories, probabilities are reported for all categories; if you have a large number of categories, there could be a significant effect on the size of your destination index. NOTE: To use the AUC ROC evaluation method, num_top_classes must be set to -1 or a value greater than or equal to the total number of categories.

      • outlier_detection object
        Hide outlier_detection attributes Show outlier_detection attributes object
        • compute_feature_influence boolean

          Specifies whether the feature influence calculation is enabled.

        • feature_influence_threshold number

          The minimum outlier score that a document needs to have in order to calculate its feature influence score. Value range: 0-1.

        • method string

          The method that outlier detection uses. Available methods are lof, ldof, distance_kth_nn, distance_knn, and ensemble. The default value is ensemble, which means that outlier detection uses an ensemble of different methods and normalises and combines their individual outlier scores to obtain the overall outlier score.

        • n_neighbors number

          Defines the value for how many nearest neighbors each method of outlier detection uses to calculate its outlier score. When the value is not set, different values are used for different ensemble members. This default behavior helps improve the diversity in the ensemble; only override it if you are confident that the value you choose is appropriate for the data set.

        • outlier_fraction number

          The proportion of the data set that is assumed to be outlying prior to outlier detection. For example, 0.05 means it is assumed that 5% of values are real outliers and 95% are inliers.

        • standardization_enabled boolean

          If true, the following operation is performed on the columns before computing outlier scores: (x_i - mean(x_i)) / sd(x_i).

      • regression object
        Hide regression attributes Show regression attributes object
        • alpha number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This parameter affects loss calculations by acting as a multiplier of the tree depth. Higher alpha values result in shallower trees and faster training times. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to zero.

        • dependent_variable string Required

          Defines which field of the document is to be predicted. It must match one of the fields in the index being used to train. If this field is missing from a document, then that document will not be used for training, but a prediction with the trained model will be generated for it. It is also known as continuous target variable. For classification analysis, the data type of the field must be numeric (integer, short, long, byte), categorical (ip or keyword), or boolean. There must be no more than 30 different values in this field. For regression analysis, the data type of the field must be numeric.

        • downsample_factor number

          Advanced configuration option. Controls the fraction of data that is used to compute the derivatives of the loss function for tree training. A small value results in the use of a small fraction of the data. If this value is set to be less than 1, accuracy typically improves. However, too small a value may result in poor convergence for the ensemble and so require more trees. By default, this value is calculated during hyperparameter optimization. It must be greater than zero and less than or equal to 1.

        • early_stopping_enabled boolean

          Advanced configuration option. Specifies whether the training process should finish if it is not finding any better performing models. If disabled, the training process can take significantly longer and the chance of finding a better performing model is unremarkable.

        • eta number

          Advanced configuration option. The shrinkage applied to the weights. Smaller values result in larger forests which have a better generalization error. However, larger forests cause slower training. By default, this value is calculated during hyperparameter optimization. It must be a value between 0.001 and 1.

        • eta_growth_rate_per_tree number

          Advanced configuration option. Specifies the rate at which eta increases for each new tree that is added to the forest. For example, a rate of 1.05 increases eta by 5% for each extra tree. By default, this value is calculated during hyperparameter optimization. It must be between 0.5 and 2.

        • feature_bag_fraction number

          Advanced configuration option. Defines the fraction of features that will be used when selecting a random bag for each candidate split. By default, this value is calculated during hyperparameter optimization.

        • feature_processors array[object]

          Advanced configuration option. A collection of feature preprocessors that modify one or more included fields. The analysis uses the resulting one or more features instead of the original document field. However, these features are ephemeral; they are not stored in the destination index. Multiple feature_processors entries can refer to the same document fields. Automatic categorical feature encoding still occurs for the fields that are unprocessed by a custom processor or that have categorical values. Use this property only if you want to override the automatic feature encoding of the specified fields.

          Hide feature_processors attributes Show feature_processors attributes object
          • frequency_encoding object
          • multi_encoding object
          • n_gram_encoding object
          • one_hot_encoding object
          • target_mean_encoding object
        • gamma number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies a linear penalty associated with the size of individual trees in the forest. A high gamma value causes training to prefer small trees. A small gamma value results in larger individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • lambda number

          Advanced configuration option. Regularization parameter to prevent overfitting on the training data set. Multiplies an L2 regularization term which applies to leaf weights of the individual trees in the forest. A high lambda value causes training to favor small leaf weights. This behavior makes the prediction function smoother at the expense of potentially not being able to capture relevant relationships between the features and the dependent variable. A small lambda value results in large individual trees and slower training. By default, this value is calculated during hyperparameter optimization. It must be a nonnegative value.

        • max_optimization_rounds_per_hyperparameter number

          Advanced configuration option. A multiplier responsible for determining the maximum number of hyperparameter optimization steps in the Bayesian optimization procedure. The maximum number of steps is determined based on the number of undefined hyperparameters times the maximum optimization rounds per hyperparameter. By default, this value is calculated during hyperparameter optimization.

        • max_trees number

          Advanced configuration option. Defines the maximum number of decision trees in the forest. The maximum value is 2000. By default, this value is calculated during hyperparameter optimization.

        • num_top_feature_importance_values number

          Advanced configuration option. Specifies the maximum number of feature importance values per document to return. By default, no feature importance calculation occurs.

        • prediction_field_name string

          Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

        • randomize_seed number

          Defines the seed for the random generator that is used to pick training data. By default, it is randomly generated. Set it to a specific value to use the same training data each time you start a job (assuming other related parameters such as source and analyzed_fields are the same).

        • soft_tree_depth_limit number

          Advanced configuration option. Machine learning uses loss guided tree growing, which means that the decision trees grow where the regularized loss decreases most quickly. This soft limit combines with the soft_tree_depth_tolerance to penalize trees that exceed the specified depth; the regularized loss increases quickly beyond this depth. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.

        • soft_tree_depth_tolerance number

          Advanced configuration option. This option controls how quickly the regularized loss increases when the tree depth exceeds soft_tree_depth_limit. By default, this value is calculated during hyperparameter optimization. It must be greater than or equal to 0.01.

        • training_percent string | number

        • loss_function string

          The loss function used during regression. Available options are mse (mean squared error), msle (mean squared logarithmic error), huber (Pseudo-Huber loss).

        • loss_function_parameter number

          A positive number that is used as a parameter to the loss_function.

    • analyzed_fields object
      Hide analyzed_fields attributes Show analyzed_fields attributes object
      • includes array[string]

        An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

      • excludes array[string]

        An array of strings that defines the fields that will be included in the analysis.

    • create_time number

      Time unit for milliseconds

    • description string
    • dest object Required
      Hide dest attributes Show dest attributes object
      • index string Required
      • results_field string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

    • id string Required
    • max_num_threads number Required
    • _meta object
      Hide _meta attribute Show _meta attribute object
      • * object Additional properties
    • model_memory_limit string Required
    • source object Required
      Hide source attributes Show source attributes object
      • index string | array[string] Required
      • runtime_mappings object
        Hide runtime_mappings attribute Show runtime_mappings attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • fields object

            For type composite

            Hide fields attribute Show fields attribute object
            • * object Additional properties
              Hide * attribute Show * attribute object
              • type string Required

                Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

          • fetch_fields array[object]

            For type lookup

            Hide fetch_fields attributes Show fetch_fields attributes object
            • field string Required

              Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

            • format string
          • format string

            A custom format for date type runtime fields.

          • input_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • target_index string
          • script object
            Hide script attributes Show script attributes object
            • id string
            • params object

              Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

              Hide params attribute Show params attribute object
              • * object Additional properties
            • lang string

              Any of:

              Values are painless, expression, mustache, or java.

            • options object
              Hide options attribute Show options attribute object
              • * string Additional properties
          • type string Required

            Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

      • _source object
        Hide _source attributes Show _source attributes object
        • includes array[string]

          An array of strings that defines the fields that will be excluded from the analysis. You do not need to add fields with unsupported data types to excludes, these fields are excluded from the analysis automatically.

        • excludes array[string]

          An array of strings that defines the fields that will be included in the analysis.

      • query object

        The Elasticsearch query domain-specific language (DSL). This value corresponds to the query object in an Elasticsearch search POST body. All the options that are supported by Elasticsearch can be used, as this object is passed verbatim to Elasticsearch. By default, this property has the following value: {"match_all": {}}.

        Query DSL
    • version string Required
PUT /_ml/data_frame/analytics/{id}
PUT _ml/data_frame/analytics/model-flight-delays-pre
{
  "source": {
    "index": [
      "kibana_sample_data_flights"
    ],
    "query": {
      "range": {
        "DistanceKilometers": {
          "gt": 0
        }
      }
    },
    "_source": {
      "includes": [],
      "excludes": [
        "FlightDelay",
        "FlightDelayType"
      ]
    }
  },
  "dest": {
    "index": "df-flight-delays",
    "results_field": "ml-results"
  },
  "analysis": {
  "regression": {
    "dependent_variable": "FlightDelayMin",
    "training_percent": 90
    }
  },
  "analyzed_fields": {
    "includes": [],
    "excludes": [
      "FlightNum"
    ]
  },
  "model_memory_limit": "100mb"
}
resp = client.ml.put_data_frame_analytics(
    id="model-flight-delays-pre",
    source={
        "index": [
            "kibana_sample_data_flights"
        ],
        "query": {
            "range": {
                "DistanceKilometers": {
                    "gt": 0
                }
            }
        },
        "_source": {
            "includes": [],
            "excludes": [
                "FlightDelay",
                "FlightDelayType"
            ]
        }
    },
    dest={
        "index": "df-flight-delays",
        "results_field": "ml-results"
    },
    analysis={
        "regression": {
            "dependent_variable": "FlightDelayMin",
            "training_percent": 90
        }
    },
    analyzed_fields={
        "includes": [],
        "excludes": [
            "FlightNum"
        ]
    },
    model_memory_limit="100mb",
)
const response = await client.ml.putDataFrameAnalytics({
  id: "model-flight-delays-pre",
  source: {
    index: ["kibana_sample_data_flights"],
    query: {
      range: {
        DistanceKilometers: {
          gt: 0,
        },
      },
    },
    _source: {
      includes: [],
      excludes: ["FlightDelay", "FlightDelayType"],
    },
  },
  dest: {
    index: "df-flight-delays",
    results_field: "ml-results",
  },
  analysis: {
    regression: {
      dependent_variable: "FlightDelayMin",
      training_percent: 90,
    },
  },
  analyzed_fields: {
    includes: [],
    excludes: ["FlightNum"],
  },
  model_memory_limit: "100mb",
});
response = client.ml.put_data_frame_analytics(
  id: "model-flight-delays-pre",
  body: {
    "source": {
      "index": [
        "kibana_sample_data_flights"
      ],
      "query": {
        "range": {
          "DistanceKilometers": {
            "gt": 0
          }
        }
      },
      "_source": {
        "includes": [],
        "excludes": [
          "FlightDelay",
          "FlightDelayType"
        ]
      }
    },
    "dest": {
      "index": "df-flight-delays",
      "results_field": "ml-results"
    },
    "analysis": {
      "regression": {
        "dependent_variable": "FlightDelayMin",
        "training_percent": 90
      }
    },
    "analyzed_fields": {
      "includes": [],
      "excludes": [
        "FlightNum"
      ]
    },
    "model_memory_limit": "100mb"
  }
)
$resp = $client->ml()->putDataFrameAnalytics([
    "id" => "model-flight-delays-pre",
    "body" => [
        "source" => [
            "index" => array(
                "kibana_sample_data_flights",
            ),
            "query" => [
                "range" => [
                    "DistanceKilometers" => [
                        "gt" => 0,
                    ],
                ],
            ],
            "_source" => [
                "includes" => array(
                ),
                "excludes" => array(
                    "FlightDelay",
                    "FlightDelayType",
                ),
            ],
        ],
        "dest" => [
            "index" => "df-flight-delays",
            "results_field" => "ml-results",
        ],
        "analysis" => [
            "regression" => [
                "dependent_variable" => "FlightDelayMin",
                "training_percent" => 90,
            ],
        ],
        "analyzed_fields" => [
            "includes" => array(
            ),
            "excludes" => array(
                "FlightNum",
            ),
        ],
        "model_memory_limit" => "100mb",
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"source":{"index":["kibana_sample_data_flights"],"query":{"range":{"DistanceKilometers":{"gt":0}}},"_source":{"includes":[],"excludes":["FlightDelay","FlightDelayType"]}},"dest":{"index":"df-flight-delays","results_field":"ml-results"},"analysis":{"regression":{"dependent_variable":"FlightDelayMin","training_percent":90}},"analyzed_fields":{"includes":[],"excludes":["FlightNum"]},"model_memory_limit":"100mb"}' "$ELASTICSEARCH_URL/_ml/data_frame/analytics/model-flight-delays-pre"
Request example
An example body for a `PUT _ml/data_frame/analytics/model-flight-delays-pre` request.
{
  "source": {
    "index": [
      "kibana_sample_data_flights"
    ],
    "query": {
      "range": {
        "DistanceKilometers": {
          "gt": 0
        }
      }
    },
    "_source": {
      "includes": [],
      "excludes": [
        "FlightDelay",
        "FlightDelayType"
      ]
    }
  },
  "dest": {
    "index": "df-flight-delays",
    "results_field": "ml-results"
  },
  "analysis": {
  "regression": {
    "dependent_variable": "FlightDelayMin",
    "training_percent": 90
    }
  },
  "analyzed_fields": {
    "includes": [],
    "excludes": [
      "FlightNum"
    ]
  },
  "model_memory_limit": "100mb"
}




Evaluate data frame analytics Generally available

POST /_ml/data_frame/_evaluate

The API packages together commonly used evaluation metrics for various types of machine learning features. This has been designed for use on indexes created by data frame analytics. Evaluation requires both a ground truth field and an analytics result field to be present.

Required authorization

  • Cluster privileges: monitor_ml
application/json

Body Required

  • evaluation object Required
    Hide evaluation attributes Show evaluation attributes object
    • classification object
      Hide classification attributes Show classification attributes object
      • actual_field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • predicted_field string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • top_classes_field string

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • metrics object
        Hide metrics attributes Show metrics attributes object
        • auc_roc object
          Hide auc_roc attributes Show auc_roc attributes object
          • class_name string
          • include_curve boolean

            Whether or not the curve should be returned in addition to the score. Default value is false.

        • precision object

          Precision of predictions (per-class and average).

          Hide precision attribute Show precision attribute object
          • * object Additional properties
        • recall object

          Recall of predictions (per-class and average).

          Hide recall attribute Show recall attribute object
          • * object Additional properties
        • accuracy object

          Accuracy of predictions (per-class and overall).

          Hide accuracy attribute Show accuracy attribute object
          • * object Additional properties
        • multiclass_confusion_matrix object

          Multiclass confusion matrix.

          Hide multiclass_confusion_matrix attribute Show multiclass_confusion_matrix attribute object
          • * object Additional properties
    • outlier_detection object
      Hide outlier_detection attributes Show outlier_detection attributes object
      • actual_field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • predicted_probability_field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • metrics object
        Hide metrics attributes Show metrics attributes object
        • auc_roc object
          Hide auc_roc attributes Show auc_roc attributes object
          • class_name string
          • include_curve boolean

            Whether or not the curve should be returned in addition to the score. Default value is false.

        • precision object

          Precision of predictions (per-class and average).

          Hide precision attribute Show precision attribute object
          • * object Additional properties
        • recall object

          Recall of predictions (per-class and average).

          Hide recall attribute Show recall attribute object
          • * object Additional properties
        • confusion_matrix object

          Accuracy of predictions (per-class and overall).

          Hide confusion_matrix attribute Show confusion_matrix attribute object
          • * object Additional properties
    • regression object
      Hide regression attributes Show regression attributes object
      • actual_field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • predicted_field string Required

        Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

      • metrics object
        Hide metrics attributes Show metrics attributes object
        • mse object

          Average squared difference between the predicted values and the actual (ground truth) value. For more information, read this wiki article.

          Hide mse attribute Show mse attribute object
          • * object Additional properties
        • msle object
          Hide msle attribute Show msle attribute object
          • offset number

            Defines the transition point at which you switch from minimizing quadratic error to minimizing quadratic log error. Defaults to 1.

        • huber object
          Hide huber attribute Show huber attribute object
          • delta number

            Approximates 1/2 (prediction - actual)2 for values much less than delta and approximates a straight line with slope delta for values much larger than delta. Defaults to 1. Delta needs to be greater than 0.

        • r_squared object

          Proportion of the variance in the dependent variable that is predictable from the independent variables.

          Hide r_squared attribute Show r_squared attribute object
          • * object Additional properties
  • index string Required
  • query object

    An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

    External documentation

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • classification object
      Hide classification attributes Show classification attributes object
      • auc_roc object
        Hide auc_roc attributes Show auc_roc attributes object
        • value number Required
        • curve array[object]
          Hide curve attributes Show curve attributes object
          • tpr number Required
          • fpr number Required
          • threshold number Required
      • accuracy object
        Hide accuracy attributes Show accuracy attributes object
        • classes array[object] Required
          Hide classes attributes Show classes attributes object
          • value number Required
          • class_name string Required
        • overall_accuracy number Required
      • multiclass_confusion_matrix object
        Hide multiclass_confusion_matrix attributes Show multiclass_confusion_matrix attributes object
        • confusion_matrix array[object] Required
          Hide confusion_matrix attributes Show confusion_matrix attributes object
          • actual_class string Required
          • actual_class_doc_count number Required
          • predicted_classes array[object] Required
          • other_predicted_class_doc_count number Required
        • other_actual_class_count number Required
      • precision object
        Hide precision attributes Show precision attributes object
        • classes array[object] Required
          Hide classes attributes Show classes attributes object
          • value number Required
          • class_name string Required
        • avg_precision number Required
      • recall object
        Hide recall attributes Show recall attributes object
        • classes array[object] Required
          Hide classes attributes Show classes attributes object
          • value number Required
          • class_name string Required
        • avg_recall number Required
    • outlier_detection object
      Hide outlier_detection attributes Show outlier_detection attributes object
      • auc_roc object
        Hide auc_roc attributes Show auc_roc attributes object
        • value number Required
        • curve array[object]
          Hide curve attributes Show curve attributes object
          • tpr number Required
          • fpr number Required
          • threshold number Required
      • precision object

        Set the different thresholds of the outlier score at where the metric is calculated.

        Hide precision attribute Show precision attribute object
        • * number Additional properties
      • recall object

        Set the different thresholds of the outlier score at where the metric is calculated.

        Hide recall attribute Show recall attribute object
        • * number Additional properties
      • confusion_matrix object

        Set the different thresholds of the outlier score at where the metrics (tp - true positive, fp - false positive, tn - true negative, fn - false negative) are calculated.

        Hide confusion_matrix attribute Show confusion_matrix attribute object
        • * object Additional properties
          Hide * attributes Show * attributes object
          • tp number Required

            True Positive

          • fp number Required

            False Positive

          • tn number Required

            True Negative

          • fn number Required

            False Negative

    • regression object
      Hide regression attributes Show regression attributes object
      • huber object
        Hide huber attribute Show huber attribute object
        • value number Required
      • mse object
        Hide mse attribute Show mse attribute object
        • value number Required
      • msle object
        Hide msle attribute Show msle attribute object
        • value number Required
      • r_squared object
        Hide r_squared attribute Show r_squared attribute object
        • value number Required
POST /_ml/data_frame/_evaluate
POST _ml/data_frame/_evaluate
{
  "index": "animal_classification",
  "evaluation": {
    "classification": {
      "actual_field": "animal_class",
      "predicted_field": "ml.animal_class_prediction",
      "metrics": {
        "multiclass_confusion_matrix": {}
      }
    }
  }
}
resp = client.ml.evaluate_data_frame(
    index="animal_classification",
    evaluation={
        "classification": {
            "actual_field": "animal_class",
            "predicted_field": "ml.animal_class_prediction",
            "metrics": {
                "multiclass_confusion_matrix": {}
            }
        }
    },
)
const response = await client.ml.evaluateDataFrame({
  index: "animal_classification",
  evaluation: {
    classification: {
      actual_field: "animal_class",
      predicted_field: "ml.animal_class_prediction",
      metrics: {
        multiclass_confusion_matrix: {},
      },
    },
  },
});
response = client.ml.evaluate_data_frame(
  body: {
    "index": "animal_classification",
    "evaluation": {
      "classification": {
        "actual_field": "animal_class",
        "predicted_field": "ml.animal_class_prediction",
        "metrics": {
          "multiclass_confusion_matrix": {}
        }
      }
    }
  }
)
$resp = $client->ml()->evaluateDataFrame([
    "body" => [
        "index" => "animal_classification",
        "evaluation" => [
            "classification" => [
                "actual_field" => "animal_class",
                "predicted_field" => "ml.animal_class_prediction",
                "metrics" => [
                    "multiclass_confusion_matrix" => new ArrayObject([]),
                ],
            ],
        ],
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"index":"animal_classification","evaluation":{"classification":{"actual_field":"animal_class","predicted_field":"ml.animal_class_prediction","metrics":{"multiclass_confusion_matrix":{}}}}}' "$ELASTICSEARCH_URL/_ml/data_frame/_evaluate"
Run `POST _ml/data_frame/_evaluate` to evaluate a a classification job for an annotated index. The `actual_field` contains the ground truth for classification. The `predicted_field` contains the predicted value calculated by the classification analysis.
{
  "index": "animal_classification",
  "evaluation": {
    "classification": {
      "actual_field": "animal_class",
      "predicted_field": "ml.animal_class_prediction",
      "metrics": {
        "multiclass_confusion_matrix": {}
      }
    }
  }
}
Run `POST _ml/data_frame/_evaluate` to evaluate a classification job with AUC ROC metrics for an annotated index. The `actual_field` contains the ground truth value for the actual animal classification. This is required in order to evaluate results. The `class_name` specifies the class name that is treated as positive during the evaluation, all the other classes are treated as negative.
{
  "index": "animal_classification",
  "evaluation": {
    "classification": {
      "actual_field": "animal_class",
      "metrics": {
        "auc_roc": {
          "class_name": "dog"
        }
      }
    }
  }
}
Run `POST _ml/data_frame/_evaluate` to evaluate an outlier detection job for an annotated index.
{
  "index": "my_analytics_dest_index",
  "evaluation": {
    "outlier_detection": {
      "actual_field": "is_outlier",
      "predicted_probability_field": "ml.outlier_score"
    }
  }
}
Run `POST _ml/data_frame/_evaluate` to evaluate the testing error of a regression job for an annotated index. The term query in the body limits evaluation to be performed on the test split only. The `actual_field` contains the ground truth for house prices. The `predicted_field` contains the house price calculated by the regression analysis.
{
  "index": "house_price_predictions",
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "ml.is_training": false
          }
        }
      ]
    }
  },
  "evaluation": {
    "regression": {
      "actual_field": "price",
      "predicted_field": "ml.price_prediction",
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {
          "offset": 10
        },
        "huber": {
          "delta": 1.5
        }
      }
    }
  }
}
Run `POST _ml/data_frame/_evaluate` to evaluate the training error of a regression job for an annotated index. The term query in the body limits evaluation to be performed on the training split only. The `actual_field` contains the ground truth for house prices. The `predicted_field` contains the house price calculated by the regression analysis.
{
  "index": "house_price_predictions",
  "query": {
    "term": {
      "ml.is_training": {
        "value": true
      }
    }
  },
  "evaluation": {
    "regression": {
      "actual_field": "price",
      "predicted_field": "ml.price_prediction",
      "metrics": {
        "r_squared": {},
        "mse": {},
        "msle": {},
        "huber": {}
      }
    }
  }
}
A succesful response from `POST _ml/data_frame/_evaluate` to evaluate a classification analysis job for an annotated index. The `actual_class` contains the name of the class the analysis tried to predict. The `actual_class_doc_count` is the number of documents in the index belonging to the `actual_class`. The `predicted_classes` object contains the list of the predicted classes and the number of predictions associated with the class.
{
  "classification": {
    "multiclass_confusion_matrix": {
      "confusion_matrix": [
        {
          "actual_class": "cat",
          "actual_class_doc_count": 12,
          "predicted_classes": [
            {
              "predicted_class": "cat",
              "count": 12
            },
            {
              "predicted_class": "dog",
              "count": 0
            }
          ],
          "other_predicted_class_doc_count": 0
        },
        {
          "actual_class": "dog",
          "actual_class_doc_count": 11,
          "predicted_classes": [
            {
              "predicted_class": "dog",
              "count": 7
            },
            {
              "predicted_class": "cat",
              "count": 4
            }
          ],
          "other_predicted_class_doc_count": 0
        }
      ],
      "other_actual_class_count": 0
    }
  }
}
A succesful response from `POST _ml/data_frame/_evaluate` to evaluate a classification analysis job with the AUC ROC metrics for an annotated index.
{
  "classification": {
    "auc_roc": {
      "value": 0.8941788639536681
    }
  }
}
A successful response from `POST _ml/data_frame/_evaluate` to evaluate an outlier detection job.
{
  "outlier_detection": {
    "auc_roc": {
      "value": 0.9258475774641445
    },
    "confusion_matrix": {
      "0.25": {
        "tp": 5,
        "fp": 9,
        "tn": 204,
        "fn": 5
      },
      "0.5": {
        "tp": 1,
        "fp": 5,
        "tn": 208,
        "fn": 9
      },
      "0.75": {
        "tp": 0,
        "fp": 4,
        "tn": 209,
        "fn": 10
      }
    },
    "precision": {
      "0.25": 0.35714285714285715,
      "0.5": 0.16666666666666666,
      "0.75": 0
    },
    "recall": {
      "0.25": 0.5,
      "0.5": 0.1,
      "0.75": 0
    }
  }
}









































Get trained model configuration info Generally available

GET /_ml/trained_models/{model_id}

Required authorization

  • Cluster privileges: monitor_ml

Path parameters

  • model_id string | array[string] Required

    The unique identifier of the trained model or a model alias.

    You can get information for multiple trained models in a single API request by using a comma-separated list of model IDs or a wildcard expression.

Query parameters

  • allow_no_match boolean

    Specifies what to do when the request:

    • Contains wildcard expressions and there are no models that match.
    • Contains the _all string or no identifiers and there are no matches.
    • Contains wildcard expressions and there are only partial matches.

    If true, it returns an empty array when there are no matches and the subset of results when there are partial matches.

  • decompress_definition boolean

    Specifies whether the included model definition should be returned as a JSON map (true) or in a custom compressed format (false).

  • exclude_generated boolean

    Indicates if certain fields should be removed from the configuration on retrieval. This allows the configuration to be in an acceptable format to be retrieved and then added to another cluster.

  • from number

    Skips the specified number of models.

  • include string

    A comma delimited string of optional fields to include in the response body.

    Values are definition, feature_importance_baseline, hyperparameters, total_feature_importance, or definition_status.

  • size number

    Specifies the maximum number of models to obtain.

  • tags string | array[string]

    A comma delimited string of tags. A trained model can have many tags, or none. When supplied, only trained models that contain all the supplied tags are returned.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • count number Required
    • trained_model_configs array[object] Required

      An array of trained model resources, which are sorted by the model_id value in ascending order.

      Hide trained_model_configs attributes Show trained_model_configs attributes object
      • model_id string Required
      • model_type string

        Values are tree_ensemble, lang_ident, or pytorch.

      • tags array[string] Required

        A comma delimited string of tags. A trained model can have many tags, or none.

      • version string
      • compressed_definition string
      • created_by string

        Information on the creator of the trained model.

      • create_time string | number

        A date and time, either as a string whose format can depend on the context (defaulting to ISO 8601), or a number of milliseconds since the Epoch. Elasticsearch accepts both as input, but will generally output a string representation.

        One of:
      • default_field_map object

        Any field map described in the inference configuration takes precedence.

        Hide default_field_map attribute Show default_field_map attribute object
        • * string Additional properties
      • description string

        The free-text description of the trained model.

      • estimated_heap_memory_usage_bytes number

        The estimated heap usage in bytes to keep the trained model in memory.

      • estimated_operations number

        The estimated number of operations to use the trained model.

      • fully_defined boolean

        True if the full model definition is present.

      • inference_config object

        Inference configuration provided when storing the model config

        Hide inference_config attributes Show inference_config attributes object
        • regression object
          Hide regression attributes Show regression attributes object
          • results_field string

            Path to field or array of paths. Some API's support wildcards in the path to select multiple fields.

          • num_top_feature_importance_values number

            Specifies the maximum number of feature importance values per document.

        • classification object
          Hide classification attributes Show classification attributes object
          • num_top_classes number

            Specifies the number of top class predictions to return. Defaults to 0.

          • num_top_feature_importance_values number

            Specifies the maximum number of feature importance values per document.

          • prediction_field_type string

            Specifies the type of the predicted field to write. Acceptable values are: string, number, boolean. When boolean is provided 1.0 is transformed to true and 0.0 to false.

          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • top_classes_results_field string

            Specifies the field to which the top classes are written. Defaults to top_classes.

        • text_classification object

          Text classification configuration options

          Hide text_classification attributes Show text_classification attributes object
          • num_top_classes number

            Specifies the number of top class predictions to return. Defaults to 0.

          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • classification_labels array[string]

            Classification labels to apply other than the stored labels. Must have the same deminsions as the default configured labels

          • vocabulary object
            Hide vocabulary attribute Show vocabulary attribute object
            • index string Required
        • zero_shot_classification object

          Zero shot classification configuration options

          Hide zero_shot_classification attributes Show zero_shot_classification attributes object
          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • hypothesis_template string

            Hypothesis template used when tokenizing labels for prediction

          • classification_labels array[string] Required

            The zero shot classification labels indicating entailment, neutral, and contradiction Must contain exactly and only entailment, neutral, and contradiction

          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • multi_label boolean

            Indicates if more than one true label exists.

          • labels array[string]

            The labels to predict.

        • fill_mask object

          Fill mask inference options

          Hide fill_mask attributes Show fill_mask attributes object
          • mask_token string

            The string/token which will be removed from incoming documents and replaced with the inference prediction(s). In a response, this field contains the mask token for the specified model/tokenizer. Each model and tokenizer has a predefined mask token which cannot be changed. Thus, it is recommended not to set this value in requests. However, if this field is present in a request, its value must match the predefined value for that model/tokenizer, otherwise the request will fail.

          • num_top_classes number

            Specifies the number of top class predictions to return. Defaults to 0.

          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • vocabulary object Required
            Hide vocabulary attribute Show vocabulary attribute object
            • index string Required
        • learning_to_rank object
          Hide learning_to_rank attributes Show learning_to_rank attributes object
          • default_params object
            Hide default_params attribute Show default_params attribute object
            • * object Additional properties
          • feature_extractors array[object]
          • num_top_feature_importance_values number Required
        • ner object

          Named entity recognition options

          Hide ner attributes Show ner attributes object
          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • classification_labels array[string]

            The token classification labels. Must be IOB formatted tags

          • vocabulary object
            Hide vocabulary attribute Show vocabulary attribute object
            • index string Required
        • pass_through object

          Pass through configuration options

          Hide pass_through attributes Show pass_through attributes object
          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • vocabulary object
            Hide vocabulary attribute Show vocabulary attribute object
            • index string Required
        • text_embedding object

          Text embedding inference options

          Hide text_embedding attributes Show text_embedding attributes object
          • embedding_size number

            The number of dimensions in the embedding output

          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • vocabulary object Required
            Hide vocabulary attribute Show vocabulary attribute object
            • index string Required
        • text_expansion object

          Text expansion inference options

          Hide text_expansion attributes Show text_expansion attributes object
          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • vocabulary object Required
            Hide vocabulary attribute Show vocabulary attribute object
            • index string Required
        • question_answering object

          Question answering inference options

          Hide question_answering attributes Show question_answering attributes object
          • num_top_classes number

            Specifies the number of top class predictions to return. Defaults to 0.

          • tokenization object

            Tokenization options stored in inference configuration

            Hide tokenization attributes Show tokenization attributes object
            • bert
            • bert_ja
            • mpnet
            • roberta
            • xlm_roberta
          • results_field string

            The field that is added to incoming documents to contain the inference prediction. Defaults to predicted_value.

          • max_answer_length number

            The maximum answer length to consider

      • input object Required
        Hide input attribute Show input attribute object
        • field_names array[string] Required

          An array of input field names for the model.

      • license_level string

        The license level of the trained model.

      • metadata object
        Hide metadata attributes Show metadata attributes object
        • model_aliases array[string]
        • feature_importance_baseline object

          An object that contains the baseline for feature importance values. For regression analysis, it is a single value. For classification analysis, there is a value for each class.

          Hide feature_importance_baseline attribute Show feature_importance_baseline attribute object
          • * string Additional properties
        • hyperparameters array[object]

          List of the available hyperparameters optimized during the fine_parameter_tuning phase as well as specified by the user.

          Hide hyperparameters attributes Show hyperparameters attributes object
          • absolute_importance number

            A positive number showing how much the parameter influences the variation of the loss function. For hyperparameters with values that are not specified by the user but tuned during hyperparameter optimization.

          • name string Required
          • relative_importance number

            A number between 0 and 1 showing the proportion of influence on the variation of the loss function among all tuned hyperparameters. For hyperparameters with values that are not specified by the user but tuned during hyperparameter optimization.

          • supplied boolean Required

            Indicates if the hyperparameter is specified by the user (true) or optimized (false).

          • value number Required

            The value of the hyperparameter, either optimized or specified by the user.

        • total_feature_importance array[object]

          An array of the total feature importance for each feature used from the training data set. This array of objects is returned if data frame analytics trained the model and the request includes total_feature_importance in the include request parameter.

          Hide total_feature_importance attributes Show total_feature_importance attributes object
          • feature_name string Required
          • importance array[object] Required

            A collection of feature importance statistics related to the training data set for this particular feature.

          • classes array[object] Required

            If the trained model is a classification model, feature importance statistics are gathered per target class value.

      • model_size_bytes number | string

      • model_package object
        Hide model_package attributes Show model_package attributes object
        • create_time number

          Time unit for milliseconds

        • description string
        • inference_config object
          Hide inference_config attribute Show inference_config attribute object
          • * object Additional properties
        • metadata object
          Hide metadata attribute Show metadata attribute object
          • * object Additional properties
        • minimum_version string
        • model_repository string
        • model_type string
        • packaged_model_id string Required
        • platform_architecture string
        • prefix_strings object
          Hide prefix_strings attributes Show prefix_strings attributes object
          • ingest string

            String prepended to input at ingest

        • size number | string

        • sha256 string
        • tags array[string]
        • vocabulary_file string
      • location object
        Hide location attribute Show location attribute object
        • index object Required
          Hide index attribute Show index attribute object
          • name string Required
      • platform_architecture string
      • prefix_strings object
        Hide prefix_strings attributes Show prefix_strings attributes object
        • ingest string

          String prepended to input at ingest

GET /_ml/trained_models/{model_id}
GET _ml/trained_models/
resp = client.ml.get_trained_models()
const response = await client.ml.getTrainedModels();
response = client.ml.get_trained_models
$resp = $client->ml()->getTrainedModels();
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_ml/trained_models/"

































































Get a query ruleset Generally available

GET /_query_rules/{ruleset_id}

Get details about a query ruleset.

Required authorization

  • Cluster privileges: manage_search_query_rules

Path parameters

  • ruleset_id string Required

    The unique identifier of the query ruleset

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • ruleset_id string Required
    • rules array[object] Required

      Rules associated with the query ruleset.

      Hide rules attributes Show rules attributes object
      • rule_id string Required
      • type string Required

        Values are pinned or exclude.

      • criteria object | array[object] Required

        The criteria that must be met for the rule to be applied. If multiple criteria are specified for a rule, all criteria must be met for the rule to be applied.

        One of:
        Hide attributes Show attributes
        • type string Required

          Values are global, exact, exact_fuzzy, fuzzy, prefix, suffix, contains, lt, lte, gt, gte, or always.

        • metadata string

          The metadata field to match against. This metadata will be used to match against match_criteria sent in the rule. It is required for all criteria types except always.

        • values array[object]

          The values to match against the metadata field. Only one value must match for the criteria to be met. It is required for all criteria types except always.

      • actions object Required
        Hide actions attributes Show actions attributes object
        • ids array[string]

          The unique document IDs of the documents to apply the rule to. Only one of ids or docs may be specified and at least one must be specified.

        • docs array[object]

          The documents to apply the rule to. Only one of ids or docs may be specified and at least one must be specified. There is a maximum value of 100 documents in a rule. You can specify the following attributes for each document:

          • _index: The index of the document to pin.
          • _id: The unique document ID.
          Hide docs attributes Show docs attributes object
          • _id string Required
          • _index string
      • priority number
GET /_query_rules/{ruleset_id}
GET _query_rules/my-ruleset/
resp = client.query_rules.get_ruleset(
    ruleset_id="my-ruleset",
)
const response = await client.queryRules.getRuleset({
  ruleset_id: "my-ruleset",
});
response = client.query_rules.get_ruleset(
  ruleset_id: "my-ruleset"
)
$resp = $client->queryRules()->getRuleset([
    "ruleset_id" => "my-ruleset",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_query_rules/my-ruleset/"
Response examples (200)
A successful response from `GET _query_rules/my-ruleset/`.
{
    "ruleset_id": "my-ruleset",
    "rules": [
        {
            "rule_id": "my-rule1",
            "type": "pinned",
            "criteria": [
                {
                    "type": "contains",
                    "metadata": "query_string",
                    "values": [ "pugs", "puggles" ]
                }
            ],
            "actions": {
                "ids": [
                    "id1",
                    "id2"
                ]
            }
        },
        {
            "rule_id": "my-rule2",
            "type": "pinned",
            "criteria": [
                {
                    "type": "fuzzy",
                    "metadata": "query_string",
                    "values": [ "rescue dogs" ]
                }
            ],
            "actions": {
                "docs": [
                    {
                        "_index": "index1",
                        "_id": "id3"
                    },
                    {
                        "_index": "index2",
                        "_id": "id4"
                    }
                ]
            }
        }
    ]
}

















Get a script or search template Generally available

GET /_scripts/{id}

Retrieves a stored script or search template.

Required authorization

  • Cluster privileges: manage

Path parameters

  • id string Required

    The identifier for the stored script or search template.

Query parameters

  • master_timeout string

    The period to wait for the master node. If the master node is not available before the timeout expires, the request fails and returns an error. It can also be set to -1 to indicate that the request should never timeout.

    Values are -1 or 0.

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • _id string Required
    • found boolean Required
    • script object
      Hide script attributes Show script attributes object
      • lang string Required

        Any of:

        Values are painless, expression, mustache, or java.

      • options object
        Hide options attribute Show options attribute object
        • * string Additional properties
      • source string | object Required

        One of:
GET _scripts/my-search-template
resp = client.get_script(
    id="my-search-template",
)
const response = await client.getScript({
  id: "my-search-template",
});
response = client.get_script(
  id: "my-search-template"
)
$resp = $client->getScript([
    "id" => "my-search-template",
]);
curl -X GET -H "Authorization: ApiKey $ELASTIC_API_KEY" "$ELASTICSEARCH_URL/_scripts/my-search-template"




















Run a script Technical preview

GET /_scripts/painless/_execute

Runs a script and returns a result. Use this API to build and test scripts, such as when defining a script for a runtime field. This API requires very few dependencies and is especially useful if you don't have permissions to write documents on a cluster.

The API uses several contexts, which control how scripts are run, what variables are available at runtime, and what the return type is.

Each context requires a script, but additional parameters depend on the context you're using for that script.

application/json

Body

  • context string

    Values are painless_test, filter, score, boolean_field, date_field, double_field, geo_point_field, ip_field, keyword_field, long_field, or composite_field.

  • context_setup object
    Hide context_setup attributes Show context_setup attributes object
    • document object Required

      Document that's temporarily indexed in-memory and accessible from the script.

    • index string Required
    • query object

      An Elasticsearch Query DSL (Domain Specific Language) object that defines a query.

      External documentation
  • script object
    Hide script attributes Show script attributes object
    • source string | object

      One of:
    • id string
    • params object

      Specifies any named parameters that are passed into the script as variables. Use parameters instead of hard-coded values to decrease compile time.

      Hide params attribute Show params attribute object
      • * object Additional properties
    • lang string

      Any of:

      Values are painless, expression, mustache, or java.

    • options object
      Hide options attribute Show options attribute object
      • * string Additional properties

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • result object Required
GET /_scripts/painless/_execute
POST /_scripts/painless/_execute
{
  "script": {
    "source": "params.count / params.total",
    "params": {
      "count": 100.0,
      "total": 1000.0
    }
  }
}
resp = client.scripts_painless_execute(
    script={
        "source": "params.count / params.total",
        "params": {
            "count": 100,
            "total": 1000
        }
    },
)
const response = await client.scriptsPainlessExecute({
  script: {
    source: "params.count / params.total",
    params: {
      count: 100,
      total: 1000,
    },
  },
});
response = client.scripts_painless_execute(
  body: {
    "script": {
      "source": "params.count / params.total",
      "params": {
        "count": 100,
        "total": 1000
      }
    }
  }
)
$resp = $client->scriptsPainlessExecute([
    "body" => [
        "script" => [
            "source" => "params.count / params.total",
            "params" => [
                "count" => 100,
                "total" => 1000,
            ],
        ],
    ],
]);
curl -X POST -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"script":{"source":"params.count / params.total","params":{"count":100,"total":1000}}}' "$ELASTICSEARCH_URL/_scripts/painless/_execute"
Run `POST /_scripts/painless/_execute`. The `painless_test` context is the default context. It runs scripts without additional parameters. The only variable that is available is `params`, which can be used to access user defined values. The result of the script is always converted to a string.
{
  "script": {
    "source": "params.count / params.total",
    "params": {
      "count": 100.0,
      "total": 1000.0
    }
  }
}
Run `POST /_scripts/painless/_execute` with a `filter` context. It treats scripts as if they were run inside a script query. For testing purposes, a document must be provided so that it will be temporarily indexed in-memory and is accessible from the script. More precisely, the `_source`, stored fields, and doc values of such a document are available to the script being tested.
{
  "script": {
    "source": "doc['field'].value.length() <= params.max_length",
    "params": {
      "max_length": 4
    }
  },
  "context": "filter",
  "context_setup": {
    "index": "my-index-000001",
    "document": {
      "field": "four"
    }
  }
}
Run `POST /_scripts/painless/_execute` with a `score` context. It treats scripts as if they were run inside a `script_score` function in a `function_score` query.
{
  "script": {
    "source": "doc['rank'].value / params.max_rank",
    "params": {
      "max_rank": 5.0
    }
  },
  "context": "score",
  "context_setup": {
    "index": "my-index-000001",
    "document": {
      "rank": 4
    }
  }
}
Response examples (200)
A successful response from `POST /_scripts/painless/_execute` with a `painless_test` context.
{
  "result": "0.1"
}
A successful response from `POST /_scripts/painless/_execute` with a `filter` context.
{
  "result": true
}
A successful response from `POST /_scripts/painless/_execute` with a `score` context.
{
  "result": 0.8
}













































Clear a scrolling search Generally available

DELETE /_search/scroll/{scroll_id}

Clear the search context and results for a scrolling search.

External documentation

Path parameters

  • scroll_id string | array[string] Required Deprecated

    A comma-separated list of scroll IDs to clear. To clear all scroll IDs, use _all. IMPORTANT: Scroll IDs can be long. It is recommended to specify scroll IDs in the request body parameter.

application/json

Body

Responses

  • 200 application/json
    Hide response attributes Show response attributes object
    • succeeded boolean Required

      If true, the request succeeded. This does not indicate whether any scrolling search requests were cleared.

    • num_freed number Required

      The number of scrolling search requests cleared.

DELETE /_search/scroll/{scroll_id}
DELETE /_search/scroll
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}
resp = client.clear_scroll(
    scroll_id="DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==",
)
const response = await client.clearScroll({
  scroll_id: "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==",
});
response = client.clear_scroll(
  body: {
    "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
  }
)
$resp = $client->clearScroll([
    "body" => [
        "scroll_id" => "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ==",
    ],
]);
curl -X DELETE -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"scroll_id":"DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="}' "$ELASTICSEARCH_URL/_search/scroll"
Request example
Run `DELETE /_search/scroll` to clear the search context and results for a scrolling search.
{
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4WYm9laVYtZndUQlNsdDcwakFMNjU1QQ=="
}