Create a transform Generally available

PUT /_transform/{transform_id}

Creates a transform.

A transform copies data from source indices, transforms it, and persists it into an entity-centric destination index. You can also think of the destination index as a two-dimensional tabular data structure (known as a data frame). The ID for each document in the data frame is generated from a hash of the entity, so there is a unique row per entity.

You must choose either the latest or pivot method for your transform; you cannot use both in a single transform. If you choose to use the pivot method for your transform, the entities are defined by the set of group_by fields in the pivot object. If you choose to use the latest method, the entities are defined by the unique_key field values in the latest object.

You must have create_index, index, and read privileges on the destination index and read and view_index_metadata privileges on the source indices. When Elasticsearch security features are enabled, the transform remembers which roles the user that created it had at the time of creation and uses those same roles. If those roles do not have the required privileges on the source and destination indices, the transform fails when it attempts unauthorized operations.

NOTE: You must use Kibana or this API to create a transform. Do not add a transform directly into any .transform-internal* indices using the Elasticsearch index API. If Elasticsearch security features are enabled, do not give users any privileges on .transform-internal* indices. If you used transforms prior to 7.5, also do not give users any privileges on .data-frame-internal* indices.

Required authorization

  • Index privileges: create_index,read,index,view_index_metadata
  • Cluster privileges: manage_transform

Path parameters

  • transform_id string Required

    Identifier for the transform. This identifier can contain lowercase alphanumeric characters (a-z and 0-9), hyphens, and underscores. It has a 64 character limit and must start and end with alphanumeric characters.

Query parameters

  • defer_validation boolean

    When the transform is created, a series of validations occur to ensure its success. For example, there is a check for the existence of the source indices and a check that the destination index is not part of the source index pattern. You can use this parameter to skip the checks, for example when the source index does not exist until after the transform is created. The validations are always run when you start the transform, however, with the exception of privilege checks.

  • timeout string

    Period to wait for a response. If no response is received before the timeout expires, the request fails and returns an error.

    Values are -1 or 0.

application/json

Body Required

  • dest object Required

    The destination for the transform.

    Hide dest attributes Show dest attributes object
    • index string

      The destination index for the transform. The mappings of the destination index are deduced based on the source fields when possible. If alternate mappings are required, use the create index API prior to starting the transform.

    • pipeline string

      The unique identifier for an ingest pipeline.

  • description string

    Free text description of the transform.

  • frequency string

    The interval between checks for changes in the source indices when the transform is running continuously. Also determines the retry interval in the event of transient failures while the transform is searching or indexing. The minimum value is 1s and the maximum is 1h.

  • latest object

    The latest method transforms the data by finding the latest document for each unique key.

    Hide latest attributes Show latest attributes object
    • sort string Required

      Specifies the date field that is used to identify the latest documents.

    • unique_key array[string] Required

      Specifies an array of one or more fields that are used to group the data.

  • _meta object

    Defines optional transform metadata.

    Hide _meta attribute Show _meta attribute object
    • * object Additional properties
  • pivot object

    The pivot method transforms the data by aggregating and grouping it. These objects define the group by fields and the aggregation to reduce the data.

    Hide pivot attributes Show pivot attributes object
    • aggregations object

      Defines how to aggregate the grouped data. The following aggregations are currently supported: average, bucket script, bucket selector, cardinality, filter, geo bounds, geo centroid, geo line, max, median absolute deviation, min, missing, percentiles, rare terms, scripted metric, stats, sum, terms, top metrics, value count, weighted average.

    • group_by object

      Defines how to group the data. More than one grouping can be defined per pivot. The following groupings are currently supported: date histogram, geotile grid, histogram, terms.

      Hide group_by attribute Show group_by attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
  • retention_policy object

    Defines a retention policy for the transform. Data that meets the defined criteria is deleted from the destination index.

    Hide retention_policy attribute Show retention_policy attribute object
    • time object

      Specifies that the transform uses a time field to set the retention policy.

      Hide time attributes Show time attributes object
      • field string Required

        The date field that is used to calculate the age of the document.

      • max_age string Required

        Specifies the maximum age of a document in the destination index. Documents that are older than the configured value are removed from the destination index.

  • settings object

    Defines optional transform settings.

    Hide settings attributes Show settings attributes object
    • align_checkpoints boolean

      Specifies whether the transform checkpoint ranges should be optimized for performance. Such optimization can align checkpoint ranges with the date histogram interval when date histogram is specified as a group source in the transform config. As a result, less document updates in the destination index will be performed thus improving overall performance.

      Default value is true.

    • dates_as_epoch_millis boolean

      Defines if dates in the ouput should be written as ISO formatted string or as millis since epoch. epoch_millis was the default for transforms created before version 7.11. For compatible output set this value to true.

      Default value is false.

    • deduce_mappings boolean

      Specifies whether the transform should deduce the destination index mappings from the transform configuration.

      Default value is true.

    • docs_per_second number

      Specifies a limit on the number of input documents per second. This setting throttles the transform by adding a wait time between search requests. The default value is null, which disables throttling.

    • max_page_search_size number

      Defines the initial page size to use for the composite aggregation for each checkpoint. If circuit breaker exceptions occur, the page size is dynamically adjusted to a lower value. The minimum value is 10 and the maximum is 65,536.

      Default value is 500.

    • unattended boolean Generally available

      If true, the transform runs in unattended mode. In unattended mode, the transform retries indefinitely in case of an error which means the transform never fails. Setting the number of retries other than infinite fails in validation.

      Default value is false.

  • source object Required

    The source of the data for the transform.

    Hide source attributes Show source attributes object
    • index string | array[string] Required

      The source indices for the transform. It can be a single index, an index pattern (for example, "my-index-*""), an array of indices (for example, ["my-index-000001", "my-index-000002"]), or an array of index patterns (for example, ["my-index-*", "my-other-index-*"]. For remote indices use the syntax "remote_name:index_name". If any indices are in remote clusters then the master node and at least one transform node must have the remote_cluster_client node role.

    • runtime_mappings object

      Definitions of search-time runtime fields that can be used by the transform. For search runtime fields all data nodes, including remote nodes, must be 7.12 or later.

      Hide runtime_mappings attribute Show runtime_mappings attribute object
      • * object Additional properties
        Hide * attributes Show * attributes object
        • fields object

          For type composite

          Hide fields attribute Show fields attribute object
          • * object Additional properties
        • fetch_fields array[object]

          For type lookup

        • format string

          A custom format for date type runtime fields.

        • input_field string

          For type lookup

        • target_field string

          For type lookup

        • target_index string

          For type lookup

        • script object

          Painless script executed at query time.

        • type string Required

          Field type, which can be: boolean, composite, date, double, geo_point, ip,keyword, long, or lookup.

          Values are boolean, composite, date, double, geo_point, geo_shape, ip, keyword, long, or lookup.

    • query object

      A query clause that retrieves a subset of data from the source index.

      Query DSL
  • sync object

    Defines the properties transforms require to run continuously.

    Hide sync attribute Show sync attribute object
    • time object

      Specifies that the transform uses a time field to synchronize the source and destination indices.

      Hide time attributes Show time attributes object
      • delay string

        The time delay between the current time and the latest input data time.

      • field string Required

        The date field that is used to identify new documents in the source. In general, it’s a good idea to use a field that contains the ingest timestamp. If you use a different field, you might need to set the delay such that it accounts for data transmission delays.

Responses

  • 200 application/json
    Hide response attribute Show response attribute object
    • acknowledged boolean Required

      For a successful response, this value is always true. On failure, an exception is returned instead.

PUT _transform/ecommerce_transform1
{
  "source": {
    "index": "kibana_sample_data_ecommerce",
    "query": {
      "term": {
        "geoip.continent_name": {
          "value": "Asia"
        }
      }
    }
  },
  "pivot": {
    "group_by": {
      "customer_id": {
        "terms": {
          "field": "customer_id",
          "missing_bucket": true
        }
      }
    },
    "aggregations": {
      "max_price": {
        "max": {
          "field": "taxful_total_price"
        }
      }
    }
  },
  "description": "Maximum priced ecommerce data by customer_id in Asia",
  "dest": {
    "index": "kibana_sample_data_ecommerce_transform1",
    "pipeline": "add_timestamp_pipeline"
  },
  "frequency": "5m",
  "sync": {
    "time": {
      "field": "order_date",
      "delay": "60s"
    }
  },
  "retention_policy": {
    "time": {
      "field": "order_date",
      "max_age": "30d"
    }
  }
}
resp = client.transform.put_transform(
    transform_id="ecommerce_transform1",
    source={
        "index": "kibana_sample_data_ecommerce",
        "query": {
            "term": {
                "geoip.continent_name": {
                    "value": "Asia"
                }
            }
        }
    },
    pivot={
        "group_by": {
            "customer_id": {
                "terms": {
                    "field": "customer_id",
                    "missing_bucket": True
                }
            }
        },
        "aggregations": {
            "max_price": {
                "max": {
                    "field": "taxful_total_price"
                }
            }
        }
    },
    description="Maximum priced ecommerce data by customer_id in Asia",
    dest={
        "index": "kibana_sample_data_ecommerce_transform1",
        "pipeline": "add_timestamp_pipeline"
    },
    frequency="5m",
    sync={
        "time": {
            "field": "order_date",
            "delay": "60s"
        }
    },
    retention_policy={
        "time": {
            "field": "order_date",
            "max_age": "30d"
        }
    },
)
const response = await client.transform.putTransform({
  transform_id: "ecommerce_transform1",
  source: {
    index: "kibana_sample_data_ecommerce",
    query: {
      term: {
        "geoip.continent_name": {
          value: "Asia",
        },
      },
    },
  },
  pivot: {
    group_by: {
      customer_id: {
        terms: {
          field: "customer_id",
          missing_bucket: true,
        },
      },
    },
    aggregations: {
      max_price: {
        max: {
          field: "taxful_total_price",
        },
      },
    },
  },
  description: "Maximum priced ecommerce data by customer_id in Asia",
  dest: {
    index: "kibana_sample_data_ecommerce_transform1",
    pipeline: "add_timestamp_pipeline",
  },
  frequency: "5m",
  sync: {
    time: {
      field: "order_date",
      delay: "60s",
    },
  },
  retention_policy: {
    time: {
      field: "order_date",
      max_age: "30d",
    },
  },
});
response = client.transform.put_transform(
  transform_id: "ecommerce_transform1",
  body: {
    "source": {
      "index": "kibana_sample_data_ecommerce",
      "query": {
        "term": {
          "geoip.continent_name": {
            "value": "Asia"
          }
        }
      }
    },
    "pivot": {
      "group_by": {
        "customer_id": {
          "terms": {
            "field": "customer_id",
            "missing_bucket": true
          }
        }
      },
      "aggregations": {
        "max_price": {
          "max": {
            "field": "taxful_total_price"
          }
        }
      }
    },
    "description": "Maximum priced ecommerce data by customer_id in Asia",
    "dest": {
      "index": "kibana_sample_data_ecommerce_transform1",
      "pipeline": "add_timestamp_pipeline"
    },
    "frequency": "5m",
    "sync": {
      "time": {
        "field": "order_date",
        "delay": "60s"
      }
    },
    "retention_policy": {
      "time": {
        "field": "order_date",
        "max_age": "30d"
      }
    }
  }
)
$resp = $client->transform()->putTransform([
    "transform_id" => "ecommerce_transform1",
    "body" => [
        "source" => [
            "index" => "kibana_sample_data_ecommerce",
            "query" => [
                "term" => [
                    "geoip.continent_name" => [
                        "value" => "Asia",
                    ],
                ],
            ],
        ],
        "pivot" => [
            "group_by" => [
                "customer_id" => [
                    "terms" => [
                        "field" => "customer_id",
                        "missing_bucket" => true,
                    ],
                ],
            ],
            "aggregations" => [
                "max_price" => [
                    "max" => [
                        "field" => "taxful_total_price",
                    ],
                ],
            ],
        ],
        "description" => "Maximum priced ecommerce data by customer_id in Asia",
        "dest" => [
            "index" => "kibana_sample_data_ecommerce_transform1",
            "pipeline" => "add_timestamp_pipeline",
        ],
        "frequency" => "5m",
        "sync" => [
            "time" => [
                "field" => "order_date",
                "delay" => "60s",
            ],
        ],
        "retention_policy" => [
            "time" => [
                "field" => "order_date",
                "max_age" => "30d",
            ],
        ],
    ],
]);
curl -X PUT -H "Authorization: ApiKey $ELASTIC_API_KEY" -H "Content-Type: application/json" -d '{"source":{"index":"kibana_sample_data_ecommerce","query":{"term":{"geoip.continent_name":{"value":"Asia"}}}},"pivot":{"group_by":{"customer_id":{"terms":{"field":"customer_id","missing_bucket":true}}},"aggregations":{"max_price":{"max":{"field":"taxful_total_price"}}}},"description":"Maximum priced ecommerce data by customer_id in Asia","dest":{"index":"kibana_sample_data_ecommerce_transform1","pipeline":"add_timestamp_pipeline"},"frequency":"5m","sync":{"time":{"field":"order_date","delay":"60s"}},"retention_policy":{"time":{"field":"order_date","max_age":"30d"}}}' "$ELASTICSEARCH_URL/_transform/ecommerce_transform1"
client.transform().putTransform(p -> p
    .description("Maximum priced ecommerce data by customer_id in Asia")
    .dest(d -> d
        .index("kibana_sample_data_ecommerce_transform1")
        .pipeline("add_timestamp_pipeline")
    )
    .frequency(f -> f
        .time("5m")
    )
    .pivot(pi -> pi
        .aggregations("max_price", a -> a
            .max(m -> m
                .field("taxful_total_price")
            )
        )
        .groupBy("customer_id", g -> g
            .terms(t -> t
                .field("customer_id")
                .missingBucket(true)
            )
        )
    )
    .retentionPolicy(r -> r
        .time(t -> t
            .field("order_date")
            .maxAge(m -> m
                .time("30d")
            )
        )
    )
    .source(s -> s
        .index("kibana_sample_data_ecommerce")
        .query(q -> q
            .term(te -> te
                .field("geoip.continent_name")
                .value(FieldValue.of("Asia"))
            )
        )
    )
    .sync(sy -> sy
        .time(ti -> ti
            .delay(d -> d
                .time("60s")
            )
            .field("order_date")
        )
    )
    .transformId("ecommerce_transform1")
);
Request examples
Run `PUT _transform/ecommerce_transform1` to create a transform that uses the pivot method.
{
  "source": {
    "index": "kibana_sample_data_ecommerce",
    "query": {
      "term": {
        "geoip.continent_name": {
          "value": "Asia"
        }
      }
    }
  },
  "pivot": {
    "group_by": {
      "customer_id": {
        "terms": {
          "field": "customer_id",
          "missing_bucket": true
        }
      }
    },
    "aggregations": {
      "max_price": {
        "max": {
          "field": "taxful_total_price"
        }
      }
    }
  },
  "description": "Maximum priced ecommerce data by customer_id in Asia",
  "dest": {
    "index": "kibana_sample_data_ecommerce_transform1",
    "pipeline": "add_timestamp_pipeline"
  },
  "frequency": "5m",
  "sync": {
    "time": {
      "field": "order_date",
      "delay": "60s"
    }
  },
  "retention_policy": {
    "time": {
      "field": "order_date",
      "max_age": "30d"
    }
  }
}
Run `PUT _transform/ecommerce_transform2` to create a transform that uses the latest method.
{
  "source": {
    "index": "kibana_sample_data_ecommerce"
  },
  "latest": {
    "unique_key": [
      "customer_id"
    ],
    "sort": "order_date"
  },
  "description": "Latest order for each customer",
  "dest": {
    "index": "kibana_sample_data_ecommerce_transform2"
  },
  "frequency": "5m",
  "sync": {
    "time": {
      "field": "order_date",
      "delay": "60s"
    }
  }
}
Response examples (200)
A successful response when creating a transform.
{
  "acknowledged": true
}