Time Based Indexing Using Elastic Search
Time Based Indexing Using Elastic Search
Rollover API
Elasticsearch provides support for time-based indexing using its Rollover
API. It is offered in two forms that I found particularly interesting:
1. REST-based APIs
2. Java APIs
For testing and playing around with how rollover actually works, it is
imperative to use the REST endpoint since it’s so easy to set up and run. We
will talk about both the ways in this blog.
Rollover API follows the Rollover pattern, which essentially works as
follows:
There is one alias used for indexing that points to the active index.
Another alias points to active and inactive indices and is used for
searching.
The active index can have as many shards as you have hot nodes to
take advantage of the indexing resources of all your expensive
hardware.
When the active index is too full or too old, it is rolled over, a new
index is created, and the indexing alias switches atomically from the
old index to the new.
The old index is moved to a cold node and is shrunk down to one
shard, which can also be force-merged and compressed. However,
this will not be not covered in this blog.
REST-Based Method
We’re going to create two aliases:logs-search for searches and logs-write
for indexing.
1. First, we create a new index template with a search alias. We will now
refer to the index using this alias only for searches.
PUT localhost:9200/_template/logs
{
“template”: “logs-*”,
“settings”: {
“number_of_shards”: 5,
“number_of_replicas”: 1
},
“aliases”: {
“logs-search”: {
}
}
}
2. Next, we create an index with payload as it writes the alias and rollover
settings.
PUT localhost:9200/logs-000001
{
“aliases”: {
“logs-write”: {
“rollover”: {
“conditions”: {
“max_age”: “60s”,
“max_docs”: 10
}
}
}
}
}
3. We index some data using the alias — this is not the actual index name.
POST localhost:9200/logs-write/_doc/861233345
{
“user”: “kimchy”,
“post_date”: “2009-11-15T14:12:12”,
“message”: “trying out Elasticsearch”
}
You may see a response similar to this:
{
“_index”: “logs-000001”,
“_type”: “_doc”,
“_id”: “861233345”,
“_version”: 3,
“result”: “updated”,
“_shards”: {
“total”: 2,
“successful”: 1,
“failed”: 0
},
“_seq_no”: 2,
“_primary_term”: 1
}
5. To verify that the rollover did, indeed, happen, try writing some new
data to the index (again using the alias):
POST localhost:9200/logs-write/_doc/1233
You can see the index that was written to logs-000002, which is the rolled
over index:
{
“_index”: “logs-000002”,
“_type”: “_doc”,
“_id”: “861233345”,
“_version”: 3,
“result”: “updated”,
“_shards”: {
“total”: 2,
“successful”: 1,
“failed”: 0
},
“_seq_no”: 2,
“_primary_term”: 1
}
6. For searches, however, you would use the search alias, which keeps on
pointing to all the logs-* indexes because of the index template we defined
in step one. If we were to use the logs-write alias for searching, it would
only point to the rolled over index (only one), and we won’t have all the
documents from the previous indexes.
GET localhost:9200/logs-search/_search
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 6,
"max_score": 1,
"hits": [
{
"_index": "logs-000001",
"_type": "_doc",
"_id": "8611234677862",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000002",
"_type": "_doc",
"_id": "861123467jahd",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000003",
"_type": "_doc",
"_id": "861123467jahd",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000004",
"_type": "_doc",
"_id": "861123467jahd",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000001",
"_type": "_doc",
"_id": "8611234677",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
},
{
"_index": "logs-000002",
"_type": "_doc",
"_id": "8611234677",
"_score": 1,
"_source": {
"user": "kimchy",
"post_date": "2009-11-15T14:12:12",
"message": "trying out Elasticsearch"
}
}
]
}
}
As you can see, the search result contains data from different indexes(logs-
00001 thru logs-00004).
7. The multiple indices fetch was possible because of the logs-search alias
that points to multiple indices. To verify this, use:
alias index filter routing.index routing.search
logs-search logs-000002 - - -
logs-write logs-000002 - - -
logs-search logs-000001 - - -
Also, notice that logs-write just points to one index at a time, which is what
we desire.