At times you may find that the indexes in your cluster are not queried that often but you still want them around. But you also want to reduce the resource footprint by reducing the number of shards, and perhaps increase the refresh interval.
For refresh interval, if new data comes in and we dont care to have it available near real time, we can set the refresh interval for example to 60 seconds, so the index will only have the data available every 60 seconds. (default: 1s)
Reindexing Elasticsearch Indexes
In this example we will use the scenario where we have daily indexes with 5 primary shards and 1 set of replicas and we would like to create a weekly index with 1 primary shard, 1 replica and the refresh interval of 60 seconds, and reindex the previous weeks data into our weekly index.
Create the target weekly index with the mentioned configuration:
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/indices/my-index-2019.01.01*?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open my-index-2019.01.01 wbFEJCApSpSlbOXzb1Tjxw 5 1 22007 0 6.6mb 3.2mb
green open my-index-2019.01.02 cbDmJR7pbpRT3O2x46fj20 5 1 28031 0 7.2mb 3.4mb
..
green open my-index-2019.01.01-07 mJR7pJ9O4T3O9jzyI943ca 1 1 0 0 466b 233b
Create the reindex job, specify the source indexes and the destination index where the data must be reindexed to:
If the response is 0 then all the tasks completed and we can have a look at our index again:
123456
$ curl -s -XGET 'http://127.0.0.1:9200/_cat/indices/my-index-2019.01.0*?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open my-index-2019.01.01 wbFEJCApSpSlbOXzb1Tjxw 5 1 22007 0 6.6mb 3.2mb
green open my-index-2019.01.02 cbDmJR7pbpRT3O2x46fj20 5 1 28031 0 7.2mb 3.4mb
..
green open my-index-2019.01.01-07 mJR7pJ9O4T3O9jzyI943ca 1 1 322007 0 45.9mb 22.9mb
Now that we can verify that the reindex tasks finished and we can see the aggregated result in our target index, we can delete our source indexes: