Austin Story

Ruby, Rails and Javascript Blog

Powered by Genesis

How to Migrate Elasticsearch Indexes with Zero Downtime

November 27, 2020 By Austin Story Leave a Comment

Elasticsearch is absolutely incredible as a search data store as it abstracts a lot of the cruft related to analyzing, distributing and returning results to searches. At some point in the evolution of Elasticsearch, you will get to a point where you need to be able to both serve searches and migrate an index at the same time. This post outlines one strategy for handling this type of a live migration

Things get complicated when you get to a point where you have a search index that needs to be available at all times and you also need to be able to add/change the mapping while serving search requests to it.  For instance, say you are moving something to a new type that is different than the old one, or using a copy_to for a new field.

To handle this, the gem that I use is es-elasticity.  Assuming that you have a document model called City::Document.  You would accomplish this by issuing the message City::Document.rebuild_index(recreate: true). This will take care of all the internals needed in order to

– Create and Migrate the current data to the new index with the new mapping
– Allow all searching to take place as normal during the migration
– Delete the old index as soon as the data is migrated

But, how the hell does this work?

Elasticsearch documentation hints at how to handle this with alias indexes here https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html

The source for how this works is here https://github.com/doximity/es-elasticity/blob/master/lib/elasticity/strategies/alias_index.rb   

  • A few conventions to make this work
  • All indexes have a prefix that we use to identify it, below that will be city_docs
  • All indexes are suffixed with a timestamp, for instance city_docs-2018-09-07_03:03:15.413063
  • Elasticsearch has the concept of aliases, which are “pointers” to a real index.  In this we will keep 2 aliases: Main and Update, both of which normally reference the “current” index
  • All reads are on the main alias, it is the name of the index prefix, city_docs here
  • All writes are on the update alias, it is the name of the index prefix with _update suffixed, city_docs_update here

To start lets look at what our setup would actually look like through asking what main and update are pointing to

Step 1: Discovery

Find our current indexes and aliases.

Find what main alias and update alias are aliased to right now, %2A is url encoded ‘*’ because we normally will not know the “timestamp” of the indexes creation.

curl localhost:9200/city_docs-%2A/_alias/city_docs
-> {"city_docs-2018-09-07_03:03:15.413063":{"aliases":{"city_docs":{}}}}

curl localhost:9200/city_docs-%2A/_alias/city_docs_update
-> {"city_docs-2018-09-07_03:03:15.413063":{"aliases":{"city_docs_update":{}}}}

main_alias = "city_docs-2018-09-07_03:03:15.413063"
update_alias = "city_docs-2018-09-07_03:03:15.413063"

Step 2: Preflight checks

Now that we have both of our main_alias and update_alias, we can do some a couple preflight checks and bail on the reindex if either of these are true. 

  • Check that conventions are following, otherwise this won’t work.  This will happen if either update or main don’t have a single alias
  • Check that there isn’t an update already in progress, that would happen if the main and update aliases don’t match

Step 3: Create a New Index

Now that the state of the world is right, we can begin by creating a new index to put all of our new index.

timestamp_now = "2020-11-27_03:03:15.413063"
new_index = "city_docs-#{timestamp_now}"

curl -x PUT "localhost:9200/#{new_index}" -d { ...your Index Settings }

Step 4: Point aliases for live migration

Now we need to setup our system so that we can migrate.  We now have

  • city_docs-2018-09-07_03:03:15.413063 -> I will call this Old 2018
  • city_docs-2020-11-27_03:03:15.413063 -> I will call this New 2020
  • An update_alias and main_alias that point to Old 2018

The next concept is to point our update alias to only New 2020 and point main alias to point to both New 2020 and Old 2018

Pointing main alias to both indexes is the secret sauce that allows us to do a live migrations.  All writes will now go to only the new index and main will continue to read from both old and new while data is being migrated.

Be sure to flush the indexes to clear the transactions logs.

curl -X POST localhost:9200/#{original_index}/_flush

Step 5: Data migration

Now that we have everything setup plumbing wise, the next step is to move all of our data over in batches, normally you would do this in something like sidekiq or another background processing system.

 1. Create a cursor to go over the records in batches of 100
cursor = curl GET localhost:9200/#{original_index}/_search?scroll=10m&search_type=query_then_fetch&size=100

This returns both search results and a scroll_id reference to the next set of results, we loop over these and perform this basic algorithm

2. Weed out any documents that don’t exist in the original index that have been deleted since when we began the migration.  To accomplish this we map all documents into a bulk request using the search result_id, type and the original_index and request all those docs using mget.  Store this in a current_docs variable

 
curl -X GET http://localhost:9200/_mget?refresh=true -d '{ 
  "docs": [{ 
    :_index=>"city_docs-2018-09-07_03:50:27.108387", 
    :_type=>"city", 
    :_id=>"100203" 
  }, { 
    :_index=>"city_docs-2018-09-07_03:50:27.108387", 
    :_type=>"city", 
    :_id=>"100211"}] }'

 

3 This approach supports removing fields on a reindex (something that lucene does not), take the new mapping and remove anything not needed in our new index
defined_mapping_fields = index_def[:mappings][docs.first["_type"]]["properties"].keys

4 Reduce the current_docs so that we only keep docs that exist on the index still and only take the keys from them that exist in our current mapping and bulk update with that

5. final check to see if documents don’t exist anymore by repeating the step where we grab all docs from the old index again, delete any of those that do not currently exist on the new index

Step 6: Cleanup

Now that all is migrated, we remove the alias for main alias to the Old 2018 index and then delete the old index.

Step 7: That’s it!

So now we have a relatively straitforward process where we can rebuild indexes without any downtime.  This approach accepts that it is ok to get double reads during a migration in order to have zero downtime for the migrations.  This system could be updated to a different strategy to allow single reads at the cost of additional complexity and reduced reliability (or increased latency/disk space).

Filed Under: Elasticsearch

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Categories

  • AngularJS
  • Books
  • Devise
  • Elasticsearch
  • ES6
  • Information Security
  • Integrations
  • Javascript
  • Linux
  • Minitest
  • PhoneGap
  • Programming
  • React
  • Redux
  • Ruby
  • Ruby on Rails
  • Stripe
  • Testing
  • Theory
  • TypeScript
  • Uncategorized
  • Vue
  • Webpack