Elasticsearch is absolutely incredible as a search data store as it abstracts a lot of the cruft related to analyzing, distributing and returning results to searches. At some point in the evolution of Elasticsearch, you will get to a point where you need to be able to both serve searches and migrate an index at the same time. This post outlines one strategy for handling this type of a live migration
Things get complicated when you get to a point where you have a search index that needs to be available at all times and you also need to be able to add/change the mapping while serving search requests to it. For instance, say you are moving something to a new type that is different than the old one, or using a copy_to for a new field.
To handle this, the gem that I use is es-elasticity. Assuming that you have a document model called City::Document. You would accomplish this by issuing the message City::Document.rebuild_index(recreate: true)
. This will take care of all the internals needed in order to
– Create and Migrate the current data to the new index with the new mapping
– Allow all searching to take place as normal during the migration
– Delete the old index as soon as the data is migrated
But, how the hell does this work?
Elasticsearch documentation hints at how to handle this with alias indexes here https://www.elastic.co/guide/en/elasticsearch/guide/current/index-aliases.html
The source for how this works is here https://github.com/doximity/es-elasticity/blob/master/lib/elasticity/strategies/alias_index.rb
- A few conventions to make this work
- All indexes have a prefix that we use to identify it, below that will be
city_docs
- All indexes are suffixed with a timestamp, for instance
city_docs-2018-09-07_03:03:15.413063
- Elasticsearch has the concept of aliases, which are “pointers” to a real index. In this we will keep 2 aliases: Main and Update, both of which normally reference the “current” index
- All reads are on the
main alias
, it is the name of the index prefix,city_docs
here - All writes are on the
update alias
, it is the name of the index prefix with _update suffixed,city_docs_update
here
To start lets look at what our setup would actually look like through asking what main and update are pointing to
Step 1: Discovery
Find our current indexes and aliases.
Find what main alias
and update alias
are aliased to right now, %2A is url encoded ‘*’ because we normally will not know the “timestamp” of the indexes creation.
curl localhost:9200/city_docs-%2A/_alias/city_docs -> {"city_docs-2018-09-07_03:03:15.413063":{"aliases":{"city_docs":{}}}} curl localhost:9200/city_docs-%2A/_alias/city_docs_update -> {"city_docs-2018-09-07_03:03:15.413063":{"aliases":{"city_docs_update":{}}}} main_alias = "city_docs-2018-09-07_03:03:15.413063" update_alias = "city_docs-2018-09-07_03:03:15.413063"
Step 2: Preflight checks
Now that we have both of our main_alias
and update_alias
, we can do some a couple preflight checks and bail on the reindex if either of these are true.
- Check that conventions are following, otherwise this won’t work. This will happen if either update or main don’t have a single alias
- Check that there isn’t an update already in progress, that would happen if the main and update aliases don’t match
Step 3: Create a New Index
Now that the state of the world is right, we can begin by creating a new index to put all of our new index.
timestamp_now = "2020-11-27_03:03:15.413063" new_index = "city_docs-#{timestamp_now}" curl -x PUT "localhost:9200/#{new_index}" -d { ...your Index Settings }
Step 4: Point aliases for live migration
Now we need to setup our system so that we can migrate. We now have
- city_docs-2018-09-07_03:03:15.413063 -> I will call this
Old 2018
- city_docs-2020-11-27_03:03:15.413063 -> I will call this
New 2020
- An
update_alias
andmain_alias
that point toOld 2018
The next concept is to point our update alias
to only New 2020
and point main alias
to point to both New 2020
and Old 2018
Pointing main alias
to both indexes is the secret sauce that allows us to do a live migrations. All writes will now go to only the new index and main will continue to read from both old and new while data is being migrated.
Be sure to flush the indexes to clear the transactions logs.
curl -X POST localhost:9200/#{original_index}/_flush
Step 5: Data migration
Now that we have everything setup plumbing wise, the next step is to move all of our data over in batches, normally you would do this in something like sidekiq or another background processing system.
1. Create a cursor to go over the records in batches of 100
cursor = curl GET localhost:9200/#{original_index}/_search?scroll=10m&search_type=query_then_fetch&size=100
This returns both search results and a scroll_id reference to the next set of results, we loop over these and perform this basic algorithm
2. Weed out any documents that don’t exist in the original index that have been deleted since when we began the migration. To accomplish this we map all documents into a bulk request using the search result_id, type and the original_index and request all those docs using mget. Store this in a current_docs
variable
curl -X GET http://localhost:9200/_mget?refresh=true -d '{ "docs": [{ :_index=>"city_docs-2018-09-07_03:50:27.108387", :_type=>"city", :_id=>"100203" }, { :_index=>"city_docs-2018-09-07_03:50:27.108387", :_type=>"city", :_id=>"100211"}] }'
3 This approach supports removing fields on a reindex (something that lucene does not), take the new mapping and remove anything not needed in our new indexdefined_mapping_fields = index_def[:mappings][docs.first["_type"]]["properties"].keys
4 Reduce the current_docs
so that we only keep docs that exist on the index still and only take the keys from them that exist in our current mapping and bulk update with that
5. final check to see if documents don’t exist anymore by repeating the step where we grab all docs from the old index again, delete any of those that do not currently exist on the new index
Step 6: Cleanup
Now that all is migrated, we remove the alias for main alias
to the Old 2018
index and then delete the old index.
Step 7: That’s it!
So now we have a relatively straitforward process where we can rebuild indexes without any downtime. This approach accepts that it is ok to get double reads during a migration in order to have zero downtime for the migrations. This system could be updated to a different strategy to allow single reads at the cost of additional complexity and reduced reliability (or increased latency/disk space).