This is a guest post for the Computer Weekly Developer Network blog written by Andy Lowry in his capacity as development team lead at LateRooms.com — Lowry has been writing software professionally for more than 15 years, in industries including defence, scientific instrument control and travel.
At LateRooms.com we need to get search right. The search bar is the centrepiece of our site and it is the main interface for our customers to find the hotel they want. We use the Elasticsearch distributed full-text search server to solve a number of problems — while we regularly talk about how we use it for logging, we don’t often discuss how we use it for search, so here goes.
Autocomplete on Elasticsearch
Last year the autocomplete feature at LateRooms was completely rewritten. The old system was slow and the results were not great. Part of the new implementation was Elasticsearch Completion Suggester feature, giving LateRooms the performance needed.
This allowed us to load our destination data and all our hotels into a single index. Each document in the index has a suggest field which we use to match the input text, display text which is what you see in the drop down, and some metadata about the entry.
Matching is done on the suggest field. Indexing is done on every permutation on the first five words of the search text. We needed to do this to allow matches where the words are out of order, so “Manchester City Centre” and “City Centre Manchester” would both match the same results. We also apply stop words for words that are common to hotel names and destinations.
This solution resulted in an index of approximately 1GB in size, and with our standard cluster of three machines all with 24 cores and 80GB RAM, this gives response times averaging around 15ms.
Our search API powers the data on the Search Results pages of the website, our apps and a few other internal tools. The existing implementation was based around an SQL Server that had worked well in the past. However, we needed something more flexible and better targeted to address our problems. Elasticsearch provided that flexibility.
It is still a work in progress with the current state being a hybrid of the two systems. The focus is to move all functionality to Elasticsearch providing most value.
The implementation has two main indexes, one for destinations and the other one for hotels. The hotels index includes relatively simple information about the hotel including name, address, facilities and its geo location.
Destinations are places such as cities, towns, counties, points of interest, train stations, airports – basically anywhere someone might be looking for a hotel. Each destination includes a name, a geoshape and some metadata.
The index contains 1.7 million destinations, most of which are UK postcodes. Some are indexed as a circle, and others by a polygon.
Sourcing the polygons is one of the biggest challenges. Freely available data sources such as OpenStreetMaps and Ordnance Survey have an incredible level of detail not needed for our searches. Although accuracy is one of our main focal points, strange as it sounds, most of this data is too accurate for us. In order to minimise index size and indexing time the polygon is reduced before indexing it.
While official boundaries and borders are great for administrations they aren’t great for finding hotels. Polygons are often needed to extend well beyond the official borders. LateRooms’ home city of Manchester is a great example:
Many of the hotels (above) near the city centre of Manchester are actually in Salford. If a user is searching for Manchester hotels they expect to see those hotels listed even though they are not technically in Manchester. This turns out to be a very common situation. We could address this with a team of cartographers and a lot of effort but this would be very expensive. So we resolve this issue by having multiple shapes for each destination and A/B testing them until we find one which works best for our customers.
Our running A/B experiments are also stored in Elasticsearch. When we want to test a new shape we add it to our experiments index, including the information about what proportion of users are included in the experiment.
Text search is a little more complicated. LateRooms tries to find an appropriate destination by matching its name against the supplied text. Then if one is found a geoshape is used to query the hotel index.
If there is no matching destination, a direct match against the name and the address will find the hotel. This allows customers to find hotels by place and by name using the same search box.
Now that we have the data we need in Elasticsearch we see a lot of other features we can develop. Features such as guaranteeing results even when hotels are full, allowing users to draw their own search areas and suggesting popular areas for major cities.