Myimagine - Fotolia

Case study: National Rail Enquiries tackles website uptime issues with AWS and SOASTA

The National Rail Enquiries website took a battering in the wake of the 2013 St. Jude’s storm, prompting the organisation to rethink its hosting and load-testing procedures

This article can also be found in the Premium Editorial Download: Computer Weekly: How cloud keeps train passengers informed in a storm

The National Rail network plays host to around 1.6 billion passenger journeys a year, prompting around 500,000 people on an average day to visit its website for route planning purposes.

On days such as Monday 28 October 2013, when large swathes of the UK’s rail network were brought to a halt by the after effects of the St. Jude’s Day storm, it is not uncommon to see visits to the National Rail Enquiries (NRE) site increase dramatically.

This is fine, as long as the infrastructure underpinning the site it is equipped to cope with such a large surge in visitors. This turned out not to be the case on 28 October, when winds of up to 99mph lashed some parts of the British Isles.

Weathering the storm

Several of the NRE website’s core systems were being hosted in a private datacentre infrastructure that should have – theoretically – had sufficient capacity to cope with this unusual rise in site visitors.

However, due to shortcomings in the website’s load testing procedures, it seems the NRE team had been filled with a false sense of security about its ability to take the strain, Jason Webb, director of the Association of Train Operating Companies (Atoc), tells Computer Weekly.

“There are a myriad of disruption events that can occur, some planned and some not. The net result is a sharp rise in capacity and how quickly people come to the site,” he says.

Atoc is responsible for running NRE, which – along with its website operations – fields around 600 million inbound requests from customers a year for information about their journeys.

“We have a small team of around 30 to 40 people who manage these requests, which are spread predominantly across the mobile app and the desktop. We also have up and coming channels, such as social media, which form part of that 600 million,” says Webb.

To indicate how the nature of these interactions with NRE have changed over time, voice calls used to be the source of around 70 million of these requests a year, says Webb. However, this figure has since dropped to around the three million mark.

“We had our busiest day ever during the St. Jude’s Day storm of October 2013. I think it’s fair to say we had a few customers who could not get through to the site,” says Webb.

Read more about website performance

“Although we had carried out some form of load testing [in the lead up to the storm], it soon became clear that it hadn’t been good enough.”

The crux of the problem lies in the fact the company had relied on load testing scripts to challenge the robustness of its infrastructure, Webb explains, which did not provide a real-world view of how the site would cope should it receive a sharp rise in visitors.

“Was it cheaper to do things this way? Yes. But was it effective? Absolutely not.”

Private datacentre versus AWS

The fallout from the St. Jude’s Day storm prompted the company to shift the website-related workloads running in its private datacentre to the Amazon Web Services (AWS) cloud for scalability reasons. National Rail also enlisted the help of website performance monitoring company SOASTA.

“We have the ability to add more infrastructure with Amazon, but one of the other drivers for us was to get closer to the pay-as-you-go IT consumption model,” says Webb.

SOASTA’s involvement saw NRE deploy the firm’s CloudTest on Demand technology, which allows NRE to stress test its website by simulating its real-time response to surges in internet or mobile traffic from a variety of locations worldwide.

“When we do a load test now, we will run it up to many times more of the throughput we saw at the time of St. Jude’s Day storm. If we’re running tests that simulate a greater load and velocity of load than that storm, it gives me confidence our systems will bear up to that,” says Webb.

The move over to AWS began in earnest shortly after the St. Jude’s Day storm struck, which the SOASTA team assisted with, and was completed by April 2014.

Since we’ve started using the SOASTA tool, we know we can meet customer expectations and demands
Jason Webb, Association of Train Operating Companies

As part of this, the company’s website management system, journey planner tool and its real-time information platform were all moved off-premise.

“We worked with SOASTA for each of those component parts to load test singularly and collectively, giving us a true representation of how our customers use the site,” says Webb.

“Since we’ve started using the SOASTA tool, we know we can meet customer expectations and demands. The tool looks at how customers are using our site currently, plugging that information in and seeing how the site responds.

“As long as we keep a check on how customers use our site and follow that up with load testing, I’m very comfortable and confident we can withstand whatever comes our way,” he adds.

Improving user experience

Webb says his team has not ruled out investigating more of what SOASTA’s product portfolio has to offer, as part of its ongoing commitment to ensuring NRE visitors enjoy a good user experience.

It is also toying with the idea of making use of AWS’ auto-scaling capabilities, but – for the time being – Webb says he is happy with the setup it has in place.

“The biggest health warning I’d give to anyone undergoing a similar project is to make sure load balancing is realistic to how your customers use your services, as well as representative of customers coming from many points and places,” he says.

“Don’t just rely on a scripted load test. Live load testing is not massively expensive and can save you a lot of hassle in the long run.”

Read more on Datacentre performance troubleshooting, monitoring and optimisation