Case study: The Alan Turing Institute takes cloud-first stance on data science research

Computer Weekly meets the Alan Turing Institute team and finds out how using cloud-first thinking can help achieve its goal of becoming a global force to be reckoned with in data science research

Hidden away on the first floor of the British Library is a large and growing team of academic researchers, spanning computer science, mathematics, machine learning and social sciences, collectively working towards establishing the UK as global leader in data science.

The Alan Turing Institute, named after the famous World War II Bletchley Park cryptanalyst, is the organisation that has brought these individuals, of varying levels of academic seniority) together to make good on this aim.

The Institute is the result of a joint venture between the universities of Cambridge, Edinburgh, Oxford, University College London and Warwick, as well as the UK Engineering and Physical Sciences Research Council.

Each member organisation has invested £5m in the initiative, and the government has also contributed around £42m. In 2014, the Lloyd’s Register Foundation also donated £10m in funding for the project.

After several years of preparation and recruitment, the Institute officially opened for business in October 2016, and now employs around 150 people, with more coming on board all the time.

Competition amongst those wanting to work there is high, with just “3-4%” of those who apply successfully managing to secure a position, Andrew Blake, the Institute’s director tells Computer Weekly.

“We were fully open for business in October, and that’s when the researchers actually started to lodge here. We had interns and doctoral students here over the summer, though, which was quite a good warm-up, but now we are really going full throttle,” he says.

The types of academics the institute has attracted to-date include doctoral students, who are already pursing a PhD with one of the five founding universities, as well as research fellows looking to get a bit more experience under their belts before taking on a full-time teaching position somewhere.

“We run a programme where students that are already in the middle of their studies come here for a year and get the Alan Turing Institute experience, and we hope they will go home enriched with new insights that will make their studies more impactful from coming into contact with a different group of people,” he says.

The organisation also runs a programme inviting researchers from overseas to spend up to a year studying at the institute, in support of its push to become a globally recognised centre of data science research and innovation excellence.  

“We haven’t stopped hiring, and we’ve had another call out for research fellows at the moment,” says Blake. “There is an incredible amount of competition for these places because people really want to come and work here.”

Competitive advantage

There are several reasons for this, said Blake, aside from the fact data science is fast becoming an area of intense interest for organisations across the globe operating in a wide range of industries.

“The idea of the institute is to do things that you can’t easily do in other universities. We have five member universities, which are fantastic, but we don’t want to duplicate what those universities already do so well,” he says.

One area that stands the institute apart from its founding universities is the fact that it allows academics specialising in, perhaps, more niche areas of interest to collaborate with their peers more easily.

“If, for instance, you specialise in datacentre computing, very few universities can pack the computer science department with people who also do that because they’ve got to teach other things, such as programming languages, machine learning and compiler designs,” says Blake.

“And it’s only when you come to a national centre like this that you get the critical mass of people in these very important disciplines.”

The layout of the institute also lends itself well to encouraging cross-discipline collaboration, with open plan office design principles influencing much of its design.

“We’re all for championing those unexpected connections and nurturing them,” says Blake.

“As you walk around the corridors, there aren’t actually that many doors. So people come in here, and they don’t go and sit in their departments.”

Designing the next

The institute is also keenly investing in building out its software engineering capabilities to pave the way for its students to draw on their insights to create data science applications for the wider academic community to use, and – in due course – enterprises too.

“We believe software engineering will be important in giving us the route to express people’s theoretical ideas in a very practical way,” he says.

“Our aim is to go beyond what individuals would normally produce as part of their research. They will build prototypes in order to illustrate their ideas and write papers, but we aim to take the best of those ideas and put software out there.”

The institute will be responsible for creating a pipeline of applications and proposed software the business community could theoretically make use of, but it will not be directly involved in commercialising such offerings.

“We’re not a business, we’re a charity, and our aim is to be influential in data science rather than commercial,” he says.

To this end, the idea is that whatever offerings its students come up with could, through the institute’s commercial partnerships, influence the way they use technology to tackle big data conundrums.

“We believe by building tools and generating ideas, we will be able to influence many more people than if we actually launched a company ourselves,” says Blake.

“We hope our alumni will also go out and launch companies, or join big firms with research labs and start generating new technologies, and we look forward to basking in the reflected glory of those high achievers and bringing to bear the latest advances in data science.”

University cloud challenges

Cloud technology is already playing an important role in helping the Institute realise its global ambitions, with Intel offering up private cloud-like resources (in the form of a cluster) and know-how to researchers, through its strategic partnership with The Alan Turing Institute team.

“They are going to provide private cloud computing specifically for the Institute and that’s partly to enable people’s research work, but it’s also because we’re doing research into the future of computing architectures and how to actually design the computer that will best do data science,” says Blake.

It also found itself on the receiving end of $5m worth of public cloud computing capacity, courtesy of Microsoft and its Azure platform, in October 2016.

According to Blake, the organisation has been following a cloud-first strategy since the beginning for mix of operational efficiency and speed to launch.

“Our admin systems were the first to be installed and were immediately implemented using the cloud, because that is the way to build a modern business these days,” he says.

“The lead times are greatly shortened using cloud and the level of technical expertise and the quantity you have to have in house is greatly reduced because you’re not doing basic maintenance and the backup is all handled elsewhere.”

Read more about data science research

There are also logistical reasons for not wanting to build an on-premise datacentre within the confines of the British Library, says Blake.

“As we’ve grown, we’ve moved around this building. We started in the attic on the fourth floor while the space we’re in now was prepared, and during all those changes, we’ve been able to be agile and not have to worry about where the computing facility is sitting.”

For the researchers whose studies require access to large-scale compute and data processing resources, having access to public cloud resources allows them to make much more efficient use of their time, says director of Azure for research, Dr. Kenji Takeda.

He is also a visiting fellow at the Alan Turing Institute, which means he is responsible for helping the researchers make full use of Azure, while – at the same time – gleaning insights and feeding back to Microsoft on how the platform is used within the academic community.

“What’s really interesting about the Alan Turing Institute is that it is very much like a startup. Most universities have a significant amount of infrastructure they might have invested millions of pounds in over the years, but the Institute has not,” he tells Computer Weekly.

“It’s really interesting to work with them, from that respect, because it really opens up what the researchers and the institute can do.”

Particularly with regard to meeting the technology requirements of a group of people working on such a wide, varied and complex array of research projects.

“In universities, a lot of the provision for research computing is basic, and involves just giving researchers desktops or laptops and, at the other extreme, having access to supercomputer resources,” he says.

“For a lot of researchers, particularly data scientists, it is the bit in the middle between a laptop and a supercomputer that is important, but it’s very hard for a university to give that provision to hundreds or thousands of researchers because each one needs something slightly different.”

Quantifying human behaviour

Merve Alanyali is a PhD student from Warwick Business School, who joined the institute as a researcher to further her social science-related studies into how people’s online activities can allow human behaviour to be quantified.

Part of her work has involved analysing the information linked to 25 million images uploaded to the photo-sharing site Flickr by users in more than 240 countries, as an example of the size of data sets her studies require her to work with.

“If I ran that analysis on my own machine, it would have taken then months to complete, and I have a computing cluster that Warwick provides us with, but using cloud is like having your own cluster in your laptop,” she tells Computer Weekly.

“It is also your cluster, and you don’t need to wait in a job queue or worry about what priority your job is going to get, and makes it easier for me to plan what I’m doing.”

The institute has set out six areas it plans to concentrate its research efforts in over the course of its first year.

As the institute sets off on its journey to become a global force within the data science space, Blake and his team have six areas they plan to concentrate their research efforts on over the course of their first year of operation.

These areas of focus include engineering, technology, defence and security, smart cities, financial services, and health and wellbeing.

“We’re focusing on the challenges in those areas, and then there are the cross-cutting things we’re also paying a lot of attention to, such as machine learning and secure cloud computing that are underlying capabilities,” says Blake.

“A year from now, I expect some of these bets will have paid off and some of them won’t. And if they all have, what that will tell us is that we weren’t ambitious enough, because you really expect some things not to work if you are concentrating on tackling the hard challenges.”

Read more on Infrastructure-as-a-Service (IaaS)