bluebay2014 - Fotolia
Steve MacPherson is the CTO at London-based special effects studio Framestore, which provides the video effects for movies and advertising. Its work has included video effects for the film Gravity, the O2 Simplicity advert and the London Stock Exchange Stellar Atrium Project.
Massive server farms running video rendering are the powerhouse behind the video effects industry.
MacPherson has a background in high-performance computing, having worked with the granddaddy of supercomputers, Cray, during the mid-1980s. He says: "If I say I worked there during the glory days of Cray, it really dates me."
Although the computational resources MacPherson had access to during the Cray era were phenomenal for the time, he says: "Nobody cares about that level of performance any more.”
For him, whether someone can get the performance of a Cray 1 on a smartphone is not particularly relevant. Instead, he says people are now more interested in the manageability of the HPC environment.
"Can you pull down 10,000 cores to solve a simulation problem?" he asks. "Can you take something that would require 48 hours to complete using all the resources you have and do it in 12 hours or eight hours?"
These are the types of challenge MacPherson faced when he joined Framestore in 2010. At the time, his goal was simple enough: provide the infrastructure to support the special effects rendering for the movie Gravity (see Force of Gravity panel below), which won the Oscar for best visual effects in 2014.
The key application for the special effects company is its on-premise rendering farm. The creative teams submit jobs to the rendering queue, a piece of software developed internally by Framestore called fQ. MacPherson says: "It is not unusual for us to run fQ to achieve 96% hardware efficiency. It takes whatever resources you give it, chops them and put the appropriate work on these resources. So when fQ is working with our local rendering engine, those machines are busy all of the time."
MacPherson’s role at Framestore is managing the mountain of work in the rendering queue. Along with the on-premise rendering farm, he says the company has also investigated the practicality of offloading some workloads to the Google Compute Engine public cloud.
"Parallelism of work is a massive resource optimisation effort," he says. "When I started at Cray, just getting dual or quad core symmetric multi-processing [working] was a major engineering challenge."
Read more CIO interviews
- David Loughenbury joined Police Mutual in 2011 as CIO to lead a change agenda and provide a single view of its members.
- CABI CIO Andrea Powell says the not-for-profit agri-science data provider aims to solve the world’s agricultural problems with data, structured and unstructured.
- Paul Saunders, director of technology at the University of Dundee, talks about the cultural differences between the US and the UK in business attitudes.
Once the hardware problems had been resolved, engineering then needed to tackle software – getting application code to run in a parallel computing architecture, he says. "Scale is one trend in HPC, but the other is how you take advantage of parallel architectures. This involves compiler technology and the ability to write efficient code and understanding the nature of the work you do."
Handling the pressure
As a CTO, MacPherson says: "I'm presented with intractable problems on a weekly, if not daily, basis. My role is to make sure the madness doesn't eat us. A lot has to do with managing the budget, finding really good people and keeping an entrepreneurial spirit on how to solve problems and keeping heads of production and the CEO happy."
Asked how he copes, MacPherson says he has the support of senior management. "This is what keeps me sane because I don't need a six-month procurement cycle," he says. "People are very focused on finding a solution.
"When we were preparing for Gravity, it was like this one big thing coming towards us and everyone was focused."
Now he says the company is more agile. From a technology perspective, MacPherson does not think there is necessarily one big technology game-changer.
Instead, he feels technology developments are coming together in aggregate.
"Take the cloud," he says. "Two years ago, it would kill most conversations among technical people in my industry. There was a general 'cloud is bullshit' due to the hype. As the cloud resources have matured we find having a resource that can scale up very quickly in the thousands of cores provides a very interesting dynamic to our delivery process."
As the conversation matures, MacPherson says people start looking at what workloads should be offloaded to the cloud, the input and output constraints, where the data should be stored, and latency. But he adds: “We can’t send everything to the cloud because we need studio approval.”
Instead, says MacPherson, Framestore submits a certain class of work that is not security-sensitive or requires a lot of input and output operations to the Google cloud.
Rendering in the cloud
Framestore has now begun using Google in Belgium for rendering. In a recent blog post, MacPherson wrote: "By adding Compute Engine to our workflow and allowing our in-house capacity to focus on the studio work, everyone’s project gets computing time – and the creative team can get as imaginative as they want to, with fast views of new iterations. The results: fewer bottlenecks, more creativity and more predictability, not to mention saving about £200,000 on the cores we didn’t need to buy."
There is nine milliseconds of latency between Framestore’s servers and the Google Compute Engine in Belgium. MacPherson thinks this is short enough to make using Google’s Belgian cloud facility viable. But clearing and moving large data files back and forth between the cloud and Framestore’s on-premise servers is impractical.
As a result, MacPherson says Framestore has begun piloting using Avere to support this deployment of the rendering engine in the cloud. "Just like how we used Avere for caching, we are now looking to see if we can use Avere to cache data so it does not have to take the long trip back to our servers."
If the pilot is successful, MacPherson says Framestore will be able to expand the amount of work it can put in the cloud. "So, rather than using the cloud for only 20% of rendering, we can expand this to 40% or 50%, which gives us a lot more flexibility to provision resources,” he adds.
In effect, Avere enables Framestore to improve the efficiency of a rendering job. "If we can improve efficiency from 70% to 80%, that’s a big gain," says MacPherson. So 10% less work needs to be sent down the network pipes to the on-premise rendering farm.
The three-dimensional model of the International Space Station in the movie Gravity required 100 million polygons. This was among the major challenges the IT at video effects firm Framestore had to handle.
Framestore’s head of research and development, Martin Preston, said: "We just needed to get everything we had into the renderer as fast as possible and let it handle it."
Given this remit, the rendering IT needed a radical rethink. Framestore CTO Steve MacPherson explains: "While we were preparing for Gravity, we had to rebuild our entire computing infrastructure because it was so much bigger than anything else we had done."
One of the key decisions MacPherson made was around storage. He says: "Our render farm is a denial-of-service attack on our servers. Once we turn it on, anyone else trying to access the servers will run into latency and performance issues."
Framestore needed to partition the render farm and limit its impact on the servers, says MacPherson. One option would be to scale by buying more servers. "By the time you aggregate enough disc drives to get the performance from the discs, which have to be replicated over and over again, the cost adds up to millions of pounds," he adds.
Instead, by analysing the profile of the rendering jobs, the team realised there was a lot of data that could be cached because it did not change. "Avere became a front-end NFS [network file server] cache, collecting data for the render farm so that it did not need to go back to the servers," says MacPherson.