Q&A: Why do we need exascale computers ?

Europe, the US, Japan and China are racing to develop the next generation of supercomputer – exascale machines - capable of a million trillion calculations a second.

Europe, the US, Japan and China are racing to develop the next generation of supercomputer – exascale machines - capable of a million trillion calculations a second by 2020. But why do we need computers as fast as powerful as this? And what are the technical challenges that need to be overcome. 

Thomas Sterling, (pictured) chief scientist for the centre for research on extreme scale technology at Indiana University, talked to Computer Weekly during the International Supercomputing Conference in Hamburg.

Q. Why do we need exascale computers ?

The only bad news is that we need more than exascale computing. Some of the key computational challenges, that face not just individual companies, but civilisation as a whole, will be enabled by exascale computing.

Everyone is concerned about climate change and climate modelling. The computational challenge for doing oceanic clouds, ice and topography are all tremendously important. And today we need at least two orders of magnitude improvement on that problem alone.

Controlled fusion - a big activity shared with Europe and Japan - can only be done with exascale computing and beyond. There is also medical modelling, whether it is life sciences itself, or the design of future drugs for every more rapidly changing and evolving viruses - again it’s a true exascale problem.

Exascale computing is really the medium and the only viable means of managing our future. It is probably crucial to the progress and the advancement of the modern age.

Q. What are the barriers to building exascale machines ?

The barriers are daunting. The challenge we have right now is changing the paradigm after 20 successful years of high performance computing

The need to move almost uniquely to multiple processor cores is now requiring reconsideration of how we architect these machines, how we programme these machines, and how we manage the systems during the execution of the problems themselves.

We will be facing absolutely hard barriers as we get into exascale. Atomic granularity is just one of several barriers which will provide limitations [in chip design]. And it will require true and dramatic paradigm shifts.

I am sure, though I don’t know what the solution will be, we will find completely dramatic innovations.

Q. Have we reached the end of Moore's law, which says the number of transistors will double on a chip every two years, when it comes to supercomputers?

Moore's law itself will continue through to the end of the decade and the next decade. However using those transistors, their power requirements and so forth, are really problematic. 

We can anticipate, if we are willing to innovate, we can address this problem into the exascale era. But fundamental physics will mean the heat capacity of the chips themselves and the need for cooling will become more of a barrier.

Q. How much of a problem is energy consumption to exascale computing ?

The power during the lifetime of the machine can exceed the cost of the machine itself. It is a dramatic change from previous practices. But more importantly, to effectively use such devices, reliability goes down as the heat within a system goes up. This is a major problem.

Fewer and fewer centres can host such critical systems because there are fewer and fewer facilities in Europe, the US, and Asia where enough power can be brought to support it. This is truly a barrier.

Q. Are people going to have to change their approach to developing software for exascale machines ?

This is a controversial statement. I believe the answer is yes, but many very good people disagree. Many people feel there will be incremental methods that extend prior techniques. But I do think there are going to be a need for new programming interfaces.

Q. Do we need something radically new technology to reach exascale ?

What I think is going to happen, and this is unfortunate, is that industry and a large part of the community who has invested in legacy techniques is going to push that as hard as they can, because incrementally that can seem to be politically financially easier to do - we will keep pushing it until it really breaks

At some point the community will simply say enough is enough and then they will begin to address radical techniques. This is already happening in the system community and user community.

You will see a ramp down on conventional practices and a slow ramp up of innovative practices. As one colleague put it at another meeting: “It's difficult but suck it up.”

Q. How confident are you that we will get to exascale ?

I certainly have the confidence that we will get there. When we will do it correctly, not as a stunt machine, but when we will do it correctly, will be some time in the early to mid-2020s.

Unfortunately there is still too much focus on the credit or pride factor in meeting the speed performance benchmark, and that means that stunt machines are the early focus.

Every time we have done a paradigm shift it has been an overlap between past practices pushed to extremes and future practices which need time to grow and mature.

Read more on Chips and processor hardware

Data Center
Data Management