The Computer Weekly Open Source Insider blog ran a story this June 2018 detailing work carried out by large-scale interactive geospatial analytics company MapD and its work to provide a higher level of data crunching through its software, which is tightly integrated with a GPU-based rendering engine.
So are GPU-based rendering engines the future (well, at least until quantum computing services start to come online) for extreme data analytics?
Not exactly, says Cambridge, UK-based Dr. Steve Marsh in his position as founder and CTO of GeoSpock an extreme data processing platform built to manage real-time machine-generated sensor information.
Marsh’s beef with MapD rests in the assertion that the company is merely following the current market-trend of relying on ever more powerful (and often prohibitively expensive) computer components in an attempt to solve the extreme data analytics problem.
Dr Marsh does not mince his words; he says this is a ‘misguided attempt’.
He thinks that this approach fails to solve the fundamental issue of how to manage increasingly vast machine-generated datasets that far exceed the memory capacity of GPUs by several orders of magnitude.
“A GPU-based approach is not future proof and is barely able to cope with today’s data volumes, so the exabyte volumes of the future will create a significant challenge for those trying to brute-force the problem with GPUs or in-memory databases,” said Marsh.
He explains that the expensive hardware may produce appealing visualisations, but they have a lengthy initial load time, are limited to ten’s of gigabytes in-memory, and have data rows languishing in the billions.
“This means the solution not only fails to qualify as ‘extreme’ analytics, but also lacks the scale that is required to handle exabytes of data and trillion-rows datasets needed for emerging use cases such as IoT, smart cities, and self-driving vehicles. It’s a costly and unsustainable half-answer to an exponentially growing data problem,” said Marsh.
His argument includes an insistence that we recognise that the GPU is ‘not a viable basis’ for massive data evaluation.
Why is Mr (sorry, Dr) Marsh so confident in making these claims?
General Purpose Graphic Processing Units (GPGPU) architecture was a significant focus of Marsh’s PhD in Computer Science, which was centred on designing custom vector processors for neural simulation.
No silver bullet
“After deep exploration of the way GPUs operate at a micro-architecture level, it became evident that, though good a raw number crunching, they would not be technologically or economically suitable to operate at ‘extreme’ data scale. The current overestimation of GPU capability as a ‘silver bullet’ is a prime example of the confusion caused by marketing hype overruling computer science practice.
So what to do?
Marsh says that going forward, we as an industry need to refocus on the deliverables of technology to ensure it has the ability to accommodate future ‘extreme’ data loads.
He concludes by saying that we need tools that operate on petabyte-scale (and up) data sets, with sub-second response time and a foundation of affordable commodity hardware that can evolve in line with data requirements.