Computational storage: University of Edinburgh - Antonio Barbalace at the CSD coalface

In a number of follow-up pieces to the Computer Weekly Developer Network (CWDN) series focusing on Computational Storage, CWDN continues the thread with some extended analysis in this space.

The below commentary forms part of a Q&A with Antonio Barbalace in his capacity as senior lecturer with the Institute for Computing Systems Architecture (ICSA) in the School of Informatics at The University of Edinburgh.

As we have noted before, computational storage enables a software and hardware system to offload and alleviate constraints on existing compute, memory and storage — so given this truth, how should software engineers identify, target and architect specific elements of the total processing workload to reside closer to computational storage functions?

Barbalace: I love this question! This is application dependent, for example, classically, you can offload part of an SQL query to the storage and this is very well understood – see for example Amazon S3 Select, which offloads an SQL select predicate to S3 (the object storage ‘layer’ of Amazon Cloud Service).

CSD products provide data compression/decompression, encryption/decryption etc. in which an entire application’s functionality is offloaded and provided by the device itself.

However, a generic and automatic way to identify what part of an application to a CSD is not understood yet, but academic research is working on this. Thus, today, it is the programmer that has to manually decide what part of an application can be offloaded and what part shouldn’t be based on application knowledge.

(If I may also mention, I am working on this!)

Refactoring factor

CWDN: So how much refactoring does it take to bring about a successful computational storage deployment and, crucially, how much tougher is that operation when the DevOps team in question is faced with a particularly archaic older legacy system?

Barbalace: It is very difficult to provide a generic answer to this question – it depends on the market sector and on the level of age of the hardware and software in any given company. Also, note that many small and medium-sized companies (and many large companies) are now minimising on-premises hardware and rely on large IT companies for their infrastructure.

Anyway, switching to CSD is as easy as replacing ‘normal’ disk drives with CSD. This cannot happen instantaneously, but replacing/updating disks is an understood practice for DevOps. What is more complicated is how to enable users to use CSD. This may require several years as products have been on the market for a relatively short time. DevOps should make sure that CSD will be used securely and without harm to the CSD device itself – let’s remember, CSDs are based on Flash storage, so write-life is limited.

Computational storage is not a Swiss Army Knife. It solves some problems, but it cannot solve others. Think about GPUs, GPUs are fantastic for certain tasks (e.g. HPC), but not worth using for others.

CWDN: According to the Journal of Big Data, “Catalina is the first Computational Storage Device (CSD) equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications.” So how fundamental could Catalina be and what do we need to know about it?

Barbalace: Catalina is the first commercial product equipped with a dedicated application processor running a full-fledged OS, research and industry R&D presented similar hardware concepts before, but truly Catalina is the first to run ‘normal’ Linux on the CSD.

Barbalace: CSD is no Swiss Army Knife, but we’re working hard to sharpen it as we speak.

Personally, I think that there are two or three concepts that are very interesting in this product from NGD (despite trying several times, I am not working or collaborating with NGD):

a) the way that it connects to the host CPU, by tunneling Ethernet packets via the NVMe protocol (I would refer to their video/documents for the details), the CSD looks like a node in your Ethernet network;

b) the way it exposes the NAND on the CSD;

c) the fact that it uses a distributed file system among the host CPUs and the CPUs on the CSD.

Starting standardisation

We shouldn’t think that the CSD architecture provided by NGD as the golden standard; there are other architecture out there and standardisation is at an early stage now.

Remember, CSD plugs on PCIe, not Ethernet! (despite some people at ARM thinking that plugging CSDs on low-speed Ethernet is a good idea, which instead is a REALLY BAD idea – e.g., if you connect CSD on Ethernet, you need to increase your expenditure in datacentre switches, which are expensive, cannot provide the same volumetric high-density, need to plan space for redundant power supply for each CSD …)

Overall, CSD is a fantastic technology because it doesn’t require any hardware change other than swapping the old disks with the new disks – anything will keep working as before, but you have additional functionalities: CSD doesn’t require any other infrastructure refresh, companies can keep the same network/switches and servers as before.

Lessons from the good old days

Surely, software should change to exploit the new hardware – I am not that young, but I still remember the gold old days in which you could just use the exact same software on a new/faster computer and the software was running faster.

In the case of NGD, you don’t need to modify the software much, to the best of my understanding of NGD technology, software developed for distributed systems just runs (more or less) as-is. However, from my perspective, software should be changed to fully exploit CSD … but we are not that far from the idea that we can reuse the same software we already developed just as-is.

A legacy example first: SQL. With SQL the user doesn’t have to change the software at all, it is the SQL engine that has to be updated to support offloading.

A modern example: FaaS. With FaaS, a developer builds up their applications with tiny part of codes that executes on (language) runtime and are orchestrated by an event infrastructure.

This sort of application is already ready to run one part of your host CPU and the other parts on the storage CPU! Surely, legacy software must be rewritten, or at least for the moment…