Linux as a computational storage panacea?

Linux is widely agreed to be key to the onward development of computational storage.

As we have noted before, computational storage enables a software and hardware system to offload and alleviate constraints on existing compute, memory and storage.

So given this proposition, how should software engineers identify, target and architect specific elements of the total processing workload to reside closer to computational storage disks (CSDs) and what role can Linux play in this process?

The open ecosystem of open source Linux has been lauded as a key facilitating technology to creating the secure containerisation principles that express key elements of computational storage.

But what really matters most when we consider the role of Linux here?

To attempt to answer this question, Computer Weekly Open Source Insider spoke to Antonio Barbalace in his capacity as senior lecturer with the Institute for Computing Systems Architecture (ICSA) in the School of Informatics at The University of Edinburgh.

Barbalace writes in full as follows…

I don’t agree with this view [that Linux is the CSD saviour], which is clearly narrow and dictated by pioneers of the CSD technology (NGD) and its partner (ARM).

Linux is ‘probably’ the current answer to bringing CSD to market in a short time, with some security features due to use of containers, but I don’t see Linux as we know it on desktop/server computers playing a key role on CSD CPUs.

There are several reasons for this.

Linux CSD wake-up calls

Barbalace: gli piace l’archiviazione dei dati

First of all, security. Putting an entire Linux distribution on a CSD will make the CSD prone to attacks in the same way as your desktop/server computer is prone to attacks, which is certainly something that you don’t want – what is running on the CSD can access all your data uncontrolled!

Developers and data architects may opt to avoid giving security keys to code that is running on CSD that they cannot ensure has been booted from ‘untouched Linux’ and so has been not infected by rootkit, malware, etc.

NOTE: This is not a problem if the CSD is used exclusively by a single user and there is some access control in place.

Second, CSDs are like embedded systems, they are not as powerful as your desktop/server, they are limited in power consumption (if on PCI, around 12W), computational power and memory; therefore, it may not make sense to run large, bloated, software stacks developed for servers (see also my next point).

Note that it may make sense to run a single large, bloated, software stack, when your CSD is used by a single user for a very specific application, but absolutely not when your CSD is used in a multi-tenant environment – let’s not forget that the memory, compute, power capacity may still limit what you can do.

Application code offload

Despite all this, I do still think that Linux on the host will play a fundamental role as an enabler of the CSD technology.

My personal view is that CSD will be programmed similarly to smartNICs, i.e., by offloading part of the application code compiled in a restricted-ISA, such as eBPF and P4. (Why not WebAssembly? For example.)

A restricted-ISA has several key advantages: it can be formally verified/checked for properties, such as security, privacy, timing and it can run fast and at low power. Think that eBPF and P4 can run on network cards at line speed such as 100GbE or 200GbE today!