vejaa - stock.adobe.com

SAN vs NAS: For AI, virtual machines and containers

We look at block vs file storage for contemporary workloads, and find it’s largely a case of trade-offs between cost, complexity and the level of performance you can settle for

SAN and NAS are distinct ways of deploying storage. They differ at fundamental levels of storage architecture, according to their relation to file system and block and physical addressing, but also the network or fabric by which input/output (I/O) travels.

Here, we look at the key differences in performance and applicability between block and file storage, focusing on key contemporary workloads in artificial intelligence (AI), virtual machines and containerised environments.

One way to view the difference between file storage (network-attached storage, or NAS) and block storage (storage area network, or SAN) is that they provide two ways to access stored data.

In both cases, a file system is required to intercede between the application that requests the data and the data in its physical storage. Where the two methods differ is how that plays out.

In NAS systems, file system and storage are bundled together in the same box. Users and applications request data from, for example, a letter-designated drive, such as the C: drive commonly referred to on PCs. The request goes to the file system on the NAS box, is translated to physical addressing and the file retrieved.

In SAN systems, the file system’s work is done elsewhere. The application or database requests data, with communications translated to physical addressing outside the SAN, and data retrieved in blocks.

Here lies the key difference, and as the name suggests, file storage serves up entire files while block storage delivers blocks.

What are the performance implications of NAS and SAN?

NAS servers carry out all their file system processing on board. They also communicate with clients via protocols such as SMB and NFS over the standard IP network. For these reasons, they are generally less well-performing than SAN block access storage.

SANs don’t have to deal with this processing overhead and are often furnished with the best-performing network connections, including dedicated high-speed networks – or fabrics in the correct SAN terminology – such as Fibre Channel.

For these reasons, NAS has historically been the choice for department-level file access while SAN has been preferred for performance-hungry work such as transaction processing.

Both can, however, be used for modern workloads such as AI, containerised applications and virtual machines (VMs), and in fact, very performant and scalable file access storage systems can be built from scale-out NAS hardware. 

SAN vs NAS for AI

SANs give the best performance overall, and that also translates to AI workloads. Excellent IOPS, latency and throughput – especially using NVMe-over-Fabrics connectivity – can feed hungry GPU clusters that need sustained bandwidth. That’s especially useful also for performance-demanding datasets in TFRecord, Parquet and LMDB formats, for example.

SANs excel at high-speed parallel writes with less protocol overhead than NAS and usually bring advanced multipathing and quality of service (QoS) mechanisms. SANs are well-suited for model checkpointing.

The drawback of SANs for AI workloads is that they are relatively expensive and complex, with skills in SAN management, Fibre Channel and NVMe-over-Fabrics less easy to come by.

Obviously, also, your AI stack will need a file system layer, such as Lustre, GPFS or BeeGFS.

NAS brings simpler file sharing across multiple compute nodes and better metadata handling than SAN. Its strong area in AI workloads is during inference, where performance demands are likely to be a little less extreme than training. In general, it’s going to be easier to set up than with a SAN.

I/O overheads such as handling SMB/NFS protocols can lead to slower throughput, while NAS may suffer during large sequential reads, writing checkpoints and dealing with datasets with millions of small files.

SAN vs NAS for virtual servers

SAN block storage can be deployed successfully for a range of hypervisors, with VMware ESXi, Hyper-V and KVM all able to use block storage – although via differing protocols – to store virtual disks.

With high performance across the network/fabric, SANs are well-suited to VMs that need transactional I/O for SQL servers and applications that depend on them, such as ERP.

Block storage is ideal for multiple VMs that need to access heavy workloads simultaneously. SANs generally bring advanced storage services such as thin provisioning, snapshots, replication and high availability (HA) controller failover.

As with any workload, the downsides for SAN are its cost, complexity and higher skills requirements.

NAS systems can also support VM workloads, with hypervisors that mount via NFS (VMware, KVM) or SMB3 (Hyper-V) and VM images stored on NFS, and application data shared via SMB/NFS.

Once again, the key strengths and weaknesses of SAN and NAS are evident in VM workloads – simplicity and relatively lower performance for NAS, performance and relative complexity for SAN.

SAN vs NAS for containerised workloads

Block storage, once again, comes out on top for performance, and that can make a difference in containerised workloads, which can run databases in containers – such as PostgreSQL, MySQL and MongoDB – AI training datasets or checkpoints inside pods.

High IOPS and low latency come with SANs, and they work well with high-throughput sequential or random I/O. Most SAN hardware products support snapshots, cloning and replication.

A chief negative is that standard block devices cannot be mounted simultaneously across nodes via ReadWriteMany – a feature of Kubernetes Persistent Volumes – without a clustered file system or CSI drivers that have to handle attach/detach logic properly.

SANs are costly, too, and can be overkill for lightweight or ephemeral container workloads.

NAS storage can be mounted by Kubernetes Persistent Volumes backed by NFS or SMB, often using a CSI driver. NAS also supports ReadWriteMany natively, unlike most block storage, so that multiple pods across nodes can access the same volume simultaneously.

As usual, NAS is easier to provision and scale than SAN, while protocol overhead (NFS, SMB) can reduce throughput and hotspots can occur if multiple pods read/write large files simultaneously.

Read more about block, file and object storage

Read more on AI and storage