Despite the many and rapid changes in storage in recent years, such as the advent of flash and the move to cloud storage, there are still some fundamentals in place. Among these are the basics of how data is accessed, whether by file, block or object.
Object storage has been a rising star among these. It forms the basis of much of the basic storage provision offered by public cloud services. In the case of Amazon Web Services S3, it has even become something of a de facto standard that is in use more widely than just the AWS cloud.
But file and block access storage are still needed for particular use cases and make up the vast bulk of stored data, in the datacentre at least.
Yet organisations also want to use cloud compute and storage capacity and to burst workloads to the cloud when necessary. In many cases, that will involve applications that haven’t been developed as cloud-native, and so file and block storage will be needed.
In this first in a series of articles, we will look at file access storage provided in the big three public clouds: AWS, Microsoft Azure and Google Cloud Platform (GCP).
Other articles will look at virtual storage appliances and cloud instances from storage players in the cloud, as well as NAS gateways and distributed file systems that offer file access cloud storage by other methods.
Overview: Similarities and differences
All of the big three public cloud providers – AWS, Azure and GCP – offer native network-attached storage (NAS) storage services.
All three also offer higher-performing file storage based on NetApp storage.
Where Azure is different is that it provides file storage caching, aimed at providing low-latency access to a set of files in a single namespace, and it provides these in a number of service levels.
Amazon’s two main file storage offers – EFS (Elastic File Storage) and FSx (for Windows and Lustre) – are both Posix-compliant, which means they work with applications that demand, for example, file permissions, file locking capabilities, and a hierarchical directory structure via NFSv4.
Use cases targeted include big data analytics, web serving and content management, application development and testing, media workflows, database backups, and container storage.
EFS is NFS access file storage for Linux applications that can run on AWS compute instances or on-premises servers. It can scale to petabytes and comes in two service levels – standard and infrequent access (IA), with automated tiering between the two to place files in the most appropriate for their usage profile.
AWS says access to files is parallelised to achieve “high levels” of throughput (10GBps quoted) and input/output (I/O) performance (500,000 IOPS). It says the cost can be 8c per GB per month, assuming an 80/20 split between IA and Standard storage.
Amazon FSx for Windows File Server provides file storage accessible via the Windows-native SMB protocol and delivers features such as Access Control Lists (ACLs), user quotas, user file restore and Active Directory (AD) integration. Flash and spinning disk hard disk drive (HDD) media options are possible, and FSx storage is accessible from Windows, Linux, and MacOS compute instances and on-premise hardware.
Claimed performance comprises sub-millisecond latency, tens of GB per second throughput and millions of input/output operations per second (IOPS).
Amazon FSx for Lustre is targeted at file-based use cases such as machine learning and high-performance computing (HPC). It integrates with AWS S3 as a bulk data store at more cost-effective rates, with data presented in file format in FSx for Lustre.
Data is accessible from EC2 instances and from on-premise locations.
Azure’s cloud file storage options include native and NetApp-based performance options as well as varying levels of caching services.
Azure File provides fully managed file shares in the cloud accessible via Server Message Block (SMB) or REpresentational State Transfer (REST) that can support cloud or on-premise deployments of Windows, macOS and Linux.
Two service levels are offered in Azure File – standard and premium.
Being a Microsoft service you get the integrations you’d expect, such as Active Directory, and Azure positively encourages “lift and shift” of applications and data that can use Azure Files.
Meanwhile, Azure NetApp Files is billed as “enterprise grade” and provides file storage for Linux and Windows compute based on NetApp storage in the Azure cloud. It is aimed at performance-intensive applications such as SAP HANA, databases, HPC apps and enterprise web applications.
Access is via SMB and NFS and there are three performance/cost tiers available – standard, premium and ultra.
Microsoft Azure also offers some file storage caching services that are intended to provide speedier access to data for high performance workflows.
Azure HPC Cache is an NFS-connected service that provides single namespace storage for on-premise NAS or Azure-located application data, which can be file or Blob (object).
Meanwhile, as a result of Microsoft’s acquisition of Avere in 2018, Azure offers a couple of file-based caching type services based on its technology.
Avere vFXT for Azure is billed as “a high-performance caching service” and is a software-based service iteration of the FXT Edge Filer. The idea is that vFXT is used as a cloud-based file access cache that can allow HPC applications to run without being re-factored for the cloud. It is optimised for read-heavy workloads and presents a single namespace to applications.
Azure FXT Edge Filer is a hardware product and so falls slightly out of this survey. It is something like co-located hardware, offered as a service and is presumably the underpinning for the vFXT.
FXT Edge Filer works with customer NAS and Azure Blob and Amazon S3 storage to act as a high performance cache for HPC workloads. It will scale up to 24 nodes to provide claimed millions of IOPS and hundreds of GBps throughput. FXT comes in two models that differ chiefly in the amount of RAM and storage capacity.
Google Cloud Platform
GCP’s Cloud Filestore offers two performance tiers of NFS-connected file storage with up to 64TB of capacity per share. Premium offers much higher throughput and IOPS than standard, with 1.2GBps vs 100MBps read for the former and 60,000 vs 5,000 IOPS for the latter. Stated availability is 99.9% for both tiers.
Google is a bit more modest in its proposed use cases than some of the AWS and Azure cloud file storage offers. GCP targets video rendering, application workloads, web content management and home directories.
If you want more than the basic file storage offered by GCP, NetApp Cloud Volumes are also available. This is NFS and SMB-connected for Linux and Windows application workloads.
NetApp Cloud Volumes on GCP comes in three performance/cost tiers – standard, premium and extreme at $0.10, $0.20 and $0.30 per GB per month and range from 4,000 to 32,000 IOPS and throughput of 16MB to 128MB per TB.
Read more about cloud storage
- Cloud storage 101: File, block and object storage from the big three public cloud providers: AWS, Azure and GCP. We look at what’s on offer and the use cases they are aimed at.
- Avoid the cloud compliance trap. When you hand over data to a cloud provider, you don’t hand over responsibility for legal and regulatory compliance. Beware of falling into a cloud compliance trap.