Konstantin Emelyanov - Fotolia
Tiered storage means matching data to the cost of – and therefore the performance of –media. So, your most critical data should go on super-fast solid state. Shouldn’t it?
Not according to Panasas recently added tiering based on file size to its PanFS scale-out NAS.
Panasas’s Dynamic Data Acceleration is a move aimed at providing customers with storage to suit a wide variety of HPC (high performance computing) and AI/ML workloads by exploiting the speed of SSD for small files and the massed throughput of HDD for large files.
Dynamic Data Acceleration tiers data to different media within the storage system, but not by usage characteristics. Its tiering by file size claims a 2x performance advantage in GBps terms over file system rivals BeeGFS, Lustre and IBM’s GPFS/Spectrum Scale.
It all sounds a bit counter-intuitive, because you want your most performance-hungry data on your most performant storage, don’t you?
Well, yes, but Panasas is convinced that tiering by file size is a better way of achieving that.
What happens in Dynamic Data Acceleration is that on ingest all metadata is placed on super-fast NVDIMM. Meanwhile, small files are routed to file system storage on low latency, high-bandwidth solid state drives (SSD) and larger files head to low-cost high-capacity spinning disk HDDs. “We think our approach is better than tiering by temperature [ie, data usage],” said Curtis Anderson, senior software architect at Panasas.
Read more about tiered storage
- Cache vs tier: What’s the difference between cache and storage? We look at the key distinctions between cache and tiers of storage, and where the line has become blurred with fast flash, 3D Xpoint and storage class memory technologies.
- Flash storage tiers: Performance, cost and use cases. We look at performance, cost and potential use cases for the key flash storage types available, such as Nand flash, 3D Xpoint and Intel Optane, plus SAS, SATA and NVMe connectivity.
The core idea is that file size is the key variable in what’s required of storage for HPC and AI workloads. In other words, at small file sizes I/O is key and providing IOPS via SSD is required. When file sizes get larger it’s all about sequential access with bandwidth provided by multiple hard drives in the Panasas parallel file system, PanFS.
“With traditional data temperature-based tiering it can be complex, with the customer needing to manage the tiers and the management of data between them. You also end up with hot data that is on very performant media and cold data on slower media,” said Anderson. “So, what you can end up with is inconsistent performance, if you haven’t run a particular application for a week, for example.
“If we base tiering on size then HDDs are always higher-performing because they are being used in the most efficient way possible, delivering 180MBps each and doing what they were designed to do,” he added. “The HDDS contribute to performance and aren’t isolated in the cold tier.”
Meanwhile, Panasas SSDs deliver 500MBps but are targeted at delivering IOPS rather than bandwidth.
Another temperature-based tiering drawback pointed out by Panasas is that you always need your hot data storage tier to be as big as your working set. If it isn’t then you potentially have to wait for the cold tier.
Dynamic Data Acceleration comes in the PanFS parallel file system run by Panasas’s ActiveStor Ultra scale-out NAS nodes. These come with six HDD of sizes that can be specified between 4TB and 16TB with customer-sizeable SSD, plus an NVMe tier and NVDIMM storage. Besides the new tiering functionality, that configuration also introduces more storage media choices than was the case previously.
Key use cases targeted are HPC and AI/ML where workloads are expected to be many and varied. The idea is that tiering by size will result in predictable performance across these workloads.