McCarony - Fotolia
The maximum file size Isilon can handle is now up to 16TB (from 4TB), data deduplication is now inline and network file system (NFS) access is optimised for real-time performance and the addition of new nodes. Isilon now also claims compatibility with Kubernetes via its own Container Storage Interface (CSI) volume plugin.
This latest iteration of OneFS allows it to keep pace with its “traditional” market in media, but also to target medical and satellite imaging, with much larger file sizes. It also provides high rates of availability and caters for newer elastic and parallel use cases such as containers.
Cyrille d’Achon, responsible for pre-sales in unstructured data storage for Dell EMC, said: “Getting past the 4TB limit on file size has been a demand of customers in video production who have wanted to be able to make changes to film from original shoots where file size has climbed towards 16TB. This allows, for example, recurrent changes to background texture in a scene in one pass.
“That size of file allows for the combination of very high-definition images, as in spatial or medical imaging where you may want to add information in layers to augment analytical efficiency.”
There will be benefits for more mainstream use cases too, said d’Achon. “In more traditional systems, that means there’s no more need to cut up virtual machine or databases snapshots into three or four segments of 4TB. It’s true this segmentation has been something that could be done by OneFS, but to no longer have to reconcile several segments of a file will amount to time saving for applications,” he added.
OneFS is built as scale-out storage, where nodes build into clusters, so it’s not necessary for hard disk drives (HDDs) to be 16TB in capacity to accommodate such file sizes. Files are cut up into chunks and distributed across several nodes, of which there can be 252 in an Isilon cluster.
Inline deduplication for HDDs
Inline data deduplication – in which data is processed as it is written and not post-process – already exists in Isilon nodes with solid state drives, such as the F810 where inline deduplication aims to preserve the life of the flash media.
OneFS 8.2.2 adds inline deduplication to H5600 nodes that have spinning disk HDDs, but why do that when the lifetimes of mechanical drives aren’t degraded by the volume of writes?
“You could carry out deduplication post-process when you have periods of inactivity on the cluster,” said d’Achon. “But Isilon clusters are increasingly used 24/24 because customers want to run analytics operations between periods of data ingestion. So, deduplication has been seen as a non-priority process and we’ve often found it hasn’t taken place.”
Deduplication gets rid of duplicates in the data and so economises on disk space. But H5600 users had been seeing capacities reduced more quickly than normal because deduplication hadn’t been run.
“There was a good reason to do deduplication post-process, so that it would not saturate available bandwidth and use processing power for data that might need to accessed,” said d’Achon.
“But we got around this problem. In OneFS 8.2.2, we can now monitor access server access via NFS so that deduplication isn’t started except at times when access is less intensive.”
The H5600 was launched less than a year ago and packs 80 drives of 10TB into 6U of rack space. Management of access and deduplication is guaranteed by four Xeon processors with eight SSDs to provide cache. The H5600 supports bandwidth of 8GBps and raw capacity of 800TB, which rises into petabytes after deduplication.
Better load balancing
Monitoring capability already exists for access via server message block (SMB) for Microsoft Windows, but the majority of Isilon customers use Linux servers.
“In SMB, the way things were done was simple because it was based on Active Directory,” said d’Achon. “What’s new is that now we can bypass that. For now, it allows us to spot which IP address on the network is accessing what on the NAS . But all the foundations are laid for OneFS to be capable in a future version of analysing application load in real time and balancing access optimally.”
One of the first applications of this better load balancing is in node updates. Until now, this was carried out sequentially, one node after the other to avoid hitting working bandwidth too hard, but the downside is that it takes a long time to update an entire cluster.
In version 8.2.2, OneFS partitions access so effectively that it becomes possible to update several nodes at the same time without penalising bandwidth. Dell EMC suggests updating is done in this way by batches of a few nodes at a time – for example, by updating four nodes at the same time in a cluster of 40.
“In this example, updating takes four-times less, which means customers can work on periods of maintenance that are shorter,” said d’Achon.
Dedicated Kubernetes CSI
Opening up Isilon to Kubernetes is affected with the arrival of a dedicated CSI controller, which is a driver that presents a tier of storage to applications in containers as if it were local disk.
Without CSI, an application in a container believes it is writing to the host file system but it is actually saving it to the container virtual image. And because this image only exists in RAM, all the data disappears with the extinction of the container. CSI changes this behaviour by diverting writes to a physical tier of storage, and so makes data retention persistent.
A principle of the DevOps movement is to shun use of containers except within the framework of web applications, which can save data in a durable way by sending an HTTP request to external object storage.
But with Kubernetes becoming a layer of infrastructure in the datacentre, CSIs have been devised as a way to mimic the behaviour of traditional application servers that read or write files with Posix commands or user rights, for example.
“Our clients still develop traditional applications that handle data in file access format,” said d’Achon. “But they’re doing that now using Kubernetes to make them more elastic. So we must respond to their need for persistent storage.”
Ordinarily, a CSI is developed specifically for a storage supplier and in block mode so that Kubernetes can interface with its disk arrays. Meanwhile, in NAS mode such as with Isilon, an NFS CSI is usually sufficient.
“That’s true,” said d’Achon. “But we wanted to develop our own CSI because we won’t have been able to gain support and buy-in from our customers if we’d just pointed them towards third-party open source CSI.”
Read more about scale-out NAS
- Despite the rise of cloud and object storage, scale-out NAS is a key choice for the big datasets increasingly prevalent in artificial intelligence and machine learning scenarios.
- There’s a scrap breaking out between object storage and scale-out NAS. The battleground is the market for customers that need to deal with very large amounts of unstructured data.