Got a few million files that need constant management? If so, IBM has a filesystem for you.
Dubbed Scale out file services (SoFS), the file system is the descendent of the IBM’s General Parallel File System and is designed to facilitate massive network attached storage implementations that, according to Sven Oehme, Lead Architect for the IBM Filesystem Competence Center Development Group in the Linux Technology Center, “have a limit of 512 billion files.”
Oehme says SoFS has currently been deployed as a clustered NAS “with 1,500 nodes and sustained transfer rates of 150 gigabits per second.
The technology will also power a project for the USA’s Defense Advanced Research Projects Agency that Oehme says will store “more than a trillion objects and several petabytes of data, integrating tapes as sequential access media and achieving sustained transfer rates of four terabits per second from 30,000 nodes operating in parallel.”
SoFS achieves these impressive speeds by incorporating communications protocols that are far less “chatty” than others, thereby making it possible for storage devices to spend more time on I/O and less time on networking niceties. The filesystem automatically moves data to different tiers of storage according to policies set by users and can even migrate data to tape.
“We have a pre-migrate function so that as soon as you create the file in the filesystem, it exists for a couple of minutes and we automatically create a backup of the file onto tape.” The result is pre-populated archives.
Another benefit Oehme touted is speed. “We can scan 1 billion objects in 15 minutes on a small 10TB, 8 node cluster,” he says, with the resulting clustered NAS operational not long afterwards.
The filesystem also has an API, although Oehme says “no vendor has taken it up other than IBM with Tivoli Storage manager.” Nor is the filesystem available for hardware other than IBM’s own products. “SoFS is a fixed bundle of IBM hardware and hardware, all sold under a single license.”
IBM does not plan to change the latter arrangement, but Oehme hopes other vendors take advantage of the API to let their backup software address SoFS systems.
“[Symantec] NetBackup has shown interest but no real plan and we expect others to talk [to us soon],” he told searchstorage.com.au.
IBM is now working on a new release of SoFS, version 1.51, which Oehme said will include “caching of data on remote sites to speed access, instead of going over the wire.” The revision is due before Christmas.
SoFS already has at least one local user – GeoScience Australia – and Oehme’s visit to Australia saw him visit other potential customers, although IBM would not disclose the organisations expressing interesting in the software.