Object-based File Systems come of age for HPC

With HPC raring to enter the exaflop realm, object-based file systems can ensure better I/O performance. We take a look at the leading options.

This article can also be found in the Premium Editorial Download: High-Performance Computing: How the latest file systems are impacting high-performance computing

With the 10 petaflop barrier breached in late 2011, the global high-performance computing (HPC) community’s dream of crossing into exascale (1,000 petaflops) territory by 2018 suddenly looks all the more realistic. Whether India will beat Japan, China, United States and others to this achievement is a matter of academic debate, but all face challenges that center on constraints in existing HPC architectures, including limitations in today’s file systems.

The file system is a critical component because it affects aspects such as read-write capabilities required by supercomputing setups. This is where object-based file systems can provide advantages over traditional methods such as distributed file systems and parallel file systems used in HPC.

Today, HPC architects have a choice of two different approaches to object-based file systems: one is open source and software-based (Lustre) and the other a bundled software/hardware system (Panasas). Both offer more I/O pos­sibilities than traditional parallel file system technology.    

Evolutionary road

Early HPC architectures relied on distributed file systems such as NFS and CIFS. In such setups, HPC clients face performance setbacks when applications try to simultaneously access storage across multiple cluster nodes. That’s largely due to I/O bottlenecks because clients are able to access just a single storage device at a time through these file systems.

Those deficiencies led to the development of parallel file systems such as IBM’s general parallel file system (GPFS), HP’s IBRIX Fusion, EMC Isilon’s OneFS, and EMC’s MPFSi. Parallel file systems provide higher performance by striping blocks of file data across multiple cluster nodes. These file systems typically distribute metadata about files at the storage device level to eliminate single points of failure. On this front, parallel file systems such as IBRIX Fusion differ from GPFS and its ilk in the way file systems are represented to HPC clients as a single global name space. These file systems are mounted on the clients before access. This enables scaling capabilities of up to 16 PB on IBRIX Fusion based systems.

Grey Clouds in Lustre-scape

It's critical to mention that the sustainability of Lustre in terms of support went under a cloud in the recent past. This was the aftermath of Oracle’s acquisition of Lustre’s owner Sun Microsystems in 2010. After the acquisition, Oracle announced cessation of Lustre 2.x development and put Lustre 1.8 in maintenance-only support (Queries to Oracle on this front were unanswered at the time of writing).

Lustre’s development and maintenance following the Oracle announcement were taken up by a group of open source community players. These include OpenSFS (Lustre feature development), Whamcloud (Lustre support and improvements), Xyratex (cluster file system solutions based on Lustre and EOFS (promotion of Lustre).

“Post Oracle’s acquisition, the community demonstrated really great solidarity and supported us to keep Lustre truly open source,” said Whamcloud CTO Eric Barton. “The formation of community projects like OpenSFS and EOFS in Europe has given great public support to what we are doing. We don’t see any conflict with Oracle as far as Lustre is concerned. Today, the general community feeling is that Lustre is in a strong position and will continue to flourish,” Barton said.

Parallel file systems usually have a high dependency on metadata managers. For example, locating stored data is impossible without the metadata manager. These block-based file system approaches, however, may create performance bottlenecks for HPC applications, since all data has to pass through the meta­data manager. “The issue with storage systems right now is that they are trying to supply POSIX semantics to applications. That’s a dead loss, since POSIX just gets in the way,” said Eric Barton, chief technology officer at Whamcloud.

This can create issues for the HPC community, which is forever in pursuit of higher performance and processing of larger data sets, the latest of these being the exascale goal. When it comes to the future of HPC, it’s critical to keep in mind that the code and other basic HPC building blocks will remain the same. “It’s not possible to tell HPC users that they will have to reinvent the wheel every time a new technology comes in,” said Goldi Misra, the group coordinator and head of HPC solutions group at India’s leading HPC player C-DAC. “We have to improvise on existing tools and software in our quest for performance; same goes for the file system. So maturing a parallel file system is the only option.”

As storage systems scale, there’s the need for a new parallel approach to managing large amounts of storage capacity. “To do a proper job with terabyte-and petabyte-scale systems, you need a parallel architecture that replaces the old model with a new architecture based on larger, more intelligent building blocks than a simple hard drive,” said Brent Welch, the principal architect at HPC storage player Panasas. This is where object-based file systems step into the HPC arena.

Parallel FS gets a rewrite

Although object-based file systems inherit much from their parallel file system predecessors, they function in a different manner. Specifically, object-based file systems rely on dedicated metadata servers for initiation and ongoing monitoring of file operations. The storage devices manage nitty-gritty such as allocation of actual data storage locations.

Once the operation starts, the metadata manager goes into monitoring mode. This eliminates performance bottlenecks because the metadata manager is not directly responsible. All communication occurs between the client and storage devices during that specific operation. This advantage of object-based file sys­tems ensures that data reallocation on storage devices can be performed seamlessly. The metadata manager has no involvement in such backend operations, thus increasing the flexibility and scalability that these file systems can offer. A few options are available when it comes to object-based file systems, the most popular of these being the open-source Lustre file system and PanFS—part of the ActiveStor storage solution from Panasas.

Lustre story

Among the two competing object-based file systems, the open source Lustre has a larger mindshare in Indian HPC circles. A case in point is C-DAC, which uses Lustre for its own HPC setups, as well as has customer implementations of the object-based file system on its roster. Lustre is predominantly present in Indian research and university application spaces, where it’s the file system of choice.

Lustre is predominantly present in Indian research and university application spaces, where it’s the file system of choice.

Anil Patrick R, Chief Editor, TechTarget India

Lustre’s strengths in the Indian HPC space include its support for scalability, open licensing and high I/O throughputs. Lustre 1.x versions came with the limitation of a single metadata server. This has been dealt with in Lustre 2.x, which allows implementation of multiple metadata servers.

Several support-related concerns surrounding Lustre were raised in recent years, which appear to be resolved, as explained in the “Grey Clouds in Lustre-scape”. Whamcloud has forged alliances with leading storage vendors, including Fujitsu, Dell and NetApp, for Lustre powered cluster solutions in the recent past, which seems to augur well for the object-based file system.

The PanFS angle

As opposed to the open source nature of Lustre, PanFS has been available from Panasas only as a packaged blade storage solution (ActiveStor) since 2004. Al­though the PanFS file system is architecturally similar to Lustre, Panasas prefers to compare the performance of ActiveStor against solutions from vendors such as EMC Isilon.

PanFS does compete with Lustre in the research and university HPC arenas, but Panasas seems to have its crosshairs on public sector and commercial applications. “Our approach is to take object-storage architecture into areas that use the product to solve common problems in design and discovery, as well as place a value on manageability, high availability and reliability features—not just on performance,” said Welch. “This allows us to easily support demanding big data applications in the bioscience, energy, government, finance and manufacturing as well as other core research and development sectors.” Panasas has partnerships with major clustered compute solution vendors such as Dell, HP and SGI for PanFS.

Road Ahead

The future of object-based file systems in HPC seems bright, with HPC percolating into industry verticals other than just academia and research. Increasing availability of commodity HPC clusters and emerging buzzwords like big data add further impetus to this trend.

Alongside these trends, object-based file systems have evolved over the years with the emergence of updated HPC and storage technologies. Some of the no­table developments on this front include the introduction of multi-core CPUs, GPUs and faster interconnect technologies. These building blocks can serve as a core fundamental backbone for HPC in the coming years.

“A tightly coupled solution inherently designed to extract performance at all levels of the architecture is of prime importance,” said Misra. “This includes the file system as well, in which a sea change is imminent. More work has to go in to optimize adaptability and high data throughput of file systems for heterogeneous platforms and hardware [which includes compute and storage].”


Read more on Storage management and strategy