Open source storage hits the mainstream

Open source storage has gained mainstream acceptance in high performance computing, analytics, object storage, cloud (OpenStack) and NAS use, but can it crack the enterprise?

Free and open source software is well accepted in today’s datacentres on the server side, to the extent that even Microsoft’s Azure runs a proportion of Linux servers, and tools such as KVM are popular alternatives to VMware for hyper-converged platforms.

Now, open source storage has begun to catch up, with, for example, open source file systems that have become the default for high-performance computing (HPC) and in other mainstream applications too.

Some are clustered or parallel file systems, which are designed to be spread or pooled across multiple storage nodes for redundancy and performance, and then mounted on multiple servers. The best-known example is Lustre.

Others are network file systems, such as Red Hat’s GlusterFS, and then there is the Sun/Oracle-derived file system and volume manager ZFS, which is widely implemented and underpins many other open source projects.

In addition, open source storage has made inroads into object storage with Ceph (also Red Hat-owned) and OpenStack Swift, as well as in big data and analytics, where Apache’s HDFS (Hadoop Distributed File System) is a leader. 

Too much DIY

The factor that previously slowed the acceptance of open source storage was that it involved so much more DIY.

An open source operating system will be designed to run on a range of hardware. Download the standard distribution package and install it on pretty much any mainstream commodity desktop or server, and it will discover the hardware and run.

Sure, you might – just might – find a Linux distribution that doesn’t support your age-old graphics card or disk controller, but in general it will work.

Open-source storage isn’t so easy. For a start, most users will already have storage that they know and trust and, more importantly, which already has their data on it. Switching storage platforms isn’t quite as simple as moving applications or virtual machines to a new server.

Tuning the storage system

There is also a lot of variation in the underlying hardware and the application’s performance requirements, so even where a standard storage software distribution could work – for example, if you buy commodity storage, which is essentially an x64 server with a shedload of disk or flash attached – there can be a fair amount of work required to tune the resulting storage system and customise it to specific needs and workloads.

That is changing though, and fast. More freely downloadable packages have appeared, albeit mainly aimed at trial users. And more companies offer services to help with hardware/software integration and tuning, and then to provide the sort of enterprise-grade support that we have come to expect from the one-stop-shop storage vendors.

One example is Red Hat, which sells services for Ceph and GlusterFS, branding the latter as Red Hat Storage Server.

An even bigger boost for enterprise respectability comes from Intel, which combines the Lustre parallel file system with technical services and support to create commercial packages such as Intel Foundation Edition for Lustre (IFEL).

“Enterprises seem to be picking up IFEL because it’s packaged,” says Laura Shepard, director of HPC markets at DDN, which builds high-performance storage appliances around Lustre and IBM’s proprietary GPFS file system (now Spectrum Scale), adding tools and a management layer.

Paradoxically, free and open source storage first saw greatest acceptance at the two opposite ends of the scale. At the high end are the HPC systems, but at the opposite extreme, some small and mid-sized businesses have used software such as FreeNAS, NAS4Free (both ZFS-based) or Openfiler to turn a suitable server into a NAS or block storage appliance.

Free software

Other examples potentially suitable for small and medium sized businesses include several companies distributing free versions of their software, although these sometimes come with limitations.

For example, Nexenta Systems’s community edition of its ZFS-based NexentaStor NAS server is limited to 18TB and lacks enterprise features such as replication and auto-sync.

Shepard says one reason for the acceptance of open source storage at the high end is that the sort of organisations that demand the highest possible storage performance for their HPC work, such as government-funded national laboratories, are also the kind of organisations that have highly skilled academics on hand who can build and maintain such systems using open source.

Meanwhile, the big private sector users of HPC – she suggests there could be as few as 10 petascale commercial computing systems in the world, all of them in the oil and gas industry – are similarly structured.

“At that scale they start to look like the national labs, so they have teams who can support large Lustre implementations,” she adds.

Indeed, the very openness of open source storage makes it easy for them to adapt and tune to their specific needs.

Also, an important aspect is that by using open source storage you attract the sort of staff that these projects need.

“With open source there’s a sense of community, which means you can ask questions and you have a voice. You aren’t just fed things by a vendor,” Shepard says. “You have people moving around and you need those people who would otherwise be working on data science in a national lab, say.”

On the debit side, these are massive development projects, warns Giorgio Regni, founder and CTO at Scality, which has open sourced its S3 Server but not its core object storage technology.

“It’s a big, complex distributed system, not something relatively easy like a web server or backup server,” he says. 

“We need hundreds of servers just to test our code. Ceph is the same. It takes six months just to understand the code before you can start contributing, and then your contributions need to be reviewed and tested.”

Growth in open source use

However, as the DIY challenge diminishes and the availability of commercial-quality support grows, the deliberate use of open source software for storage is moving beyond the tech-savvy hinterlands at the far ends of the storage scale.

Even at the lower end, where most of the earlier free and open source storage projects aimed to fill the gap between simple home NAS servers and sophisticated enterprise-grade storage from the major hardware suppliers, that is no longer the case.

Several things have catalysed this, say experts in the field. One is that greater acceptance and use of open source storage at the infrastructure level is feeding through into other parts of the datacentre. Another is that as well as the commercial organisations that offer the technical services needed to implement open source storage at the enterprise level, there are companies using it to build off-the-shelf storage appliances.

“It’s all in tuning the stack and making it turnkey,” says Eren Niazi, who founded his company Open Source Storage to build complete systems based on commodity hardware and an open source framework that includes ZFS.

“People want simplicity, something that can be up and running in 15 minutes,” he adds. “We include a three-year support contract too – open source users do need support. There’s questions at first, then six months later there’s packages to update and security to consider, and then as the storage expands you need to add more hardware.”

Add the almost accidental use of open source storage software, packaged within appliances or other open source platforms, and newer open source storage projects such as Ceph and OpenStack Swift, both of which are object storage platforms and are squarely aimed at the enterprise, and all the indications are that open source storage is not just ready for the mainstream – it is already there.

