The move to scale-out network-attached storage (NAS) has brought predictable performance and saved large amounts of maintenance time, compared with the chaotic results of NAS sprawl that had arisen as the department’s storage estate grew.
The department’s data is largely the result of oceanography, atmosphere and climate modelling run on high-performance computing (HPC) systems outside the university that comes to Reading for further analysis.
The department started out as a Natural Environment Research Council (NERC) research centre at the university and all data was stored on separate NAS servers, with a variety of operating systems (OSs) including the Red Hat Linux clone, CentOS, and SUSE Linux Enterprise Server, with either ext3/4 or XFS file systems beneath them.
By 2010, there was about 50TB held on the department’s NAS boxes and the situation had become very difficult to manage.
“It was all in silos and a complete mix of vendors depending on the best deal we found to hold the data at the time,” said Dan Bretherton, research HPC manager in the department of meteorology at University of Reading.
At that time. the department started to have big problems with its NAS servers. There were some very large volumes of data, such as from the NEMO ocean circulation model, that would fill up entire storage servers. Often, to accommodate such data sets links were written between NAS boxes to make them look like one location.
The result, said Bretherton, was that all six NAS servers became interdependent in a really uncontrolled way. "Downtime scheduled for one server would make data unavailable from another. Performance was very unpredictable too. A processing run that took one hour on one day would take four hours or all day on another,” he said.
“We had to solve the problem,” said Bretherton. “We couldn’t afford a SAN, so had to look at a software solution that could balance across all the NAS boxes. GlusterFS [acquired by Red Hat in 2011 and rebranded Red Hat Storage Server] had all the features that we needed.”
The department deployed GlusterFS/Red Hat Storage Server on commodity x86-based NAS hardware totalling around 300TB in capacity. It is a scale-out NAS operating system that is part of Red Hat Enterprise Linux.
Scale-out or clustered NAS is a great improvement on traditional NAS, which is limited in terms of the numbers of files supported by the OS and file system, and also physically by the capacity of the NAS box. Scale-out NAS operates in a distributed fashion across many devices and can scale to very large numbers – often billions – of files.
Scale-out NAS allows users to build grids of NAS hardware instances with a global namespace so that the entire file system looks the same across all devices.
Red Hat Storage Server is overlaid onto a file system that resides on each NAS instance. XFS is the preferred file system and currently Bretherton is moving his department’s data to that format from ext3/4.
More on open source storage software
- Red Hat discloses RHEL roadmap
- Wellcome Trust maps big data genetics with DDN HPC storage
- Further education college graduates to near all-virtual with Dell Compellent SAN
- How OpenStack Storage fits in the larger open source OpenStack picture
- Will open source hypervisors ever be popular among enterprises?
A benefit for Bretherton is that if anything goes wrong with the Red Hat cluster, data is still accessible from the native file system on the NAS device.
The department initially deployed the community-supported GlusterFS, but according to Bretherton this was, “taking a lot of time, with community support that was a bit hit and miss”.
The benefits of Red Hat Storage Server, said Bretherton, are: “Compared to the community version we spent a lot less time maintaining it. The benefits compared to the pre-Gluster days are that it is much more predictable, with uniform performance. We can take servers out for maintenance without users knowing about it.
“I used to spend 40% of my time firefighting. Now that’s more like 15%, and I get time to spend on other things while Red Hat ticks over,” he added.
Bretherton also considered using IBM’s General Parallel File System (GPFS) as it is in use at two of the large NERC centres.
“But, even with support from IBM the NERC centres have had to put in a lot of effort to make GPFS work effectively, and the potential for getting it horribly wrong is very real,” said Bretherton. “[The open source storage platform] Ceph would probably fall into that category as well,” he added.
Other distributed storage options – GoogleFS (GFS) and OCFS (Oracle Cluster File System) – were rejected by Bretherton because they lacked the load balancing and high-availability (HA) features that GlusterFS had.
“Although it is possible to combine them with replication solutions like [Linux replication method] DRBD (Distributed Replicated Block Device) to provide HA that doesn't sound easy. HA works straight out of the box with GlusterFS and Red Hat storage,” he said.