Web hosting provider Oxeva has delivered big data analytics without a big price tag, and it has done so with Pure Storage FlashBlade all-flash NAS hardware that wasn’t designed with such workloads in mind.
The move has allowed the Paris-based service provider to offer big data products for €150,000. From the consultants, even entry-level prices would be in the millions.
“FlashBlade is NAS and shares files via NFS to servers,” says Gabriel Barazer, technical director and co-founder at Oxeva.
“That’s not the classic model for big data infrastructure, where data is stored directly on servers via something like HDFS [Hadoop Distributed File System]. But, it has to be said that the benchmark infrastructure for big data has been a failure.”
Oxeva has been in the hosting provider business for 15 years. It targets customers that use a standard solution set and offers to take charge of all system administration. So, for example, at the outset that might mean maintenance of the customer’s physical web servers, but over time that would likely evolve towards running them on Oxeva’s own virtual servers.
“When an organisation comes to us we estimate the resources needed in terms of compute, storage and bandwidth by running some performance tests,” says Barazer. “Then we put it into production and make sure everything works.”
Big data architectures not suited to needs
Demand for big data hosting systems started coming through in 2017, but the specifications customers came with were usually very unrealistic.
“Enterprises came to us with preconceived ideas that were totally at odds with what we knew about. We were all about increasing reliability in dealing with data by storing data outside servers,” says Barazer.
“The model we were being presented with involved putting all disk inside servers to maximise performance, and even to increase the number of servers to get at least three copies. This model of architecture was one popularised since 2015 by the consultants, who were inspired by the experiments carried out by the web giants since 2011.
“But these architectures – with tens of thousands of drives in thousands of servers – were not suited to enterprises, which had need of, at best, 50TB to carry out their projects.”
Oxeva prides itself on operating at a human scale, with a policy of investment in development of bespoke systems that often use open source software. Among the technological principles it holds to are the separation of servers and storage.
“To improve the reliability of our services, we developed a system that allows us to boot physical servers without use of internal disk but instead via a preconfigured boot image accessed via that network and residing on storage,” says Barazer.
“That makes deployment of new machines possible in a matter of seconds in case of failure or scaling up. It’s this added value that underpins our revenues.”
The key idea, according to Barazer, is to allow the independent scaling of compute and storage.
“From experience, we reckon that that the amount of compute doesn’t vary on a project, because the applications are designed from the outset to run on a finite number of nodes,” he says. “But, it is impossible to know at the start of a project how much space data will take up.”
FlashBlade arrays replace server disk
Faced with big data’s technical requirements, the Oxeva team didn’t feel brave enough to build an external storage system of the required performance. The result was that in the first half of 2017 something of an impasse set in when it came to development of the company’s big data hosting offer.
Then, in summer everything changed. “I went to meet Pure Storage. Without a great deal of conviction, but just to keep up with things on the technical side,” says Barazer.
“They talked to me about FlashBlade, which is an all-flash array that distributes its storage via multiple networked connections without a single point of contention. That makes it very quick, and file-access mode is just what’s required by big data but without the need to put storage in the same node as compute. In other words, they had already created what [we] wanted.”
Doing Hadoop with NAS
The challenge was that FlashBlade wasn’t sold with big data in mind. It was aimed at NAS use cases via NFS but with SAN-like performance for critical applications that need to work in file mode, such as high-performance computing (HPC) and video, for example.
For big data systems at the time – which meant some variation of Hadoop like Cloudera or Hortonworks – access was usually via the HDFS file system.
“HDFS is important when each node in a cluster has to look for its data in another node. But Hadoop deployments are also designed to work with files on local disks,” says Barazer.
“We didn’t want to put disk into servers, but by NAS sharing access to servers via NFS that makes it look like locally accessible files.”
A key detail, however, is that Hadoop was designed with access to only a portion of the data in mind. But as the amount of data accessed grows, the more performance degrades.
Better than the consultants thought
Between October 2017 and February 2018, Oxeva deployed two Pure FlashBlade chassis with eight nodes of 17TB each, one to run project models and the other for production.
“We were very impressed by the speed,” says Barazer. “Each node delivered between 4GBps and 5GBps to hosts where previously we’d stalled at about 100MBps per server. We executed some test workloads under TeraSort [Hadoop testing benchmarks] and got results in the thousands of IOPS [input/output per second] and that was better than platforms designed for big data.”
Oxeva’s first big data customer signed at the beginning of 2018. It was an immediate success. “Not only did the platform work exactly as expected, but it wasn’t expensive either. And above all, working with us allowed the customer to dispense with the services of the consultant they had previously engaged,” says Barazer.
That saved a lot of rancour. According to Barazer, big data projects between 2015 and 2017 often headed towards dramatic conclusions. After clients had invested heavily in infrastructure, consultants sold services to them without transferring competences at the end of projects.
Machines that worked by themselves
As well as hosting and system administration, Oxeva offers complementary services such as backup.
“For now, our activity around big data is still paying off the purchase of the FlashBlade arrays. We don’t consider that a slow start, however. Big data projects are always lengthy to put in place, but I think it’ll bring good returns from this year on,” says Barazer, adding that there have been no issues with the FlashBlade arrays in two years.
“These are machines that work all on their own. Pure Storage carries out updates remotely. And if we have had an incident related to an update, Pure’s team has immediately known what to do. We haven’t even had an interruption of 30 seconds. It’s a vendor that provides confidence, and on a human scale.”
Oxeva plans to add to the number of FlashBlades in its arrays as its customers gradually increase their storage needs.
“We are serene. We don’t anticipate any urgency to increase capacity. FlashBlade compresses its contents in real time so customer demand increases slowly,” says Barazer.
“Finally, the practical benefit of FlashBlade will be that we have avoided having to invest in new R&D [research and development] storage to develop our big data expertise.”
Read more about analytics and storage
- Top storage technologists and analysts predict IT organisations will focus on revamping their storage architectures to make better use of data for analytics, AI and IoT in 2019.
- AI storage: Machine learning, deep learning and storage needs. Artificial intelligence workloads impact storage, with NVMe flash needed for GPU processing at the highest levels of performance – and there are other choices right through the AI data lifecycle.