Couchbase CEO: How to understand benchmarks, from BS to belief

This is a guest post for the Computer Weekly Developer Network blog by Bob Wiederhold, CEO Couchbase.

Couchbase is company known for its open-source distributed NoSQL document-oriented database that is optimised for interactive applications

Wiederhold writes in light of recent NoSQL industry benchmarks comparing flagship products, which it has to be said… have been met with contrasting opinions.


So how do we know what to believe?

What benchmarks should be like

Benchmark tests may raise questions, however it’s essential that each report is open, reproducible and is not over-engineered to favour one solution over another.

Under these circumstances, competitive benchmarks are designed to provide valuable information to developers and ops engineers who are evaluating various tools and solutions.

More NoSQL usage necessitates more testing

The release of an increasing number of benchmarks isn’t surprising.During early phases of NoSQL adoption, benchmarks were somewhat less important because most users were experimenting with NoSQL or using it on lightweight applications that operated at small scale.

Since 2013, we’ve entered a different phase of NoSQL adoption, where appetite has grown, and organisations are deploying NoSQL for mission-critical applications operating at significant scale.

The use of benchmarks is increasing because performance at scale is critical for most of these applications.

Developers and ops engineers need to know which products perform best for their specific use cases and workloads.

Different benchmarks: different use cases

It’s entirely legitimate for benchmarks to focus on use cases and workloads that align with the target market and ‘sweet spots’ of the vendor’s products.

(CWDN Ed — this is Wiederhold’s ‘money shot’ killer line playing for validation isn’t it? The point is (relatively) impartially made and at least he is being candid enough to say it out loud. Doesn’t (quite) make it alright, but nearly. Let’s allow the gentleman to finish…

That doesn’t make them invalid, it just points out the importance of highlighting what those use cases and workloads are so developers and ops engineers can assess whether the benchmark is applicable to their specific situations.

Keeping It fair

To be useful, however, benchmarks need to be fair, transparent and open. Otherwise, they’re of little value to anyone, let alone the developers and engineers who depend on them to make an informed decision.

Vendors may complain that a benchmark isn’t fair because it’s focused on a use case and workload that’s not a sweet spot for them.

Those aren’t valid complaints. On the other hand, benchmarks need to make every effort to achieve an apples-to-apples comparison and, for example, use the most recent software versions.

These comparisons can be difficult, because the architectures and operational setups of each product are so different, but significant effort should be made to achieve this. Using the right version of software should be very easy to achieve and should promptly be fixed when it isn’t.

Keeping it transparent

Transparency implies at least two things:

(1) Clearly communicating the use cases and workloads that are being measured, and

(2) making the benchmarks open so others can reproduce them, verify the results, and modify them to align more closely with the specific use cases they care about.

A sign of NoSQL growth and adoption?

Vendors will continue to sponsor and publish benchmarks, and they’ll continue to gear them toward the use cases the vendor supports best.

All of this is just another indicator of the rising importance of NoSQL, which is growing fast. According to a recent report from Allied Market Research, the global NoSQL market is expected to reach $4.2 billion by 2020 – an annual growth rate of 35.1% from 2014 through 2020. When done fairly and transparently, competitive benchmarks can help enterprises choose the right product for their particular set of requirements.

Couchbase is very focused on supporting enterprise-class mission-critical applications that operate at significant scale with mixed read/write workloads. As a result, our benchmarks run on clusters with many servers and reflect those workloads.

We have recently seen some benchmarks focused on supporting applications that operate at much smaller scale and therefore tested with a small amount of data running on a single server.

Both are valid, but for completely different situations and users.