Justin Sheehy (pictured) is the chief technology officer at NoSQL database company, Basho, whose European customers include the Danish health service. He directs Basho’s technical strategy, roadmap, and new research into storage and distributed systems.
He has a background in US intelligence community consulting. As principal scientist at Mitre Corporation he managed research projects for that community, including high assurance platforms, automated defensive cyber response and cryptographic protocol analysis.
He has also worked at Akamai Technologies, where he was a senior architect for systems infrastructure.
What follows is an edited version of an interview conducted with Sheehy at Computer Weekly’s office. In it he draws an analogy between the burgeoning NoSQL movement and the explosion of the Cambrian geological period that changed utterly life on Earth.
How do you find the data management space compared with the consulting work you've done for the intelligence community in the US?
Both kinds of work are very interesting from a technology point of view, but the kind of things I need to know to succeed right now don't keep me up at night! It is more about interesting problems regardless of domain.
Tell us about the problem spaces that Basho addresses
We started in January 2008, with a few of us sitting around at Akamai. A few months before that, the Amazon "Dynamo" paper had been published [Dynamo: Amazon’s Highly Available Key-value Store]. Here was a retailer that had to build its own database. They would rather have not done that, but they had to. So this illuminated a problem space in the market: it suggested that the market was not being served by the existing database vendors.
For more on NoSQL software suppliers
That was the spark that lit the NoSQL movement. I don't think there is such a thing as a NoSQL category, but there is NoSQL movement. And it is a movement against the database monoculture that we have inhabited for the past 20 years. It hasn’t always been that way, and it has much more to do with the architecture that with the languages you use.
Riak, our initial product, was written with those same [Amazon] principles in mind. It's all about availability, scalability and predictability – not as features but as foundations. That's led us to build something that is architecturally very different. It led our database to be used for our customers' business-critical purposes.
Why has there been a proliferation of NoSQL database companies? What's your explanation for the conditions of that?
The best analogy is the Cambrian explosion. You get this period of time in the environment where, all of a sudden, you get this burst of diversity. Amazon’s Dynamo paper was one of the triggers for this explosion. There was the monoculture that could be broken up and had been hegemonic for so long. People building software had forgotten that there was more than one way to do a database. People would make choices about the operating system, hardware and programming languages but they would not make an architectural choice about the database. There was a vendor choice, but not an architectural choice.
Now does that mean that every variant we have now is viable long-term? No it doesn't. After the Cambrian explosion there was the collapse. But I don't think we’ll go back to one winner. We are still in the middle of the explosion just now. The survivors will be those that fill a space of value in the market.
What means you will survive?
Because we are about critical data and we are about availability. I mean by that the ability to write data to make changes, not just read. Also we are really a distributed systems company. More and more software has to be built as a distributed system now. The vast majority of software in the past ran on a single computer and it is now the other way around. If your software matters, it has to be distributed, it has to run on multiple machines. So delivering robust distributed systems is right in the path of where all software is going right now. Those that don't have that expertise will find themselves in trouble.
Who lies outside that space, in your view?
The database model you get with Oracle, PostGres, and so on, is one that people have been building distributed systems around. However, what that is doing is not really building a distributed database. It is, essentially, putting a distributed system on top. Now, people have built working systems for their businesses that way. That's roughly the same approach you see with MongoDB, with sharding and replication.
With us, all the machines in the cluster are equal. That has a huge impact on both availability and scalability. It’s easier to manage and add machines. Otherwise, you are managing the complexity of master and slave machines, and so you have to think about that. This makes your planning for failure much more fragile. You can do, it but the humans in the system have to work so much harder.
What does your customer base look like in the UK and Europe?
Bet 365 is about to become a customer and the local variant of eBay. And we have Rovio, makers of Angry Birds, as a customer also. DVAG was one of our first large customers in Europe they have tens of thousands of financial advisers, using Riak CS, our other cloud storage product.
But the main thing is our customers use Riak for is their critical data. That is not always the biggest data set, but it is the data that, if you cannot manipulate it, your business cannot provide value. The canonical example is the shopping cart on an ecommerce site. If a customer can’t add something to a shopping cart, you are not letting them give you money. Or it could be the session storage on a gaming site -- the rest of the site is essentially pointless if you don't have that. Or take Yammer -- they are expanding their use of Riak, using for the notifications component. Another good example is Danish Health, the Danish NHS. They use Riak for their electronic patient record system. It is simply unacceptable for any healthcare professional not to be able to interact with that system.
Some of our customers do put everything they have in Riak. But the biggest customer trend is to have heterogeneity, to have a more polyglot data landscape: especially as their businesses get larger, and more complex.