Cloud native series: DataStax on how cloud apps deal with data

This is a guest post for the Computer Weekly Developer Network written by Patrick McFadin in his role as role of VP of developer relations at DataStax,.

DataStax Enterprise is an always-on data platform powered by a distribution of Apache Cassandra,  an open source distributed database management system designed to handle large amounts of data across many commodity servers.

McFadin writes as follows:

If you are building an application today, you’re probably thinking about scale. Over time, web-based applications have gone from replicating traditional software designs to building their own best practices.

From Salesforce designing Software-as-a-Service as a delivery model in 1999, through the launch of services like Amazon Web Services and the increase of mobile-first applications on iPhone and Android phones, the role for cloud in application design has grown over time.

The data deal is a big deal

But what makes a cloud app (a native one) really different to those traditional applications is how they deal with data.

From centralised data stored in one place and served out to everyone that asked for it, cloud applications now demand that data gets spread across multiple locations.

When you have users around the world, hosting all your application data in one place is problematic.

A request can take an inordinate amount of time when everything is being held on the other side of the world. Even the speed of light isn’t fast enough. Shifting data closer to users can help reduce this, as well as helping to prevent data loss or outages.

What is cloud scale?

Working at cloud scale means dealing with hundreds of thousands or millions of users, all creating data all the time. Storing data in one place on a relational database can be difficult when it involves sharding data into multiple locations, all of which are filling up rapidly.

So a new approach around data is needed to help applications running in the cloud work. What elements should we be looking at? From a cloud application perspective, how data can be distributed consistently should be the first point of interest. Can data be stored in multiple locations, with copies of each record across those locations? Without this, it will be difficult to scale out successfully.

Linked into this distribution of data, applications should be ‘always on’ – that is, they should run all the time and avoid downtime.

Spreading data across multiple sites should provide protection against failure, but it should also mean that services can run during updates or patching. For teams looking at how cloud can make applications run more efficiently, this ability to keep running through updates should be a huge bonus.

Service matters

Alongside running the data storage side more efficiently, cloud applications should provide better service to users. Helping apps to run in real time – so that decisions are made while customers are using the app, rather than after they have carried out some interactions – is a critical area to look at. When customers can make decisions using better data, they get a better experience.

However, this is not just about making a recommendation for a product after the fact; you have to think about how that interaction can take place while someone is making their choice. Using data in real time like this affects a lot of other business decisions, not just software development ones.

Similarly, providing a better user experience through contextual use of data should also be considered. For example, a retailer can provide special offers based on a shopping list comparison to previous purchase behaviour over time while someone is in the store using their app. Providing a user experience that changes based on customer preferences improves utilisation.

Scalability matters

Lastly, scalability should be a given for cloud applications. The mention of a new application on social media can lead to huge increases in downloads compared to what was expected.

In cases like these, growing the back-end infrastructure can be difficult without the right choices in advance. Scaling up should not incur huge capital costs, so looking at the right software infrastructure is essential.

For cloud applications, managing data is still one of the biggest problems to solve. Without some thought, cloud applications can provide poor user experience when services have to scale. Distributed data architectures can help, but they have their own nuances to discover and bear in mind too.

Follow Patrick McFadin @PatrickMcFadin on Twitter.