Clearing blurred lines around real-time analytics

This is a guest blog post by Martin Willcox, director of big data, Teradata

Today’s increasingly crowded and competitive business environment has led to an explosion in the popularity of real-time and near real-time systems, with enterprises across the globe devoting more and more of their attention to these issues.

Despite this well-deserved attention, many businesses would be hard-pressed to define what truly constitutes “real-time.”

When a merchandiser at a big box retailer talks about “real-time analytics“, for example, he may actually want a sales dashboard that is updated several times a day. 

But when a marketing manager at a mobile telco talks about real-time analytics, she may want the capability to automatically send offers to customers within seconds of them tripping a geo-fence. 

And her friend in capital markets trading may have expectations of “real-time” systems that are measured in microseconds. 

Since appropriate solutions to these different problems typically require very different architectures, technologies and implementation patterns, knowing which “real-time” we are dealing with really matters.

Before you get started, pause to consider

Real-time systems are often about detecting an event – and then making a smart decision about how to react to it.  The Observe-Orient-Decide-Act or “OODA loop” gives us a useful way to model the decision-making process.  So what can a business leader do to minimise confusion when engaging with IT at the start of a real-time project?

1.    Understand how we will detect the event that we wish to respond to. Sometimes this is trivial.  Other times, rather tougher – especially if the “event” we care about is one when something that should happen does not. Or which represents the conjunction of multiple events from across the business.

2.    Clarify who will be making the decision – man, or machine?  The marque 2 eyeball has powers of discretion that machines sometimes lack. But its carbon-based owner is not only much slower than a silicon-based system, but is only able to make decisions one-at-a-time, one-after-another.  If we chose to put a human in the loop, we are normally in “please-update-my-dashboard-faster-and-more-often” territory.

3.    Being clear about decision latency is also important – how soon after a business event do we need to take a decision?  And implement it?  We will need also to understand whether decision latency and data latency are the same. Sometimes I can make a good decision now on the basis of older data. But sometimes I need the latest, greatest and most up-to-date information to make the right choices.

4.    Balance the often competing requirements of decision sophistication and data availability. Do we need to leverage more – and potentially older – data to take a good decision? Or can we make a “good enough” decision with less data?

Can you have your cake and eat it?

Consider this – I want to send you an offer in near real-time when you are within half-a-mile of a particular store or outlet.  I can do so solely on the basis of the fact that you have tripped a geo-fence – which means that the only information I need is your location – where you are right now

But what if I want first to understand whether I have made the same offer to you before, how you did or didn’t respond, which offers other customers who have previously exhibited similar behaviours to yours have or haven’t responded to in the last six months, etc…? Then I need also to access other data in addition to your current location that may be stored elsewhere, outwith the streaming system. 

In this case, the cost of choosing to give you a more sophisticated and personalised offer is the time it takes to fetch and process that data, so “good”, here, may be the enemy of “fast”. In this case, we might need to choose between “OK right now” or “great a little later”.  That trade-off is normally very dependent on use-case, channel and application.

Playing the game

Of course, I can try and game the system – by pre-computing next-best actions for a variety of different scenarios.  This way, I can try to be fast and good, by merely fetching the result of a complex calculation made with lots of data in response to an event that I have just detected, instead of actually getting the underlying data and running the numbers. 

 But then the price I pay is reduced flexibility and increased complexity.  And by definition, decision latency and data latency are different where we “cheat” like this, because I’m making the decision based on the data from our previous interactions, not the latest data.

There are different costs and benefits associated with all these options. There is no wrong answer – they are all more or less appropriate in different scenarios.  But make sure that you understand your requirements before IT starts evaluating streaming and in-memory technologies.