Big data holds out big promises to drive deeper analysis and valuable insights from significantly increased repositories of information.
While these have the potential to deliver real business value and change the way an organisation operates, achieving those goals requires a lens to make data easy to interrogate and clearly show the insight.
That is where data visualisation tools come in. These typically offer: The ability to rapidly incorporate new data sets; removal of restrictive metadata layers; a business, as opposed to a technical, tool (ie. intuitive with minimal coding required); and high performance.
These characteristics have challenged the more traditional visualisation tools provided by the larger suppliers. As a result, over the past four to five years, niche players have started to gain market share. Their success has arisen from the ability they offer the business to show powerful insight in a matter of weeks.
As a result of their flexibility, these tools are also now taking a lead in providing a direct integration with big data platforms, such as Hadoop and Cassandra. However, we now see the established suppliers starting to move to address this gap in the stack with new products coming to market, and expect to see some consolidation of the supplier landscape over the next year or two.
High performance is a particular challenge in big data landscapes. The nature of big data volumes and the query speed where tools integrate through Hive means that to query directly against the data will inhibit the dynamic capability of the tool.
The key use case for these tools is rapid discovery rather than the creation of standard reports. This means that the data required is transient in nature – required to support a hypothesis and then discarded. The leading practice approach to achieve high performance is to create specific data sets – through map-reduce jobs, for example – and capture these in the memory of the visualisation tools.
A key benefit of visualisation tools is that they change the approach to project delivery. Since they allow value to be visualised rapidly through prototypes, they can prove the value at a low cost point before being incorporated into an industrialised platform. As part of this process, the visualisation tools provide a common language by which IT and the business can communicate. This creates a clearer understanding of requirements and helps set expectations of what can be delivered.
More on data visualisation embedded in business practice
Excel on steroids
While there are many benefits from being able to visualise data rapidly, we also see an inherent danger of creating a steroid version of the age-old problem of the Excel spreadsheet/Access database proliferation.
As a consequence, there is no escaping the fact that underpinning big data visualisation must be a robust approach to data governance.
This creates the need for a hybrid environment. In practice that means data is first explored in the big data landscape; then, if and when those explorations reveal value that needs to be reported, the data is promoted into traditional relational databases, whether MPP or in-memory.
The final aspect that must not be overlooked is the impact these tools will have on the relationship between business and IT. Visualisation tools will empower the business, enabling rapid insights and the ability to drive higher value out of the data asset. Consequently, IT will need to provide data in a much more agile way.
This creates a dichotomy. On the one hand, information must be provided quickly to drive value outside of the more traditional gating process. On the other, there needs to be rigid governance through more traditional gated waterfall projects once the solution needs to be industrialised.
Failure to achieve the right balance will lead to frustrations and significantly reduced value.
Both groups have their responsibilities: the business to establish the insights and ensure these drive change in the way the organisation operates, and IT to provide a data service with the appropriate level of governance.
Given that this whole area is still evolving, we can expect to see the emergence of greater intelligence in how visualisation tools are able to index results. Those tools will start to predict user data requirements before the user requests them and start to create personalised memory caches, thus helping to address the performance challenge.
Current trends point to the emergence of a self-service analytics environment in which business users can set the parameters of their own investigations of an almost endless source of information, bounded only by the limits of their creativity. But traditional, more structured approaches and robust data governance will always have a vital role to play. They are not an impediment to the use of visualisation and big data – they are very much part of the solution.
About the authors
Nick Millman is EALA lead, digital, data and analytics technology, Accenture
William Gatehouse is senior manager, process and information management, Accenture
Image: Ingram Publishing/Thinkstock