Traditional data visualisations are familiar to us all: bar charts, pies, histograms, scatter plots, dials, traffic lights, lines graphs, and maps with counties colour-coded for the latest cheery unemployment figures.
They have been the staples of government, science and business since Florence Nightingale invented the precursor of the pie chart. (Yes, that Florence Nightingale, as well as her night job with the lamp, she was also a highly competent statistician.) But, just as the world of data analysis is changing dramatically, so is the world of visualisation.
Tip: Think about the class of data you wish to visualise before choosing a tool. Is your data multi-dimensional?
As a general rule, we use visualisations to represent numerical values (which can also be called measures) against one or more dimensions:
In the chart above, the measure is the monthly sales values and time is the dimension by which the sales values are being displayed. In the mapping example mentioned above, the measure is the unemployment figure and the dimension is geography (and could be displayed as counties, postcode areas or arbitrary business-specific regions).
Multidimensional databases allow us to analyse multiple measures by multiple different dimensions; the problem is how we visualise these potentially complex results. The good news is that a range of tools is now available for this kind of visualisation. searching for ‘visualisation tools’ should get you started.
Hans Rosling: Data visualisation pioneer
One of the outstanding successes in this field is the work of Hans Rosling. He doesn’t want to sell you anything, he is a professor of global health at Sweden’s Karolinska Institute, and is genuinely fascinated by how to help people to understand the outcomes from complex analyses, especially in the field of world health.
His site www.gapminder.org, well worth looking at for stunning visualisations using size, colour and movement to communicate results. You can also download his software for free.
Tip: Or maybe your data isn’t multi-dimensional but really, really big?
Visualisation techniques are also moving into a whole new area, that of interpreting big data. Big data is a term that means data that cannot easily be turned into tables that can be stored and analysed in a relational database engine. Traces from seismographs and mass spectrometers, x-ray photographs and nuclear magnetic resonance (NMR) images are all examples of big data.
You can, of course, store the date of an NMR image, the patient’s name and so on in a relational database, but the meaning inherent in the image itself cannot be easily tabulated. The NMR image exists as a huge block of data and part of the process of producing a visualisation may simply to show a picture; but there is so much more we can do.
Imagine a system that scans the image for anomalies and presents just those to the consultant. In other words, the image is de-cluttered: the visualisation process removes all the normal stuff and leaves the anomalies highlighted. The trick, of course, is to write software that can spot anomalies; but modern analytical processes are getting very good at spotting exceptions.
Tip: Consider mapping
Many of us love maps for plotting data – and the technique is very commonly used for anything with a geographical dimension. Mapping requires the amalgamation of two sets of data, yours (the postcodes of your customers or whatever) and a monstrously large and complex set of data which comprises the map. Achieving this amalgamation used to be horribly costly: your only option was to buy and run GIS (Geographic Information System) software at great expense.
Happily these days several companies now provide a free, highly detailed zoom-able map of the entire world and most allow you some degree of interaction with it. It’s often possible for you to squirt in your data set and see a mapped visualisation.
It’s clear that creating good visualisations is as much an art as a science. There are many inspirational examples on the web, visualising everything from the scale of the universe (from quantum foam outwards) to the usage of Boris Bikes in London.
Tip: Put aside some time to explore examples of visualisation
Read the books of Edward R Tufte (Visual Explanations among others) and William Cleveland (Visualising Data). These discuss many ways of representing data and are highly recommended reading.
About the author
Mark Whitehorn works as a consultant for national and international companies. He specialises in the areas of databases, data analysis, data modeling, data warehousing and business intelligence (BI). He also holds the chair of Analytics at the University of Dundee where he works as an academic researcher, lecturer and runs a Masters programme in Business Intelligence.
This was first published in November 2012