Bad data visualisations threaten to spoil the promise of the new data journalism, Aron Pilhofer, head of the interactive news team at The New York Times told delegates at the London Strata conference this week.
The role of data scientists and of data science was a strong theme at the event, with spokespeople from The Guardian, The Economist and Bloomberg invoking data’s ongoing transformation of the working lives of journalists.
But could data-rich story-writing be automated? Business IT journalists used to writing about the automation of other processes might reflect on the possible computerisation of their own craft.
Kris Hammond, chief technology officer at Narrative Science, and professor of computer science and journalism at Northwestern University, described his company’s automatic turning of data into stories for Forbes.com.
“Forbes already had a product – earnings stories – and wanted to scale up,” he said.
The company’s technology starts with numerical data, but adds an “angles” stage, configured by journalists, a "structure" stage, and then puts stories into languages, usually English. The automatically generated earnings stories perform well on Google – four out of the top 10 is not unusual, said Hammond.
For more on data science, scientists, the media
Opportunities for programmer-journalists
In the panel discussion, at which Pilhofer identified the risk of a rash of bad data visualisations, made possible by easy-to-use tools, plaguing newsrooms, the opportunities for new programmer-journalists were clear. This is partly down to a lack of basic data literacy among existing journalists.
Nicolas Kayser-Bril, who trains journalists in France to work with data for Journalism++, lamented an over-concentration on “fancy bubbles” and a lack of rigour in understanding statistics.
For both panellists, little data, well understood, would do the trick.
Economic value of big data
Kenneth Cukier, data editor at The Economist, urged delegates to register the huge and genuine implications of big data. It really can be a new form of economic value, he said, such as when there is high price variability and a mass of data that can be scraped from the internet.
Farecast, an airfare prediction company, founded in 2003 and acquired by Microsoft in 2008, was his example. Data cast aside can also be put to work, as it has been in the field of neo-natal care at the Hospital for Sick Children (SickKids) in Toronto, Canada.
Simon Rogers, who edits The Guardian's Datablog and Datastore, told attendees about his team’s recent work, from representing medals data at the Olympics, through mapping out sponsorship of Academy schools, to tabulating Dr Who villains. The Guardian uses Tableau, Google Fusion tables, and is a heavy user of Google spreadsheets.
Marianne Bouchart, web producer at Bloomberg News, also touched on map-based visualisations. The subscription-based financial information company has made public such visualisations as the US “melting pot” map, based on Census Bureau’s 2010 American Community Survey, and its “Secret Liquidity Lifelines” of which banks were bailed out by the Federal Reserve during the financial meltdown between 2007 and 2010, based on 29,000 previously secret documents and 21,000 spreadsheets detailing 21,000+ loans.
Data, little or large, could be a rich seam for data scientists ready to work in the media.