Ingo Menhard - Fotolia
Graph database and data visualisation software have been pressed into service for investigative journalists working on the huge Panama Papers leak.
By submitting your personal information, you agree that TechTarget and its partners may contact you regarding relevant content, products and special offers.
The leak of 11.5 million files, amounting to 2.6TB of data, from Panamanian law firm Mossack Fonseca to the Süddeutsche Zeitung has been generating controversy this week. The BBC and The Guardian also have the document trove, as does the Washington-based International Consortium of Investigative Journalists (ICIJ).
The ICIJ has been using graph database technology supplied by Neo4j alongside data discovery and visualisation software, Linkurious, which specialises in graph databases. The organisation has used these linked technologies for its Panama Papers project and, along with other media organisations, has drawn links between members of the tax-avoiding global elite and bank accounts held in offshore tax havens.
Emil Eifrem, chief executive officer of Neo4j, told Computer Weekly: “This leak could have happened 10 years ago and no one would have written about it.”
Eifrem said that only companies that have developed big data technologies since 2006, notably Google, whose “Big Table” paper was published that year, and Facebook have had the capacity to do the kind of analysis that lies behind the data journalism that has been running in The Guardian, the BBC and elsewhere.
The year 2006 was also when big data technology Hadoop was invented at Yahoo, he said. Government agencies, such as the NSA in the US and GCHQ in the UK, have also had this capacity, Eifrem added. “We are democratising that capability. And it is not just about counting words, but connecting the dots.”
Neo4j’s technology was also in play when there was a major leak of documents from 100,000 HSBC clients in 2015, he said. “But this leak is a greater order of magnitude than any other in human history.”
The ICIJ journalists who are using Linkurious on top of Neo4j are able to surface connections such as people who share the same address who are not formally married, with material connections to suspicious bank accounts used for money laundering, or other financial crimes and misdemeanours.
Mar Cabra, ICIJ’s data and research unit editor, said: “Neo4j is a revolutionary discovery tool that has transformed our investigative journalism process. This simply would not have been possible before on this scale. It’s magic.”
Read more about graph databases
- What is a graph database?
- Discover why the NSA and Facebook love graph databases.
- Graph databases map relationships between entities in a network. They won’t replace conventional relational databases, but for harnessing the value of interconnectedness, they mark a breakthrough.
The papers expose the internal workings of Panama-headquartered Mossack Fonseca, one of the world’s leading firms in incorporation of offshore entities.
Graph databases, such as Neo4j’s, use structures incorporating nodes, properties and edges to define and store data instead of using “tables” the way relational database do. They are used to map links between entitites.
Cabra said: “It [Neo4j and Linkurious] is a revolutionary discovery tool because relationships are all-important in telling you where the criminality lies, who works with whom, and so on.
“At least 11.5 million documents, far more than in any data leaks we have investigated before, meant we needed a technology that could handle these unprecedented volumes of highly connected data quickly, easily and efficiently.
“We also needed an easy-to-use and intuitive solution that did not require the intervention of any data scientist or developers, so journalists around the globe could work with the data, regardless of their technical abilities. Linkurious Enterprise was the best platform to explore this data and to share insights in a secure way.”
Eifrem added: “Graph databases are the only option when trying to make sense of the vast terabytes of connected data that we are producing more and more of, and are an essential tool for international agencies, governments, financial services and security firms trying to uncover the truth.”