Mystery surrounds leak of four billion user records

Threat researchers uncover four billion user records on a wide-open Elasticsearch server but who left them there is a mystery

Alex Scroxton, Security Editor

Published: 22 Nov 2019 16:30

Personal data relating to over a billion people, including email addresses, phone numbers, and LinkedIn and Facebook profile information, has been leaked online via an open and unsecured ElasticSearch server, but its actual source is shrouded in mystery.

The data was uncovered on 16 October 2019 by researchers Bob Diachenko and Vinny Troia of threat intelligence platform Data Viper. Diachenko and Troia accessed and downloaded the data via a web browser without any password or authentication needed.

In a blog posting detailing the disclosure, Troia said that he uncovered four Terabytes of data spanning four separate indexes, labelled “PDL” and “OXY”.

The first dataset contained, among other things, data on 1.5 billion unique individuals, a billion personal email addresses including work emails for millions of decision makers in Canada, the UK and the US, 420 million LinkedIn URLs, a billion Facebook URLs and IDs, over 400 million phone numbers and 200 million valid US mobile phone numbers. The second dataset contained scraped data from LinkedIn profiles, including information on recruiters.

Based on his analysis, he said, this led him to believe that the data originated at two data aggregation companies, People Data Labs and OxyData.io. However, on contacting both companies, as per Wired, which first reported the story, Troia was told the server in question did not belong to either of them.

Following further investigations, Troia revealed that he was unable to find any evidence to contradict these denials, even though he was able to determine that both the datasets matched up with data held by both firms. In particular, a crucial piece of evidence that would seem to exonerate People Data Labs was the fact that that its API appears to use AWS, while the unsecured Elasticsearch server was found in Google Cloud.

“This is an incredibly tricky and unusual situation,” wrote Troia. “The lion’s share of the data is marked as ‘PDL’, indicating that it originated from People Data Labs. However, as far as we can tell, the server that leaked the data is not associated with PDL.

Leak a big deal

While the leak lacks the sort of personal information – such as passwords or credit card details – that would render it valuable to cyber criminals, the fact that it exposes email addresses, phone numbers and social media profiles is still a big deal, according to CyberArk’s senior vice-president of EMEA, Rich Turner.

“[This] makes a phishing expedition or an attempt to otherwise find, profile and compromise high-value targets – individuals or organisations – that much easier,” he said.

“The vast amount of data in the repository contained enough intelligence and detail to launch a well-targeted campaign which would allow a motivated group or individuals to obtain access, credentials and other highly valued information.”

“Over the years, hundreds of billions of online accounts have been exposed, meaning that personal information on every human on the face of the earth has been stolen 20 times or more,” said Cybereason chief security officer Sam Curry.

“This latest exposure is like astronomy: billions and billions ceases to be personal or mean anything. In reality, this data breach is a stark reminder that consumers need to rethink their own security hygiene. Today, everyone should assume their private information has been stolen numerous times and will continue to be accessible to a growing number of threat actors.”

Mystery surrounds leak of four billion user records

Threat researchers uncover four billion user records on a wide-open Elasticsearch server but who left them there is a mystery

Read more about data breaches

Leak a big deal

Read more on Data breach incident management and recovery

Scale of MoD Afghan data breaches widens dramatically

Australian scaleup to bring AI-led data protection to the MoD

More data stolen in 2023 MOVEit attacks comes to light

National Public Data confirms breach, scope unknown