We’ve grown accustomed to seeing data leaks on a daily basis but every now and then one of them is a spectacular doozy. All on one server with 4 billion user accounts involved, the sheer number of records could populate a small galaxy. The only fortunate thing, if anything fortunate could be said about this or any data breach or data leak, is that the data itself was not as critically personal as it often is – i.e., no social security numbers, credit card data or passwords in this leak.
The data this time is more social in nature – affecting Facebook, Twitter and LinkedIn profiles – and including cell phone numbers, home numbers, email addresses, work histories and other profile information. Four terabytes of personal info was exposed some of the records are duplicates so the unique number of users affected is over 1.2 billion, thus ranking as one of the largest data leaks ever.
Dark Web researcher Vinny Troja, while searching for other leaks with colleague Bob Diachenko, discovered the exposed Elasticsearch server on October 16.
The data appears to have mixed origins and therefore isn’t clearly identifiable yet. Troja discovered three of the four datasets coming from San Franscisco data broker People Data Labs (PDL). PDL offers for sale on its own website the data of 1.5 billion people, 260 million of which are in the US. Among the data they promote, they boast over a billion personal emails, Facebook URLs and IDs, 420 LinkedIn URLs, 400 million personal phone numbers (200 million US). However, PDL cofounder states PDL does not own the server that held the exposed data. Researchers have confirmed this is likely true though they can’t yet identify how the leaked data got there.
A fourth data set is tagged OXY, likely for Oxydata based in Wyoming. This data represented 380 million consumer profiles and employees in 85 industries, 195 countries.
This leak ranks with other mega leaks and breaches that have occurred. In March this year, researchers Troja and Diachenko made another discovery of 809 million exposed records from Verifications.io. In 2018, Exactis marketing firm leaked 340 million personal records, and Apollo also breached billions of data points.
The Elasticsearch Server holding all the 1.2 billion records of personal information of this particular breach was unguarded and could be accessed by browser at http://35.199.58.125:9200. Anyone visiting that address was not asked for a password, authentication or any kind of identifying or restricting requirement in order to access the data.
Elasticsearch different indexes (databases) on the exposed server
Data enrichment companies played a role here where users’ social profiles were victimized in this leak. These companies provide additional (“enriched”) information on single pieces of information. They don’t charge a lot of money and their services increase user profile data considerably – up to hundreds of new data points. This can include household, financial, income, political and religious information.
No one oversees the resulting information and the door is open for a person’s personal and social information to be accessed easily.
The exposed IP address, http://35.199.58.125, was hosted with Google Cloud, but data in the cloud is protected by privacy. The FBI can make requests but doesn’t have the authority to demand an organization to announce a breach. And the question still remains of who is responsible – PDL as the data owner, or the owner of the URL http://35.199.58.125. A court order may be required to get enough information to make the determination.
The data exposed appears to have been handled by at least two “data enrichment companies.” These organizations aren’t so different from the credit reporting agencies that collect our data. Oftentimes, we don’t know what’s in there, and there’s little recourse to correct it. Well-founded privacy concerns are the major impetus behind the California Consumer Privacy Act, GDPR & other state and national privacy laws now in the works. The goal of these is to enable users to explicitly control their data that’s “out there.” There’s been no “opt-in” for consumers who don’t want their data shared, and now the challenge is how to put the Genie back in the bottle.
The time to act is NOW. The reality is that the compiled and consolidated data that massive companies are now monetizing is a small fraction of what will be exposed in the years to come. As more companies use increasingly advanced AI to predict consumer behavior, there is enormous potential for both intrusions into and limitations on the average consumers’ life.
Religious preferences, social activities, spending patterns, educational potential and more may become mere data points by which consumers are targeted or limited. Just as so many companies are now using consumer behavioral data to predict shopping, travel patterns, and more, they could use customer data, including illegally sourced data, in ways that have the potential to be detrimental on entirely new levels.
The data Genie is growing daily. It’s urgent that authorities pass and uniformly enforce laws to give legal control to consumers over their data. It’s equally urgent that individuals today invoke greater care of their data in the absence of such laws, and that companies be far more diligent with data collected than we’ve seen in these last few years.
Read full 1.2 Billion Records Found Exposed Online in a Single Server article
Read full Personal and Social Information of 1.2 Billion People Discovered in Massive Data Leak article
Further resources
Less Than 100 Days Till A New California Privacy Law Goes into Effect
FTC Fines Equifax up to $700M for 2017 Data Breach
Equifax web app breach exposed data of 143 million consumers
Yahoo data breach found to affect all 3 billion users
FTC Approves Record $5B Fine for Facebook
White Paper: Making Applications Truly Self-Defending