After years spent finding and investigating data breaches, Greg Pollock admits that when he comes across yet another exposed database full of passwords and Social Security numbers, “I come to it with some fatigue.” But Pollock, director of research at the cybersecurity company UpGuard, says he and his colleagues found an exposed, publicly accessible database online in January that appeared to contain a trove of Americans’ sensitive personal data so massive that his weariness lifted and they sprang to action to validate the finding.
The UpGuard researchers point out that not all of the records represent unique, valid information, but the raw totals they found in the January exposure included roughly 3 billion email addresses and passwords as well as about 2.7 billion records that included Social Security numbers. It was unclear who had set up the database, but it seemed to contain personal details that may have been cobbled together from multiple historic data breaches—including, perhaps, the trove from the 2024 breach of the background-checking service National Public Data. It is common for data brokers and cybercriminals to combine and recombine old datasets, but the scale and the potential quantity of Social Security numbers—even if only a fraction of them were real—was striking.
“Every week, there’s another finding where it looks big on paper, but it’s probably not very novel,” Pollock says. “So I was surprised when I started digging into the specific cases here to validate the data. In some cases, the identities in this data breach are at risk because they have been exposed, but they have not yet been exploited.”
The data was hosted by the German cloud provider Hetzner. Since Pollock could not identify an owner of the database to contact, he notified Hetzner on January 16. The company, in turn, said it notified its customer, which removed the data on January 21.
Hetzner did not provide WIRED with comment ahead of publication.
The researchers did not download the entire dataset for analysis due to its size and sensitivity. Instead they worked with a sample of 2.8 million records—a tiny fraction of the total trove. By analyzing trends in the data, including the popularity of certain cultural references in passwords, they concluded that much of the data likely dates to the United States in roughly 2015. For example, passwords referencing One Direction, Fall Out Boy, and Taylor Swift were very common. Meanwhile, references to Blackpink, Katseye, and Btsarmy were just barely beginning to show up.
Old data is still valuable for two reasons. First, people often reuse the same email address and password, or a variation of the password, across many different websites and services. This means that cybercriminals can keep trying the same login credentials for the same people over time. The second reason is that people’s Social Security numbers are often linked to their most sensitive and high-stakes data but almost never change during their lifetimes. As a result, valid SSNs are one of the crown jewels of identity theft for attackers.
In the sample of data the researchers reviewed, Pollock says that one in four Social Security numbers appeared to be valid and legitimate. The sample was too small to extrapolate to the entire dataset, but a quarter of all the records containing SSNs would be 675 million. A fraction of that would still represent a very significant set of Social Security numbers.
To verify the data, UpGuard researchers contacted a handful of people whose data appeared in the leaked trove. Pollock emphasizes that one of the most concerning findings from speaking to those individuals was that not all of them have had their identities stolen or suffered hacks. In other words, there was information in the database that has not been exploited by cybercriminals—and potential victims don’t necessarily know that their information has been exposed.








