AI can unmask anonymous accounts with up to 90% accuracy

Artificial intelligence applications are changing many things and one of them is Internet privacy. According to recent research, Commercially available AI agents can be used to identify real users behind anonymous accounts on social networks and other Internet platforms.

The conclusion is based on tests carried out that correlate specific individuals with accounts or posts on more than one social media platform. The success rate was much higher than that of classic deanonymization work, which relies on humans to assemble structured data sets suitable for algorithmic matching or the manual work of specialized researchers.

He recall (i.e. how many users were successfully deanonymized) reached up to 68%. The precision (the proportion of correct answers when correctly identifying the user) reached the 90%. The percentages increased as more publications and more user data were available to cross-reference.

‘Our findings have important implications for online privacy. The average Internet user has long operated under an implicit threat model in which assumed that a pseudonym provided adequate protection because targeted deanonymization would require considerable effort. LLMs invalidate that assumption.‘, say the researchers.

The ability to cheaply and quickly identify the people behind these anonymous accounts means that easier to expose them to doxxing, harassment, and detailed marketing profiling that track where they live, what they do for a living, and other personal information.

The researchers collected several data sets from public social media sites to test the techniques, preserving the privacy of those posting. For example, gathering publications from Hacker News of accounts that displayed an associated profile in LinkedIn and which were later anonymized to execute the LLM or large language models, which is the more technical name for the technology behind chatbots like ChatGPT either Gemini.

‘What we discovered is that these AI agents can do something that was very difficult before: Starting from free text, such as the anonymized transcript of an interview, they can reach the complete identity of a person. It’s a fairly new capability; ‘Previous re-identification approaches generally required structured data and two data sets with a similar schema that could be linked together,’ he said. Simon Lermenco-author of the article.

Unlike those older methods of eliminating anonymity, Lermen said that AI agents can navigate and interact with the web in many of the same ways as humans. They can use simulated reasoning to match potential individuals.

‘The accuracy of classic attacks drops very quickly, which explains their low recall. In contrast, the accuracy of LLM-based attacks degrades more gradually as the attacker makes more attempts. The classic attack almost fails completely even at moderately low accuracy levels. In contrast, even the simplest LLM attack (Search) achieves non-trivial recall with low precision, and extending it with Reason and Calibrate steps doubles the Recall @99% Precision‘, says the study.

The results show that LLMs, although still prone to false positives and other limitations, are rapidly outperforming more traditional, resource-intensive methods of identifying online users..

The researchers propose mitigation measures such as platforms imposing speed limits on API access to user data, detecting automated scraping, and restricting bulk data exports. LLM providers could also monitor for misuse of their models in deanonymization attacks and incorporate barriers that cause models to reject such requests.

They also warn that the governments They could use these techniques to unmask critical people, the companies could create customer profiles for ‘hyper-segmented’ advertising and cybercriminals they could build large-scale target profiles to launch highly personalized social engineering frauds.

‘Recent advances in LLM capabilities have made it clear that there is an urgent need to rethink various aspects of cybersecurity in the face of offensive cyber capabilities driven by LLM. Our work shows that the same is probably true for privacy,’ the researchers warn.