New tool gives researchers a better look at online anonymous marketplaces

Daniel Tkacik

Oct 18, 2019

As you read this, cocaine, bounties, and other illicit products and services are being bought and sold on dozens of online anonymous marketplaces. These marketplaces are hard to shut down because they exist on networks that are buried under layers of encryption, making it exceedingly difficult to determine the identity of those involved.

To make matters worse for law enforcement, some prolific sellers will evade targeting by operating several accounts that appear to be individual sellers with smaller amounts of product. Law enforcement agencies are left with headaches, but they’re not the only ones. 

“When sellers use multiple accounts, it’s very difficult for researchers to get an accurate picture of what these marketplaces actually look like,” says Xiao Hui Tai, a former CyLab Ph.D. student in the Department of Statistics & Data Science. “Researchers and law enforcement both would like to know the true sizes of these underground markets.”

Researchers and law enforcement both would like to know the true sizes of these underground markets.

Xiao Hui Tai, Former CyLab Ph.D. student, Department of Statistics & Data Science

In a study presented at the Knowledge Discovery and Data (KDD) Mining Conference, Tai teamed up with two other researchers to develop an algorithm that is able to detect when seemingly disparate accounts belong to the same seller. The team tested their algorithm on eight years’ worth of data collected from a dozen online anonymous marketplaces.

“Our algorithm detected over 20,000 accounts belonging to roughly 15,000 individual sellers,” Tai said. “Some of these people were operating between two and 11 accounts.”

The algorithm worked by extracting account information – things like account names, products sold by those accounts, prices, where the accounts were shipping to and from, and the kinds of words used in the accounts’ profiles – and comparing them against one another. If two more accounts shared similar traits above a certain threshold, the algorithm matched them to the same seller.

Some of these people were operating between two and 11 accounts.

Xiao Hui Tai, Former CyLab Ph.D. student, Department of Statistics & Data Science

Also extracted by the algorithm was an account’s PGP key – a unique bit of code which allows buyers to encrypt and authenticate communications with sellers. While past studies have used PGP keys to match accounts to sellers, Tai’s study combined it with the characteristics described above for more accurate matching.

For example, the algorithm detected a collection of accounts that had the same PGP key, but labeled them all as being run by different sellers because other information in these accounts were so different. It turned out that the Dutch National Police had seized these accounts and posted the same PGP key for all of them; if anyone tried to communicate with the seller’s account, the police would be able to decrypt it.

“If we were to use only PGP keys to match accounts, we would have thought these all belonged to the same person,” Tai said. “But in fact, the model reassured us that they weren’t the same.”

Oftentimes, Tai said, accounts would impersonate other accounts operated by different sellers by using similar pieces of text in their profile. Impersonation would allow one account to piggy-back on another’s good reputation as a seller. 

“In one case, an account profile read, ‘There’s an account out there that is claiming to be us, but they’re just impersonating us,’" Tai said. “Using pieces of information other than just the profile text, the model was able to determine the accounts belonged to different sellers.”

In the end, Tai said, one of the major goals of law enforcement is to learn who the people are behind these accounts, and the matching algorithm is a step towards achieving that goal. 

“When you’re able to capture various pieces of information from different accounts and say they belong to the same person,” Tai said, “…then you can combine all of this information to help generate investigative leads.”

Other authors on the study included Electrical and Computer Engineering Ph.D. student Kyle Soska and Institute for Software Research and Engineering and Public Policy professor Nicolas Christin.