Carnegie Mellon faculty, students present at the 31st USENIX Security Symposium

Aug 11, 2022

A number of Carnegie Mellon researchers are presenting on various topics at the 31st USENIX Security Symposium. Held in Boston, MA, the event brings together experts from around the world, highlighting the latest advances in security and privacy of computer systems and networks.

Here, we’ve compiled a list of papers, authored by CMU’s CyLab Security and Privacy Institute members, that are being presented at this year’s event.

Provably-Safe Multilingual Software Sandboxing using WebAssembly

Jay Bosamiya, Wen Shih Lim, Bryan Parno, Carnegie Mellon University

Winner of USENIX Distinguished Paper Award
Earned 2^nd Place Prize in the 2022 Internet Defense Prize Competition

Many applications, from the Web to smart contracts, need to safely execute untrusted code. Researchers at Carnegie Mellon have found WebAssembly (Wasm) is ideally to positioned to support such applications thanks to its promises around safety and performance, while serving as a compiler target for many high-level languages. However, Wasm’s safety guarantees are only as strong as the implementation that enforces them.

The group explores two distinct approaches to producing provably sandboxed Wasm code. One draws on traditional formal methods to produce mathematical, machine-checked proofs of safety. The second carefully embeds Wasm semantics in safe Rust code such that the Rust compiler can emit safe executable code with good performance.

Their implementation and evaluation of these two techniques indicate that leveraging Wasm provides provably-safe multilingual sandboxing with performance comparable to standard, unsafe approaches.

Lumos: Identifying and Localizing Diverse Hidden IoT Devices in an Unfamiliar Environment

Rahul Anand Sharma, Elahe Soltanaghaei, Anthony Rowe, and Vyas Sekar, Carnegie Mellon University

Hidden IoT devices are increasingly being used to snoop on users in hotel rooms or AirBnBs, but researchers at Carnegie Mellon University envision empowering users entering these unfamiliar environments to identify and locate diverse hidden devices, such as camera, microphones, and speakers, using their own personal devices.

What makes this challenging is the limited network visibility and physical access that a user has in such unfamiliar environments, coupled with the lack of specialized equipment.

Now, the team is introducing Lumos, a system that runs on commodity user devices, such as a smartphones or laptops, enabling users to identify and locate Wi-Fi-connected hidden IoT devices and visualize their presence using an augmented reality interface. Lumos addresses key challenges including identifying diverse devices using only coarse-grained wireless layer features, without IP/DNS layer information and without knowledge of the Wi-Fi channel assignments of the hidden devices and locating the identified IoT devices with respect to the user using only phone sensors and wireless signal strength measurements.

After evaluating Lumos across 44 different IoT devices spanning various types, models, and brands, the results show that Lumos can identify hidden devices with 95% accuracy and locate them with a median error of 1.5m within 30 minutes in a two-bedroom, 1000 sq. ft. apartment.

Measurement by Proxy: On the Accuracy of Online Marketplace Measurements

Alejandro Cuevas, Carnegie Mellon University; Fieke Miedema, Delft University of Technology; Kyle Soska, University of Illinois Urbana Champaign and Hikari Labs, Inc.; Nicolas Christin, Carnegie Mellon University and Hikari Labs, Inc.; Rolf van Wegberg, Delft University of Technology

A number of recent studies have investigated online anonymous ("dark web") marketplaces. Almost all leverage a "measurement-by-proxy" design, in which researchers scrape market public pages, and take buyer reviews as a proxy for actual transactions, to gain insights into market size and revenue. Yet, researcher say it remains unknown if and how this method biases results.

Now, researchers have built a framework to reason about marketplace measurement accuracy and use it to contrast estimates projected from scrapes of Hansa Market with data from a back-end database seized by the police. Through simulation, the group further investigates the impact of scraping frequency, consistency, and rate-limits, uncovering that even with a decent scraping regimen, one might miss approximately 46% of objects—with scraped listings differing significantly from not-scraped listings on price, views and product categories.

Experts say this bias also impacts revenue calculations. Findings show Hansa’s total market revenue to be $50M, which projections based on the team’s scrapes, underestimate by a factor of four. Simulations further show that studies based on one or two scrapes are likely to suffer from a very poor coverage (on average, 14% to 30%, respectively).

A high scraping frequency is crucial to achieve reliable coverage, even without a consistent scraping routine. When high-frequency scraping is difficult, for example, due to deployed anti-scraping countermeasures, innovative scraper design, such as scraping most popular listings first, helps improve coverage. Finally, abundance estimators can provide insights on population coverage when population sizes are unknown.

Augmenting Decompiler Output with Learned Variable Names and Types

Qibin Chen and Jeremy Lacomis, Carnegie Mellon University; Edward J. Schwartz, Carnegie Mellon University Software Engineering Institute; Claire Le Goues, Graham Neubig, and Bogdan Vasilescu, Carnegie Mellon University

Winner of USENIX Distinguished Paper Award

A common tool used by security professionals for reverse-engineering binaries found in the wild is the decompiler. A decompiler attempts to reverse compilation, transforming a binary to a higher-level language such as C. High-level languages ease reasoning about programs by providing useful abstractions such as loops, typed variables, and comments, but these abstractions are lost during compilation. Decompilers can deterministically reconstruct structural properties of code, but comments, variable names, and custom variable types are technically impossible to recover.

Researchers at Carnegie Mellon have developed a novel technique for improving the quality of decompiler output that automatically generates meaningful variable names and types. DIRTY is built on a Transformer-based neural network model and is trained on code automatically scraped from repositories on GitHub. DIRTY uses this model to postprocesses decompiled files, recommending variable types and names given their context. Empirical evaluation on a novel dataset of C code mined from GitHub shows that DIRTY outperforms prior work approaches by a sizable margin, recovering the original names written by developers 66.4% of the time and the original types 75.8% of the time.