CyLab researchers to present at PLDI 2025

Michael Cunningham

Jun 17, 2025

PLDI logo graphic

Carnegie Mellon University researchers are set to present five papers and chair one program session at the 46th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2025).

The conference will take place in Seoul, South Korea from on June 16th through the 20th. PLDI is the premier forum in the field of programming languages and programming systems research, covering the areas of design, implementation, theory, applications, and performance.

Below, we’ve compiled a list of presentations led by Carnegie Mellon researchers at this year’s event. In addition to the papers listed below, Feras Saad, assistant professor in the CMU Computer Science Department, will be chairing Wednesday’s program session on Machine Learning.

Random Variate Generation with Formal Guarantees

Researchers: Feras Saad, Carnegie Mellon University; Wonyeol Lee, POSTECH

Abstract: Generating random variates is a fundamental operation in diverse areas of computer science and is supported in almost all modern programming languages. Traditional software libraries for random variate generation are grounded in the idealized “Real-RAM” model of computation, where algorithms are assumed to be able to access uniformly distributed real numbers from the unit interval and compute with infinite-precision real arithmetic. These assumptions are unrealistic, as any software implementation of a Real-RAM algorithm on a physical computer can instead access a stream of individual random bits and computes with finite-precision arithmetic. As a result, existing libraries have few theoretical guarantees in practice. For example, the actual distribution of a random variate generator is generally unknown, intractable to quantify, and arbitrarily different from the desired distribution; causing runtime errors, unexpected behavior, and inconsistent APIs.

This article introduces a new approach to principled and practical random variate generation with formal guarantees. The key idea is to first specify the desired probability distribution in terms of a finite-precision numerical program that defines its cumulative distribution function (CDF), and then generate exact random variates according to this CDF. We present a universal and fully automated method to synthesize exact random variate generators given any numerical CDF implemented in any binary number format, such as floating-point, fixed-point, and posits. The method is guaranteed to operate with the same precision used to specify the CDF, does not overflow, avoids expensive arbitrary-precision arithmetic, and exposes a consistent API. The method rests on a novel space-time optimal implementation for the class of generators that attain the information-theoretically optimal Knuth and Yao entropy rate, consuming the least possible number of input random bits per output variate. We develop a random variate generation library using our method in C and evaluate it on a diverse set of “continuous” and “discrete” distributions, showing competitive runtime with the state-of-the-art GNU Scientific Library while delivering higher accuracy, entropy efficiency, and automation.

Program Synthesis From Partial Traces

Researchers: Margarida Ferreira, Carnegie Mellon University, INESC-ID, Instituto Superior Técnico - University of Lisbon; Victor Nicolet, Joey Dodds, and Daniel Kroening, Amazon

Abstract: We present the first technique to synthesize programs that compose side-effecting functions, pure functions, and control flow, from partial traces containing records of only the side-effecting functions. This technique can be applied to synthesize API composing scripts from logs of calls made to those APIs, or a script from traces of system calls made by a workload, for example. All of the provided traces are positive examples, meaning that they describe desired behavior. Our approach does not require negative examples. Instead, it generalizes over the examples and uses cost metrics to prevent over-generalization. Because the problem is too complex for traditional monolithic program synthesis techniques, we propose a new combination of optimizing rewrites and syntax-guided program synthesis. The resulting program is correct by construction, so its output will always be able to reproduce the input traces.

We evaluate the quality of the programs synthesized when considering various optimization metrics and the synthesizer's efficiency on real-world benchmarks. The results show that our approach can generate useful real-world programs.

Automated Exploit Generation for Node.js Packages

Researchers: Filipe Marques, Mafalda Ferreira, André Nascimento, Miguel E. Coimbra, and Nuno Santos, INESC-ID,  Instituto Superior Técnico - University of Lisbon; Limin Jia, Carnegie Mellon University; José Fragoso Santos, INESC-ID; Instituto Superior Técnico - University of Lisbon

Abstract: The Node.js ecosystem, with its growing popularity and increasing exposure to security vulnerabilities, has a pressing need for more effective security analysis tools. To reduce false positives, recent works on detecting vulnerabilities in Node.js packages have developed synthesis algorithms to generate proof-of-concept exploits. However, these tools focus mainly on vulnerabilities that can be triggered by a single direct call to an exported function of the analyzed package, failing to generate exploits that require more complex interactions. In this paper, we present Explode.js, the first tool capable of synthesizing exploits that include complex call sequences to trigger vulnerabilities in Node.js packages. By combining static analysis and symbolic execution, Explode.js generates functional exploits that confirm the existence of command, code injection, prototype pollution, and path traversal vulnerabilities, effectively eliminating false positives. The results of evaluating Explode.js on two state-of-the-art datasets of Node.js packages with confirmed vulnerabilities show that it generates significantly more exploits than its main competitor tools. Furthermore, when applied to real-world Node.js packages, Explode.js uncovered 44 zero-day vulnerabilities, with 4 new CVEs.

Ripple: Asynchronous Programming for Spatial Dataflow Architectures

Researchers: Souradip Ghosh, Yufei Shi, Brandon Lucia, and Nathan Beckmann, Carnegie Mellon University

Abstract: Spatial dataflow architectures (SDAs) are a promising and versatile accelerator platform. They are software-programmable and achieve near-ASIC performance and energy efficiency, beating CPUs by orders of magnitude. Unfortunately, many SDAs struggle to efficiently implement irregular computations because they suffer from an abstraction inversion: they fail to capture coarse-grain dataflow semantics in the application — namely asynchronous communication, pipelining, and queueing — that are naturally supported by the dataflow execution model and existing SDA hardware.

Ripple is a language and architecture that corrects the abstraction inversion by preserving dataflow semantics down the stack. Ripple provides asynchronous iterators, shared-memory atomics, and a familiar task-parallel interface to concisely express the asynchronous pipeline parallelism enabled by an SDA. Ripple efficiently implements deadlock-free, asynchronous task communication by exposing hardware token queues in its ISA. Across nine important workloads, compared to a recent ordered-dataflow SDA, Ripple shrinks programs by 1.9×, improves performance by 3×, increases IPC by 58%, and reduces dynamic instructions by 44%.

Usability Barriers for Liquid Types

Researchers: Catarina Gamboa, Carnegie Mellon University, University of Lisbon; Abigail Elena Reese, Carnegie Mellon University; Alcides Fonseca, LASIGE, University of Lisbon, Jonathan Aldrich, Carnegie Mellon University

Abstract: Liquid types can express richer verification properties than simple type systems. However, despite their advantages, liquid types have yet to achieve widespread adoption. To understand why, we conducted a study analyzing developers' challenges with liquid types, focusing on LiquidHaskell.

Our findings reveal nine key barriers that span three categories, including developer experience, scalability challenges with complex and large codebases, and understanding the verification process. Together, these obstacles provide a comprehensive view of the usability challenges to the broader adoption of liquid types and offer insights that can inform the current and future design and implementation of liquid type systems.