CyLab researchers to present at USENIX PEPR 2025

Michael Cunningham

May 27, 2025

PEPR logo graphic

CyLab Security and Privacy Institute researchers are set to lead three presentations at the 2025 USENIX Conference on Privacy Engineering Practice and Respect (PEPR '25).

The conference will take place in Santa Clara, CA on June 9th and 10th, bringing together privacy practitioners and researchers who are focused on designing and building products and systems with privacy and respect for their users and the societies in which they operate.

PEPR was co-founded in 2019 by CyLab Director Lorrie Cranor and CyLab alumnus Lea Kissner because there were no conferences where privacy engineering practitioners could talk about their experiences and learn from each other and from privacy researchers.

Below, we’ve compiled a list of presentations led by CyLab Security and Privacy Institute researchers at this year’s event.

UsersFirst: A User-Centric Threat Modeling Framework for Privacy Notice and Choice

Researchers: Norman Sadeh and Lorrie Cranor, Carnegie Mellon University

Abstract: Recent privacy regulations impose increasingly stringent requirements on the collection and use of data. This includes more specific obligations to disclose various data practices and the need to provide data subjects with more comprehensive sets of choices or controls. There is also an increasing emphasis on user-centric criteria. Failure to offer usable notices and choices that people can truly benefit from has become a significant privacy threat, whether one thinks in terms of potential regulatory penalties, consumer trust and brand reputation, or privacy-by-design best practices. This presentation will provide an overview of UsersFirst, a Privacy Threat Modeling framework intended to supplement existing privacy threat modeling frameworks and to support organizations in their analysis and mitigation of risks associated with the absence or ineffectiveness of privacy notices and choices. Rather than treating privacy notices and choices as mere checkboxes, UsersFirst revolves around user-centric interpretations of these requirements. It is intended to reflect an emerging trend in privacy regulations where perfunctory approaches to notices and choices are no longer sufficient, and where instead notices and choices are expected to be noticeable, usable, unambiguous, devoid of deceptive patterns, and more. The presentation will include results of a detailed evaluation of the UsersFirst user-centric threat taxonomy with people working and/or trained in privacy.

When Privacy Guarantees Meet Pre-Trained LLMs: A Case Study in Synthetic Data

Researchers: Yash Maurya and Aman Priyanshu, Carnegie Mellon University

Abstract: Modern synthetic data generation with privacy guarantees has become increasingly prevalent. Take real data, create synthetic versions following similar patterns, and ensure privacy through differential privacy mechanisms. But what happens when theoretical privacy guarantees meet real-world data? Even with conservative epsilon values (ε<10), document formatting and contextual patterns can create unexpected privacy challenges, especially when using models which aren't transparent about their own training data like most LLMs.

We explore a case study where financial synthetic data was generated with differential privacy guarantees (ε<10) using public SEC filings, yet revealed concerning privacy leakages. These findings raise important questions: Does the privacy leakage stem from the training data, or did fine-tuning untangle existing privacy controls in the base model? How do we evaluate privacy when the model's training history isn't fully known? This talk examines these challenges and brings awareness to emerging privacy considerations when generating synthetic data using modern language models.

From Existential to Existing Risks of Generative AI: A Taxonomy of Who Is at Risk, What Risks Are Prevalent, and How They Arise

Researchers: Megan Li and Wendy Bickersteth, Carnegie Mellon University

Abstract: Due to its general-purpose nature, Generative AI is applied in an ever-growing set of domains and tasks, leading to an expanding set of risks impacting people, communities, society, and the environment. These risks may arise due to failures during the design and development of the technology, its release, deployment, or downstream usages and appropriations of its outputs. In this paper, building on prior taxonomies of AI risks and failures, we construct both a taxonomy of Generative AI risks and a taxonomy of the sociotechnical failure modes that precipitate them through a systematic analysis of 499 publicly reported incidents. We'll walk through some example incidents and highlight those related to privacy. We describe what risks are reported, how they arose, and who they impact. We report the prevalence of each type of risk, failure mode, and affected human entity in our dataset, as well as their co-occurrences. We find that the majority of reported incidents are caused by use-related issues but pose risks to parties beyond the end user(s) of the Generative AI at fault. We argue that tracing and characterizing Generative AI failure modes to their downstream risks in the real world offers actionable insights for many stakeholders, including policymakers, developers, and Generative AI users. In particular, our results call for the prioritization of non-technical risk mitigation approaches.