NFDI4DS | UHH-SEMS - Publication Details

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

OPENALEX - Publications

Matthew Jagielski Alina Oprea Battista Biggio Chang Liu Cristina Nita-Rotaru and 1 more

As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by algorithms. In this paper, we perform first systematic study of poisoning attacks their countermeasures linear regression models. attacks, deliberately influence training data a predictive model. We propose theoretically-grounded optimization framework specifically designed demonstrate its effectiveness on range datasets also introduce fast...

10.1109/sp.2018.00057 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2018-05-01

Extracting Training Data from Large Language Models

OPENALEX - Publications

Nicholas Carlini Florian Tramèr Eric Wallace Matthew Jagielski Ariel Herbert-Voss and 7 more

It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates in such settings, an adversary can perform a training data extraction attack recover individual examples by querying the model. We demonstrate our GPT-2, model scrapes of public Internet, and are able extract hundreds verbatim text sequences from model's data. These extracted include (public) personally identifiable information (names, phone numbers,...

10.48550/arxiv.2012.07805 preprint EN other-oa arXiv (Cornell University) 2020-01-01

PaLM 2 Technical Report

OPENALEX - Publications

Rohan Anil Andrew M. Dai Orhan Fırat Melvin Johnson Dmitry Lepikhin and 95 more

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities is more compute-efficient than its predecessor PaLM. 2 Transformer-based trained using mixture of objectives. Through extensive evaluations on English language, tasks, we demonstrate significantly improved quality downstream tasks across different sizes, while simultaneously exhibiting faster efficient inference compared to This efficiency enables broader deployment also...

10.48550/arxiv.2305.10403 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

Quantifying Memorization Across Neural Language Models

OPENALEX - Publications

Nicholas Carlini Daphne Ippolito Matthew Jagielski Katherine Lee Florian Tramèr and 1 more

Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text often low quality), hurts fairness (some texts are over others). We describe three log-linear relationships that quantify degree which LMs data. Memorization significantly grows as we increase (1)...

10.48550/arxiv.2202.07646 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Extracting Training Data from Diffusion Models

OPENALEX - Publications

Nicholas Carlini Jamie Hayes Milad Nasr Matthew Jagielski Vikash Sehwag and 4 more

Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability generate high-quality synthetic images. In this work, we show that memorize individual images from training data emit them at generation time. With a generate-and-filter pipeline, extract over thousand examples state-of-the-art models, ranging photographs of people trademarked company logos. We also train hundreds in various settings analyze how different modeling...

10.48550/arxiv.2301.13188 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Scalable Extraction of Training Data from (Production) Language Models

OPENALEX - Publications

Milad Nasr Nicholas Carlini Jonathan Hayase Matthew Jagielski A. Feder Cooper and 5 more

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the dataset. We show gigabytes from open-source language models like Pythia or GPT-Neo, semi-open LLaMA Falcon, and closed ChatGPT. Existing techniques literature suffice to attack unaligned models; in order aligned ChatGPT, we develop new divergence causes diverge its chatbot-style generations emit at rate 150x higher than when...

10.48550/arxiv.2311.17035 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Poisoning Web-Scale Training Datasets is Practical

OPENALEX - Publications

Nicholas Carlini Matthew Jagielski Christopher A. Choquette-Choo Daniel Paleka Will Pearce and 4 more

10.1109/sp54263.2024.00179 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2024-05-19

Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks

OPENALEX - Publications

Ambra Demontis Marco Melis Maura Pintor Matthew Jagielski Battista Biggio and 3 more

Transferability captures the ability of an attack against a machine-learning model to be effective different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but underlying reasons why transfers or not are yet well understood. In this paper, we present comprehensive analysis aimed investigate both test-time evasion and training-time poisoning attacks. We provide unifying optimization framework attacks, formal definition such highlight two...

10.48550/arxiv.1809.02861 preprint EN other-oa arXiv (Cornell University) 2018-01-01

High Accuracy and High Fidelity Extraction of Neural Networks

OPENALEX - Publications

Matthew Jagielski Nicholas Carlini David Berthelot А.В. Куракин Nicolas Papernot

In a model extraction attack, an adversary steals copy of remotely deployed machine learning model, given oracle prediction access. We taxonomize attacks around two objectives: *accuracy*, i.e., performing well on the underlying task, and *fidelity*, matching predictions remote victim classifier any input. To extract high-accuracy we develop learning-based attack exploiting to supervise training extracted model. Through analytical empirical arguments, then explain inherent limitations that...

10.48550/arxiv.1909.01838 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Subpopulation Data Poisoning Attacks

OPENALEX - Publications

Matthew Jagielski Giorgio Severi Niklas Pousette Harger Alina Oprea

Machine learning systems are deployed in critical settings, but they might fail unexpected ways, impacting the accuracy of their predictions. Poisoning attacks against machine induce adversarial modification data used by a algorithm to selectively change its output when it is deployed. In this work, we introduce novel poisoning attack called subpopulation attack, which particularly relevant datasets large and diverse. We design modular framework for attacks, instantiate with different...

10.1145/3460120.3485368 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2021-11-12

Advancing Differential Privacy: Where We Are Now and Future Directions for Real-World Deployment

OPENALEX - Publications

Rachel Cummings Damien Desfontaines David Evans Roxana Geambasu Yangsibo Huang and 19 more

In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field differential privacy (DP), with focus advancing DP's deployment real-world applications. Key points high-level contents article were originated from discussions "Differential Privacy (DP): Challenges Towards Next Frontier," workshop held July 2022 experts industry, academia, public sector seeking answers to broad questions pertaining its implications design industry-grade...

10.1162/99608f92.d3197524 article EN cc-by Harvard data science review 2024-01-16

Auditing Differentially Private Machine Learning: How Private is Private SGD?

OPENALEX - Publications

Matthew Jagielski Jonathan Ullman Alina Oprea

We investigate whether Differentially Private SGD offers better privacy in practice than what is guaranteed by its state-of-the-art analysis. do so via novel data poisoning attacks, which we show correspond to realistic attacks. While previous work (Ma et al., arXiv 2019) proposed this connection between differential and as a defense against poisoning, our use tool for understanding the of specific mechanism new. More generally, takes quantitative, empirical approach afforded implementations...

10.48550/arxiv.2006.07709 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Truth Serum

OPENALEX - Publications

Florian Tramèr Reza Shokri Ayrton San Joaquin Hoang Le Matthew Jagielski and 2 more

We introduce a new class of attacks on machine learning models. show that an adversary who can poison training dataset cause models trained this to leak significant private details points belonging other parties. Our active inference connect two independent lines work targeting the integrity and privacy data.

10.1145/3548606.3560554 article EN Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security 2022-11-07

Are aligned neural networks adversarially aligned?

OPENALEX - Publications

Nicholas Carlini Milad Nasr Christopher A. Choquette-Choo Matthew Jagielski Irena Gao and 6 more

Large language models are now tuned to align with the goals of their creators, namely be "helpful and harmless." These should respond helpfully user questions, but refuse answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study what extent these remain aligned, even when interacting an who constructs worst-case (adversarial examples). designed model emit harmful content would otherwise prohibited. We...

10.48550/arxiv.2306.15447 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Threat Detection for Collaborative Adaptive Cruise Control in Connected Cars

OPENALEX - Publications

Matthew Jagielski Nicholas Jones Chung‐Wei Lin Cristina Nita-Rotaru Shinichi Shiraishi

We study collaborative adaptive cruise control as a representative application for safety services provided by autonomous cars. provide detailed analysis of attacks that can be conducted motivated attacker targeting the algorithm, influencing acceleration reported another car, or local LIDAR and RADAR sensors. The have strong impact on passenger comfort, efficiency safety, with two such being able to cause crashes. also present detection methods rooted in physical-based constraints machine...

10.1145/3212480.3212492 article EN 2018-06-18

Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning

OPENALEX - Publications

Matthew Jagielski Alina Oprea Battista Biggio Chang Liu Cristina Nita-Rotaru and 1 more

As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by algorithms. In this paper, we perform first systematic study of poisoning attacks their countermeasures linear regression models. attacks, deliberately influence training data a predictive model. We propose theoretically-grounded optimization framework specifically designed demonstrate its effectiveness on range datasets also introduce fast...

10.48550/arxiv.1804.00308 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Counterfactual Memorization in Neural Language Models

OPENALEX - Publications

Chiyuan Zhang Daphne Ippolito Katherine Lee Matthew Jagielski Florian Tramèr and 1 more

Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important real world applications and also a learning-theoretical perspective. An open question previous studies of model how to filter out "common" memorization. In fact, most criteria strongly correlate with the number occurrences set, capturing memorized familiar phrases, public knowledge, templated texts, or other...

10.48550/arxiv.2112.12938 preprint EN other-oa arXiv (Cornell University) 2021-01-01

The Privacy Onion Effect: Memorization is Relative

OPENALEX - Publications

Nicholas Carlini Matthew Jagielski Chiyuan Zhang Nicolas Papernot Andreas Terzis and 1 more

Machine learning models trained on private datasets have been shown to leak their data. While recent work has found that the average data point is rarely leaked, outlier samples are frequently subject memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing "layer" points most vulnerable a attack exposes new layer previously-safe same attack. perform several experiments study this effect, understand why it occurs. The existence...

10.48550/arxiv.2206.10469 preprint EN other-oa arXiv (Cornell University) 2022-01-01

SNAP: Efficient Extraction of Private Properties with Poisoning

OPENALEX - Publications

Harsh Chaudhari John Abascal Alina Oprea Matthew Jagielski Florian Tramèr and 1 more

Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such have privacy implications for data owners sharing their datasets train models. Several existing approaches property against deep neural networks been proposed [1] –[3], but they all rely on attacker large number shadow models, which induces computational overhead.In this paper, we consider setting in can poison subset and query trained target Motivated by our...

10.1109/sp46215.2023.10179334 article EN 2022 IEEE Symposium on Security and Privacy (SP) 2023-05-01