- Adversarial Robustness in Machine Learning
- Privacy-Preserving Technologies in Data
- Explainable Artificial Intelligence (XAI)
- Topic Modeling
- Anomaly Detection Techniques and Applications
- Cryptography and Data Security
- Ethics and Social Impacts of AI
- Advanced Malware Detection Techniques
- Natural Language Processing Techniques
- Advanced Causal Inference Techniques
- Text Readability and Simplification
- Privacy, Security, and Data Protection
- Advanced Neural Network Applications
- Advanced Authentication Protocols Security
- Neural Networks and Applications
- Internet Traffic Analysis and Secure E-voting
- Speech Recognition and Synthesis
- Vehicular Ad Hoc Networks (VANETs)
- Stochastic Gradient Optimization Techniques
- Network Security and Intrusion Detection
- User Authentication and Security Systems
- Traffic control and management
- Digital and Cyber Forensics
- Diamond and Carbon-based Materials Research
- Domain Adaptation and Few-Shot Learning
Google (United States)
2019-2024
DeepMind (United Kingdom)
2024
Northeastern University
2018-2021
Universidad del Noreste
2018-2020
As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by algorithms. In this paper, we perform first systematic study of poisoning attacks their countermeasures linear regression models. attacks, deliberately influence training data a predictive model. We propose theoretically-grounded optimization framework specifically designed demonstrate its effectiveness on range datasets also introduce fast...
It has become common to publish large (billion parameter) language models that have been trained on private datasets. This paper demonstrates in such settings, an adversary can perform a training data extraction attack recover individual examples by querying the model. We demonstrate our GPT-2, model scrapes of public Internet, and are able extract hundreds verbatim text sequences from model's data. These extracted include (public) personally identifiable information (names, phone numbers,...
We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities is more compute-efficient than its predecessor PaLM. 2 Transformer-based trained using mixture of objectives. Through extensive evaluations on English language, tasks, we demonstrate significantly improved quality downstream tasks across different sizes, while simultaneously exhibiting faster efficient inference compared to This efficiency enables broader deployment also...
Large language models (LMs) have been shown to memorize parts of their training data, and when prompted appropriately, they will emit the memorized data verbatim. This is undesirable because memorization violates privacy (exposing user data), degrades utility (repeated easy-to-memorize text often low quality), hurts fairness (some texts are over others). We describe three log-linear relationships that quantify degree which LMs data. Memorization significantly grows as we increase (1)...
Image diffusion models such as DALL-E 2, Imagen, and Stable Diffusion have attracted significant attention due to their ability generate high-quality synthetic images. In this work, we show that memorize individual images from training data emit them at generation time. With a generate-and-filter pipeline, extract over thousand examples state-of-the-art models, ranging photographs of people trademarked company logos. We also train hundreds in various settings analyze how different modeling...
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the dataset. We show gigabytes from open-source language models like Pythia or GPT-Neo, semi-open LLaMA Falcon, and closed ChatGPT. Existing techniques literature suffice to attack unaligned models; in order aligned ChatGPT, we develop new divergence causes diverge its chatbot-style generations emit at rate 150x higher than when...
Transferability captures the ability of an attack against a machine-learning model to be effective different, potentially unknown, model. Empirical evidence for transferability has been shown in previous work, but underlying reasons why transfers or not are yet well understood. In this paper, we present comprehensive analysis aimed investigate both test-time evasion and training-time poisoning attacks. We provide unifying optimization framework attacks, formal definition such highlight two...
In a model extraction attack, an adversary steals copy of remotely deployed machine learning model, given oracle prediction access. We taxonomize attacks around two objectives: *accuracy*, i.e., performing well on the underlying task, and *fidelity*, matching predictions remote victim classifier any input. To extract high-accuracy we develop learning-based attack exploiting to supervise training extracted model. Through analytical empirical arguments, then explain inherent limitations that...
Machine learning systems are deployed in critical settings, but they might fail unexpected ways, impacting the accuracy of their predictions. Poisoning attacks against machine induce adversarial modification data used by a algorithm to selectively change its output when it is deployed. In this work, we introduce novel poisoning attack called subpopulation attack, which particularly relevant datasets large and diverse. We design modular framework for attacks, instantiate with different...
In this article, we present a detailed review of current practices and state-of-the-art methodologies in the field differential privacy (DP), with focus advancing DP's deployment real-world applications. Key points high-level contents article were originated from discussions "Differential Privacy (DP): Challenges Towards Next Frontier," workshop held July 2022 experts industry, academia, public sector seeking answers to broad questions pertaining its implications design industry-grade...
We investigate whether Differentially Private SGD offers better privacy in practice than what is guaranteed by its state-of-the-art analysis. do so via novel data poisoning attacks, which we show correspond to realistic attacks. While previous work (Ma et al., arXiv 2019) proposed this connection between differential and as a defense against poisoning, our use tool for understanding the of specific mechanism new. More generally, takes quantitative, empirical approach afforded implementations...
We introduce a new class of attacks on machine learning models. show that an adversary who can poison training dataset cause models trained this to leak significant private details points belonging other parties. Our active inference connect two independent lines work targeting the integrity and privacy data.
Large language models are now tuned to align with the goals of their creators, namely be "helpful and harmless." These should respond helpfully user questions, but refuse answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study what extent these remain aligned, even when interacting an who constructs worst-case (adversarial examples). designed model emit harmful content would otherwise prohibited. We...
We study collaborative adaptive cruise control as a representative application for safety services provided by autonomous cars. provide detailed analysis of attacks that can be conducted motivated attacker targeting the algorithm, influencing acceleration reported another car, or local LIDAR and RADAR sensors. The have strong impact on passenger comfort, efficiency safety, with two such being able to cause crashes. also present detection methods rooted in physical-based constraints machine...
As machine learning becomes widely used for automated decisions, attackers have strong incentives to manipulate the results and models generated by algorithms. In this paper, we perform first systematic study of poisoning attacks their countermeasures linear regression models. attacks, deliberately influence training data a predictive model. We propose theoretically-grounded optimization framework specifically designed demonstrate its effectiveness on range datasets also introduce fast...
Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important real world applications and also a learning-theoretical perspective. An open question previous studies of model how to filter out "common" memorization. In fact, most criteria strongly correlate with the number occurrences set, capturing memorized familiar phrases, public knowledge, templated texts, or other...
Machine learning models trained on private datasets have been shown to leak their data. While recent work has found that the average data point is rarely leaked, outlier samples are frequently subject memorization and, consequently, privacy leakage. We demonstrate and analyse an Onion Effect of memorization: removing "layer" points most vulnerable a attack exposes new layer previously-safe same attack. perform several experiments study this effect, understand why it occurs. The existence...
Property inference attacks allow an adversary to extract global properties of the training dataset from a machine learning model. Such have privacy implications for data owners sharing their datasets train models. Several existing approaches property against deep neural networks been proposed [1] –[3], but they all rely on attacker large number shadow models, which induces computational overhead.In this paper, we consider setting in can poison subset and query trained target Motivated by our...