NFDI4DS | UHH-SEMS - Publication Details

A. Feder Cooper

ORCID: 0000-0002-4892-681X

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5064217008

Research Areas

Ethics and Social Impacts of AI
Privacy-Preserving Technologies in Data
Law, AI, and Intellectual Property
Explainable Artificial Intelligence (XAI)
Blockchain Technology Applications and Security
IPv6, Mobility, Handover, Networks, Security
Privacy, Security, and Data Protection
Adversarial Robustness in Machine Learning
Machine Learning and Data Classification
Topic Modeling
Internet Traffic Analysis and Secure E-voting
Artificial Intelligence in Law
ICT Impact and Policies
Generative Adversarial Networks and Image Synthesis
Mobile Crowdsensing and Crowdsourcing
Computational and Text Analysis Methods
Natural Language Processing Techniques
Data Quality and Management
Markov Chains and Monte Carlo Methods
Mobile Agent-Based Network Management
Access Control and Trust
Cryptography and Data Security
Digital Transformation in Law
Digital Humanities and Scholarship
Machine Learning and Algorithms

Georgetown University
2025

Cornell University
2020-2024

Center for the Study of Democracy
2008

Identification of Racial Inequities in Access to Specialized Inpatient Heart Failure Care at an Academic Medical Center

OPENALEX - Publications

Lauren A. Eberly Aaron Richterman Anne G. Beckett Bram Wispelwey Regan H. Marsh and 95 more

Background: Racial inequities for patients with heart failure (HF) have been widely documented. HF who receive cardiology care during a hospital admission better outcomes. It is unknown whether there are differences in to or general medicine service by race. This study examined the relationship between race and service, its effect on 30-day readmission mortality Methods: We performed retrospective cohort from September 2008 November 2017 at single large urban academic referral center of all...

10.1161/circheartfailure.119.006214 article EN Circulation Heart Failure 2019-10-29

Scalable Extraction of Training Data from (Production) Language Models

OPENALEX - Publications

Milad Nasr Nicholas Carlini Jonathan Hayase Matthew Jagielski A. Feder Cooper and 5 more

This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the dataset. We show gigabytes from open-source language models like Pythia or GPT-Neo, semi-open LLaMA Falcon, and closed ChatGPT. Existing techniques literature suffice to attack unaligned models; in order aligned ChatGPT, we develop new divergence causes diverge its chatbot-style generations emit at rate 150x higher than when...

10.48550/arxiv.2311.17035 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Accountability in an Algorithmic Society: Relationality, Responsibility, and Robustness in Machine Learning

OPENALEX - Publications

A. Feder Cooper Emanuel Moss Benjamin Laufer Helen Nissenbaum

In 1996, Accountability in a Computerized Society [95] issued clarion call concerning the erosion of accountability society due to ubiquitous delegation consequential functions computerized systems. Nissenbaum described four barriers that computerization presented, which we revisit relation ascendance data-driven algorithmic systems—i.e., machine learning or artificial intelligence—to uncover new challenges for these systems present. Nissenbaum's original paper grounded discussion moral...

10.1145/3531146.3533150 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Arbitrariness and Social Prediction: The Confounding Role of Variance in Fair Classification

OPENALEX - Publications

A. Feder Cooper Katherine Lee Madiha Zahrah Choksi Solon Barocas Christopher De and 4 more

Variance in predictions across different trained models is a significant, under-explored source of error fair binary classification. In practice, the variance on some data examples so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define metric called self-consistency, derived from variance, which use as proxy for measuring reducing arbitrariness; 2) Develop ensembling algorithm...

10.1609/aaai.v38i20.30203 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

Four Years of FAccT: A Reflexive, Mixed-Methods Analysis of Research Contributions, Shortcomings, and Future Prospects

OPENALEX - Publications

Benjamin Laufer Sameer Jain A. Feder Cooper Jon Kleinberg Hoda Heidari

Fairness, Accountability, and Transparency (FAccT) for socio-technical systems has been a thriving area of research in recent years. An ACM conference bearing the same name central venue scholars this to come together, provide peer feedback one another, publish their work. This reflexive study aims shed light on FAccT's activities date identify major gaps opportunities translating contributions into broader positive impact. To end, we utilize mixed-methods design. On qualitative front,...

10.1145/3531146.3533107 article EN 2022 ACM Conference on Fairness, Accountability, and Transparency 2022-06-20

Talkin’ ‘Bout AI Generation: Copyright and the Generative AI Supply Chain

OPENALEX - Publications

Katherine Lee A. Feder Cooper James Grimmelmann

This essay is an attempt to work systematically through the copyright infringement analysis of generative AI supply chain. Our goal not provide a definitive answer as whether and when training or using infringing conduct. Rather, we aim map surprisingly large number live issues that raises, identify key decision points at which forks in interesting ways.

10.2139/ssrn.4523551 article EN SSRN Electronic Journal 2023-01-01

Privacy and Identity Management

OPENALEX - Publications

Marit Hansen Ari Schwartz A. Feder Cooper

Creating and managing individual identities is a central challenge of the digital age. As identity management systems defined here as programs or frameworks that administer collection, authentication, use information linked to are implemented in both public private sectors, individuals required identify themselves with increasing frequency. Traditional run by organizations control all mechanisms for authentication (establishing confidence an claim's truth) authorization (deciding what should...

10.1109/msp.2008.41 article EN IEEE Security & Privacy 2008-03-01

A survey of query log privacy-enhancing techniques from a policy perspective

OPENALEX - Publications

A. Feder Cooper

As popular search engines face the sometimes conflicting interests of protecting privacy while retaining query logs for a variety uses, numerous technical measures have been suggested to both enhance and preserve at least portion utility logs. This article seeks assess seven these techniques against three sets criteria: (1) how well technique protects privacy, (2) preserves logs, (3) might be implemented as user control. A control is defined mechanism that allows individual Internet users...

10.1145/1409220.1409222 article EN ACM Transactions on the Web 2008-10-01

Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research

OPENALEX - Publications

A. Feder Cooper Ellen Abrams Na Na

Across machine learning (ML) sub-disciplines, researchers make explicit mathematical assumptions in order to facilitate proof-writing. We note that, specifically the area of fairness-accuracy trade-off optimization scholarship, similar attention is not paid normative that ground this approach. Such presume 1) accuracy and fairness are inherent opposition one another, 2) strict notions equality can adequately model fairness, 3) it possible measure decisions independent from historical...

10.1145/3461702.3462519 preprint EN 2021-07-21

Talkin' 'Bout AI Generation

OPENALEX - Publications

Katherine Lee A. Feder Cooper James Grimmelmann

"Does generative AI infringe copyright?" is an urgent question. It also a difficult question, for two reasons. First, "generative AI" not just one product from company. catch-all name massive ecosystem of loosely related technologies. These systems behave differently and raise different legal issues. Second, copyright law notoriously complicated, generative-AI manage to touch on great many corners it. They issues authorship, similarity, direct indirect liability, fair use, among much else....

10.1145/3614407.3643696 article EN cc-by 2024-02-26

Position: Evaluating Generative AI Systems is a Social Science Measurement Challenge

OPENALEX - Publications

Hanna Wallach Meera Desai A. Feder Cooper Angelina Wang Chad Atalla and 15 more

The measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges comparisons" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning and drawing on social sciences when developing using instruments for GenAI systems. Specifically, our is a science challenge. We present four-level framework, grounded theory sciences, measuring...

10.48550/arxiv.2502.00561 preprint EN arXiv (Cornell University) 2025-02-01

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy, Research, and Practice

OPENALEX - Publications

A. Feder Cooper Christopher A. Choquette-Choo Miranda Bogen Matthew Jagielski Katja Filippova and 30 more

10.2139/ssrn.5060253 preprint EN 2025-01-01

Social media harm abatement: Mechanisms for transparent public health assessment

OPENALEX - Publications

Nathaniel Lubin Yuning Liu AMANDA YARNELL S. Bryn Austin Zachary J. Ward and 5 more

Abstract Social media platforms have been accused of causing a range harms, resulting in dozens lawsuits across jurisdictions. These are situated within the context long history American product safety litigation, suggesting opportunities for remediation outside financial compensation. Anticipating that at least some these cases may be successful and/or lead to settlements, this article outlines an implementable mechanism abatement settlement plan capable mitigating abuse. The paper...

10.1111/nyas.15345 article EN Annals of the New York Academy of Sciences 2025-05-16

Extracting memorized pieces of (copyrighted) books from open-weight language models

OPENALEX - Publications

Mark A. Lemley A. Feder Cooper Aaron Gokaslan Amy Cyphert Christopher De and 2 more

10.2139/ssrn.5262084 preprint EN 2025-01-01

Stealing Part of a Production Language Model

OPENALEX - Publications

Nicholas Carlini Daniel Paleka Krishnamurthy Dvijotham Thomas Steinke Jonathan Hayase and 8 more

We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our recovers embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, entire matrix Ada and Babbage models. thereby confirm, for time, these have hidden dimension 1024 2048, respectively. also recover exact size gpt-3.5-turbo estimate it would cost...

10.48550/arxiv.2403.06634 preprint EN arXiv (Cornell University) 2024-03-11

The Files are in the Computer: Copyright, Memorization, and Generative AI

OPENALEX - Publications

A. Feder Cooper James Grimmelmann

A central issue in copyright lawsuits against generative-AI companies is the degree to which a model does or not "memorize" data it was trained on. Unfortunately, debate has been clouded by ambiguity over what "memorization" is, leading legal debates participants often talk past one another. In this essay, we attempt bring clarity conversation memorization.

10.48550/arxiv.2404.12590 preprint EN arXiv (Cornell University) 2024-04-18

Accuracy-Efficiency Trade-Offs and Accountability in Distributed ML Systems

OPENALEX - Publications

A. Feder Cooper Karen Levy Christopher De

Trade-offs between accuracy and efficiency pervade law, public health, other non-computing domains, which have developed policies to guide how balance the two in conditions of uncertainty. While computer science also commonly studies accuracy-efficiency trade-offs, their policy implications remain poorly examined. Drawing on risk assessment practices US, we argue that, since examining these trade-offs has been useful for guiding governance need similarly reckon with governing systems. We...

10.1145/3465416.3483289 preprint EN 2021-10-05

Non-Determinism and the Lawlessness of Machine Learning Code

OPENALEX - Publications

A. Feder Cooper Jonathan Frankle Christopher De

Legal literature on machine learning (ML) tends to focus harms, and thus reason about individual model outcomes summary error rates. This has masked important aspects of ML that are rooted in its reliance randomness -- namely, stochasticity non-determinism. While some recent work begun the relationship between arbitrariness legal contexts, role non-determinism more broadly remains unexamined. In this paper, we clarify overlap differences these two concepts, show effects non-determinism,...

10.1145/3511265.3550446 preprint EN 2022-11-01

AMAGOLD: Amortized Metropolis Adjustment for Efficient Stochastic Gradient MCMC

OPENALEX - Publications

Ruqi Zhang A. Feder Cooper Christopher De

Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only subsample. This improves performance, but introduces bias that can cause converge wrong distribution. One prevent this step size decays zero, such schedule drastically slow down convergence. To address tension, we propose novel second-order SG-MCMC algorithm---AMAGOLD---that...

10.48550/arxiv.2003.00193 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Model Selection's Disparate Impact in Real-World Deep Learning Applications

OPENALEX - Publications

Jessica Zosa Forde A. Feder Cooper Kweku Kwegyir-Aggrey Chris De Michael L. Littman

Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there been a shift attention to sources bias that implicate other stages ML pipeline. We contend one source such bias, human preferences model selection, remains under-explored terms its disparate impact across demographic groups. Using deep learning trained on real-world medical imaging data, we verify our claim empirically and argue choice metric for comparison, especially those do not...

10.48550/arxiv.2104.00606 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Coming Soon ...