- Ethics and Social Impacts of AI
- Privacy-Preserving Technologies in Data
- Law, AI, and Intellectual Property
- Explainable Artificial Intelligence (XAI)
- Blockchain Technology Applications and Security
- IPv6, Mobility, Handover, Networks, Security
- Privacy, Security, and Data Protection
- Adversarial Robustness in Machine Learning
- Machine Learning and Data Classification
- Topic Modeling
- Internet Traffic Analysis and Secure E-voting
- Artificial Intelligence in Law
- ICT Impact and Policies
- Generative Adversarial Networks and Image Synthesis
- Mobile Crowdsensing and Crowdsourcing
- Computational and Text Analysis Methods
- Natural Language Processing Techniques
- Data Quality and Management
- Markov Chains and Monte Carlo Methods
- Mobile Agent-Based Network Management
- Access Control and Trust
- Cryptography and Data Security
- Digital Transformation in Law
- Digital Humanities and Scholarship
- Machine Learning and Algorithms
Georgetown University
2025
Cornell University
2020-2024
Center for the Study of Democracy
2008
Background: Racial inequities for patients with heart failure (HF) have been widely documented. HF who receive cardiology care during a hospital admission better outcomes. It is unknown whether there are differences in to or general medicine service by race. This study examined the relationship between race and service, its effect on 30-day readmission mortality Methods: We performed retrospective cohort from September 2008 November 2017 at single large urban academic referral center of all...
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the dataset. We show gigabytes from open-source language models like Pythia or GPT-Neo, semi-open LLaMA Falcon, and closed ChatGPT. Existing techniques literature suffice to attack unaligned models; in order aligned ChatGPT, we develop new divergence causes diverge its chatbot-style generations emit at rate 150x higher than when...
In 1996, Accountability in a Computerized Society [95] issued clarion call concerning the erosion of accountability society due to ubiquitous delegation consequential functions computerized systems. Nissenbaum described four barriers that computerization presented, which we revisit relation ascendance data-driven algorithmic systems—i.e., machine learning or artificial intelligence—to uncover new challenges for these systems present. Nissenbaum's original paper grounded discussion moral...
Variance in predictions across different trained models is a significant, under-explored source of error fair binary classification. In practice, the variance on some data examples so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions. We: 1) Define metric called self-consistency, derived from variance, which use as proxy for measuring reducing arbitrariness; 2) Develop ensembling algorithm...
Fairness, Accountability, and Transparency (FAccT) for socio-technical systems has been a thriving area of research in recent years. An ACM conference bearing the same name central venue scholars this to come together, provide peer feedback one another, publish their work. This reflexive study aims shed light on FAccT's activities date identify major gaps opportunities translating contributions into broader positive impact. To end, we utilize mixed-methods design. On qualitative front,...
This essay is an attempt to work systematically through the copyright infringement analysis of generative AI supply chain. Our goal not provide a definitive answer as whether and when training or using infringing conduct. Rather, we aim map surprisingly large number live issues that raises, identify key decision points at which forks in interesting ways.
Creating and managing individual identities is a central challenge of the digital age. As identity management systems defined here as programs or frameworks that administer collection, authentication, use information linked to are implemented in both public private sectors, individuals required identify themselves with increasing frequency. Traditional run by organizations control all mechanisms for authentication (establishing confidence an claim's truth) authorization (deciding what should...
As popular search engines face the sometimes conflicting interests of protecting privacy while retaining query logs for a variety uses, numerous technical measures have been suggested to both enhance and preserve at least portion utility logs. This article seeks assess seven these techniques against three sets criteria: (1) how well technique protects privacy, (2) preserves logs, (3) might be implemented as user control. A control is defined mechanism that allows individual Internet users...
Across machine learning (ML) sub-disciplines, researchers make explicit mathematical assumptions in order to facilitate proof-writing. We note that, specifically the area of fairness-accuracy trade-off optimization scholarship, similar attention is not paid normative that ground this approach. Such presume 1) accuracy and fairness are inherent opposition one another, 2) strict notions equality can adequately model fairness, 3) it possible measure decisions independent from historical...
"Does generative AI infringe copyright?" is an urgent question. It also a difficult question, for two reasons. First, "generative AI" not just one product from company. catch-all name massive ecosystem of loosely related technologies. These systems behave differently and raise different legal issues. Second, copyright law notoriously complicated, generative-AI manage to touch on great many corners it. They issues authorship, similarity, direct indirect liability, fair use, among much else....
The measurement tasks involved in evaluating generative AI (GenAI) systems are especially difficult, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges comparisons" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning and drawing on social sciences when developing using instruments for GenAI systems. Specifically, our is a science challenge. We present four-level framework, grounded theory sciences, measuring...
Abstract Social media platforms have been accused of causing a range harms, resulting in dozens lawsuits across jurisdictions. These are situated within the context long history American product safety litigation, suggesting opportunities for remediation outside financial compensation. Anticipating that at least some these cases may be successful and/or lead to settlements, this article outlines an implementable mechanism abatement settlement plan capable mitigating abuse. The paper...
We introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, our recovers embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under \$20 USD, entire matrix Ada and Babbage models. thereby confirm, for time, these have hidden dimension 1024 2048, respectively. also recover exact size gpt-3.5-turbo estimate it would cost...
A central issue in copyright lawsuits against generative-AI companies is the degree to which a model does or not "memorize" data it was trained on. Unfortunately, debate has been clouded by ambiguity over what "memorization" is, leading legal debates participants often talk past one another. In this essay, we attempt bring clarity conversation memorization.
Trade-offs between accuracy and efficiency pervade law, public health, other non-computing domains, which have developed policies to guide how balance the two in conditions of uncertainty. While computer science also commonly studies accuracy-efficiency trade-offs, their policy implications remain poorly examined. Drawing on risk assessment practices US, we argue that, since examining these trade-offs has been useful for guiding governance need similarly reckon with governing systems. We...
Legal literature on machine learning (ML) tends to focus harms, and thus reason about individual model outcomes summary error rates. This has masked important aspects of ML that are rooted in its reliance randomness -- namely, stochasticity non-determinism. While some recent work begun the relationship between arbitrariness legal contexts, role non-determinism more broadly remains unexamined. In this paper, we clarify overlap differences these two concepts, show effects non-determinism,...
Stochastic gradient Hamiltonian Monte Carlo (SGHMC) is an efficient method for sampling from continuous distributions. It a faster alternative to HMC: instead of using the whole dataset at each iteration, SGHMC uses only subsample. This improves performance, but introduces bias that can cause converge wrong distribution. One prevent this step size decays zero, such schedule drastically slow down convergence. To address tension, we propose novel second-order SG-MCMC algorithm---AMAGOLD---that...
Algorithmic fairness has emphasized the role of biased data in automated decision outcomes. Recently, there been a shift attention to sources bias that implicate other stages ML pipeline. We contend one source such bias, human preferences model selection, remains under-explored terms its disparate impact across demographic groups. Using deep learning trained on real-world medical imaging data, we verify our claim empirically and argue choice metric for comparison, especially those do not...