Alex Tamkin

ORCID: 0009-0006-0007-3746
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Natural Language Processing Techniques
  • Domain Adaptation and Few-Shot Learning
  • Speech Recognition and Synthesis
  • Multimodal Machine Learning Applications
  • Remote-Sensing Image Classification
  • Reinforcement Learning in Robotics
  • Ethics and Social Impacts of AI
  • Adversarial Robustness in Machine Learning
  • Advanced Bandit Algorithms Research
  • Artificial Intelligence in Healthcare and Education
  • Generative Adversarial Networks and Image Synthesis
  • Music and Audio Processing
  • Machine Learning in Healthcare
  • Explainable Artificial Intelligence (XAI)
  • Simulation Techniques and Applications
  • Text Readability and Simplification
  • Stellar, planetary, and galactic studies
  • Data Stream Mining Techniques
  • Socioeconomic Development in MENA
  • Speech and Audio Processing
  • Advanced Vision and Imaging
  • Hearing Loss and Rehabilitation
  • Experimental Behavioral Economics Studies
  • Fullerene Chemistry and Applications

Stanford University
2019-2025

Chinese University of Hong Kong
2022

University of Michigan
2022

University of Minnesota
2022

Columbia University
2022

University of California, Santa Cruz
2022

Bauhaus-Universität Weimar
2022

Leipzig University
2022

University of Minnesota System
2022

North Carolina State University
2022

AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and adaptable to wide range downstream tasks. We call these foundation underscore their critically central yet incomplete character. This report provides thorough account opportunities risks models, ranging from capabilities language, vision, robotics, reasoning, human interaction) technical principles(e.g., model architectures, training procedures, data, systems,...

10.48550/arxiv.2108.07258 preprint EN cc-by arXiv (Cornell University) 2021-01-01

On October 14th, 2020, researchers from OpenAI, the Stanford Institute for Human-Centered Artificial Intelligence, and other universities convened to discuss open research questions surrounding GPT-3, largest publicly-disclosed dense language model at time. The meeting took place under Chatham House Rules. Discussants came a variety of backgrounds including computer science, linguistics, philosophy, political communications, cyber policy, more. Broadly, discussion centered around two main...

10.48550/arxiv.2102.02503 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed capture issues across different countries. Next, define metric that quantifies the similarity between LLM-generated survey human responses, conditioned...

10.48550/arxiv.2306.16388 preprint EN cc-by arXiv (Cornell University) 2023-01-01

There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods enable broader public to collectively shape behavior systems affect them. To address this need, we present Collective Constitutional AI (CCAI): multi-stage process sourcing and integrating input into LMs—from identifying target population principles training evaluating model. We demonstrate real-world practicality approach by what is, our knowledge, first...

10.1145/3630106.3658979 preprint EN cc-by 2022 ACM Conference on Fairness, Accountability, and Transparency 2024-06-03

How does language model pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on task performance. This method, partial reinitialization, involves replacing different layers with random weights, then finetuning entire and observing change in reveals that BERT, high probing performance downstream GLUE tasks are neither necessary nor sufficient accuracy those tasks. Furthermore, benefit using parameters varies...

10.18653/v1/2020.findings-emnlp.125 article EN cc-by 2020-01-01

Drones are becoming ubiquitous and offer support to people in various tasks, such as photography, increasingly interactive social contexts. We introduce drone.io, a projected body-centric graphical user interface for human-drone interaction. Using two simple gestures, users can interact with drone natural manner. drone.io is the first embedded on provide both input output capabilities. This paper describes design process of drone.io. present proof concept, drone-based implementation, well...

10.1109/hri.2019.8673011 article EN 2016 11th ACM/IEEE International Conference on Human-Robot Interaction (HRI) 2019-03-01

When trying to gain better visibility into a machine learning model in order understand and mitigate the associated risks, potentially valuable source of evidence is: which training examples most contribute given behavior? Influence functions aim answer counterfactual: how would model's parameters (and hence its outputs) change if sequence were added set? While influence have produced insights for small models, they are difficult scale large language models (LLMs) due difficulty computing an...

10.48550/arxiv.2308.03296 preprint EN other-oa arXiv (Cornell University) 2023-01-01

While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little known about how to explore quickly learn policies with good CVaR. In this paper, we present first algorithm sample-efficient of CVaR-optimal Markov decision processes based on optimism face uncertainty principle. This method relies a novel optimistic version...

10.1609/aaai.v34i04.5870 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Medical data poses a daunting challenge for AI algorithms: it exists in many different modalities, experiences frequent distribution shifts, and suffers from scarcity of examples labels. Recent advances, including transformers self-supervised learning, promise more universal approach that can be applied flexibly across these diverse conditions. To measure drive progress this direction, we present BenchMD: benchmark tests how well unified, modality-agnostic methods, architectures training...

10.48550/arxiv.2304.08486 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Many recent methods for unsupervised representation learning train models to be invariant different "views," or distorted versions of an input. However, designing these views requires considerable trial and error by human experts, hindering widespread adoption across domains modalities. To address this, we propose viewmaker networks: generative that learn produce useful from a given Viewmakers are stochastic bounded adversaries: they generating then adding $\ell_p$-bounded perturbation the...

10.48550/arxiv.2010.07432 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information these can we force them better across this hierarchy? We approach question by focusing on individual neurons, analyzing the behavior of their activations timescales. show that signal processing provides a natural framework for separating enabling us 1) disentangle scale-specific in existing embeddings 2) train learn more...

10.48550/arxiv.2011.04823 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Methods for designing organic materials with desired properties have high potential impact across fields such as medicine, renewable energy, petrochemical engineering, and agriculture. However, using generative modeling to design substances is difficult because candidate compounds must satisfy multiple constraints, including synthetic accessibility other metrics that are intuitive domain experts but challenging quantify. We propose C5T5, a novel self-supervised pretraining method enables...

10.48550/arxiv.2108.10307 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Models can fail in unpredictable ways during deployment due to task ambiguity, when multiple behaviors are consistent with the provided training data. An example is an object classifier trained on red squares and blue circles: encountering squares, intended behavior undefined. We investigate whether pretrained models better active learners, capable of disambiguating between possible tasks a user may be trying specify. Intriguingly, we find that learning emergent property pretraining process:...

10.48550/arxiv.2204.08491 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, Sandhini Agarwal, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.

10.18653/v1/n19-1365 article EN 2019-01-01

Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from combination context, instructions, examples. We investigate how both humans behave in face such task ambiguity by proposing AmbiBench, new benchmark six ambiguously-specified classification tasks. evaluate on AmbiBench seeing well they identify using 1) instructions...

10.48550/arxiv.2212.10711 preprint EN other-oa arXiv (Cornell University) 2022-01-01

When we transfer a pretrained language model to new language, there are many axes of variation that change at once. To disentangle the impact different factors like syntactic similarity and vocabulary similarity, propose set controlled studies: systematically transform GLUE benchmark, altering one axis crosslingual time, then measure resulting drops in model's downstream performance. We find models can largely recover from syntactic-style shifts, but cannot misalignment embedding matrix...

10.48550/arxiv.2202.12312 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, speech processing. However, these algorithms are domain-specific, meaning that new self-supervised must be developed for each setting, myriad healthcare, scientific, multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark learning. perform well on DABS, an algorithm is...

10.48550/arxiv.2111.12062 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses current limited availability 3D reacting non-reacting flow simulation data. With this data, benchmark total 49 variations five deep learning approaches super-resolution - can be applied improving...

10.48550/arxiv.2309.13457 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting writing prompts for challenging--especially in that involve unusual edge cases, demand precise articulation of nebulous preferences, require an accurate mental model LM behavior. We propose use *LMs themselves* guide the task specification process. In this paper, we introduce **Generative Active Task Elicitation (GATE)**: a learning framework which elicit and...

10.48550/arxiv.2310.11589 preprint EN other-oa arXiv (Cornell University) 2023-01-01

As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination contexts raises ethical concerns, motivating the need better methods evaluate these risks. We present a method proactively evaluating discriminatory impact of LMs wide range use cases, including hypothetical cases where they have not yet been deployed. Specifically, we an LM generate array...

10.48550/arxiv.2312.03689 preprint EN other-oa arXiv (Cornell University) 2023-01-01

How are AI assistants being used in the real world? While model providers theory have a window into this impact via their users' data, both privacy concerns and practical challenges made analyzing data difficult. To address these issues, we present Clio (Claude insights observations), privacy-preserving platform that uses themselves to analyze surface aggregated usage patterns across millions of conversations, without need for human reviewers read raw conversations. We validate can be done...

10.48550/arxiv.2412.13678 preprint EN arXiv (Cornell University) 2024-12-18
Coming Soon ...