- Topic Modeling
- Natural Language Processing Techniques
- Domain Adaptation and Few-Shot Learning
- Speech Recognition and Synthesis
- Multimodal Machine Learning Applications
- Remote-Sensing Image Classification
- Reinforcement Learning in Robotics
- Ethics and Social Impacts of AI
- Adversarial Robustness in Machine Learning
- Advanced Bandit Algorithms Research
- Artificial Intelligence in Healthcare and Education
- Generative Adversarial Networks and Image Synthesis
- Music and Audio Processing
- Machine Learning in Healthcare
- Explainable Artificial Intelligence (XAI)
- Simulation Techniques and Applications
- Text Readability and Simplification
- Stellar, planetary, and galactic studies
- Data Stream Mining Techniques
- Socioeconomic Development in MENA
- Speech and Audio Processing
- Advanced Vision and Imaging
- Hearing Loss and Rehabilitation
- Experimental Behavioral Economics Studies
- Fullerene Chemistry and Applications
Stanford University
2019-2025
Chinese University of Hong Kong
2022
University of Michigan
2022
University of Minnesota
2022
Columbia University
2022
University of California, Santa Cruz
2022
Bauhaus-Universität Weimar
2022
Leipzig University
2022
University of Minnesota System
2022
North Carolina State University
2022
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and adaptable to wide range downstream tasks. We call these foundation underscore their critically central yet incomplete character. This report provides thorough account opportunities risks models, ranging from capabilities language, vision, robotics, reasoning, human interaction) technical principles(e.g., model architectures, training procedures, data, systems,...
On October 14th, 2020, researchers from OpenAI, the Stanford Institute for Human-Centered Artificial Intelligence, and other universities convened to discuss open research questions surrounding GPT-3, largest publicly-disclosed dense language model at time. The meeting took place under Chatham House Rules. Discussants came a variety of backgrounds including computer science, linguistics, philosophy, political communications, cyber policy, more. Broadly, discussion centered around two main...
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed capture issues across different countries. Next, define metric that quantifies the similarity between LLM-generated survey human responses, conditioned...
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods enable broader public to collectively shape behavior systems affect them. To address this need, we present Collective Constitutional AI (CCAI): multi-stage process sourcing and integrating input into LMs—from identifying target population principles training evaluating model. We demonstrate real-world practicality approach by what is, our knowledge, first...
How does language model pretraining help transfer learning? We consider a simple ablation technique for determining the impact of each pretrained layer on task performance. This method, partial reinitialization, involves replacing different layers with random weights, then finetuning entire and observing change in reveals that BERT, high probing performance downstream GLUE tasks are neither necessary nor sufficient accuracy those tasks. Furthermore, benefit using parameters varies...
Drones are becoming ubiquitous and offer support to people in various tasks, such as photography, increasingly interactive social contexts. We introduce drone.io, a projected body-centric graphical user interface for human-drone interaction. Using two simple gestures, users can interact with drone natural manner. drone.io is the first embedded on provide both input output capabilities. This paper describes design process of drone.io. present proof concept, drone-based implementation, well...
When trying to gain better visibility into a machine learning model in order understand and mitigate the associated risks, potentially valuable source of evidence is: which training examples most contribute given behavior? Influence functions aim answer counterfactual: how would model's parameters (and hence its outputs) change if sequence were added set? While influence have produced insights for small models, they are difficult scale large language models (LLMs) due difficulty computing an...
While maximizing expected return is the goal in most reinforcement learning approaches, risk-sensitive objectives such as conditional value at risk (CVaR) are more suitable for many high-stakes applications. However, relatively little known about how to explore quickly learn policies with good CVaR. In this paper, we present first algorithm sample-efficient of CVaR-optimal Markov decision processes based on optimism face uncertainty principle. This method relies a novel optimistic version...
Medical data poses a daunting challenge for AI algorithms: it exists in many different modalities, experiences frequent distribution shifts, and suffers from scarcity of examples labels. Recent advances, including transformers self-supervised learning, promise more universal approach that can be applied flexibly across these diverse conditions. To measure drive progress this direction, we present BenchMD: benchmark tests how well unified, modality-agnostic methods, architectures training...
Many recent methods for unsupervised representation learning train models to be invariant different "views," or distorted versions of an input. However, designing these views requires considerable trial and error by human experts, hindering widespread adoption across domains modalities. To address this, we propose viewmaker networks: generative that learn produce useful from a given Viewmakers are stochastic bounded adversaries: they generating then adding $\ell_p$-bounded perturbation the...
Language exhibits structure at different scales, ranging from subwords to words, sentences, paragraphs, and documents. To what extent do deep models capture information these can we force them better across this hierarchy? We approach question by focusing on individual neurons, analyzing the behavior of their activations timescales. show that signal processing provides a natural framework for separating enabling us 1) disentangle scale-specific in existing embeddings 2) train learn more...
Methods for designing organic materials with desired properties have high potential impact across fields such as medicine, renewable energy, petrochemical engineering, and agriculture. However, using generative modeling to design substances is difficult because candidate compounds must satisfy multiple constraints, including synthetic accessibility other metrics that are intuitive domain experts but challenging quantify. We propose C5T5, a novel self-supervised pretraining method enables...
Models can fail in unpredictable ways during deployment due to task ambiguity, when multiple behaviors are consistent with the provided training data. An example is an object classifier trained on red squares and blue circles: encountering squares, intended behavior undefined. We investigate whether pretrained models better active learners, capable of disambiguating between possible tasks a user may be trying specify. Intriguingly, we find that learning emergent property pretraining process:...
Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, Sandhini Agarwal, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen. Proceedings of the 2019 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019.
Language models have recently achieved strong performance across a wide range of NLP benchmarks. However, unlike benchmarks, real world tasks are often poorly specified, and agents must deduce the user's intended behavior from combination context, instructions, examples. We investigate how both humans behave in face such task ambiguity by proposing AmbiBench, new benchmark six ambiguously-specified classification tasks. evaluate on AmbiBench seeing well they identify using 1) instructions...
When we transfer a pretrained language model to new language, there are many axes of variation that change at once. To disentangle the impact different factors like syntactic similarity and vocabulary similarity, propose set controlled studies: systematically transform GLUE benchmark, altering one axis crosslingual time, then measure resulting drops in model's downstream performance. We find models can largely recover from syntactic-style shifts, but cannot misalignment embedding matrix...
Self-supervised learning algorithms, including BERT and SimCLR, have enabled significant strides in fields like natural language processing, computer vision, speech processing. However, these algorithms are domain-specific, meaning that new self-supervised must be developed for each setting, myriad healthcare, scientific, multimodal domains. To catalyze progress toward domain-agnostic methods, we introduce DABS: a Domain-Agnostic Benchmark learning. perform well on DABS, an algorithm is...
Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses current limited availability 3D reacting non-reacting flow simulation data. With this data, benchmark total 49 variations five deep learning approaches super-resolution - can be applied improving...
Language models (LMs) can be directed to perform target tasks by using labeled examples or natural language prompts. But selecting writing prompts for challenging--especially in that involve unusual edge cases, demand precise articulation of nebulous preferences, require an accurate mental model LM behavior. We propose use *LMs themselves* guide the task specification process. In this paper, we introduce **Generative Active Task Elicitation (GATE)**: a learning framework which elicit and...
As language models (LMs) advance, interest is growing in applying them to high-stakes societal decisions, such as determining financing or housing eligibility. However, their potential for discrimination contexts raises ethical concerns, motivating the need better methods evaluate these risks. We present a method proactively evaluating discriminatory impact of LMs wide range use cases, including hypothetical cases where they have not yet been deployed. Specifically, we an LM generate array...
How are AI assistants being used in the real world? While model providers theory have a window into this impact via their users' data, both privacy concerns and practical challenges made analyzing data difficult. To address these issues, we present Clio (Claude insights observations), privacy-preserving platform that uses themselves to analyze surface aggregated usage patterns across millions of conversations, without need for human reviewers read raw conversations. We validate can be done...