- Topic Modeling
- Natural Language Processing Techniques
- Advanced Text Analysis Techniques
- Text Readability and Simplification
- Hate Speech and Cyberbullying Detection
- Software Engineering Research
- Social Media and Politics
- Sentiment Analysis and Opinion Mining
- Explainable Artificial Intelligence (XAI)
- Ethics and Social Impacts of AI
- Misinformation and Its Impacts
- Computational and Text Analysis Methods
- Persona Design and Applications
- Opinion Dynamics and Social Influence
- Advanced Graph Neural Networks
- Media Influence and Politics
- Aerospace Engineering and Energy Systems
- Machine Learning and Data Classification
- Adversarial Robustness in Machine Learning
- Wikis in Education and Collaboration
- Domain Adaptation and Few-Shot Learning
- Privacy-Preserving Technologies in Data
- Online Learning and Analytics
- Speech and dialogue systems
- Reinforcement Learning in Robotics
Stanford University
2022-2024
Cornell University
2018-2021
Columbia University
2020
New York University
2020
George Washington University
2020
AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and adaptable to wide range downstream tasks. We call these foundation underscore their critically central yet incomplete character. This report provides thorough account opportunities risks models, ranging from capabilities language, vision, robotics, reasoning, human interaction) technical principles(e.g., model architectures, training procedures, data, systems,...
Neural abstractive summarization models are prone to generate content inconsistent with the source document, i.e. unfaithful. Existing automatic metrics do not capture such mistakes effectively. We tackle problem of evaluating faithfulness a generated summary given its document. first collected human annotations for outputs from numerous on two datasets. find that current exhibit trade-off between abstractiveness and faithfulness: less word overlap document more likely be Next, we propose an...
Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa...
Machine learning models that convert user-written text descriptions into images are now widely available online and used by millions of users to generate a day. We investigate the potential for these amplify dangerous complex stereotypes. find broad range ordinary prompts produce stereotypes, including simply mentioning traits, descriptors, occupations, or objects. For example, we cases prompting basic traits social roles resulting in reinforcing whiteness as ideal, occupations amplification...
Abstract Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, find instruction tuning, not size, is key to LLM’s zero-shot capability. Second, existing studies been limited by low-quality references, leading underestimates of performance lower few-shot...
Language models (LMs) are increasingly being used in open-ended contexts, where the opinions reflected by LMs response to subjective queries can have a profound impact, both on user satisfaction, as well shaping views of society at large. In this work, we put forth quantitative framework investigate -- leveraging high-quality public opinion polls and their associated human responses. Using framework, create OpinionsQA, new dataset for evaluating alignment LM with those 60 US demographic...
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood. By conducting a human evaluation on ten LLMs across different pretraining methods, prompts, and model scales, we make two important observations. First, find instruction tuning, not size, is key to LLM's zero-shot capability. Second, existing studies been limited by low-quality references, leading underestimates of performance lower few-shot finetuning...
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of cross-lingual abstractive summarization systems. extract article and summary pairs in 18 languages from WikiHow, high quality, collaborative resource how-to guides on diverse set topics written by human authors. create gold-standard article-summary alignments across aligning images that are used to describe each step an article. As baselines further studies, we evaluate performance existing methods our...
Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors. While prior work has proposed models that improve faithfulness, it is unclear whether the improvement comes an increased level of extractiveness model outputs as one naive way to make summarization more extractive. In this work, we present a framework for evaluating effective systems, by generating faithfulness-abstractiveness trade-off curve serves control at different operating points on...
Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks not well understood. We present Holistic Evaluation of Models (HELM) to improve transparency models. First, we taxonomize vast space potential scenarios (i.e. use cases) metrics desiderata) that interest LMs. Then select a broad subset based on coverage feasibility, noting what's missing or underrepresented (e.g. question answering neglected English...
Large language models (LLMs) may not equitably represent diverse global perspectives on societal issues. In this paper, we develop a quantitative framework to evaluate whose opinions model-generated responses are more similar to. We first build dataset, GlobalOpinionQA, comprised of questions and answers from cross-national surveys designed capture issues across different countries. Next, define metric that quantifies the similarity between LLM-generated survey human responses, conditioned...
To recognize and mitigate harms from large language models (LLMs), we need to understand the prevalence nuances of stereotypes in LLM outputs. Toward this end, present Marked Personas, a prompt-based method measure LLMs for intersectional demographic groups without any lexicon or data labeling.Grounded sociolinguistic concept markedness (which characterizes explicitly linguistically marked categories versus unmarked defaults), our proposed is twofold: 1) prompting an generate personas, i.e.,...
There is growing consensus that language model (LM) developers should not be the sole deciders of LM behavior, creating a need for methods enable broader public to collectively shape behavior systems affect them. To address this need, we present Collective Constitutional AI (CCAI): multi-stage process sourcing and integrating input into LMs—from identifying target population principles training evaluating model. We demonstrate real-world practicality approach by what is, our knowledge, first...
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on constantly evolving ecosystem of automated metrics, datasets, human evaluation standards. Due to this moving target, new models often still evaluate divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging identify the limitations current opportunities progress. Addressing limitation, GEM provides...
Many real-world applications of language models (LMs), such as writing assistance and code autocomplete, involve human-LM interaction. However, most benchmarks are non-interactive in that a model produces output without human involvement. To evaluate interaction, we develop new framework, Human-AI Language-based Interaction Evaluation (HALIE), defines the components interactive systems dimensions to consider when designing evaluation metrics. Compared standard, evaluation, HALIE captures (i)...
Esin Durmus, Claire Cardie. Proceedings of the 2018 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 2018.
When trying to gain better visibility into a machine learning model in order understand and mitigate the associated risks, potentially valuable source of evidence is: which training examples most contribute given behavior? Influence functions aim answer counterfactual: how would model's parameters (and hence its outputs) change if sequence were added set? While influence have produced insights for small models, they are difficult scale large language models (LLMs) due difficulty computing an...
Human feedback is commonly utilized to finetune AI assistants. But human may also encourage model responses that match user beliefs over truthful ones, a behaviour known as sycophancy. We investigate the prevalence of sycophancy in models whose finetuning procedure made use feedback, and potential role preference judgments such behavior. first demonstrate five state-of-the-art assistants consistently exhibit across four varied free-form text-generation tasks. To understand if preferences...
Polis is a platform that leverages machine intelligence to scale up deliberative processes. In this paper, we explore the opportunities and risks associated with applying Large Language Models (LLMs) towards challenges facilitating, moderating summarizing results of engagements. particular, demonstrate pilot experiments using Anthropic's Claude LLMs can indeed augment human help more efficiently run conversations. find summarization capabilities enable categorically new methods immense...
Faisal Ladhak, Esin Durmus, Mirac Suzgun, Tianyi Zhang, Dan Jurafsky, Kathleen McKeown, Tatsunori Hashimoto. Proceedings of the 17th Conference European Chapter Association for Computational Linguistics. 2023.
Online debate forums provide users a platform to express their opinions on controversial topics while being exposed from diverse set of viewpoints. Existing work in Natural Language Processing (NLP) has shown that linguistic features extracted the text and encoding characteristics audience are both critical persuasion studies. In this paper, we aim further investigate role discourse structure arguments online debates persuasiveness. particular, use factor graph model obtain for argument an...
Existing argumentation datasets have succeeded in allowing researchers to develop computational methods for analyzing the content, structure and linguistic features of argumentative text. They been much less successful fostering studies effect "user" traits -- characteristics beliefs participants on debate/argument outcome as this type user information is generally not available. This paper presents a dataset 78, 376 debates generated over 10-year period along with surprisingly comprehensive...
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated faithful explanation of model's actual (i.e., its process for question). We investigate hypotheses how CoT may be unfaithful, by examining model predictions change we intervene on (e.g., adding mistakes or paraphrasing it). Models show large variation across tasks in strongly condition predicting their answer, sometimes...
Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...