NFDI4DS | UHH-SEMS - Publication Details

Diyi Yang

ORCID: 0000-0003-1220-3983

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5089413311

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Social Media and Politics
Hate Speech and Cyberbullying Detection
Sentiment Analysis and Opinion Mining
Speech and dialogue systems
Misinformation and Its Impacts
Online Learning and Analytics
Text Readability and Simplification
Wikis in Education and Collaboration
Explainable Artificial Intelligence (XAI)
Speech Recognition and Synthesis
Computational and Text Analysis Methods
Complex Network Analysis Techniques
Recommender Systems and Techniques
Advanced Text Analysis Techniques
Software Engineering Research
Mental Health via Writing
Domain Adaptation and Few-Shot Learning
Adversarial Robustness in Machine Learning
Innovative Teaching and Learning Methods
Text and Document Classification Technologies
Online and Blended Learning
Opinion Dynamics and Social Influence

Stanford University
2022-2025

Georgia Institute of Technology
2019-2023

University of Illinois Urbana-Champaign
2023

Amazon (United States)
2023

Laboratoire d'Informatique de Paris-Nord
2023

Google (United States)
2023

Harvard University Press
2023

University of Washington
2023

Dartmouth Hospital
2023

Harvard University
2023

Hierarchical Attention Networks for Document Classification

OPENALEX - Publications

Zichao Yang Diyi Yang Chris Dyer Xiaodong He Alex Smola and 1 more

Zichao Yang, Diyi Chris Dyer, Xiaodong He, Alex Smola, Eduard Hovy. Proceedings of the 2016 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2016.

10.18653/v1/n16-1174 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2016-01-01

That's So Annoying!!!: A Lexical and Frame-Semantic Embedding Based Data Augmentation Approach to Automatic Categorization of Annoying Behaviors using #petpeeve Tweets

OPENALEX - Publications

William Yang Wang Diyi Yang

We propose a novel data augmentation approach to enhance computational behavioral analysis using social media text.In particular, we collect Twitter corpus of the descriptions annoying behaviors #petpeeve hashtags.In qualitative analysis, study language use in these tweets, with special focus on fine-grained categories and geographic variation language.In quantitative show that lexical syntactic features are useful for automatic categorization behaviors, frame-semantic further boost...

10.18653/v1/d15-1306 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Is ChatGPT a General-Purpose Natural Language Processing Task Solver?

OPENALEX - Publications

Chengwei Qin Aston Zhang Zhuosheng Zhang Jiaao Chen Michihiro Yasunaga and 1 more

Spurred by advancements in scale, large language models (LLMs) have demonstrated the ability to perform a variety of natural processing (NLP) tasks zero-shot—i.e., without adaptation on downstream data. Recently, debut ChatGPT has drawn great deal attention from community due fact that it can generate high-quality responses human input and self-correct previous mistakes based subsequent conversations. However, is not yet known whether serve as generalist model many NLP zero-shot. In this...

10.18653/v1/2023.emnlp-main.85 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2023-01-01

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

OPENALEX - Publications

Jiaao Chen Zichao Yang Diyi Yang

This paper presents MixText, a semi-supervised learning method for text classification, which uses our newly designed data augmentation called TMix. TMix creates large amount of augmented training samples by interpolating in hidden space. Moreover, we leverage recent advances to guess low-entropy labels unlabeled data, hence making them as easy use labeled data. By mixing labeled, and MixText significantly outperformed current pre-trained fined-tuned models other state-of-the-art methods on...

10.18653/v1/2020.acl-main.194 preprint EN cc-by 2020-01-01

Humor Recognition and Humor Anchor Extraction

OPENALEX - Publications

Diyi Yang Arnon Lavie Chris Dyer Eduard Hovy

Humor is an essential component in personal communication. How to create computational models discover the structures behind humor, recognize humor and even extract anchors remains a challenge. In this work, we first identify several semantic design sets of features for each structure, next employ approach humor. Furthermore, develop simple effective method that enable sentence. Experiments conducted on two datasets demonstrate our recognizer automatically distinguishing between humorous...

10.18653/v1/d15-1284 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2015-01-01

Can Large Language Models Transform Computational Social Science?

OPENALEX - Publications

Caleb Ziems William A. Held Omar Ahmed Shaikh Jiaao Chen Zhehao Zhang and 1 more

Abstract Large language models (LLMs) are capable of successfully performing many processing tasks zero-shot (without training data). If LLMs can also reliably classify and explain social phenomena like persuasiveness political ideology, then could augment the computational science (CSS) pipeline in important ways. This work provides a road map for using as CSS tools. Towards this end, we contribute set prompting best practices an extensive evaluation to measure performance 13 on 25...

10.1162/coli_a_00502 article EN cc-by-nc-nd Computational Linguistics 2023-12-12

ToTTo: A Controlled Table-To-Text Generation Dataset

OPENALEX - Publications

Ankur P. Parikh Xuezhi Wang Sebastian Gehrmann Manaal Faruqui Bhuwan Dhingra and 2 more

Ankur Parikh, Xuezhi Wang, Sebastian Gehrmann, Manaal Faruqui, Bhuwan Dhingra, Diyi Yang, Dipanjan Das. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2020.

10.18653/v1/2020.emnlp-main.89 article EN cc-by 2020-01-01

The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics

OPENALEX - Publications

Sebastian Gehrmann Tosin Adewumi Karmanya Aggarwal Pawan Sasanka Ammanamanchi Anuoluwapo Aremu and 51 more

Sebastian Gehrmann, Tosin Adewumi, Karmanya Aggarwal, Pawan Sasanka Ammanamanchi, Anuoluwapo Aremu, Antoine Bosselut, Khyathi Raghavi Chandu, Miruna-Adriana Clinciu, Dipanjan Das, Kaustubh Dhole, Wanyu Du, Esin Durmus, Ondřej Dušek, Chris Chinenye Emezue, Varun Gangal, Cristina Garbacea, Tatsunori Hashimoto, Yufang Hou, Yacine Jernite, Harsh Jhamtani, Yangfeng Ji, Shailza Jolly, Mihir Kale, Dhruv Kumar, Faisal Ladhak, Aman Madaan, Mounica Maddela, Khyati Mahajan, Saad Mahamood, Bodhisattwa...

10.18653/v1/2021.gem-1.10 preprint ID cc-by 2021-01-01

Using large language models in psychology

OPENALEX - Publications

Dorottya Demszky Diyi Yang David S. Yeager Christopher J. Bryan Margarett Clapper and 13 more

10.1038/s44159-023-00241-5 article EN Nature Reviews Psychology 2023-10-13

Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond

OPENALEX - Publications

Amir Feder Katherine A. Keith Emaad Manzoor Reid Pryzant Dhanya Sridhar and 8 more

Abstract A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had same importance Natural Language Processing (NLP), which traditionally placed more emphasis on predictive tasks. This distinction beginning fade, with an emerging area interdisciplinary at convergence inference language processing. Still, NLP remains scattered across domains without unified definitions, benchmark...

10.1162/tacl_a_00511 article EN cc-by Transactions of the Association for Computational Linguistics 2022-01-01

Evaluating the Effectiveness of Deplatforming as a Moderation Strategy on Twitter

OPENALEX - Publications

Shagun Jhaver Christian Boylston Diyi Yang Amy Bruckman

Deplatforming refers to the permanent ban of controversial public figures with large followings on social media sites. In recent years, platforms like Facebook, Twitter and YouTube have deplatformed many influencers curb spread offensive speech. We present a case study three high-profile who were Twitter---Alex Jones, Milo Yiannopoulos, Owen Benjamin. Working over 49M tweets, we found that deplatforming significantly reduced number conversations about all individuals Twitter. Further,...

10.1145/3479525 article EN Proceedings of the ACM on Human-Computer Interaction 2021-10-13

Latent Hatred: A Benchmark for Understanding Implicit Hate Speech

OPENALEX - Publications

Mai ElSherief Caleb Ziems David Muchlinski Vaishnavi Anupindi Jordyn Seybolt and 2 more

Mai ElSherief, Caleb Ziems, David Muchlinski, Vaishnavi Anupindi, Jordyn Seybolt, Munmun De Choudhury, Diyi Yang. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.

10.18653/v1/2021.emnlp-main.29 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

The Importance of Modeling Social Factors of Language: Theory and Practice

OPENALEX - Publications

Dirk Hovy Diyi Yang

Natural language processing (NLP) applications are now more powerful and ubiquitous than ever before. With rapidly developing (neural) models ever-more available data, current NLP have access to information any human speaker during their life. Still, it would be hard argue that reached human-level capacity. In this position paper, we the reason for limitations is a focus on content while ignoring language's social factors. We show systems systematically break down when faced with...

10.18653/v1/2021.naacl-main.49 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2021-01-01

An Empirical Survey of Data Augmentation for Limited Data Learning in NLP

OPENALEX - Publications

Jiaao Chen Derek Tam Colin Raffel Mohit Bansal Diyi Yang

Abstract NLP has achieved great progress in the past decade through use of neural models and large labeled datasets. The dependence on abundant data prevents from being applied to low-resource settings or novel tasks where significant time, money, expertise is required label massive amounts textual data. Recently, augmentation methods have been explored as a means improving efficiency NLP. To date, there no systematic empirical overview for limited setting, making it difficult understand...

10.1162/tacl_a_00542 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

How Climate Movement Actors and News Media Frame Climate Change and Strike: Evidence from Analyzing Twitter and News Media Discourse from 2018 to 2021

OPENALEX - Publications

Kaiping Chen Amanda L. Molder Zening Duan Shelley Boulianne Christopher Eckart and 2 more

Twitter enables an online public sphere for social movement actors, news organizations, and others to frame climate change the movement. In this paper, we analyze five million English tweets posted from 2018 2021 demonstrating how peaks in activity relate key events framing of strike discourse has evolved over past three years. We also collected 30,000 articles major sources English-speaking countries (Australia, Canada, United States, Kingdom) demonstrate actors media differ their issue,...

10.1177/19401612221106405 article EN cc-by-nc The International Journal of Press/Politics 2022-06-19

Graph Vulnerability and Robustness: A Survey

OPENALEX - Publications

Scott Freitas Diyi Yang Srijan Kumar Hanghang Tong Duen Horng Chau

The study of network robustness is a critical tool in the characterization and sense making complex interconnected systems such as infrastructure, communication social networks. While significant research has been conducted these areas, gaps surveying literature still exist. Answers to key questions are currently scattered across multiple scientific fields numerous papers. In this survey, we distill findings domains provide researchers crucial access important information by(1) summarizing...

10.1109/tkde.2022.3163672 article EN IEEE Transactions on Knowledge and Data Engineering 2022-01-01

Social factors that contribute to attrition in MOOCs

OPENALEX - Publications

Carolyn Penstein Rosé Ryan G. Carlson Diyi Yang Miaomiao Wen Lauren Β. Resnick and 2 more

In this paper, we explore student dropout behavior in a Massively Open Online Course (MOOC). We use survival model to measure the impact of three social factors that make predictions about attrition along way for students who have participated course discussion forum.

10.1145/2556325.2567879 article EN 2014-02-25

Automatically Neutralizing Subjective Bias in Text

OPENALEX - Publications

Reid Pryzant Richard Diehl Martinez Nathan Dass Sadao Kurohashi Dan Jurafsky and 1 more

Texts like news, encyclopedias, and some social media strive for objectivity. Yet bias in the form of inappropriate subjectivity — introducing attitudes via framing, presupposing truth, casting doubt remains ubiquitous. This kind erodes our collective trust fuels conflict. To address this issue, we introduce a novel testbed natural language generation: automatically bringing inappropriately subjective text into neutral point view (“neutralizing” biased text). We also offer first parallel...

10.1609/aaai.v34i01.5385 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2020-04-03

Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization

OPENALEX - Publications

Jiaao Chen Diyi Yang

Text summarization is one of the most challenging and interesting problems in NLP. Although much attention has been paid to summarizing structured text like news reports or encyclopedia articles, conversations—an essential part human-human/machine interaction where important pieces information are scattered across various utterances different speakers—remains relatively under-investigated. This work proposes a multi-view sequence-to-sequence model by first extracting conversational...

10.18653/v1/2020.emnlp-main.336 article EN cc-by 2020-01-01

Exploring the Effect of Confusion in Discussion Forums of Massive Open Online Courses

OPENALEX - Publications

Diyi Yang Miaomiao Wen Iris Howley Robert E. Kraut Carolyn Penstein Rosé

Thousands of students enroll in Massive Open Online Courses~(MOOCs) to seek opportunities for learning and self-improvement. However, the process often involves struggles with confusion, which may have an adverse effect on course participation experience, leading dropout along way. In this paper, we quantify that effect. We describe a classification model using discussion forum behavior clickstream data automatically identify posts express confusion. then apply survival analysis impact...

10.1145/2724660.2724677 article EN 2015-03-09

Linguistic Reflections of Student Engagement in Massive Open Online Courses

OPENALEX - Publications

Miaomiao Wen Diyi Yang Carolyn Penstein Rosé

While data from Massive Open Online Courses (MOOCs) offers the potential to gain new insights into ways in which online communities can contribute student learning, much of richness trace is still yet be mined. In particular, very little work has attempted fine-grained content analyses interactions MOOCs. Survey research indicates importance goals and intentions keeping them involved a MOOC over time. Automated offer detect monitor evidence engagement how it relates other aspects their...

10.1609/icwsm.v8i1.14512 article EN Proceedings of the International AAAI Conference on Web and Social Media 2014-05-16

The Channel Matters

OPENALEX - Publications

Diyi Yang Zheng Yao Joseph Seering Robert E. Kraut

People with health concerns go to online support groups obtain help and advice. To do so, they frequently disclose personal details, many times in public. Although research non-health settings suggests that people self-disclose less public than private, this pattern may not apply where want get relevant help. Our work examines how the use of private channels influences members' self-disclosure an cancer group, moderate influence on reciprocity receiving support. By automatically measuring...

10.1145/3290605.3300261 article EN 2019-04-29

Seekers, Providers, Welcomers, and Storytellers

OPENALEX - Publications

Diyi Yang Robert E. Kraut Tenbroeck Smith Elijah Mayfield Dan Jurafsky

Participants in online communities often enact different roles when participating their communities. For example, some cancer support specialize providing disease-related information or socializing new members. This work clusters the behavioral patterns of users a community into specific functional roles. Based on series quantitative and qualitative evaluations, this research identified eleven that members occupy, such as welcomer story sharer. We investigated role dynamics, including how...

10.1145/3290605.3300574 article EN 2019-04-29

Racism is a virus

OPENALEX - Publications

Bing He Caleb Ziems Sandeep Soni Naren Ramakrishnan Diyi Yang and 1 more

The spread of COVID-19 has sparked racism and hate on social media targeted towards Asian communities. However, little is known about how racial spreads during a pandemic the role counterspeech in mitigating this spread. In work, we study evolution anti-Asian speech through lens Twitter. We create COVID-HATE, largest dataset spanning 14 months, containing over 206 million tweets, network with 127 nodes. By creating novel hand-labeled 3,355 train text classifier to identify hateful tweets...

10.1145/3487351.3488324 article EN 2021-11-08

Coming Soon ...