NFDI4DS | UHH-SEMS - Publication Details

Tongshuang Wu

ORCID: 0000-0003-1630-0588

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5004225142

Research Areas

Topic Modeling
Software Engineering Research
Natural Language Processing Techniques
Explainable Artificial Intelligence (XAI)
Ethics and Social Impacts of AI
Multimodal Machine Learning Applications
Scientific Computing and Data Management
Software Engineering Techniques and Practices
Data Visualization and Analytics
Mobile Crowdsensing and Crowdsourcing
Big Data and Business Intelligence
Adversarial Robustness in Machine Learning
Artificial Intelligence in Healthcare and Education
AI in Service Interactions
Software System Performance and Reliability
Semantic Web and Ontologies
Software Testing and Debugging Techniques
Text and Document Classification Technologies
Speech and dialogue systems
Data Quality and Management
Digital Games and Media
Spreadsheets and End-User Computing
Misinformation and Its Impacts
Consumer Market Behavior and Pricing
Information Retrieval and Search Behavior

Carnegie Mellon University
2022-2025

University of Washington
2019-2023

Administration for Community Living
2023

Tokyo Institute of Technology
2023

IT University of Copenhagen
2023

American Jewish Committee
2023

Mongolia International University
2023

RIKEN Center for Advanced Intelligence Project
2023

Microsoft (United States)
2022

University of Notre Dame
2022

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList

OPENALEX - Publications

Marco Túlio Ribeiro Tongshuang Wu Carlos Guestrin Sameer Singh

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or specific behaviors. Inspired by principles behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology models. CheckList includes matrix general linguistic capabilities and test types that facilitate comprehensive ideation, as...

10.18653/v1/2020.acl-main.442 article EN cc-by 2020-01-01

Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance

OPENALEX - Publications

Gagan Bansal Tongshuang Wu Joyce Zhou Raymond Fok Besmira Nushi and 3 more

Many researchers motivate explainable AI with studies showing that human-AI team performance on decision-making tasks improves when the explains its recommendations. However, prior observed improvements from explanations only AI, alone, outperformed both human and best team. Can help lead to complementary performance, where accuracy is higher than either or working solo? We conduct mixed-method user three datasets, an comparable humans helps participants solve a task (explaining itself in...

10.1145/3411764.3445717 article EN 2021-05-06

AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts

OPENALEX - Publications

Tongshuang Wu Michael Terry Carrie J. Cai

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack transparency, and insufficient controllability can make them less effective when assisting humans more complex tasks. In response, we introduce the concept Chaining LLM steps together, where output one step becomes input for next, thus aggregating gains per step. We first define a set primitive operations useful Chain construction, then present an interactive system...

10.1145/3491102.3517582 article EN CHI Conference on Human Factors in Computing Systems 2022-04-28

Polyjuice: Generating Counterfactuals for Explaining, Evaluating, and Improving Models

OPENALEX - Publications

Tongshuang Wu Marco Túlio Ribeiro Jeffrey Heer Daniel S. Weld

Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.523 article EN cc-by 2021-01-01

PromptChainer: Chaining Large Language Model Prompts through Visual Programming

OPENALEX - Publications

Tongshuang Wu Ellen Jiang Aaron Donsbach Jeff Gray Alejandra Molina and 2 more

While LLMs have made it possible to rapidly prototype new ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found chaining multiple LLM runs together (with the output one step being input next) can help users accomplish these more tasks, and in way is perceived transparent controllable. However, remains unknown what need when authoring their own chains – key lowering barriers for non-AI-experts...

10.1145/3491101.3519729 article EN CHI Conference on Human Factors in Computing Systems Extended Abstracts 2022-04-27

StoryBuddy: A Human-AI Collaborative Chatbot for Parent-Child Interactive Storytelling with Flexible Parental Involvement

OPENALEX - Publications

Zheng Zhang Ying Xu Yanhao Wang Bingsheng Yao Daniel Ritchie and 4 more

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges coming up appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, underoptimizes engagement. Informed need-finding interviews...

10.1145/3491102.3517479 article EN CHI Conference on Human Factors in Computing Systems 2022-04-28

Selenite: Scaffolding Online Sensemaking with Comprehensive Overviews Elicited from Large Language Models

OPENALEX - Publications

Michael Xieyang Liu Tongshuang Wu Tianying Chen Franklin Mingzhe Li Aniket Kittur and 1 more

Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect various criteria. Prior research and our formative study found that people would benefit from reading an overview of information space upfront, including the criteria others previously useful. However, existing sensemaking tools struggle "cold-start" problem — it not only requires significant input previous users generate share these overviews, but such overviews...

10.1145/3613904.3642149 article EN cc-by-nc-sa 2024-05-11

Large Language Models Enable Few-Shot Clustering

OPENALEX - Publications

Vijay Viswanathan Kiril Gashteovski Kiril Gashteovski Carolin Lawrence Tongshuang Wu and 1 more

Abstract Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure the data, which helps algorithm match user’s intent. Existing approaches require a significant amount of feedback from an expert improve clusters. In this paper, we ask whether large language model (LLM) can amplify expert’s guidance enable query-efficient, few-shot text clustering. We show that LLMs are surprisingly effective at improving explore three stages where be...

10.1162/tacl_a_00648 article EN cc-by Transactions of the Association for Computational Linguistics 2024-01-01

Errudite: Scalable, Reproducible, and Testable Error Analysis

OPENALEX - Publications

Tongshuang Wu Marco Túlio Ribeiro Jeffrey Heer Daniel S. Weld

Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization a small sample errors can yield biased incomplete conclusions. This paper codifies model task agnostic principles for informative analysis, presents Errudite, an interactive tool better supporting this process. First, groups should be precisely defined reproducibility; Errudite supports with expressive domain-specific language. Second, avoid spurious...

10.18653/v1/p19-1073 article EN cc-by 2019-01-01

No Explainability without Accountability

OPENALEX - Publications

Alison Smith Ron Fan Melissa Birchfield Tongshuang Wu Jordan Boyd‐Graber and 2 more

Automatically generated explanations of how machine learning (ML) models reason can help users understand and accept them. However, have unintended consequences: promoting over-reliance or undermining trust. This paper investigates shape users' perceptions ML with without the ability to provide feedback them: (1) does revealing model flaws increase desire "fix" them; (2) providing cause believe - wrongly that are introspective, will thus improve over time. Through two controlled experiments...

10.1145/3313831.3376624 article EN 2020-04-21

Tailor: Generating and Perturbing Text with Semantic Controls

OPENALEX - Publications

Alexis Ross Tongshuang Wu Hao Peng Matthew N. Peters Matt Gardner

Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a every target perturbation, which expensive hard to generalize. We present Tailor, semantically-controlled generation system. Tailor builds pretrained seq2seq produces textual outputs conditioned control codes derived from semantic representations. craft set of operations modify the codes, in turn steer towards targeted attributes. These can be further...

10.18653/v1/2022.acl-long.228 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Towards Natural Language-Based Visualization Authoring

OPENALEX - Publications

Yun Wang Zhitao Hou Leixian Shen Tongshuang Wu Jiaqi Wang and 3 more

A key challenge to visualization authoring is the process of getting familiar with complex user interfaces tools. Natural Language Interface (NLI) presents promising benefits due its learnability and usability. However, supporting NLIs for tools requires expertise in natural language processing, while existing are mostly designed visual analytic workflow. In this paper, we propose an authoring-oriented NLI pipeline by introducing a structured representation users' editing intents, called...

10.1109/tvcg.2022.3209357 article EN IEEE Transactions on Visualization and Computer Graphics 2022-01-01

Bridging the Gap: A Survey on Integrating (Human) Feedback for Natural Language Generation

OPENALEX - Publications

Patrick Fernandes Aman Madaan Emmy Liu António Farinhas Pedro Henrique Martins and 6 more

Abstract Natural language generation has witnessed significant advancements due to the training of large models on vast internet-scale datasets. Despite these advancements, there exists a critical challenge: These can inadvertently generate content that is toxic, inaccurate, and unhelpful, existing automatic evaluation metrics often fall short identifying shortcomings. As become more capable, human feedback an invaluable signal for evaluating improving models. This survey aims provide...

10.1162/tacl_a_00626 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

Tool Learning with Foundation Models

OPENALEX - Publications

Yujia Qin Shengding Hu Yankai Lin Weize Chen Ning Ding and 36 more

Humans possess an extraordinary ability to create and utilize tools, allowing them overcome physical limitations explore new frontiers. With the advent of foundation models, AI systems have potential be equally adept in tool use as humans. This paradigm, i.e., learning with combines strengths specialized tools models achieve enhanced accuracy, efficiency, automation problem-solving. Despite its immense potential, there is still a lack comprehensive understanding key challenges,...

10.48550/arxiv.2304.08354 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Do LLMs Exhibit Human-like Response Biases? A Case Study in Survey Design

OPENALEX - Publications

Lindia Tjuatja Valerie Chen Tongshuang Wu Ameet Talwalkwar Graham Neubig

Abstract One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity prompt wording—but interestingly, also display sensitivities instruction changes form response biases. We investigate extent which reflect human biases, if at all. look survey design, where biases caused by wordings “prompts” have been extensively explored social psychology literature. Drawing from these works, we design a dataset and framework evaluate whether exhibit...

10.1162/tacl_a_00685 article EN cc-by Transactions of the Association for Computational Linguistics 2024-01-01

A large-scale audit of dataset licensing and attribution in AI

OPENALEX - Publications

Shayne Longpre Robert Mahari Anthony Chen Naana Obeng-Marnu Damien Sileo and 12 more

Abstract The race to train language models on vast, diverse and inconsistently documented datasets raises pressing legal ethical concerns. To improve data transparency understanding, we convene a multi-disciplinary effort between machine learning experts systematically audit trace more than 1,800 text datasets. We develop tools standards the lineage of these datasets, including their source, creators, licences subsequent use. Our landscape analysis highlights sharp divides in composition...

10.1038/s42256-024-00878-8 article EN cc-by Nature Machine Intelligence 2024-08-30

Fantastic Questions and Where to Find Them: FairytaleQA – An Authentic Dataset for Narrative Comprehension

OPENALEX - Publications

Ying Xu Dakuo Wang Mo Yu Daniel Ritchie Bingsheng Yao and 13 more

Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Li, Nora Bradford, Branda Sun, Tran Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Mark Warschauer. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.34 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Synergi: A Mixed-Initiative System for Scholarly Synthesis and Sensemaking

OPENALEX - Publications

Hyeonsu B Kang Tongshuang Wu Joseph Chee Chang Aniket Kittur

Efficiently reviewing scholarly literature and synthesizing prior art are crucial for scientific progress. Yet, the growing scale of publications burden knowledge make synthesis research threads more challenging than ever. While significant has been devoted to helping scholars interact with individual papers, building scattered across multiple papers remains a challenge. Most top-down (and LLMs) it difficult personalize iterate on output, while bottom-up is costly in time effort. Here, we...

10.1145/3586183.3606759 preprint EN cc-by 2023-10-21

ConvXAI : Delivering Heterogeneous AI Explanations via Conversations to Support Human-AI Scientific Writing

OPENALEX - Publications

Hua Shen Chieh-Yang Huang Tongshuang Wu Ting-Hao Huang

Despite a surge collection of XAI methods, users still struggle to obtain required AI explanations. Previous research suggests chatbots as dynamic solutions, but the effective design conversational agents for practical human needs remains under-explored. This paper focuses on Conversational AI-assisted scientific writing tasks. Drawing from linguistic theories and formative studies, we identify four rationales: "multifaceted", "controllability", "mix-initiative", "context-aware drill-down"....

10.1145/3584931.3607492 article EN 2023-10-13

Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia

OPENALEX - Publications

Tzu-Sheng Kuo Aaron Halfaker Zirui Cheng Jiwoo Kim Meng-Hsin Wu and 3 more

AI tools are increasingly deployed in community contexts. However, datasets used to evaluate typically created by developers and annotators outside a given community, which can yield misleading conclusions about performance. How might we empower communities drive the intentional design curation of evaluation for that impacts them? We investigate this question on Wikipedia, an online with multiple AI-based content moderation deployed. introduce Wikibench, system enables collaboratively curate...

10.1145/3613904.3642278 preprint EN cc-by-sa 2024-05-11

What Should We Engineer in Prompts? Training Humans in Requirement-Driven LLM Use

OPENALEX - Publications

Qianou Ma Weirui Peng Chenyang Yang Hua Shen Kenneth R. Koedinger and 1 more

Prompting LLMs for complex tasks (e.g., building a trip advisor chatbot) needs humans to clearly articulate customized requirements “start the response with tl;dr”). However, existing prompt engineering instructions often lack focused training on requirement articulation and instead tend emphasize increasingly automatable strategies tricks like adding role-plays “think step-by-step”). To address gap, we introduce Requirement-Oriented Prompt Engineering (ROPE), paradigm that focuses human...

10.1145/3731756 article EN ACM Transactions on Computer-Human Interaction 2025-04-24

It is AI’s Turn to Ask Humans a Question: Question-Answer Pair Generation for Children’s Story Books

OPENALEX - Publications

Bingsheng Yao Dakuo Wang Tongshuang Wu Zheng Zhang Toby Li and 2 more

Existing question answering (QA) techniques are created mainly to answer questions asked by humans. But in educational applications, teachers often need decide what they should ask, order help students improve their narrative understanding capabilities. We design an automated question-answer generation (QAG) system for this education scenario: given a story book at the kindergarten eighth-grade level as input, our can automatically generate QA pairs that capable of testing variety dimensions...

10.18653/v1/2022.acl-long.54 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Local Decision Pitfalls in Interactive Machine Learning

OPENALEX - Publications

Tongshuang Wu Daniel S. Weld Jeffrey Heer

Tools for Interactive Machine Learning (IML) enable end users to update models in a “rapid, focused, and incremental”—yet local—manner. In this work, we study the question of local decision making an IML context around feature selection sentiment classification task. Specifically, characterize utility interactive through combination human-subjects experiments computational simulations. We find that, expectation, modification fails improve model performance may hamper generalization due...

10.1145/3319616 article EN ACM Transactions on Computer-Human Interaction 2019-06-17

NameClarifier: A Visual Analytics System for Author Name Disambiguation

OPENALEX - Publications

Qiaomu Shen Tongshuang Wu Haiyan Yang Yanhong Wu Huamin Qu and 1 more

In this paper, we present a novel visual analytics system called NameClarifier to interactively disambiguate author names in publications by keeping humans the loop. Specifically, quantifies and visualizes similarities between ambiguous those that have been confirmed digital libraries. The are calculated using three key factors, namely, co-authorships, publication venues, temporal information. Our estimates all possible allocations, then provides cues users help them validate every case. By...

10.1109/tvcg.2016.2598465 article EN IEEE Transactions on Visualization and Computer Graphics 2016-08-05

Coming Soon ...