Tongshuang Wu

ORCID: 0000-0003-1630-0588
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Topic Modeling
  • Software Engineering Research
  • Natural Language Processing Techniques
  • Explainable Artificial Intelligence (XAI)
  • Ethics and Social Impacts of AI
  • Multimodal Machine Learning Applications
  • Scientific Computing and Data Management
  • Software Engineering Techniques and Practices
  • Data Visualization and Analytics
  • Mobile Crowdsensing and Crowdsourcing
  • Big Data and Business Intelligence
  • Adversarial Robustness in Machine Learning
  • Artificial Intelligence in Healthcare and Education
  • AI in Service Interactions
  • Software System Performance and Reliability
  • Semantic Web and Ontologies
  • Software Testing and Debugging Techniques
  • Text and Document Classification Technologies
  • Speech and dialogue systems
  • Data Quality and Management
  • Digital Games and Media
  • Spreadsheets and End-User Computing
  • Misinformation and Its Impacts
  • Consumer Market Behavior and Pricing
  • Information Retrieval and Search Behavior

Carnegie Mellon University
2022-2025

University of Washington
2019-2023

Administration for Community Living
2023

Tokyo Institute of Technology
2023

IT University of Copenhagen
2023

American Jewish Committee
2023

Mongolia International University
2023

RIKEN Center for Advanced Intelligence Project
2023

Microsoft (United States)
2022

University of Notre Dame
2022

Although measuring held-out accuracy has been the primary approach to evaluate generalization, it often overestimates performance of NLP models, while alternative approaches for evaluating models either focus on individual tasks or specific behaviors. Inspired by principles behavioral testing in software engineering, we introduce CheckList, a task-agnostic methodology models. CheckList includes matrix general linguistic capabilities and test types that facilitate comprehensive ideation, as...

10.18653/v1/2020.acl-main.442 article EN cc-by 2020-01-01

Many researchers motivate explainable AI with studies showing that human-AI team performance on decision-making tasks improves when the explains its recommendations. However, prior observed improvements from explanations only AI, alone, outperformed both human and best team. Can help lead to complementary performance, where accuracy is higher than either or working solo? We conduct mixed-method user three datasets, an comparable humans helps participants solve a task (explaining itself in...

10.1145/3411764.3445717 article EN 2021-05-06

Although large language models (LLMs) have demonstrated impressive potential on simple tasks, their breadth of scope, lack transparency, and insufficient controllability can make them less effective when assisting humans more complex tasks. In response, we introduce the concept Chaining LLM steps together, where output one step becomes input for next, thus aggregating gains per step. We first define a set primitive operations useful Chain construction, then present an interactive system...

10.1145/3491102.3517582 article EN CHI Conference on Human Factors in Computing Systems 2022-04-28

Tongshuang Wu, Marco Tulio Ribeiro, Jeffrey Heer, Daniel Weld. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.523 article EN cc-by 2021-01-01

While LLMs have made it possible to rapidly prototype new ML functionalities, many real-world applications involve complex tasks that cannot be easily handled via a single run of an LLM. Recent work has found chaining multiple LLM runs together (with the output one step being input next) can help users accomplish these more tasks, and in way is perceived transparent controllable. However, remains unknown what need when authoring their own chains – key lowering barriers for non-AI-experts...

10.1145/3491101.3519729 article EN CHI Conference on Human Factors in Computing Systems Extended Abstracts 2022-04-27

Despite its benefits for children's skill development and parent-child bonding, many parents do not often engage in interactive storytelling by having story-related dialogues with their child due to limited availability or challenges coming up appropriate questions. While recent advances made AI generation of questions from stories possible, the fully-automated approach excludes parent involvement, disregards educational goals, underoptimizes engagement. Informed need-finding interviews...

10.1145/3491102.3517479 article EN CHI Conference on Human Factors in Computing Systems 2022-04-28

Sensemaking in unfamiliar domains can be challenging, demanding considerable user effort to compare different options with respect various criteria. Prior research and our formative study found that people would benefit from reading an overview of information space upfront, including the criteria others previously useful. However, existing sensemaking tools struggle "cold-start" problem — it not only requires significant input previous users generate share these overviews, but such overviews...

10.1145/3613904.3642149 article EN cc-by-nc-sa 2024-05-11

Abstract Unlike traditional unsupervised clustering, semi-supervised clustering allows users to provide meaningful structure the data, which helps algorithm match user’s intent. Existing approaches require a significant amount of feedback from an expert improve clusters. In this paper, we ask whether large language model (LLM) can amplify expert’s guidance enable query-efficient, few-shot text clustering. We show that LLMs are surprisingly effective at improving explore three stages where be...

10.1162/tacl_a_00648 article EN cc-by Transactions of the Association for Computational Linguistics 2024-01-01

Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization a small sample errors can yield biased incomplete conclusions. This paper codifies model task agnostic principles for informative analysis, presents Errudite, an interactive tool better supporting this process. First, groups should be precisely defined reproducibility; Errudite supports with expressive domain-specific language. Second, avoid spurious...

10.18653/v1/p19-1073 article EN cc-by 2019-01-01

Automatically generated explanations of how machine learning (ML) models reason can help users understand and accept them. However, have unintended consequences: promoting over-reliance or undermining trust. This paper investigates shape users' perceptions ML with without the ability to provide feedback them: (1) does revealing model flaws increase desire "fix" them; (2) providing cause believe - wrongly that are introspective, will thus improve over time. Through two controlled experiments...

10.1145/3313831.3376624 article EN 2020-04-21

Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a every target perturbation, which expensive hard to generalize. We present Tailor, semantically-controlled generation system. Tailor builds pretrained seq2seq produces textual outputs conditioned control codes derived from semantic representations. craft set of operations modify the codes, in turn steer towards targeted attributes. These can be further...

10.18653/v1/2022.acl-long.228 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

A key challenge to visualization authoring is the process of getting familiar with complex user interfaces tools. Natural Language Interface (NLI) presents promising benefits due its learnability and usability. However, supporting NLIs for tools requires expertise in natural language processing, while existing are mostly designed visual analytic workflow. In this paper, we propose an authoring-oriented NLI pipeline by introducing a structured representation users' editing intents, called...

10.1109/tvcg.2022.3209357 article EN IEEE Transactions on Visualization and Computer Graphics 2022-01-01

Abstract Natural language generation has witnessed significant advancements due to the training of large models on vast internet-scale datasets. Despite these advancements, there exists a critical challenge: These can inadvertently generate content that is toxic, inaccurate, and unhelpful, existing automatic evaluation metrics often fall short identifying shortcomings. As become more capable, human feedback an invaluable signal for evaluating improving models. This survey aims provide...

10.1162/tacl_a_00626 article EN cc-by Transactions of the Association for Computational Linguistics 2023-01-01

Humans possess an extraordinary ability to create and utilize tools, allowing them overcome physical limitations explore new frontiers. With the advent of foundation models, AI systems have potential be equally adept in tool use as humans. This paradigm, i.e., learning with combines strengths specialized tools models achieve enhanced accuracy, efficiency, automation problem-solving. Despite its immense potential, there is still a lack comprehensive understanding key challenges,...

10.48550/arxiv.2304.08354 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Abstract One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is their sensitivity prompt wording—but interestingly, also display sensitivities instruction changes form response biases. We investigate extent which reflect human biases, if at all. look survey design, where biases caused by wordings “prompts” have been extensively explored social psychology literature. Drawing from these works, we design a dataset and framework evaluate whether exhibit...

10.1162/tacl_a_00685 article EN cc-by Transactions of the Association for Computational Linguistics 2024-01-01

Abstract The race to train language models on vast, diverse and inconsistently documented datasets raises pressing legal ethical concerns. To improve data transparency understanding, we convene a multi-disciplinary effort between machine learning experts systematically audit trace more than 1,800 text datasets. We develop tools standards the lineage of these datasets, including their source, creators, licences subsequent use. Our landscape analysis highlights sharp divides in composition...

10.1038/s42256-024-00878-8 article EN cc-by Nature Machine Intelligence 2024-08-30

Ying Xu, Dakuo Wang, Mo Yu, Daniel Ritchie, Bingsheng Yao, Tongshuang Wu, Zheng Zhang, Toby Li, Nora Bradford, Branda Sun, Tran Hoang, Yisi Sang, Yufang Hou, Xiaojuan Ma, Diyi Yang, Nanyun Peng, Zhou Mark Warschauer. Proceedings of the 60th Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2022.

10.18653/v1/2022.acl-long.34 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Efficiently reviewing scholarly literature and synthesizing prior art are crucial for scientific progress. Yet, the growing scale of publications burden knowledge make synthesis research threads more challenging than ever. While significant has been devoted to helping scholars interact with individual papers, building scattered across multiple papers remains a challenge. Most top-down (and LLMs) it difficult personalize iterate on output, while bottom-up is costly in time effort. Here, we...

10.1145/3586183.3606759 preprint EN cc-by 2023-10-21

Despite a surge collection of XAI methods, users still struggle to obtain required AI explanations. Previous research suggests chatbots as dynamic solutions, but the effective design conversational agents for practical human needs remains under-explored. This paper focuses on Conversational AI-assisted scientific writing tasks. Drawing from linguistic theories and formative studies, we identify four rationales: "multifaceted", "controllability", "mix-initiative", "context-aware drill-down"....

10.1145/3584931.3607492 article EN 2023-10-13

AI tools are increasingly deployed in community contexts. However, datasets used to evaluate typically created by developers and annotators outside a given community, which can yield misleading conclusions about performance. How might we empower communities drive the intentional design curation of evaluation for that impacts them? We investigate this question on Wikipedia, an online with multiple AI-based content moderation deployed. introduce Wikibench, system enables collaboratively curate...

10.1145/3613904.3642278 preprint EN cc-by-sa 2024-05-11

Prompting LLMs for complex tasks (e.g., building a trip advisor chatbot) needs humans to clearly articulate customized requirements “start the response with tl;dr”). However, existing prompt engineering instructions often lack focused training on requirement articulation and instead tend emphasize increasingly automatable strategies tricks like adding role-plays “think step-by-step”). To address gap, we introduce Requirement-Oriented Prompt Engineering (ROPE), paradigm that focuses human...

10.1145/3731756 article EN ACM Transactions on Computer-Human Interaction 2025-04-24

Existing question answering (QA) techniques are created mainly to answer questions asked by humans. But in educational applications, teachers often need decide what they should ask, order help students improve their narrative understanding capabilities. We design an automated question-answer generation (QAG) system for this education scenario: given a story book at the kindergarten eighth-grade level as input, our can automatically generate QA pairs that capable of testing variety dimensions...

10.18653/v1/2022.acl-long.54 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

Tools for Interactive Machine Learning (IML) enable end users to update models in a “rapid, focused, and incremental”—yet local—manner. In this work, we study the question of local decision making an IML context around feature selection sentiment classification task. Specifically, characterize utility interactive through combination human-subjects experiments computational simulations. We find that, expectation, modification fails improve model performance may hamper generalization due...

10.1145/3319616 article EN ACM Transactions on Computer-Human Interaction 2019-06-17

In this paper, we present a novel visual analytics system called NameClarifier to interactively disambiguate author names in publications by keeping humans the loop. Specifically, quantifies and visualizes similarities between ambiguous those that have been confirmed digital libraries. The are calculated using three key factors, namely, co-authorships, publication venues, temporal information. Our estimates all possible allocations, then provides cues users help them validate every case. By...

10.1109/tvcg.2016.2598465 article EN IEEE Transactions on Visualization and Computer Graphics 2016-08-05
Coming Soon ...