NFDI4DS | UHH-SEMS - Publication Details

Ziyu Yao

ORCID: 0009-0007-4571-3505

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101616607

Research Areas

Topic Modeling
Natural Language Processing Techniques
Multimodal Machine Learning Applications
Software Engineering Research
Speech and dialogue systems
Big Data and Business Intelligence
Explainable Artificial Intelligence (XAI)
Semantic Web and Ontologies
Machine Learning and Algorithms
Advanced Text Analysis Techniques
Text and Document Classification Technologies
Biomedical Text Mining and Ontologies
Sleep and Work-Related Fatigue
Artificial Intelligence in Law
Text Readability and Simplification
Artificial Intelligence in Games
Domain Adaptation and Few-Shot Learning
Robotics and Automated Systems
Ferroelectric and Negative Capacitance Devices
Computability, Logic, AI Algorithms
Web Data Mining and Analysis
Ergonomics and Musculoskeletal Disorders
Cognitive Science and Education Research
Personal Information Management and User Behavior
AI-based Problem Solving and Planning

George Mason University
2021-2024

Bridge University
2023

The Ohio State University
2018-2021

Moscow Institute of Physics and Technology
2021

Rensselaer Polytechnic Institute
2021

RheinMain University of Applied Sciences
2021

Moscow State University
2021

Lomonosov Moscow State University
2021

ITMO University
2021

Qingdao University
2021

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

OPENALEX - Publications

Tianbao Xie Chen Wu Peng Shi Ruiqi Zhong Torsten Scholak and 18 more

Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Ming Pengcheng Yin, Sida I. Wang, Victor Bailin Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.

10.18653/v1/2022.emnlp-main.39 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2022-01-01

Model-based Interactive Semantic Parsing: A Unified Framework and A Text-to-SQL Case Study

OPENALEX - Publications

Ziyu Yao Yu Su Huan Sun Wen-tau Yih

Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.

10.18653/v1/d19-1547 article EN cc-by 2019-01-01

Instance Needs More Care: Rewriting Prompts for Instances Yields Better Zero-Shot Performance

OPENALEX - Publications

Saurabh Srivastava Chengyue Huang Weiguo Fan Ziyu Yao

Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing generalizability. Despite its advancements, current methods using trigger phrases such as ``Let's think step by step'' remain limited. This study introduces PRomPTed, an approach that optimizes prompts individual instances following innovative manner of ``LLMs in loop''. Our comprehensive evaluation across 13 datasets and 10 types based on GPT-4...

10.48550/arxiv.2310.02107 preprint EN other-oa arXiv (Cornell University) 2023-01-01

CoaCor: Code Annotation for Code Retrieval with Reinforcement Learning

OPENALEX - Publications

Ziyu Yao Jayavardhan Reddy Peddamail Huan Sun

To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have widely studied: retrieval, which aims retrieve snippets relevant a given natural language query from base, annotation, where goal is annotate snippet with description. Despite their advancement in recent years, two are mostly explored separately. In this work, we investigate novel perspective Code annotation for...

10.1145/3308558.3313632 preprint EN 2019-05-13

StaQC

OPENALEX - Publications

Ziyu Yao Daniel S. Weld Weipeng Chen Huan Sun

Stack Overflow (SO) has been a great source of natural language questions and their code solutions (i.e., question-code pairs), which are critical for many tasks including retrieval annotation. In most existing research, pairs were collected heuristically tend to have low quality. this paper, we investigate new problem systematically mining from (in contrast collecting them). It is formulated as predicting whether or not snippet standalone solution question. We propose novel Bi-View...

10.1145/3178876.3186081 preprint EN 2018-01-01

Reinforced Dynamic Reasoning for Conversational Question Generation

OPENALEX - Publications

Boyuan Pan Hao Li Ziyu Yao Deng Cai Huan Sun

This paper investigates a new task named Conversational Question Generation (CQG) which is to generate question based on passage and conversation history (i.e., previous turns of question-answer pairs). CQG crucial for developing intelligent agents that can drive question-answering style conversations or test user understanding given passage. Towards end, we propose approach Reinforced Dynamic Reasoning network, the general encoder-decoder framework but incorporates reasoning procedure in...

10.18653/v1/p19-1203 preprint EN cc-by 2019-01-01

A Paradigm Shift from “Human Writing” to “Machine Generation” in Personality Test Development: an Application of State-of-the-Art Natural Language Processing

OPENALEX - Publications

Philseok Lee Shea Fyffe Mina Son Zihao Jia Ziyu Yao

10.1007/s10869-022-09864-6 article EN Journal of Business and Psychology 2022-11-30

Interactive Semantic Parsing for If-Then Recipes via Hierarchical Reinforcement Learning

OPENALEX - Publications

Ziyu Yao Xiujun Li Jianfeng Gao Brian M. Sadler Huan Sun

Given a text description, most existing semantic parsers synthesize program in one shot. However, it is quite challenging to produce correct solely based on the which reality often ambiguous or incomplete. In this paper, we investigate interactive parsing, where agent can ask user clarification questions resolve ambiguities via multi-turn dialogue, an important type of programs called “If-Then recipes.” We develop hierarchical reinforcement learning (HRL) that significantly improves parsing...

10.1609/aaai.v33i01.33012547 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2019-07-17

A Survey on Large Language Models for Automated Planning

OPENALEX - Publications

Mohamed Aghzal Erion Plaku G. Stein Ziyu Yao

The planning ability of Large Language Models (LLMs) has garnered increasing attention in recent years due to their remarkable capacity for multi-step reasoning and generalize across a wide range domains. While some researchers emphasize the potential LLMs perform complex tasks, others highlight significant limitations performance, particularly when these models are tasked with handling intricacies long-horizon reasoning. In this survey, we critically investigate existing research on use...

10.48550/arxiv.2502.12435 preprint EN arXiv (Cornell University) 2025-02-17

An Imitation Game for Learning Semantic Parsers from User Interaction

OPENALEX - Publications

Ziyu Yao Yiqi Tang Wen-tau Yih Huan Sun Yu Su

Despite the widely successful applications, bootstrapping and fine-tuning semantic parsers are still a tedious process with challenges such as costly data annotation privacy risks. In this paper, we suggest an alternative, human-in-the-loop methodology for learning directly from users. A parser should be introspective of its uncertainties prompt user demonstrations when uncertain. doing so it also gets to imitate behavior continue improving itself autonomously hope that eventually may become...

10.18653/v1/2020.emnlp-main.559 article EN cc-by 2020-01-01

CliniQG4QA: Generating Diverse Questions for Domain Adaptation of Clinical Question Answering

OPENALEX - Publications

Xiang Yue Xinliang Frederick Zhang Ziyu Yao Simon Lin Huan Sun

Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts. Studies show that neural QA models trained one corpus may not generalize well new texts a different institute or patient group, where largescale pairs are readily available for model retraining. To address this challenge, we propose simple yet effective framework, CliniQG4QA, which leverages generation (QG) synthesize contexts and boosts without requiring manual...

10.1109/bibm52615.2021.9669300 article EN 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2021-12-09

PoseRAC: Pose Saliency Transformer for Repetitive Action Counting

OPENALEX - Publications

Ziyu Yao Xuxin Cheng Yuexian Zou

This paper presents a significant contribution to the field of repetitive action counting through introduction new approach called Pose Saliency Representation. The proposed method efficiently represents each using only two salient poses instead redundant frames, which significantly reduces computational cost while improving performance. Moreover, we introduce pose-level method, PoseRAC, is based on this representation and achieves state-of-the-art performance version datasets by Annotation...

10.48550/arxiv.2303.08450 preprint EN other-oa arXiv (Cornell University) 2023-01-01

GhostT5: Generate More Features with Cheap Operations to Improve Textless Spoken Question Answering

OPENALEX - Publications

Xuxin Cheng Zhihong Zhu Ziyu Yao Hongxiang Li Yaowei Li and 1 more

10.21437/interspeech.2023-41 article EN Interspeech 2022 2023-08-14

FC-MTLF: A Fine- and Coarse-grained Multi-Task Learning Framework for Cross-Lingual Spoken Language Understanding

OPENALEX - Publications

Xuxin Cheng Wanshi Xu Ziyu Yao Zhihong Zhu Yaowei Li and 2 more

10.21437/interspeech.2023-46 article EN Interspeech 2022 2023-08-14

Semi-Supervised Multinomial Naive Bayes for Text Classification by Leveraging Word-Level Statistical Constraint

OPENALEX - Publications

Zhao Li Minlie Huang Ziyu Yao Rongwei Su Yingying Jiang and 1 more

Multinomial Naive Bayes with Expectation Maximization (MNB-EM) is a standard semi-supervised learning method to augment (MNB) for text classification. Despite its success, MNB-EM not stable, and may succeed or fail improve MNB. We believe that this because lacks the ability preserve class distribution on words. In paper, we propose novel by leveraging word-level statistical constraint The constraints are further converted document posteriors generated MNB-EM. Experiments demonstrate our can...

10.1609/aaai.v30i1.10345 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2016-03-05

Stable Fluorescence of Eu3+ Complex Nanostructures Beneath a Protein Skin for Potential Biometric Recognition

OPENALEX - Publications

Yue Zhao Ziyu Yao Christopher D. Snow Yanan Xu Yao Wang and 3 more

We designed and realized highly fluorescent nanostructures composed of Eu

10.3390/nano11092462 article EN cc-by Nanomaterials 2021-09-21

Improving Generalization in Language Model-based Text-to-SQL Semantic Parsing: Two Simple Semantic Boundary-based Techniques

OPENALEX - Publications

Daking Rai Bailin Wang Yilun Zhou Ziyu Yao

Compositional and domain generalization present significant challenges in semantic parsing, even for state-of-the-art parsers based on pre-trained language models (LMs). In this study, we empirically investigate improving an LM's parsing with two simple techniques: at the token level, introduce a preprocessing method to preserve boundaries of tokens produced by LM tokenizers; sequence propose use special mark components aligned between input output. Our experimental results text-to-SQL...

10.18653/v1/2023.acl-short.15 article EN cc-by 2023-01-01

Gentopia: A Collaborative Platform for Tool-Augmented LLMs

OPENALEX - Publications

Binfeng Xu Xukun Liu Hua Shen Zeyu Han Yuhan Li and 5 more

Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks ALMs, varying degrees, are deficient in following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling customization of through simple configurations, seamlessly integrating various models, task formats,...

10.48550/arxiv.2308.04030 preprint EN cc-by arXiv (Cornell University) 2023-01-01

C²A-SLU: Cross and Contrastive Attention for Improving ASR Robustness in Spoken Language Understanding

OPENALEX - Publications

Xuxin Cheng Ziyu Yao Zhihong Zhu Yaowei Li Hongxiang Li and 1 more

10.21437/interspeech.2023-93 article EN Interspeech 2022 2023-08-14

Synthetic Question Value Estimation for Domain Adaptation of Question Answering

OPENALEX - Publications

Xiang Yue Ziyu Yao Huan Sun

Synthesizing QA pairs with a question generator (QG) on the target domain has become popular approach for adaptation of answering (QA) models. Since synthetic questions are often noisy in practice, existing work adapts scores from pretrained (or QG) model as criteria to select high-quality questions. However, these do not directly serve ultimate goal improving performance domain. In this paper, we introduce novel idea training value estimator (QVE) that estimates usefulness target-domain...

10.18653/v1/2022.acl-long.95 article EN cc-by Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022-01-01

UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models

OPENALEX - Publications

Tianbao Xie Chen Wu Peng Shi Ruiqi Zhong Torsten Scholak and 18 more

Structured knowledge grounding (SKG) leverages structured to complete user requests, such as semantic parsing over databases and question answering bases. Since the inputs outputs of SKG tasks are heterogeneous, they have been studied separately by different communities, which limits systematic compatible research on SKG. In this paper, we overcome limitation proposing UnifiedSKG framework, unifies 21 into a text-to-text format, aiming promote research, instead being exclusive single task,...

10.48550/arxiv.2201.05966 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Explaining Large Language Model-Based Neural Semantic Parsers (Student Abstract)

OPENALEX - Publications

Daking Rai Yilun Zhou Bailin Wang Ziyu Yao

While large language models (LLMs) have demonstrated strong capability in structured prediction tasks such as semantic parsing, few amounts of research explored the underlying mechanisms their success. Our work studies different methods for explaining an LLM-based parser and qualitatively discusses explained model behaviors, hoping to inspire future toward better understanding them.

10.1609/aaai.v37i13.27014 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing

OPENALEX - Publications

Hao Yan Saurabh Srivastava Yintao Tai Sida I. Wang Wen-tau Yih and 1 more

Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than traditional one-shot parsing. However, prior work heavily relied human-annotated data train interactive parser, which is prohibitively expensive and not scalable. In this work, we propose new task of simulating NL for We accompany with novel evaluator. The evaluator specifically designed assess quality simulated...

10.18653/v1/2023.acl-long.177 article EN cc-by 2023-01-01

Coming Soon ...