- Topic Modeling
- Natural Language Processing Techniques
- Multimodal Machine Learning Applications
- Software Engineering Research
- Speech and dialogue systems
- Big Data and Business Intelligence
- Explainable Artificial Intelligence (XAI)
- Semantic Web and Ontologies
- Machine Learning and Algorithms
- Advanced Text Analysis Techniques
- Text and Document Classification Technologies
- Biomedical Text Mining and Ontologies
- Sleep and Work-Related Fatigue
- Artificial Intelligence in Law
- Text Readability and Simplification
- Artificial Intelligence in Games
- Domain Adaptation and Few-Shot Learning
- Robotics and Automated Systems
- Ferroelectric and Negative Capacitance Devices
- Computability, Logic, AI Algorithms
- Web Data Mining and Analysis
- Ergonomics and Musculoskeletal Disorders
- Cognitive Science and Education Research
- Personal Information Management and User Behavior
- AI-based Problem Solving and Planning
George Mason University
2021-2024
Bridge University
2023
The Ohio State University
2018-2021
Moscow Institute of Physics and Technology
2021
Rensselaer Polytechnic Institute
2021
RheinMain University of Applied Sciences
2021
Moscow State University
2021
Lomonosov Moscow State University
2021
ITMO University
2021
Qingdao University
2021
Tianbao Xie, Chen Henry Wu, Peng Shi, Ruiqi Zhong, Torsten Scholak, Michihiro Yasunaga, Chien-Sheng Ming Pengcheng Yin, Sida I. Wang, Victor Bailin Chengzu Li, Connor Boyle, Ansong Ni, Ziyu Yao, Dragomir Radev, Caiming Xiong, Lingpeng Kong, Rui Zhang, Noah A. Smith, Luke Zettlemoyer, Tao Yu. Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. 2022.
Ziyu Yao, Yu Su, Huan Sun, Wen-tau Yih. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint (EMNLP-IJCNLP). 2019.
Large language models (LLMs) have revolutionized zero-shot task performance, mitigating the need for task-specific annotations while enhancing generalizability. Despite its advancements, current methods using trigger phrases such as ``Let's think step by step'' remain limited. This study introduces PRomPTed, an approach that optimizes prompts individual instances following innovative manner of ``LLMs in loop''. Our comprehensive evaluation across 13 datasets and 10 types based on GPT-4...
To accelerate software development, much research has been performed to help people understand and reuse the huge amount of available code resources. Two important tasks have widely studied: retrieval, which aims retrieve snippets relevant a given natural language query from base, annotation, where goal is annotate snippet with description. Despite their advancement in recent years, two are mostly explored separately. In this work, we investigate novel perspective Code annotation for...
Stack Overflow (SO) has been a great source of natural language questions and their code solutions (i.e., question-code pairs), which are critical for many tasks including retrieval annotation. In most existing research, pairs were collected heuristically tend to have low quality. this paper, we investigate new problem systematically mining from (in contrast collecting them). It is formulated as predicting whether or not snippet standalone solution question. We propose novel Bi-View...
This paper investigates a new task named Conversational Question Generation (CQG) which is to generate question based on passage and conversation history (i.e., previous turns of question-answer pairs). CQG crucial for developing intelligent agents that can drive question-answering style conversations or test user understanding given passage. Towards end, we propose approach Reinforced Dynamic Reasoning network, the general encoder-decoder framework but incorporates reasoning procedure in...
Given a text description, most existing semantic parsers synthesize program in one shot. However, it is quite challenging to produce correct solely based on the which reality often ambiguous or incomplete. In this paper, we investigate interactive parsing, where agent can ask user clarification questions resolve ambiguities via multi-turn dialogue, an important type of programs called “If-Then recipes.” We develop hierarchical reinforcement learning (HRL) that significantly improves parsing...
The planning ability of Large Language Models (LLMs) has garnered increasing attention in recent years due to their remarkable capacity for multi-step reasoning and generalize across a wide range domains. While some researchers emphasize the potential LLMs perform complex tasks, others highlight significant limitations performance, particularly when these models are tasked with handling intricacies long-horizon reasoning. In this survey, we critically investigate existing research on use...
Despite the widely successful applications, bootstrapping and fine-tuning semantic parsers are still a tedious process with challenges such as costly data annotation privacy risks. In this paper, we suggest an alternative, human-in-the-loop methodology for learning directly from users. A parser should be introspective of its uncertainties prompt user demonstrations when uncertain. doing so it also gets to imitate behavior continue improving itself autonomously hope that eventually may become...
Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts. Studies show that neural QA models trained one corpus may not generalize well new texts a different institute or patient group, where largescale pairs are readily available for model retraining. To address this challenge, we propose simple yet effective framework, CliniQG4QA, which leverages generation (QG) synthesize contexts and boosts without requiring manual...
This paper presents a significant contribution to the field of repetitive action counting through introduction new approach called Pose Saliency Representation. The proposed method efficiently represents each using only two salient poses instead redundant frames, which significantly reduces computational cost while improving performance. Moreover, we introduce pose-level method, PoseRAC, is based on this representation and achieves state-of-the-art performance version datasets by Annotation...
Multinomial Naive Bayes with Expectation Maximization (MNB-EM) is a standard semi-supervised learning method to augment (MNB) for text classification. Despite its success, MNB-EM not stable, and may succeed or fail improve MNB. We believe that this because lacks the ability preserve class distribution on words. In paper, we propose novel by leveraging word-level statistical constraint The constraints are further converted document posteriors generated MNB-EM. Experiments demonstrate our can...
We designed and realized highly fluorescent nanostructures composed of Eu
Compositional and domain generalization present significant challenges in semantic parsing, even for state-of-the-art parsers based on pre-trained language models (LMs). In this study, we empirically investigate improving an LM's parsing with two simple techniques: at the token level, introduce a preprocessing method to preserve boundaries of tokens produced by LM tokenizers; sequence propose use special mark components aligned between input output. Our experimental results text-to-SQL...
Augmented Language Models (ALMs) empower large language models with the ability to use tools, transforming them into intelligent agents for real-world interactions. However, most existing frameworks ALMs, varying degrees, are deficient in following critical features: flexible customization, collaborative democratization, and holistic evaluation. We present gentopia, an ALM framework enabling customization of through simple configurations, seamlessly integrating various models, task formats,...
Synthesizing QA pairs with a question generator (QG) on the target domain has become popular approach for adaptation of answering (QA) models. Since synthetic questions are often noisy in practice, existing work adapts scores from pretrained (or QG) model as criteria to select high-quality questions. However, these do not directly serve ultimate goal improving performance domain. In this paper, we introduce novel idea training value estimator (QVE) that estimates usefulness target-domain...
Structured knowledge grounding (SKG) leverages structured to complete user requests, such as semantic parsing over databases and question answering bases. Since the inputs outputs of SKG tasks are heterogeneous, they have been studied separately by different communities, which limits systematic compatible research on SKG. In this paper, we overcome limitation proposing UnifiedSKG framework, unifies 21 into a text-to-text format, aiming promote research, instead being exclusive single task,...
While large language models (LLMs) have demonstrated strong capability in structured prediction tasks such as semantic parsing, few amounts of research explored the underlying mechanisms their success. Our work studies different methods for explaining an LLM-based parser and qualitatively discusses explained model behaviors, hoping to inspire future toward better understanding them.
Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than traditional one-shot parsing. However, prior work heavily relied human-annotated data train interactive parser, which is prohibitively expensive and not scalable. In this work, we propose new task of simulating NL for We accompany with novel evaluator. The evaluator specifically designed assess quality simulated...