NFDI4DS | UHH-SEMS - Publication Details

The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey

OPENALEX - Publications

Yi-Chong Huang Xiachong Feng Xiaocheng Feng Bing Qin

Recently, various neural encoder-decoder models pioneered by Seq2Seq framework have been proposed to achieve the goal of generating more abstractive summaries learning map input text output text. At a high level, such can freely generate without any constraint on words or phrases used. Moreover, their format is closer human-edited and readable fluent. However, model's abstraction ability double-edged sword. A commonly observed problem with generated distortion fabrication factual information...

10.48550/arxiv.2104.14839 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Libo Qin Bing Qin Ting Liu

Xiachong Feng, Xiaocheng Libo Qin, Bing Ting Liu. Proceedings of the 59th Annual Meeting Association for Computational Linguistics and 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 2021.

10.18653/v1/2021.acl-long.117 article EN cc-by 2021-01-01

Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Bing Qin Xinwei Geng

Meeting summarization is a challenging task due to its dynamic interaction nature among multiple speakers and lack of sufficient training data. Existing methods view the meeting as linear sequence utterances while ignoring diverse relations between each utterance. Besides, limited labeled data further hinders ability data-hungry neural models. In this paper, we try mitigate above challenges by introducing dialogue-discourse relations. First, present Dialogue Discourse-Dware Summarizer...

10.24963/ijcai.2021/524 article EN 2021-08-01

A Survey on Dialogue Summarization: Recent Advances and New Frontiers

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Bing Qin

Dialogue summarization aims to condense the original dialogue into a shorter version covering salient information, which is crucial way reduce data overload. Recently, promising achievements in both systems and natural language generation techniques drastically lead this task new landscape, results significant research attentions. However, there still remains lack of comprehensive survey for task. To end, we take first step present thorough review field carefully widely. In detail,...

10.24963/ijcai.2022/764 article EN Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence 2022-07-01

Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer

OPENALEX - Publications

Xiaocheng Feng Xiachong Feng Bing Qin Zhangyin Feng Ting Liu

Neural networks have been widely used for high resource language (e.g. English) named entity recognition (NER) and shown state-of-the-art results.However, low languages, such as Dutch, Spanish, due to the limitation of resources lack annotated data, taggers tend lower performances.To narrow this gap, we propose three novel strategies enrich semantic representations languages: first develop neural improve word by knowledge transfer from using bilingual lexicons. Further, a lexicon extension...

10.24963/ijcai.2018/566 article EN 2018-07-01

Large Language Models Meet NLP: A Survey

OPENALEX - Publications

Libo Qin Qiguang Chen Xiachong Feng Yang Wu Yongheng Zhang and 4 more

While large language models (LLMs) like ChatGPT have shown impressive capabilities in Natural Language Processing (NLP) tasks, a systematic investigation of their potential this field remains largely unexplored. This study aims to address gap by exploring the following questions: (1) How are LLMs currently applied NLP tasks literature? (2) Have traditional already been solved with LLMs? (3) What is future for NLP? To answer these questions, we take first step provide comprehensive overview...

10.48550/arxiv.2405.12819 preprint EN arXiv (Cornell University) 2024-05-21

Improving Contextual Faithfulness of Large Language Models via Retrieval Heads-Induced Optimization

OPENALEX - Publications

Lei Huang Xiaocheng Feng Weitao Ma Yong Fan Xiachong Feng and 7 more

Ensuring contextual faithfulness in retrieval-augmented large language models (LLMs) is crucial for building trustworthy information-seeking systems, particularly long-form question-answering (LFQA) scenarios. In this work, we identify a salient correlation between LFQA and retrieval heads, set of attention heads responsible retrieving information. Leveraging insight, propose RHIO, framework designed to teach LLMs explicitly discriminate faithful unfaithful generations. RHIO first augments...

10.48550/arxiv.2501.13573 preprint EN arXiv (Cornell University) 2025-01-23

Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective

OPENALEX - Publications

Xinmiao Yu Xiaocheng Feng Yun Li Minghui Liao Yaqi Yu and 8 more

Recent Large Vision-Language Models (LVLMs) have shown promising reasoning capabilities on text-rich images from charts, tables, and documents. However, the abundant text within such may increase model's sensitivity to language. This raises need evaluate LVLM performance cross-lingual visual inputs, where language in image differs of instructions. To address this, we introduce XT-VQA (Cross-Lingual Text-Rich Visual Question Answering), a benchmark designed assess how LVLMs handle...

10.1609/aaai.v39i9.33049 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2025-04-11

A Survey on Dialogue Summarization: Recent Advances and New Frontiers

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Bing Qin

Dialogue summarization aims to condense the original dialogue into a shorter version covering salient information, which is crucial way reduce data overload. Recently, promising achievements in both systems and natural language generation techniques drastically lead this task new landscape, results significant research attentions. However, there still remains lack of comprehensive survey for task. To end, we take first step present thorough review field carefully widely. In detail,...

10.48550/arxiv.2107.03175 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Incorporating Commonsense Knowledge into Abstractive Dialogue Summarization via Heterogeneous Graph Networks

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Bing Qin Ting Liu

Abstractive dialogue summarization is the task of capturing highlights a and rewriting them into concise version. In this paper, we present novel multi-speaker summarizer to demonstrate how large-scale commonsense knowledge can facilitate understanding summary generation. detail, consider utterance as two different types data design Dialogue Heterogeneous Graph Network (D-HGN) for modeling both information. Meanwhile, also add speakers heterogeneous nodes information flow. Experimental...

10.48550/arxiv.2010.10044 preprint EN other-oa arXiv (Cornell University) 2020-01-01

MSAMSum: Towards Benchmarking Multi-lingual Dialogue Summarization

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Bing Qin

Dialogue summarization helps users capture salient information from various types of dialogues has received much attention recently. However, current works mainly focus on English dialogue summarization, leaving other languages less well explored. Therefore, we present a multi-lingual dataset, namely MSAMSum, which covers dialogue-summary pairs in six languages. Specifically, derive MSAMSum the standard SAMSum using sophisticated translation techniques and further employ two methods to...

10.18653/v1/2022.dialdoc-1.1 article EN cc-by 2022-01-01

Aligning Semantic in Brain and Language: A Curriculum Contrastive Method for Electroencephalography-to-Text Generation

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Bing Qin Ting Liu

Electroencephalography-to-Text generation (EEG-to-Text), which aims to directly generate natural text from EEG signals has drawn increasing attention in recent years due the enormous potential for Brain-computer interfaces. However, remarkable discrepancy between subject-dependent representation and semantic-dependent poses a great challenge this task. To mitigate this, we devise Curriculum Semantic-aware Contrastive Learning strategy (C- SCL), effectively recalibrates representation,...

10.1109/tnsre.2023.3314642 article EN cc-by IEEE Transactions on Neural Systems and Rehabilitation Engineering 2023-01-01

Hierarchical Catalogue Generation for Literature Review: A Benchmark

OPENALEX - Publications

Kun Zhu Xiaocheng Feng Xiachong Feng Yingsheng Wu Bing Qin

Scientific literature review generation aims to extract and organize important information from an abundant collection of reference papers produces corresponding reviews while lacking a clear logical hierarchy. We observe that high-quality catalogue-guided process can effectively alleviate this problem. Therefore, we present atomic challenging task named Hierarchical Catalogue Generation for Literature Review as the first step generation, which produce hierarchical catalogue paper given...

10.18653/v1/2023.findings-emnlp.453 article EN cc-by 2023-01-01

Language Model as an Annotator: Exploring DialoGPT for Dialogue Summarization

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Libo Qin Bing Qin Ting Liu

Current dialogue summarization systems usually encode the text with a number of general semantic features (e.g., keywords and topics) to gain more powerful modeling capabilities. However, these are obtained via open-domain toolkits that dialog-agnostic or heavily relied on human annotations. In this paper, we show how DialoGPT, pre-trained model for conversational response generation, can be developed as an unsupervised annotator, which takes advantage background knowledge encoded in...

10.48550/arxiv.2105.12544 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

OPENALEX - Publications

Lei Li Yuqi Wang Runxin Xu Peiyi Wang Xiachong Feng and 2 more

Large vision-language models (LVLMs), exemplified by GPT-4V, excel across diverse tasks involving concrete images from natural scenes. However, their ability to interpret abstract figures, such as geometry shapes and scientific plots, remains limited due a scarcity of training datasets in domains. To fill this gap, we introduce Multimodal ArXiv, consisting ArXivCap ArXivQA, for enhancing LVLMs comprehension. is figure-caption dataset comprising 6.4M 3.9M captions sourced 572K ArXiv papers...

10.48550/arxiv.2403.00231 preprint EN arXiv (Cornell University) 2024-02-29

Adapter-Based Selective Knowledge Distillation for Federated Multi-Domain Meeting Summarization

OPENALEX - Publications

Xiachong Feng Xiaocheng Feng Xiyuan Du Min‐Yen Kan Bing Qin

10.1109/taslp.2024.3414313 article EN IEEE/ACM Transactions on Audio Speech and Language Processing 2024-01-01

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

OPENALEX - Publications

Lei Huang Xiaocheng Feng Weitao Ma Yuxuan Gu Wei‐Hong Zhong and 6 more

10.18653/v1/2024.findings-acl.838 article EN Findings of the Association for Computational Linguistics: ACL 2022 2024-01-01

Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models

OPENALEX - Publications

Lei Li Yuqi Wang Runxin Xu Peiyi Wang Xiachong Feng and 2 more

10.18653/v1/2024.acl-long.775 article EN 2024-01-01

Learning Fine-Grained Grounded Citations for Attributed Large Language Models

OPENALEX - Publications

Lei Huang Xiaocheng Feng Weitao Ma Yuxuan Gu Wei‐Hong Zhong and 6 more

Despite the impressive performance on information-seeking tasks, large language models (LLMs) still struggle with hallucinations. Attributed LLMs, which augment generated text in-line citations, have shown potential in mitigating hallucinations and improving verifiability. However, current approaches suffer from suboptimal citation quality due to their reliance in-context learning. Furthermore, practice of citing only coarse document identifiers makes it challenging for users perform...

10.48550/arxiv.2408.04568 preprint EN arXiv (Cornell University) 2024-08-08

TMGBench: A Systematic Game Benchmark for Evaluating Strategic Reasoning Abilities of LLMs

OPENALEX - Publications

Haochuan Wang Xiachong Feng Lei Li Zhen Qin Dianbo Sui and 1 more

The rapid advancement of large language models (LLMs) has accelerated their application in reasoning, with strategic reasoning drawing increasing attention. To evaluate LLMs' capabilities, game theory, its concise structure, become a preferred approach. However, current research focuses on limited selection games, resulting low coverage. Classic scenarios risk data leakage, and existing benchmarks often lack extensibility, making them inadequate for evaluating state-of-the-art models....

10.48550/arxiv.2410.10479 preprint EN arXiv (Cornell University) 2024-10-14

GlobeSumm: A Challenging Benchmark Towards Unifying Multi-lingual, Cross-lingual and Multi-document News Summarization

OPENALEX - Publications

Yangfan Ye Xiachong Feng Xiaocheng Feng Weitao Ma Libo Qin and 4 more

News summarization in today's global scene can be daunting with its flood of multilingual content and varied viewpoints from different sources. However, current studies often neglect such real-world scenarios as they tend to focus solely on either single-language or single-document tasks. To bridge this gap, we aim unify Multi-lingual, Cross-lingual Multi-document Summarization into a novel task, i.e., MCMS, which encapsulates the requirements all-in-one. Nevertheless, lack benchmark...

10.48550/arxiv.2410.04087 preprint EN arXiv (Cornell University) 2024-10-05

Length Extrapolation of Transformers: A Survey from the Perspective of Positional Encoding

OPENALEX - Publications

Liang Zhao Xiachong Feng Xiaocheng Feng Weihong Zhong Dongliang Xu and 4 more

10.18653/v1/2024.findings-emnlp.582 article EN 2024-01-01