NFDI4DS | UHH-SEMS - Publication Details

Crosslingual Generalization through Multitask Finetuning

OPENALEX - Publications

Niklas Muennighoff Thomas J. Wang Lintang Sutawika Adam Roberts Stella Biderman and 14 more

Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, Colin Raffel. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.891 article EN cc-by 2023-01-01

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

OPENALEX - Publications

Stephen Bach Victor Sanh Zheng Yong Albert Webson Colin Raffel and 22 more

Stephen Bach, Victor Sanh, Zheng Xin Yong, Albert Webson, Colin Raffel, Nihal V. Nayak, Abheesht Sharma, Taewoon Kim, M Saiful Bari, Thibault Fevry, Zaid Alyafeai, Manan Dey, Andrea Santilli, Zhiqing Sun, Srulik Ben-david, Canwen Xu, Gunjan Chhablani, Han Wang, Jason Fries, Maged Al-shaibani, Shanya Urmish Thakker, Khalid Almubarak, Xiangru Tang, Dragomir Radev, Mike Tian-jian Jiang, Alexander Rush. Proceedings of the 60th Annual Meeting Association for Computational Linguistics: System...

10.18653/v1/2022.acl-demo.9 article EN cc-by 2022-01-01

ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs

OPENALEX - Publications

Yujia Qin Shihao Liang Yining Ye Kunlun Zhu Lan Yan and 13 more

Despite the advancements of open-source large language models (LLMs), e.g., LLaMA, they remain significantly limited in tool-use capabilities, i.e., using external tools (APIs) to fulfill human instructions. The reason is that current instruction tuning largely focuses on basic tasks but ignores domain. This contrast excellent capabilities state-of-the-art (SOTA) closed-source LLMs, ChatGPT. To bridge this gap, we introduce ToolLLM, a general framework encompassing data construction, model...

10.48550/arxiv.2307.16789 preprint EN other-oa arXiv (Cornell University) 2023-01-01

A survey of generative AI for de novo drug design: new frontiers in molecule and protein generation

OPENALEX - Publications

Xiangru Tang Howard Dai Elizabeth Knight Fang Wu Yunyang Li and 2 more

Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative for de novo design, particular, focus on creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development field, combined inherent complexity creates difficult landscape new researchers to enter. In this survey, we organize into two overarching themes: small...

10.1093/bib/bbae338 article EN cc-by-nc Briefings in Bioinformatics 2024-05-23

Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems

OPENALEX - Publications

Bang Liu Xinfeng Li Jiayi Zhang Jinlin Wang Tanjin He and 42 more

The advent of large language models (LLMs) has catalyzed a transformative shift in artificial intelligence, paving the way for advanced intelligent agents capable sophisticated reasoning, robust perception, and versatile action across diverse domains. As these increasingly drive AI research practical applications, their design, evaluation, continuous improvement present intricate, multifaceted challenges. This survey provides comprehensive overview, framing within modular, brain-inspired...

10.48550/arxiv.2504.01990 preprint EN arXiv (Cornell University) 2025-03-31

CONFIT: Toward Faithful Dialogue Summarization with Linguistically-Informed Contrastive Fine-tuning

OPENALEX - Publications

Xiangru Tang Arjun Nair Borui Wang Bingyao Wang Jai Desai and 5 more

Factual inconsistencies in generated summaries severely limit the practical applications of abstractive dialogue summarization. Although significant progress has been achieved by using pre-trained models, substantial amounts hallucinated content are found during human evaluation. Pre-trained models most commonly fine-tuned with cross-entropy loss for text summarization, which may not be an optimal strategy. In this work, we provide a typology factual errors annotation data to highlight types...

10.18653/v1/2022.naacl-main.415 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Fast, sensitive detection of protein homologs using deep dense retrieval

OPENALEX - Publications

Liang Hong Zhigang Hu Siqi Sun Xiangru Tang Jiuming Wang and 7 more

The identification of protein homologs in large databases using conventional methods, such as sequence comparison, often misses remote homologs. Here, we offer an ultrafast, highly sensitive method, dense homolog retriever (DHR), for detecting on the basis a language model and retrieval techniques. Its dual-encoder architecture generates different embeddings same easily locates by comparing these representations. alignment-free nature improves speed incorporates rich evolutionary structural...

10.1038/s41587-024-02353-6 article EN cc-by-nc-nd Nature Biotechnology 2024-08-09

DART: Open-Domain Structured Data Record to Text Generation

OPENALEX - Publications

Linyong Nan Dragomir Radev Rui Zhang Amrit Rau Abhinand Sivaprasad and 19 more

We present DART, an open domain structured DAta Record to Text generation dataset with over 82k instances (DARTs). Data-to-Text annotations can be a costly process, especially when dealing tables which are the major source of data and contain nontrivial structures. To this end, we propose procedure extracting semantic triples from that encodes their structures by exploiting dependencies among table headers title. Our construction framework effectively merged heterogeneous sources parsing...

10.48550/arxiv.2007.02871 preprint EN cc-by-sa arXiv (Cornell University) 2020-01-01

A Survey of Generative AI for De Novo Drug Design: New Frontiers in Molecule and Protein Generation

OPENALEX - Publications

Xiangru Tang Howard Dai Elizabeth Knight Fang Wu Yunyang Li and 2 more

Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative for de novo design, particular, focus on creation of novel biological compounds entirely from scratch, representing a promising future direction. Rapid development field, combined inherent complexity creates difficult landscape new researchers to enter. In this survey, we organize into two overarching themes: small...

10.48550/arxiv.2402.08703 preprint EN arXiv (Cornell University) 2024-02-13

BC-Design: A Biochemistry-Aware Framework for Highly Accurate Inverse Protein Folding

OPENALEX - Publications

Mark Gerstein Xiangru Tang X. Ye Fang Wu Dalei Shao and 1 more

<title>Abstract</title> Inverse protein folding, which aims to design amino acid sequences for desired structures, is fundamental engineering and therapeutic development. While recent deep-learning approaches have made remarkable progress, they typically represent biochemical properties as discrete features associated with individual residues. Here, we present BC-Design, a framework that represents continuous distributions across surfaces interiors. Through contrastive learning, our model...

10.21203/rs.3.rs-6310665/v1 preprint EN Research Square (Research Square) 2025-05-08

OctoPack: Instruction Tuning Code Large Language Models

OPENALEX - Publications

Niklas Muennighoff Qian Liu Armel Zebaze Qinkai Zheng Binyuan Hui and 5 more

Finetuning large language models (LLMs) on instructions leads to vast performance improvements natural tasks. We apply instruction tuning using code, leveraging the structure of Git commits, which pair code changes with human instructions. compile CommitPack: 4 terabytes commits across 350 programming languages. benchmark CommitPack against other and synthetic (xP3x, Self-Instruct, OASST) 16B parameter StarCoder model, achieve state-of-the-art among not trained OpenAI outputs, HumanEval...

10.48550/arxiv.2308.07124 preprint EN other-oa arXiv (Cornell University) 2023-01-01

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

OPENALEX - Publications

Yilun Zhao Long Xie Haowei Zhang Guo Gan Yi‐Tao Long and 14 more

We introduce MMVU, a comprehensive expert-level, multi-discipline benchmark for evaluating foundation models in video understanding. MMVU includes 3,000 expert-annotated questions spanning 27 subjects across four core disciplines: Science, Healthcare, Humanities & Social Sciences, and Engineering. Compared to prior benchmarks, features three key advancements. First, it challenges apply domain-specific knowledge perform expert-level reasoning analyze specialized-domain videos, moving beyond...

10.48550/arxiv.2501.12380 preprint EN arXiv (Cornell University) 2025-01-21

Investigating Crowdsourcing Protocols for Evaluating the Factual Consistency of Summaries

OPENALEX - Publications

Xiangru Tang Alexander R. Fabbri Haoran Li Ziming Mao Griffin Adams and 4 more

Xiangru Tang, Alexander Fabbri, Haoran Li, Ziming Mao, Griffin Adams, Borui Wang, Asli Celikyilmaz, Yashar Mehdad, Dragomir Radev. Proceedings of the 2022 Conference North American Chapter Association for Computational Linguistics: Human Language Technologies. 2022.

10.18653/v1/2022.naacl-main.417 article EN cc-by Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies 2022-01-01

Crosslingual Generalization through Multitask Finetuning

OPENALEX - Publications

Niklas Muennighoff Thomas J. Wang Lintang Sutawika Adam Roberts Stella Biderman and 14 more

Multitask prompted finetuning (MTF) has been shown to help large language models generalize new tasks in a zero-shot setting, but so far explorations of MTF have focused on English data and models. We apply the pretrained multilingual BLOOM mT5 model families produce finetuned variants called BLOOMZ mT0. find with prompts allows for task generalization non-English languages that appear only pretraining corpus. Finetuning further improves performance leading various state-of-the-art results....

10.48550/arxiv.2211.01786 preprint EN other-oa arXiv (Cornell University) 2022-01-01

MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations

OPENALEX - Publications

Xiangru Tang Andrew Tran Jeffrey Too Chuan Tan Mark Gerstein

Abstract Motivation The current paradigm of deep learning models for the joint representation molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits models’ versatility adaptability across a wide range modalities. Conversely, limited research focusing explicit tends to overlook textual data within biomedical domain. Results We present unified pre-trained language...

10.1093/bioinformatics/btae260 article EN cc-by Bioinformatics 2024-05-09

Struc-Bench: Are Large Language Models Good at Generating Complex Structured Tabular Data?

OPENALEX - Publications

Xiangru Tang Yiming Zong Jason Phang Yilun Zhao Wangchunshu Zhou and 2 more

10.18653/v1/2024.naacl-short.2 article EN 2024-01-01

Investigating Data Contamination in Modern Benchmarks for Large Language Models

OPENALEX - Publications

Chunyuan Deng Yilun Zhao Xiangru Tang Mark Gerstein Arman Cohan

10.18653/v1/2024.naacl-long.482 article EN 2024-01-01

Investigating Data Contamination in Modern Benchmarks for Large Language Models

OPENALEX - Publications

Chunyuan Deng Yilun Zhao Xiangru Tang Mark Gerstein Arman Cohan

Recent observations have underscored a disparity between the inflated benchmark scores and actual performance of LLMs, raising concerns about potential contamination evaluation benchmarks. This issue is especially critical for closed-source models certain open-source where training data transparency lacking. In this paper we study by proposing two methods tailored both proprietary LLMs. We first introduce retrieval-based system to explore overlaps benchmarks pretraining corpora. further...

10.48550/arxiv.2311.09783 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Investigating Table-to-Text Generation Capabilities of Large Language Models in Real-World Information Seeking Scenarios

OPENALEX - Publications

Yilun Zhao Haowei Zhang Shengyun Si Linyong Nan Xiangru Tang and 1 more

Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand manipulate their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential improve user efficiency. However, the adoption of LLMs real-world applications table information seeking remains underexplored. In this paper, we investigate table-to-text capabilities different using four datasets within two scenarios. These include...

10.18653/v1/2023.emnlp-industry.17 article EN cc-by 2023-01-01

PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts

OPENALEX - Publications

Stephen H. Bach Victor Sanh Zheng-Xin Yong Albert Webson Colin Raffel and 22 more

PromptSource is a system for creating, sharing, and using natural language prompts. Prompts are functions that map an example from dataset to input target output. Using prompts train query models emerging area in NLP requires new tools let users develop refine these collaboratively. addresses the emergent challenges this setting with (1) templating defining data-linked prompts, (2) interface lets quickly iterate on prompt development by observing outputs of their many examples, (3)...

10.48550/arxiv.2202.01279 preprint EN other-oa arXiv (Cornell University) 2022-01-01

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

OPENALEX - Publications

Xiangru Tang Bill Qian Rick Gao Jiakang Chen Xinyun Chen and 1 more

Pre-trained large language models have significantly improved code generation. As these scale up, there is an increasing need for the output to handle more intricate tasks and be appropriately specialized particular domains. Here, we target bioinformatics due amount of domain knowledge, algorithms, data operations this discipline requires. We present BioCoder, a benchmark developed evaluate (LLMs) in generating bioinformatics-specific code. BioCoder spans broad spectrum field covers...

10.48550/arxiv.2308.16458 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

OPENALEX - Publications

Xiangru Tang Yiming Zong Jason Phang Yilun Zhao Wangchunshu Zhou and 2 more

Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging. Our study assesses LLMs' proficiency in structuring tables and introduces a novel fine-tuning method, cognizant structures, to bolster their performance. We unveil Struc-Bench, comprehensive benchmark featuring prominent LLMs (GPT-NeoX-20B, GPT-3.5, Vicuna), which spans text tables, HTML, LaTeX formats. proposed FormatCoT aids crafting...

10.48550/arxiv.2309.08963 preprint EN cc-by arXiv (Cornell University) 2023-01-01

ChemSafetyBench: Benchmarking LLM Safety on Chemistry Domain

OPENALEX - Publications

Haochen Zhao Xiangru Tang Ziqing Yang Xiao Han Xunmeng Feng and 6 more

The advancement and extensive application of large language models (LLMs) have been remarkable, including their use in scientific research assistance. However, these often generate scientifically incorrect or unsafe responses, some cases, they may encourage users to engage dangerous behavior. To address this issue the field chemistry, we introduce ChemSafetyBench, a benchmark designed evaluate accuracy safety LLM responses. ChemSafetyBench encompasses three key tasks: querying chemical...

10.48550/arxiv.2411.16736 preprint EN arXiv (Cornell University) 2024-11-23

RobuT: A Systematic Study of Table QA Robustness Against Human-Annotated Adversarial Perturbations

OPENALEX - Publications

Yilun Zhao Chen Zhao Linyong Nan Zhenting Qi Wenlin Zhang and 3 more

Yilun Zhao, Chen Linyong Nan, Zhenting Qi, Wenlin Zhang, Xiangru Tang, Boyu Mi, Dragomir Radev. Proceedings of the 61st Annual Meeting Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.acl-long.334 article EN cc-by 2023-01-01