NFDI4DS | UHH-SEMS - Publication Details

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

OPENALEX - Publications

Yejin Bang Samuel Cahyawijaya Nayeon Lee Wenliang Dai Dan Su and 8 more

Yejin Bang, Samuel Cahyawijaya, Nayeon Lee, Wenliang Dai, Dan Su, Bryan Wilie, Holy Lovenia, Ziwei Ji, Tiezheng Yu, Willy Chung, Quyet V. Do, Yan Xu, Pascale Fung. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.ijcnlp-main.45 article EN cc-by 2023-01-01

A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity

OPENALEX - Publications

Yejin Bang Samuel Cahyawijaya Nayeon Lee Wenliang Dai Dan Su and 8 more

This paper proposes a framework for quantitatively evaluating interactive LLMs such as ChatGPT using publicly available data sets. We carry out an extensive technical evaluation of 23 sets covering 8 different common NLP application tasks. evaluate the multitask, multilingual and multi-modal aspects based on these newly designed multimodal dataset. find that outperforms with zero-shot learning most tasks even fine-tuned models some it is better at understanding non-Latin script languages...

10.48550/arxiv.2302.04023 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

OPENALEX - Publications

Bryan Wilie Karissa Vincentio Genta Indra Winata Samuel Cahyawijaya Xiaohong Li and 6 more

Although Indonesian is known to be the fourth most frequently used language over internet, research progress on this in natural processing (NLP) slow-moving due a lack of available resources. In response, we introduce first-ever vast resource for training, evaluating, and benchmarking understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification pair-sentences sequence labeling with different levels complexity. The datasets tasks lie domains styles...

10.48550/arxiv.2009.05387 preprint EN other-oa arXiv (Cornell University) 2020-01-01

IndoNLG: Benchmark and Resources for Evaluating Indonesian Natural Language Generation

OPENALEX - Publications

Samuel Cahyawijaya Genta Indra Winata Bryan Wilie Karissa Vincentio Xiaohong Li and 7 more

Samuel Cahyawijaya, Genta Indra Winata, Bryan Wilie, Karissa Vincentio, Xiaohong Li, Adhiguna Kuncoro, Sebastian Ruder, Zhi Yuan Lim, Syafri Bahar, Masayu Khodra, Ayu Purwarianti, Pascale Fung. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021.

10.18653/v1/2021.emnlp-main.699 article EN cc-by Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2021-01-01

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

OPENALEX - Publications

Samuel Cahyawijaya Holy Lovenia Alham Fikri Aji Genta Indra Winata Bryan Wilie and 43 more

Samuel Cahyawijaya, Holy Lovenia, Alham Fikri Aji, Genta Winata, Bryan Wilie, Fajri Koto, Rahmad Mahendra, Christian Wibisono, Ade Romadhony, Karissa Vincentio, Jennifer Santoso, David Moeljadi, Cahya Wirawan, Frederikus Hudi, Muhammad Satrio Wicaksono, Ivan Parmonangan, Ika Alfina, Ilham Firdausi Putra, Samsul Rahmadani, Yulianti Oenang, Ali Septiandri, James Jaya, Kaustubh Dhole, Arie Suryani, Rifki Afina Putri, Dan Su, Keith Stevens, Made Nindyatama Nityasya, Adilazuarda, Ryan Hadiwijaya,...

10.18653/v1/2023.findings-acl.868 article ID cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

RHO: Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

OPENALEX - Publications

Ziwei Ji Zihan Liu Nayeon Lee Tiezheng Yu Bryan Wilie and 2 more

Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent informative responses. However, these are still prone produce hallucinated responses not supported by the input source, which greatly hinders their application. The heterogeneity between external dialogue context challenges representation learning source integration, further contributes unfaithfulness. To handle this challenge more faithful responses, paper presents RHO (ρ) utilizing...

10.18653/v1/2023.findings-acl.275 article EN cc-by Findings of the Association for Computational Linguistics: ACL 2022 2023-01-01

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

OPENALEX - Publications

Sebastian Gehrmann Abhik Bhattacharjee Abinaya Mahendiran Alex Wang Alexandros Papangelis and 72 more

Sebastian Gehrmann, Abhik Bhattacharjee, Abinaya Mahendiran, Alex Wang, Alexandros Papangelis, Aman Madaan, Angelina Mcmillan-major, Anna Shvets, Ashish Upadhyay, Bernd Bohnet, Bingsheng Yao, Bryan Wilie, Chandra Bhagavatula, Chaobin You, Craig Thomson, Cristina Garbacea, Dakuo Daniel Deutsch, Deyi Xiong, Di Jin, Dimitra Gkatzia, Dragomir Radev, Elizabeth Clark, Esin Durmus, Faisal Ladhak, Filip Ginter, Genta Indra Winata, Hendrik Strobelt, Hiroaki Hayashi, Jekaterina Novikova, Jenna...

10.18653/v1/2022.emnlp-demos.27 article EN cc-by 2022-01-01

LLM Internal States Reveal Hallucination Risk Faced With a Query

OPENALEX - Publications

Ziwei Ji Delong Chen Etsuko Ishii Samuel Cahyawijaya Yejin Bang and 2 more

10.18653/v1/2024.blackboxnlp-1.6 article EN 2024-01-01

IndoNLU: Benchmark and Resources for Evaluating Indonesian Natural Language Understanding

OPENALEX - Publications

Bryan Wilie Karissa Vincentio Genta Indra Winata Samuel Cahyawijaya Xiaohong Li and 6 more

Although Indonesian is known to be the fourth most frequently used language over internet, research progress on this in natural processing (NLP) slow-moving due a lack of available resources. In response, we introduce first-ever vast resource for training, evaluation, and benchmarking understanding (IndoNLU) tasks. IndoNLU includes twelve tasks, ranging from single sentence classification pair-sentences sequence labeling with different levels complexity. The datasets tasks lie domains styles...

10.18653/v1/2020.aacl-main.85 article EN 2020-01-01

NusaWrites: Constructing High-Quality Corpora for Underrepresented and Extremely Low-Resource Languages

OPENALEX - Publications

Samuel Cahyawijaya Holy Lovenia Fajri Koto Dea Adhista Emmanuel Dave and 13 more

Samuel Cahyawijaya, Holy Lovenia, Fajri Koto, Dea Adhista, Emmanuel Dave, Sarah Oktavianti, Salsabil Akbar, Jhonson Lee, Nuur Shadieq, Tjeng Wawan Cenggoro, Hanung Linuwih, Bryan Wilie, Galih Muridan, Genta Winata, David Moeljadi, Alham Fikri Aji, Ayu Purwarianti, Pascale Fung. Proceedings of the 13th International Joint Conference on Natural Language Processing and 3rd Asia-Pacific Chapter Association for Computational Linguistics (Volume 1: Long Papers). 2023.

10.18653/v1/2023.ijcnlp-main.60 article EN cc-by 2023-01-01

Can Question Rewriting Help Conversational Question Answering?

OPENALEX - Publications

Etsuko Ishii Yan Xu Samuel Cahyawijaya Bryan Wilie

Question rewriting (QR) is a subtask of conversational question answering (CQA) aiming to ease the challenges understanding dependencies among dialogue history by reformulating questions in self-contained form. Despite seeming plausible, little evidence available justify QR as mitigation method for CQA. To verify effectiveness CQA, we investigate reinforcement learning approach that integrates and CQA tasks does not require corresponding datasets targeted CQA.We find, however, RL on par with...

10.18653/v1/2022.insights-1.13 article EN cc-by 2022-01-01

InstructTODS: Large Language Models for End-to-End Task-Oriented Dialogue Systems

OPENALEX - Publications

Willy Chung Samuel Cahyawijaya Bryan Wilie Holy Lovenia Pascale Fung

Large language models (LLMs) have been used for diverse tasks in natural processing (NLP), yet remain under-explored task-oriented dialogue systems (TODS), especially end-to-end TODS.We present In-structTODS, a novel off-the-shelf framework zero-shot that can adapt to domains without fine-tuning.By leveraging LLMs, Instruct-TODS generates proxy belief state seamlessly translates user intentions into dynamic queries efficient interaction with any KB.Our extensive experiments demonstrate...

10.18653/v1/2023.nlint-1.1 article EN cc-by 2023-01-01

RHO ($ρ$): Reducing Hallucination in Open-domain Dialogues with Knowledge Grounding

OPENALEX - Publications

Ziwei Ji Zihan Liu Nayeon Lee Tiezheng Yu Bryan Wilie and 2 more

Dialogue systems can leverage large pre-trained language models and knowledge to generate fluent informative responses. However, these are still prone produce hallucinated responses not supported by the input source, which greatly hinders their application. The heterogeneity between external dialogue context challenges representation learning source integration, further contributes unfaithfulness. To handle this challenge more faithful responses, paper presents RHO ($\rho$) utilizing...

10.48550/arxiv.2212.01588 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Integrating Question Rewrites in Conversational Question Answering: A Reinforcement Learning Approach

OPENALEX - Publications

Etsuko Ishii Bryan Wilie Yan Xu Samuel Cahyawijaya Pascale Fung

Resolving dependencies among dialogue history is one of the main obstacles in research on conversational question answering (QA). The rewrites (QR) task has been shown to be effective solve this problem by reformulating questions a self-contained form. However, QR datasets are limited and existing methods tend depend assumption existence corresponding for every CQA dataset.This paper proposes reinforcement learning approach that integrates tasks without labeled datasets. We train model based...

10.18653/v1/2022.acl-srw.6 article EN cc-by 2022-01-01

How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling

OPENALEX - Publications

Samuel Cahyawijaya Bryan Wilie Holy Lovenia Zhong Huan MingQian Zhong and 2 more

Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Yuk-Yu Nancy Ip, Pascale Fung. Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI). 2022.

10.18653/v1/2022.louhi-1.19 article EN cc-by 2022-01-01

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

OPENALEX - Publications

Samuel Cahyawijaya Holy Lovenia Fajri Koto Rifki Afina Putri Emmanuel Dave and 11 more

Large language models (LLMs) show remarkable human-like capability in various domains and languages. However, a notable quality gap arises low-resource languages, e.g., Indonesian indigenous rendering them ineffective inefficient such linguistic contexts. To bridge this gap, we introduce Cendol, collection of LLMs encompassing both decoder-only encoder-decoder architectures across range model sizes. We highlight Cendol's effectiveness diverse array tasks, attaining 20% improvement,...

10.48550/arxiv.2404.06138 preprint EN arXiv (Cornell University) 2024-04-09

High-Dimension Human Value Representation in Large Language Models

OPENALEX - Publications

Samuel Cahyawijaya Delong Chen Yejin Bang Leila Khalatbari Bryan Wilie and 3 more

The widespread application of Large Language Models (LLMs) across various tasks and fields has necessitated the alignment these models with human values preferences. Given approaches value alignment, ranging from Reinforcement Learning Human Feedback (RLHF), to constitutional learning, etc. there is an urgent need understand scope nature injected into before their release. There also a for model without costly large scale annotation effort. We propose UniVaR, high-dimensional representation...

10.48550/arxiv.2404.07900 preprint EN arXiv (Cornell University) 2024-04-11

Belief Revision: The Adaptability of Large Language Models Reasoning

OPENALEX - Publications

Bryan Wilie Samuel Cahyawijaya Etsuko Ishii Junxian He Pascale Fung

The capability to reason from text is crucial for real-world NLP applications. Real-world scenarios often involve incomplete or evolving data. In response, individuals update their beliefs and understandings accordingly. However, most existing evaluations assume that language models (LMs) operate with consistent information. We introduce Belief-R, a new dataset designed test LMs' belief revision ability when presented evidence. Inspired by how humans suppress prior inferences, this task...

10.48550/arxiv.2406.19764 preprint EN arXiv (Cornell University) 2024-06-28

LLM Internal States Reveal Hallucination Risk Faced With a Query

OPENALEX - Publications

Ziwei Ji Delong Chen Etsuko Ishii Samuel Cahyawijaya Yejin Bang and 2 more

The hallucination problem of Large Language Models (LLMs) significantly limits their reliability and trustworthiness. Humans have a self-awareness process that allows us to recognize what we don't know when faced with queries. Inspired by this, our paper investigates whether LLMs can estimate own risk before response generation. We analyze the internal mechanisms broadly both in terms training data sources across 15 diverse Natural Generation (NLG) tasks, spanning over 700 datasets. Our...

10.48550/arxiv.2407.03282 preprint EN arXiv (Cornell University) 2024-07-03

Cendol: Open Instruction-tuned Generative Large Language Models for Indonesian Languages

OPENALEX - Publications

Samuel Cahyawijaya Holy Lovenia Fajri Koto Rifki Afina Putri Wawan Cenggoro and 11 more

10.18653/v1/2024.acl-long.796 article EN 2024-01-01

WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines

OPENALEX - Publications

Genta Indra Winata Frederikus Hudi Patrick Amadeus Irawan David Anugraha Rifki Afina Putri and 46 more

Vision Language Models (VLMs) often struggle with culture-specific knowledge, particularly in languages other than English and underrepresented cultural contexts. To evaluate their understanding of such we introduce WorldCuisines, a massive-scale benchmark for multilingual multicultural, visually grounded language understanding. This includes visual question answering (VQA) dataset text-image pairs across 30 dialects, spanning 9 families featuring over 1 million data points, making it the...

10.48550/arxiv.2410.12705 preprint EN arXiv (Cornell University) 2024-10-16

Belief Revision: The Adaptability of Large Language Models Reasoning

OPENALEX - Publications

Bryan Wilie Samuel Cahyawijaya Etsuko Ishii Junxian He Pascale Fung

10.18653/v1/2024.emnlp-main.586 article EN Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing 2024-01-01

GEMv2: Multilingual NLG Benchmarking in a Single Line of Code

OPENALEX - Publications

Sebastian Gehrmann Abhik Bhattacharjee Abinaya Mahendiran Alex Wang Alexandros Papangelis and 72 more

Evaluation in machine learning is usually informed by past choices, for example which datasets or metrics to use. This standardization enables the comparison on equal footing using leaderboards, but evaluation choices become sub-optimal as better alternatives arise. problem especially pertinent natural language generation requires ever-improving suites of datasets, metrics, and human make definitive claims. To following best model practices easier, we introduce GEMv2. The new version...

10.48550/arxiv.2206.11249 preprint EN cc-by arXiv (Cornell University) 2022-01-01

NusaCrowd: A Call for Open and Reproducible NLP Research in Indonesian Languages

OPENALEX - Publications

Samuel Cahyawijaya Alham Fikri Aji Holy Lovenia Genta Indra Winata Bryan Wilie and 6 more

At the center of underlying issues that halt Indonesian natural language processing (NLP) research advancement, we find data scarcity. Resources in languages, especially local ones, are extremely scarce and underrepresented. Many researchers do not publish their dataset. Furthermore, few public datasets have scattered across different platforms, thus makes performing reproducible data-centric NLP even more arduous. Rising to this challenge, initiate first crowdsourcing effort, NusaCrowd....

10.48550/arxiv.2207.10524 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01