NFDI4DS | UHH-SEMS - Publication Details

Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation

FOS: Computer and information sciences Computer Science - Computation and Language Artificial Intelligence (cs.AI) Computer Science - Artificial Intelligence 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL)

DOI: 10.18653/v1/2023.eacl-main.228 Publication Date: 2023-09-09T20:54:31Z

Abstract Supplemental Material References Cited by

AUTHORS (7)

Long Phan

Tai Dang

Hieu Tran

Trieu H. Trinh

Vy Phan

Lam D. Chau

Minh-Thang Luong

ABSTRACT

Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encoder-Decoder Transformer model trained on 20 million translated abstracts from the high-quality public PubMed corpus. ViPubMedT5 demonstrates state-of-the-art results on two different biomedical benchmarks in summarization and acronym disambiguation. Further, we release ViMedNLI - a new NLP task in Vietnamese translated from MedNLI using the recently public En-vi translation model and carefully refined by human experts, with evaluations of existing methods against ViPubmedT5.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (1)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Enriching Biomedical Knowledge for Low-resource Language Through Large-scale Translation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....