Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain

Domain-specific language
DOI: 10.48550/arxiv.2404.07613 Publication Date: 2024-04-11
ABSTRACT
Research on language technology for the development of medical applications is currently a hot topic in Natural Language Understanding and Generation. Thus, number large models (LLMs) have recently been adapted to domain, so that they can be used as tool mediating human-AI interaction. While these LLMs display competitive performance automated texts benchmarks, pre-trained evaluated with focus single (English mostly). This particularly true text-to-text models, which typically require amounts domain-specific pre-training data, often not easily accessible many languages. In this paper, we address shortcomings by compiling, best our knowledge, largest multilingual corpus domain four languages, namely English, French, Italian Spanish. new has train Medical mT5, first open-source model domain. Additionally, present two evaluation benchmarks all languages aim facilitating research A comprehensive shows mT5 outperforms both encoders similarly sized Spanish, while being current state-of-the-art English.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....