BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Biomedical text mining
Named Entity Recognition
Relationship extraction
Text corpus
Representation
F1 score
DOI:
10.1093/bioinformatics/btz682
Publication Date:
2019-09-05T19:27:43Z
AUTHORS (7)
ABSTRACT
Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With progress in natural language processing (NLP), extracting valuable information from literature has gained popularity among researchers, and deep learning boosted development effective models. However, directly applying advancements NLP to often yields unsatisfactory results due a word distribution shift general domain corpora corpora. In this article, we investigate how recently introduced pre-trained model BERT can be adapted for We introduce BioBERT (Bidirectional Encoder Representations Transformers Text Mining), which domain-specific representation on large-scale almost same architecture across tasks, largely outperforms previous state-of-the-art models variety tasks when While obtains performance comparable that models, significantly them following three representative tasks: named entity recognition (0.62% F1 score improvement), relation extraction (2.80% improvement) question answering (12.24% MRR improvement). Our analysis show pre-training helps it understand complex texts. make weights freely available at https://github.com/naver/biobert-pretrained, source code fine-tuning https://github.com/dmis-lab/biobert.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (38)
CITATIONS (3624)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....