Building Chinese Biomedical Language Models via Multi-Level Text Discrimination
eHealth
Discriminator
Sequence (biology)
Code (set theory)
Tree (set theory)
DOI:
10.48550/arxiv.2110.07244
Publication Date:
2021-01-01
AUTHORS (7)
ABSTRACT
Pre-trained language models (PLMs), such as BERT and GPT, have revolutionized the field of NLP, not only in general domain but also biomedical domain. Most prior efforts building PLMs resorted simply to adaptation focused mainly on English. In this work we introduce eHealth, a Chinese PLM built from scratch with new pre-training framework. This framework pre-trains eHealth discriminator through both token- sequence-level discrimination. The former is detect input tokens corrupted by generator recover their original identities plausible candidates, while latter further distinguish corruptions same sequence those others. As such, can learn semantics at token levels. Extensive experiments 11 understanding tasks various forms verify effectiveness superiority our approach. We release pre-trained model \url{https://github.com/PaddlePaddle/Research/tree/master/KG/eHealth} will code later.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....