NFDI4DS | UHH-SEMS - Publication Details

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

eHealth Discriminator Sequence (biology) Code (set theory) Tree (set theory)

DOI: 10.48550/arxiv.2110.07244 Publication Date: 2021-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Quan Wang

Songtai Dai

Benfeng Xu

Yajuan Lyu

Yong Zhu

Hua Wu

Haifeng Wang

ABSTRACT

Pre-trained language models (PLMs), such as BERT and GPT, have revolutionized the field of NLP, not only in general domain but also biomedical domain. Most prior efforts building PLMs resorted simply to adaptation focused mainly on English. In this work we introduce eHealth, a Chinese PLM built from scratch with new pre-training framework. This framework pre-trains eHealth discriminator through both token- sequence-level discrimination. The former is detect input tokens corrupted by generator recover their original identities plausible candidates, while latter further distinguish corruptions same sequence those others. As such, can learn semantics at token levels. Extensive experiments 11 understanding tasks various forms verify effectiveness superiority our approach. We release pre-trained model \url{https://github.com/PaddlePaddle/Research/tree/master/KG/eHealth} will code later.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Building Chinese Biomedical Language Models via Multi-Level Text Discrimination

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....