- Biomedical Text Mining and Ontologies
- Topic Modeling
- Natural Language Processing Techniques
- Bioinformatics and Genomic Networks
- Machine Learning in Bioinformatics
- Advanced Text Analysis Techniques
- Text and Document Classification Technologies
- Semantic Web and Ontologies
- Cancer, Hypoxia, and Metabolism
- Breast Cancer Treatment Studies
- HER2/EGFR in Cancer Research
- Cancer Cells and Metastasis
- ATP Synthase and ATPases Research
- S100 Proteins and Annexins
- Web Data Mining and Analysis
Peking University
2024
Dalian University of Technology
2017-2022
Liaoning Cancer Hospital & Institute
2022
China Medical University
2022
The Precision Medicine Initiative is a multicenter effort aiming at formulating personalized treatments leveraging on individual patient data (clinical, genome sequence and functional genomic data) together with the information in large knowledge bases (KBs) that integrate annotation, disease association studies, electronic health records other types. biomedical literature provides rich foundation for populating these KBs, reporting genetic molecular interactions provide scaffold cellular...
In this work, we unveil and study idiosyncrasies in Large Language Models (LLMs) -- unique patterns their outputs that can be used to distinguish the models. To do so, consider a simple classification task: given particular text output, objective is predict source LLM generates text. We evaluate synthetic task across various groups of LLMs find simply fine-tuning existing embedding models on LLM-generated texts yields excellent accuracy. Notably, achieve 97.1% accuracy held-out validation...
Pancreatic ductal adenocarcinoma (PDAC) is one of the most refractory malignancies and has a poor prognosis. In recent years, increasing evidence shown that an imbalance metabolism may contribute to unrestricted pancreatic tumour progression pentose phosphate pathway (PPP) plays pivotal role in cellular metabolism. S100A11 been regulate multiple biological functions related metastasis various cancer types. However, exact mechanisms prognostic value PDAC remain unclear. Here, we found...
Abstract Background Automated biomedical named entity recognition and normalization serves as the basis for many downstream applications in information management. However, this task is challenging due to name variations ambiguity. A may have multiple variants a variant could denote several different identifiers. Results To remedy above issues, we present novel knowledge-enhanced system protein/gene (PNER) (PNEN). On one hand, large amount of knowledge extracted from bases used recognize...
Automatically extracting the relationships between chemicals and diseases is significantly important to various areas of biomedical research health care. Biomedical experts have built many large-scale knowledge bases (KBs) advance development research. KBs contain huge amounts structured information about entities relationships, therefore plays a pivotal role in chemical-disease relation (CDR) extraction. However, previous researches pay less attention prior existing KBs. This paper proposes...
Automatically extracting protein–protein interactions (PPIs) from biomedical literature provides additional support for precision medicine efforts. This paper proposes a novel memory network-based model (MNM) PPI extraction, which leverages prior knowledge about pairs with networks. The proposed MNM captures important context clues related to representations learned bases. Both entity embeddings and relation of are effective in improving the extraction model, leading new state-of-the-art...
Automatic extraction of chemical-disease relations (CDR) from unstructured text is essential importance for disease treatment and drug development. Meanwhile, biomedical experts have built many highly-structured knowledge bases (KBs), which contain prior about chemicals diseases. Prior provides strong support CDR extraction. How to make full use it worth studying.This paper proposes a novel model called "Knowledge-guided Convolutional Networks (KCN)" leverage The proposed first learns...
We observe an empirical phenomenon in Large Language Models (LLMs) -- very few activations exhibit significantly larger values than others (e.g., 100,000 times larger). call them massive activations. First, we demonstrate the widespread existence of across various LLMs and characterize their locations. Second, find largely stay constant regardless input, they function as indispensable bias terms LLMs. Third, these lead to concentration attention probabilities corresponding tokens, further,...
In medical domain, given a question, it is difficult to manually select the most relevant information from large number of search results. BioNLP 2019 proposes Question Answering (QA) task, which encourages use text mining technology automatically judge whether result an answer question. The main challenge QA task how mine semantic relation between question and answer. We propose BioBERT Transformer model tackle this challenge, applies Transformers extract different words in questions...
In early-stage breast cancer (BC) patients, 40-70% of lymph node metastases are limited to the sentinel nodes (SLNs). Patients at low risk for nonsentinel (NSLN) metastasis should be exempt from axillary dissection (ALND) or regional radiotherapy (RNI). The present study included 237 female BC patients with positive SLNs who received ALND. Based on clinicopathological factors 158 in training cohort, multivariate analysis was used determine independent NSLN metastasis, which were establish...