- Biomedical Text Mining and Ontologies
- Topic Modeling
- Influenza Virus Research Studies
- Machine Learning in Healthcare
- RNA and protein synthesis mechanisms
- Natural Language Processing Techniques
- Protein Structure and Dynamics
- Enzyme Structure and Function
- Bioinformatics and Genomic Networks
- Data-Driven Disease Surveillance
- Viral Infections and Vectors
- Respiratory viral infections research
- Chronic Obstructive Pulmonary Disease (COPD) Research
- Semantic Web and Ontologies
- Atomic and Subatomic Physics Research
- Wnt/β-catenin signaling in development and cancer
- Artificial Intelligence in Healthcare
- Thyroid Cancer Diagnosis and Treatment
- Lung Cancer Diagnosis and Treatment
- Electromagnetic Fields and Biological Effects
- Advanced Graph Neural Networks
- Viral Infections and Outbreaks Research
- Advanced Clustering Algorithms Research
- COVID-19 epidemiological studies
- Cancer-related gene regulation
Chinese Academy of Medical Sciences & Peking Union Medical College
2009-2025
Nanjing Medical University
2025
Suzhou Institute of Systems Medicine
2014-2023
Guangzhou Medical University
2023
Soochow University
2021
First Affiliated Hospital of Soochow University
2021
Peking Union Medical College Hospital
2021
Chinese Academy of Sciences
2014-2015
Institute of Biophysics
2014-2015
University of Chinese Academy of Sciences
2014
Abstract Respiratory diseases pose a significant global health burden, with challenges in early and accurate diagnosis due to overlapping clinical symptoms, which often leads misdiagnosis or delayed treatment. To address this issue, we developed LungDiag , an artificial intelligence (AI)‐based diagnostic system that utilizes natural language processing (NLP) extract key features from electronic records (EHRs) for the classification of respiratory diseases. This study employed large cohort...
Precisely aligning phenotypic information within medical texts is paramount in advancing intelligent applications, such as similar patient case retrieval. However, despite its criticality, an algorithm specifically designed for this task lacking. We previously introduced a fine-grained semantic model, the structured unit of phenotypes (PhenoSSU), and automatic extraction algorithm. This model accurately characterizes extracts from texts. In study, we explore different PhenoSSU alignment...
Abstract Background and Objectives: Mesenchymal stem cells (MSC) are multipotent progenitor that have found use in regenerative medicine. We previously observed aspirin, a widely used anti‐inflammatory drug, inhibits MSC proliferation. Here we aimed to elucidate whether aspirin induces apoptosis this is modulated through the Wnt/β‐catenin pathway. Materials methods: Apoptosis of MSCs was assessed using Hoechst 33342 dye an Annexin V–FITC/PI Kit. Expression protein phosphorylation were...
Since coronavirus disease 2019 (COVID-19) might circulate in the following seasons, it is essential to understand how COVID-19 influences other respiratory diseases, especially influenza. In this study, we analyzed influenza activity from mid-November March 2020 Chinese mainland and found that season ended much earlier than previous seasons for all subtypes lineages, which may have resulted circulation of measures such as travel control personal protection. These findings provide rudimentary...
Many template-based modeling (TBM) methods have been developed over the recent years that allow for protein structure prediction and study of structure-function relationships proteins. One major problem all TBM algorithms face, however, is their unsatisfactory performance when proteins under consideration are low-homology. To improve such targets, a novel model evaluation method was here, named MEFTop. Our focuses on evaluating topology by using two groups features. These features included...
Phenotypes characterize the clinical manifestations of diseases and provide important information for diagnosis. Therefore, construction phenotype knowledge graphs is valuable to development artificial intelligence in medicine. However, current bases such as WikiData DBpedia are coarse-grained because they only consider core concepts phenotypes while neglecting details (attributes) associated with these phenotypes.
The Influenza A (H1N1) pdm09 virus caused a global pandemic in 2009 and has circulated seasonally ever since. As the continual genetic evolution of hemagglutinin this leads to antigenic drift, rapid identification variants characterization are needed. In study, we developed PREDAC-H1pdm, model predict relationships between H1N1pdm viruses identify clusters for post-2009 H1N1 strains. Our performed well predicting variants, which was helpful influenza surveillance. By mapping H1N1pdm, found...
Abstract Motivation Previously, we developed a computational model to identify genomic co-occurrence networks that was applied capture the coevolution patterns within genomes of influenza viruses. To facilitate easy public use this model, an R package ‘cooccurNet’ is presented here. Results includes functionalities construction and analysis residues (e.g. nucleotides, amino acids SNPs) network. In addition, new method for measuring coevolution, defined as residue score (RCOS), proposed...
Newly emerging influenza viruses keep challenging global public health. To evaluate the potential risk of viruses, it is critical to rapidly determine phenotypes including antigenicity, host, virulence and drug resistance.Here, we built FluPhenotype, a one-stop platform determinate A viruses. The input FluPhenotype complete or partial genomic/protein sequences output presents five types information about viruses: (i) sequence annotation gene protein names as well open reading frames, (ii)...
Timely surveillance of the antigenic dynamics influenza virus is critical for accurate selection vaccine strains, which important effective prevention viral spread and infection.Here, we provide a computational platform, called PREDAC-H3, human A(H3N2) based on sequence surface protein hemagglutinin (HA). PREDAC-H3 not only determines variants cluster (grouped similar antigenicity) to belongs, HA sequences, but also allows visualization spatial distribution temporal clusters viruses isolated...
Objective The purpose of this study was to predict elevated TSH levels by developing an effective machine learning model based on large-scale physical examination results. Methods Subjects who underwent general examinations from January 2015 December 2019 were enrolled in study. A total 21 clinical parameters analyzed, including six demographic (sex, age, etc.) and 15 laboratory (thyroid peroxidase antibody (TPO-Ab), thyroglobulin (TG-Ab), etc.). risk factors for the univariate multivariate...
Medical entity normalization is an important task for medical information processing. The Unified Language System (UMLS), a well-developed terminology system, crucial normalization. However, the UMLS primarily consists of English terms. For languages other than English, such as Chinese, significant challenge normalizing entities lack robust systems. To address this issue, we propose translation-enhancing training strategy that incorporates translation and synonym knowledge into language...
Many host specific mutations have been detected in influenza A viruses (IAVs). However, their effects on hydrogen bond (H-bond) variations rarely investigated. In this study, 60 sites were identified the internal proteins of avian and human IAVs, 27 which contained with H-bonds. Besides, 30 group HA NA. Twenty-six 36 existing at these caused H-bond loss or formation least one subtype. The number isolations 2009 pandemic H1N1, human-infecting H5N1 H7N9 varied. combinations changes three...
Phenotype information in electronic health records (EHRs) is mainly recorded unstructured free text, which cannot be directly used for clinical research. EHR-based deep-phenotyping methods can structure phenotype EHRs with high fidelity, making it the focus of medical informatics. However, developing a method non-English (ie, Chinese EHRs) challenging. Although numerous EHR resources exist China, fine-grained annotation data that are suitable limited. It challenging to develop such...
<sec> <title>BACKGROUND</title> In recent years, text embedding models and their associated vector search techniques have seen significant advancements. These technologies become crucial for deploying large language in specialized fields like medicine. However, the inherent complexity of domain-specific knowledge still poses challenges mainstream models. often struggle to accurately interpret represent terminology. The medical field, with its extensive diverse terminology, highlights a gap...
Chinese medical entities have not been organized comprehensively due to the lack of well-developed terminology systems, which poses a challenge processing texts for fine-grained knowledge representation. To unify terminologies, mapping their English counterparts in Unified Medical Language System (UMLS) is an efficient solution. However, mappings investigated sufficiently former research. In this study, we explore strategies UMLS and systematically evaluate performance.
Introduction: Respiratory diseases pose a global health burden, and early diagnosis is crucial for effective treatment. Developing natural language processing (NLP) diagnostic system based on electronic records (EHRs) essential but challenging due to the complexity of EHR data.Methods: A retrospective study was conducted using EHRs respiratory disease patients from multiple hospitals in China. An NLP algorithm developed extract clinical features phenotypic attributes EHRs. The extracted were...
Electronic health record (EHR) resources are valuable but remain underexplored because most clinical information, especially phenotype is buried in the free text of EHRs. An intelligent annotation tool plays an important role unlocking full potential EHRs by transforming free-text information into a computer-readable form. Deep phenotyping has shown its advantage representing with high fidelity; however, existing tools not suitable for deep task. Here, we developed named PIAT major focus on...
Motivation: Protein domains are fundamental units of protein structure, function and evolution; thus, it is critical to gain a deep understanding domain organization. Previous works have attempted identify key residues involved in organization architecture. Because one the most important characteristics architecture arrangement secondary structure elements (SSEs), here we present picture through an integrated consideration SSE arrangements residue contact networks. Results: In this work, by...