- Advanced Graph Neural Networks
- Topic Modeling
- Machine Learning and Data Classification
- Natural Language Processing Techniques
- Recommender Systems and Techniques
- Advanced Neural Network Applications
- Graph Theory and Algorithms
- Domain Adaptation and Few-Shot Learning
- Machine Learning and Algorithms
- Complex Network Analysis Techniques
- Text and Document Classification Technologies
- Advanced Image and Video Retrieval Techniques
- Generative Adversarial Networks and Image Synthesis
- Machine Learning in Materials Science
- Lymphoma Diagnosis and Treatment
- Systemic Lupus Erythematosus Research
- Semantic Web and Ontologies
- Data Stream Mining Techniques
- Salivary Gland Tumors Diagnosis and Treatment
- Multimodal Machine Learning Applications
- Rheumatoid Arthritis Research and Therapies
- Sentiment Analysis and Opinion Mining
- Neural Networks and Applications
- Gene expression and cancer classification
- Salivary Gland Disorders and Functions
Peking University
2011-2025
Peking Union Medical College Hospital
2011-2024
Chinese Academy of Medical Sciences & Peking Union Medical College
2012-2024
HEC Montréal
2023-2024
Beijing Institute of Big Data Research
2020-2024
UNSW Sydney
2024
Mila - Quebec Artificial Intelligence Institute
2023-2024
Shenzhen University Health Science Center
2024
National Clinical Research Center for Digestive Diseases
2024
Lenovo (China)
2023
We propose new classification criteria for Sjögren's syndrome (SS), which are needed considering the emergence of biologic agents as potential treatments and their associated comorbidity. These target individuals with signs/symptoms suggestive SS.Criteria based on expert opinion elicited using nominal group technique analyses data from International Collaborative Clinical Alliance. Preliminary validation included comparisons classifications American–European Consensus Group (AECG) criteria,...
The Chinese systemic lupus erythematosus (SLE) treatment and research group (CSTAR) provides major clinical characteristics of SLE in China establishes a platform to provide resources for future basic studies. CSTAR originated as multicentre, consecutive, prospective design. data were collected online from 104 rheumatology centers, which covered 30 provinces China. registered patients required meet four or more the American College Rheumatology (ACR) criteria classification SLE. All centers...
With the explosive growth of online information, recommender systems play a key role to alleviate such information overload. Due important application value systems, there have always been emerging works in this field. In main challenge is learn effective user/item representations from their interactions and side (if any). Recently, graph neural network (GNN) techniques widely utilized since most essentially has structure GNN superiority representation learning. This article aims provide...
Graph neural networks (GNNs) have achieved great success in many graph-based applications. However, the enormous size and high sparsity level of graphs hinder their applications under industrial scenarios. Although some scalable GNNs are proposed for large-scale graphs, they adopt a fixed $K$-hop neighborhood each node, thus facing over-smoothing issue when adopting large propagation depths nodes within sparse regions. To tackle above issue, we propose new GNN architecture -- Attention...
Halide solid electrolytes have attracted intense research interest recently for application in all-solid-state lithium-ion batteries. Herein, we present a systematic first-principles study of the Li3MX6 (M: multivalent cation; X: halogen anion) halide family that unveils link between Li-rich channels and ionic conductivity, highlighting former as material gene these compounds. By screening total 180 halides those with high thermodynamic stability, wide electrochemical window, low chemical...
The rapid development of open-source large language models (LLMs) has been truly remarkable. However, the scaling law described in previous literature presents varying conclusions, which casts a dark cloud over LLMs. We delve into study laws and present our distinctive findings that facilitate scale two commonly used configurations, 7B 67B. Guided by laws, we introduce DeepSeek LLM, project dedicated to advancing with long-term perspective. To support pre-training phase, have developed...
To evaluate the incidence of malignancies in a cohort Chinese patients with primary Sjögren's syndrome (pSS) and to identify risk factors malignancy pSS patients.A retrospective analysis was carried out 1320 who were recruited Peking Union Medical College Hospital from 1990 2005 followed up for an average 4.4 years. Among them, 29 developed malignancies. Standardized ratios (SIRs) calculated along 95% CIs. Clinical characteristics compared between without malignancies, as well haematological...
Named entity recognition is a fundamental task in natural language processing and many studies have done about it recent decades. Previous word representation methods represent words as single vector of multiple dimensions, which ignore the ambiguity character Chinese. To solve this problem, we apply BERT-BiLSTM-CRF model to Chinese electronic medical records named paper. This enhances semantic by using BERT pre-trained model, then combine BiLSTM network with CRF layer, used input for...
Graph Convolutional Network (GCN) is a widely used method for learning from graph-based data. However, it fails to use the unlabeled data its full potential, thereby hindering ability. Given some pseudo labels of data, GCN can benefit this extra supervision. Based on Knowledge Distillation and Ensemble Learning, lots methods teacher-student architecture make better then prediction. these introduce unnecessary training costs high bias student model if teacher's predictions are unreliable....
Graph neural networks (GNNs) have achieved state-of-the-art performance in various graph-based tasks. However, as mainstream GNNs are designed based on the message passing mechanism, they do not scale well to data size and steps. Although there has been an emerging interest design of scalable GNNs, current researches focus specific GNN design, rather than general space, limiting discovery potential models. This paper proposes PaSca, a new paradigm system that offers principled approach...
Graph Neural Networks (GNNs) have achieved great success in various graph mining tasks.However, drastic performance degradation is always observed when a GNN stacked with many layers. As result, most GNNs only shallow architectures, which limits their expressive power and exploitation of deep neighborhoods.Most recent studies attribute the to \textit{over-smoothing} issue. In this paper, we disentangle conventional convolution operation into two independent operations: \textit{Propagation}...
In recent years, Graph Neural Network (GNN) based models have shown promising results in simulating physics of complex systems. However, training dedicated graph network simulators can be costly, as most are confined to fully supervised training, which requires extensive data generated from traditional simulators. To date, how transfer learning could improve the model performance and efficiency has remained unexplored. this work, we introduce a pre-training paradigm for We propose scalable...
With the expansion of data availability, machine learning (ML) has achieved remarkable breakthroughs in both academia and industry. However, imbalanced distributions are prevalent various types raw severely hinder performance ML by biasing decision-making processes. To deepen understanding facilitate related research applications, this survey systematically analyzing real-world formats concludes existing researches for different into four distinct categories: re-balancing, feature...
The remarkable success of the autoregressive paradigm has made significant advancement in Multimodal Large Language Models (MLLMs), with powerful models like Show-o, Transfusion and Emu3 achieving notable progress unified image understanding generation. For first time, we uncover a common phenomenon: capabilities MLLMs are typically stronger than their generative capabilities, gap between two. Building on this insight, propose HermesFlow, simple yet general framework designed to seamlessly...
Retrieval-Augmented Generation (RAG) systems often struggle with imperfect retrieval, as traditional retrievers focus on lexical or semantic similarity rather than logical relevance. To address this, we propose HopRAG, a novel RAG framework that augments retrieval reasoning through graph-structured knowledge exploration. During indexing, HopRAG constructs passage graph, text chunks vertices and connections established via LLM-generated pseudo-queries edges. it employs retrieve-reason-prune...
According to the Test-Time Scaling, integration of External Slow-Thinking with Verify mechanism has been demonstrated enhance multi-round reasoning in large language models (LLMs). However, multimodal (MM) domain, there is still a lack strong MM-Verifier. In this paper, we introduce MM-Verifier and MM-Reasoner through longer inference more robust verification. First, propose two-step MM verification data synthesis method, which combines simulation-based tree search uses rejection sampling...
Data selection methods, such as active learning and core-set selection, are useful tools for improving the data efficiency of deep models on large-scale datasets. However, recent have moved forward from independent identically distributed to graph-structured data, social networks, e-commerce user-item graphs, knowledge graphs. This evolution has led emergence Graph Neural Networks (GNNs) that go beyond existing methods designed for. Therefore, we present GRAIN, an efficient framework opens...
Graph Convolutional Networks (GCNs) have become state-of-the-art methods in many supervised and semi-supervised graph representation learning scenarios. In order to achieve satisfactory performance, GCNs require a sufficient amount of labeled data. However, real-world scenarios, data is often expensive obtain. Therefore, we propose ALG, novel Active Learning framework for GCNs, which employs domain-specific intelligence much higher performance efficiency compared the generic AL frameworks....