- Topic Modeling
- Biomedical Text Mining and Ontologies
- Natural Language Processing Techniques
- Adversarial Robustness in Machine Learning
- Machine Learning and Algorithms
- Domain Adaptation and Few-Shot Learning
- Multimodal Machine Learning Applications
- Anomaly Detection Techniques and Applications
- Machine Learning in Healthcare
- Advanced Neural Network Applications
- Machine Learning and Data Classification
- Quantum Information and Cryptography
- Semantic Web and Ontologies
- Reinforcement Learning in Robotics
- COVID-19 diagnosis using AI
- Genomics and Rare Diseases
- Data Quality and Management
- Advanced Graph Neural Networks
- Privacy-Preserving Technologies in Data
- Stochastic Gradient Optimization Techniques
- AI in cancer detection
- Neural Networks and Applications
- Advanced Bandit Algorithms Research
- Cryptography and Data Security
- Computational Drug Discovery Methods
Peking University
2015-2025
The University of Texas Health Science Center at Houston
2023-2025
Peking University Third Hospital
2023-2025
State Key Laboratory of Oncogene and Related Genes
2021-2025
Renji Hospital
2021-2025
Shanghai Jiao Tong University
2021-2025
Shanghai Cancer Institute
2021-2025
Mayo Clinic in Florida
2016-2024
XinHua Hospital
2024
Mayo Clinic
2016-2024
While neural machine translation (NMT) is making good progress in the past two years, tens of millions bilingual sentence pairs are needed for its training. However, human labeling very costly. To tackle this training data bottleneck, we develop a dual-learning mechanism, which can enable an NMT system to automatically learn from unlabeled through game. This mechanism inspired by following observation: any task has dual task, e.g., English-to-French (primal) versus French-to-English (dual);...
Automatic clinical text classification is a natural language processing (NLP) technology that unlocks information embedded in narratives. Machine learning approaches have been shown to be effective for tasks. However, successful machine model usually requires extensive human efforts create labeled training data and conduct feature engineering. In this study, we propose paradigm using weak supervision deep representation reduce these efforts.
Previous cross-lingual knowledge graph (KG) alignment studies rely on entity embeddings derived only from monolingual KG structural information, which may fail at matching entities that have different facts in two KGs. In this paper, we introduce the topic graph, a local sub-graph of an entity, to represent with their contextual information KG. From view, KB-alignment task can be formulated as problem; and further propose graph-attention based solution, first matches all graphs, then jointly...
Deep learning models are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on benign inputs. However, under the black-box setting, most existing adversaries often have a poor transferability attack other defense models. In this work, from perspective of regarding example generation as an optimization process, we propose two new methods improve examples, namely Nesterov Iterative Fast Gradient Sign Method (NI-FGSM) and Scale-Invariant (SIM). NI-FGSM aims...
Few-shot learning (FSL) has attracted increasing attention in recent years but remains challenging, due to the intrinsic difficulty generalize from a few examples. This paper proposes an adaptive margin principle improve generalization ability of metric-based meta-learning approaches for few-shot problems. Specifically, we first develop class-relevant additive loss, where semantic similarity between each pair classes is considered separate samples feature embedding space similar classes....
Accurate extraction of breast cancer patients' phenotypes is important for clinical decision support and research. This study developed evaluated domain pretrained CancerBERT models extracting from texts. We also investigated the effect customized cancer-related vocabulary on performance models.A corpus patients was extracted electronic health records a local hospital. annotated named entities in 200 pathology reports 50 notes 8 fine-tuning evaluation. kept pretraining BlueBERT model with...
Abstract Powder X‐ray diffraction (PXRD) is a prevalent technique in materials characterization. While the analysis of PXRD often requires extensive human manual intervention, and most automated method only achieved at coarse‐grained level. The more difficult important task fine‐grained crystal structure prediction from remains unaddressed. This study introduces XtalNet, first equivariant deep generative model for end‐to‐end PXRD. Unlike previous methods that rely solely on composition,...
One of the most promising ways improving performance deep convolutional neural networks is by increasing number layers. However, adding layers makes training more difficult and computationally expensive. In order to train deeper networks, we propose add auxiliary supervision branches after certain intermediate during training. We formulate a simple rule thumb determine where these should be added. The resulting deeply supervised structure much easier also produces better classification...
People believe that depth plays an important role in success of deep neural networks (DNN). However, this belief lacks solid theoretical justifications as far we know. We investigate from perspective margin bound. In bound, expected error is upper bounded by empirical plus Rademacher Average (RA) based capacity term. First, derive bound for RA DNN, and show it increases with increasing depth. This indicates negative impact on test performance. Second, deeper tend to have larger...
Recently, large-scale few-shot learning (FSL) becomes topical. It is discovered that, for a FSL problem with 1,000 classes in the source domain, strong baseline emerges, that is, simply training deep feature embedding model using aggregated and performing nearest neighbor (NN) search learned features on target classes. The state-of-the-art methods struggle to beat this baseline, indicating intrinsic limitations scalability. To overcome challenge, we propose novel by transferable visual class...
The Transformer architecture is widely used in natural language processing. Despite its success, the design principle of remains elusive. In this paper, we provide a novel perspective towards understanding architecture: show that can be mathematically interpreted as numerical Ordinary Differential Equation (ODE) solver for convection-diffusion equation multi-particle dynamic system. particular, how words sentence are abstracted into contexts by passing through layers approximating multiple...
We study an interesting problem in training neural network-based models for natural language generation tasks, which we call the \emph{representation degeneration problem}. observe that when a model tasks through likelihood maximization with weight tying trick, especially big datasets, most of learnt word embeddings tend to degenerate and be distributed into narrow cone, largely limits representation power embeddings. analyze conditions causes this propose novel regularization method address...
In this paper, we propose to tackle the challenging few-shot learning (FSL) problem by global class representations using both base and novel training samples. each episode, an episodic mean computed from a support set is registered with representation via registration module. This produces for computing classification loss query set. Though following similar pipeline as existing meta based approaches, our method differs significantly in that samples are involved beginning. To compensate...
In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves running time previous state-of-the-art algorithms, for both $\epsilon$-DP $(\epsilon, \delta)$-DP. non-convex propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm,...
Neural network robustness has recently been highlighted by the existence of adversarial examples. Many previous works show that learned networks do not perform well on perturbed test data, and significantly more labeled data is required to achieve adversarially robust generalization. In this paper, we theoretically empirically with just unlabeled can learn a model better The key insight our results based risk decomposition theorem, in which expected separated into two parts: stability part...