- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Sparse and Compressive Sensing Techniques
- Stochastic Gradient Optimization Techniques
- Multimodal Machine Learning Applications
- Natural Language Processing Techniques
- Topic Modeling
- Advanced Image and Video Retrieval Techniques
- Machine Learning and Data Classification
- Advanced Graph Neural Networks
- Visual Attention and Saliency Detection
- Music and Audio Processing
- Reconstructive Surgery and Microvascular Techniques
- Adversarial Robustness in Machine Learning
- Generative Adversarial Networks and Image Synthesis
- Neural Networks and Applications
- Speech Recognition and Synthesis
- Human Pose and Action Recognition
- Machine Learning and ELM
- Tensor decomposition and applications
- Speech and Audio Processing
- Text and Document Classification Technologies
- Anomaly Detection Techniques and Applications
- Speech and dialogue systems
- Wound Healing and Treatments
Huazhong University of Science and Technology
2018-2025
Tianjin Medical University
2025
Singapore Management University
2023-2025
The Fourth People's Hospital of Ningxia Hui Autonomous Region
2020-2024
Chongqing University
2024
University of Chinese Academy of Sciences
2021-2024
Shenzhen Institutes of Advanced Technology
2021-2024
Shanghai Institute of Optics and Fine Mechanics
2023
Shanghai Institute of Technical Physics
2023
Union Hospital
2023
Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to competence. However, recent works show the transformers can be replaced by spatial MLPs and resulted models still perform quite well. Based on this observation, we hypothesize that general architecture of transformers, instead specific module, more essential model's performance. To verify this, deliberately replace attention with an embarrassingly...
This paper presents Prototypical Contrastive Learning (PCL), an unsupervised representation learning method that addresses the fundamental limitations of instance-wise contrastive learning. PCL not only learns low-level features for task instance discrimination, but more importantly, it implicitly encodes semantic structures data into learned embedding space. Specifically, we introduce prototypes as latent variables to help find maximum-likelihood estimation network parameters in...
Recently, a tensor nuclear norm (TNN) based method was proposed to solve the completion problem, which has achieved state-of-the-art performance on image and video inpainting tasks. However, it requires computing singular value decomposition (t-SVD), costs much computation thus cannot efficiently handle data, due its natural large scale. Motivated by TNN, we propose novel low-rank factorization for solving 3-way problem. Our preserves structure of factorizing into product two tensors smaller...
Most existing subspace clustering methods hinge on self-expression of handcrafted representations and are unaware potential errors. Thus they perform unsatisfactorily real data with complex underlying subspaces. To solve this issue, we propose a novel deep adversarial (DASC) model, which learns more favorable sample by learning for clustering, importantly introduces to supervise representation clustering. Specifically, DASC consists generator quality-verifying discriminator, learn against...
Abstract Background Severe patients with 2019 novel coronavirus (2019-nCoV) pneumonia progressed rapidly to acute respiratory failure. We aimed evaluate the definite efficacy and safety of corticosteroid in treatment severe COVID-19 pneumonia. Methods Forty-six hospitalized at Wuhan Union Hospital from January 20 February 25, 2020, were retrospectively reviewed. The divided into two groups based on whether they received treatment. clinical symptoms chest computed tomography(CT) results...
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies predominantly convey local information. To tackle this issue, we present a novel and general-purpose Inception Transformer, or iFormer for short, effectively learns comprehensive features with both high- low-frequency information visual data. Specifically, design an mixer to explicitly graft the advantages convolution max-pooling high-frequency...
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore capacity again, by migrating our focus away from token mixer design: introduce several baseline models under MetaFormer using most basic or common mixers, and demonstrate their gratifying We summarize observations as follows: (1) ensures solid lower bound By merely adopting identity mapping mixer, model, termed...
In deep learning, different kinds of networks typically need optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve model speed across networks, we propose ADAptive Nesterov momentum algorithm, Adan for short. first reformulates vanilla acceleration develop a new estimation (NME) method, avoids extra overhead computing gradient at extrapolation point. Then adopts NME estimate gradient's first-...
AdamW modifies Adam by adding a decoupled weight decay to network weights per training iteration. For adaptive algorithms, this does not affect specific optimization steps, and differs from the widely used <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\ell _{2}$</tex-math></inline-formula> -regularizer which changes steps via changing first- second-order gradient moments. Despite its great practical...
Multi-way or tensor data analysis has attracted increasing attention recently, with many important applications in practice. This article develops a low-rank representation (TLRR) method, which is the first approach that can exactly recover clean of intrinsic structure and accurately cluster them as well, provable performance guarantees. In particular, for arbitrary sparse corruptions, TLRR under mild conditions; meanwhile verify their true origin subspaces hence accurately. objective...
It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse generalization performance than SGD despite their faster training speed. This work aims to provide understandings on this gap by analyzing local convergence behaviors. Specifically, we observe the heavy tails of noise in these algorithms. motivates us analyze through Levy-driven stochastic differential equations (SDEs) because similar behaviors an algorithm and its SDE. Then establish escaping time SDEs a...
Graph-level representations are critical in various real-world applications, such as predicting the properties of molecules. But practice, precise graph annotations generally very expensive and time-consuming. To address this issue, contrastive learning constructs instance discrimination task which pulls together positive pairs (augmentation same graph) pushes away negative different graphs) for unsupervised representation learning. However, since a query, its negatives uniformly sampled...
In deep learning, different kinds of networks typically need optimizers, which have to be chosen after multiple trials, making the training process inefficient. To relieve this issue and consistently improve model speed across networks, we propose ADAptive Nesterov momentum algorithm, Adan for short. first reformulates vanilla acceleration develop a new estimation (NME) method, avoids extra overhead computing gradient at extrapolation point. Then adopts NME estimate gradient's first-...
Low-rank tensor analysis is important for various real applications in computer vision. However, existing methods focus on recovering a low-rank contaminated by Gaussian or gross sparse noise and hence cannot effectively handle outliers that are common practical data. To solve this issue, we propose an outlier-robust principle component (OR-TPCA) method simultaneous recovery outlier detection. For intrinsically observations with arbitrary corruption, OR-TPCA the first has provable...
Feature learning plays a central role in pattern recognition. In recent years, many representation-based feature methods have been proposed and achieved great success applications. However, these perform subsequent classification two separate steps, which may not be optimal for recognition tasks. this paper, we present supervised low-rank-based approach discriminative features. By integrating latent low-rank representation (LatLRR) with ridge regression-based classifier, our combines...
Wearable sensors-based gait recognition is an effective method to recognize people’s identity by recognizing the unique way they walk. Recently, adoption of deep learning networks for has achieved significant performance improvement and become a new promising trend. However, most existing studies mainly focused on improving accuracy while ignored model complexity, which make them unsuitable wearable devices. In this study, we proposed lightweight attention-based Convolutional Neural Networks...
We propose to perform video question answering (VideoQA) in a Contrastive manner via Video Graph Transformer model (CoVGT). CoVGT's uniqueness and superiority are three-fold: 1) It proposes dynamic graph transformer module which encodes by explicitly capturing the visual objects, their relations dynamics, for complex spatio-temporal reasoning. 2) designs separate text transformers contrastive learning between QA, instead of multi-modal answer classification. Fine-grained video-text...
Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs. To address this, we propose a method called Iterative Graph Self-Distillation (IGSD) which learns graph-level representation an unsupervised manner through instance discrimination using self-supervised contrastive learning approach. IGSD involves teacher-student distillation process that uses graph diffusion augmentations and constructs teacher model exponential moving average student...