- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Advanced Image and Video Retrieval Techniques
- Advanced Vision and Imaging
- Anomaly Detection Techniques and Applications
- Adversarial Robustness in Machine Learning
- Advanced Image Processing Techniques
- Advanced Graph Neural Networks
- Image Processing Techniques and Applications
- Machine Learning and Data Classification
- Video Analysis and Summarization
- Human Motion and Animation
- Crystallization and Solubility Studies
- Visual Attention and Saliency Detection
- X-ray Diffraction in Crystallography
- Image Enhancement Techniques
- Image and Signal Denoising Methods
- Face recognition and analysis
- Topic Modeling
- Digital Media Forensic Detection
- Graph Theory and Algorithms
National University of Singapore
2021-2025
Chongqing University
2013-2024
Heze University
2019-2024
Kunming Medical University
2024
Taiyuan University of Technology
2024
Shanxi Medical University
2021-2024
University of Wisconsin–Madison
2015-2024
Yunnan University
2023-2024
Fujian Institute of Research on the Structure of Matter
2022-2024
Tea Research Institute
2024
This paper reviews the first challenge on single image super-resolution (restoration of rich details in an low resolution image) with focus proposed solutions and results. A new DIVerse 2K dataset (DIV2K) was employed. The had 6 competitions divided into 2 tracks 3 magnification factors each. Track 1 employed standard bicubic downscaling setup, while unknown operators (blur kernel decimation) but learnable through high res train images. Each competition ∽100 registered participants 20 teams...
Transformers have shown great potential in computer vision tasks. A common belief is their attention-based token mixer module contributes most to competence. However, recent works show the transformers can be replaced by spatial MLPs and resulted models still perform quite well. Based on this observation, we hypothesize that general architecture of transformers, instead specific module, more essential model's performance. To verify this, deliberately replace attention with an embarrassingly...
Despite the remarkable progress in person re-identification (Re-ID), such approaches still suffer from failure cases where discriminative body parts are missing. To mitigate this type of failure, we propose a simple yet effective Horizontal Pyramid Matching (HPM) approach to fully exploit various partial information given person, so that correct candidates can be identified even if some key With HPM, make following contributions produce more robust feature representations for Re-ID task: 1)...
Deep compression refers to removing the redundancy of parameters and feature maps for deep learning models. Low-rank approximation pruning sparse structures play a vital role in many works. However, weight filters tend be both low-rank sparse. Neglecting either part these structure information previous methods results iteratively retraining, compromising accuracy, low rates. Here we propose unified framework integrating decomposition matrices with map reconstructions. Our model includes like...
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus proposed solutions and results. The had 4 tracks. Track 1 employed standard bicubic downscaling setup, while Tracks 2, 3 realistic unknown downgrading operators simulating camera acquisition pipeline. were learnable through provided pairs high train images. tracks 145, 114, 101, 113 registered participants, resp., 31 teams competed final testing...
In this report we demonstrate that with same parameters and computational budgets, models wider features before ReLU activation have significantly better performance for single image super-resolution (SISR). The resulted SR residual network has a slim identity mapping pathway (\(2\times\) to \(4\times\)) channels in each block. To further widen (\(6\times\) \(9\times\)) without overhead, introduce linear low-rank convolution into networks achieve even accuracy-efficiency tradeoffs. addition,...
Recent Vision Transformer (ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to its competence in modeling long-range de-pendencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields each token feature within layer. Such a constraint inevitably limits ability self-attention layer capturing multi-scale features, thereby leading performance degradation handling images with multiple...
Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks. However, the parameter-grouping patterns vary widely across different models, making architecture-specific pruners, which rely on manually-designed grouping schemes, non-generalizable to new architectures. In this work, we study a highly-challenging yet barely-explored task, any structural pruning, tackle general of arbitrary architecture like CNNs, RNNs, GNNs and Transformers. The...
Class imbalance has emerged as one of the major challenges for medical image segmentation. The model cascade (MC) strategy significantly alleviates class issue via running a set individual deep models coarse-to-fine Despite its outstanding performance, however, this method leads to undesired system complexity and also ignores correlation among models. To handle these flaws, we propose light-weight model, i.e., One-pass Multi-task Network (OM-Net) solve better than MC does, while requiring...
Existing knowledge distillation methods focus on convolutional neural networks (CNNs), where the input samples like images lie in a grid domain, and have largely overlooked graph (GCN) that handle non-grid data. In this paper, we propose to our best first dedicated approach distilling from pre-trained GCN model. To enable transfer teacher student, local structure preserving module explicitly accounts for topological semantics of teacher. module, information both student are extracted as...
Prior normalization methods rely on affine transformations to produce arbitrary image style transfers, of which the parameters are computed in a pre-defined way. Such manually-defined nature eventually results high-cost and shared encoders for both content encoding, making transfer systems cumbersome be deployed resource-constrained environments like mobile-terminal side. In this paper, we propose new generalized module, termed as Dynamic Instance Normalization (DIN), that allows flexible...
Recent studies show that Transformer has strong capability of building long-range dependencies, yet is incompetent in capturing high frequencies predominantly convey local information. To tackle this issue, we present a novel and general-purpose Inception Transformer, or iFormer for short, effectively learns comprehensive features with both high- low-frequency information visual data. Specifically, design an mixer to explicitly graft the advantages convolution max-pooling high-frequency...
Dataset condensation aims at reducing the network training effort through condensing a cumbersome set into compact synthetic one. State-of-the-art approaches largely rely on learning data by matching gradients between real and batches. Despite intuitive motivation promising results, such gradient-based methods, nature, easily overfit to biased of samples that produce dominant gradients, thus lack global supervision distribution. In this paper, we propose novel scheme Condense dataset...
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore capacity again, by migrating our focus away from token mixer design: introduce several baseline models under MetaFormer using most basic or common mixers, and demonstrate their gratifying We summarize observations as follows: (1) ensures solid lower bound By merely adopting identity mapping mixer, model, termed...
Despite the recent visually-pleasing results achieved, massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused accelerating testing yet overlooked huge complexity and sizes. In this paper, we make dedicated attempt to lighten DPM while striving preserve its favourable performance. We start by training...
Learning to predict agent motions with relationship reasoning is important for many applications. In motion prediction tasks, maintaining equivariance under Euclidean geometric transformations and invariance of interaction a critical fundamental principle. However, such properties are overlooked by most existing methods. To fill this gap, we propose Eq-Motion, an efficient equivariant model invariant reasoning. achieve equivariance, feature learning module learn transformable through...
Large language models (LLMs) have shown remarkable capabilities in understanding and generation. However, such impressive capability typically comes with a substantial model size, which presents significant challenges both the deployment, inference, training stages. With LLM being general-purpose task solver, we explore its compression task-agnostic manner, aims to preserve multi-task solving generation ability of original LLM. One challenge achieving this is enormous size corpus LLM, makes...
In this paper, we show that tracking different kinds of interacting objects can be formulated as a network-flow mixed integer program. This is made possible by all simultaneously using intertwined flow variables and expressing the fact one object appear or disappear at locations where another in terms linear constraints. Our proposed method able to track invisible whose only evidence presence other contain them. Furthermore, our tracklet-based implementation yields real-time performance. We...
In this paper, balanced two-stage residual networks (BTSRN) are proposed for single image super-resolution. The deep design with constrained depth achieves the optimal balance between accuracy and speed super-resolving images. experiments show that structure, together our lightweight two-layer PConv block design, very promising results when considering both speed. We evaluated models on New Trends in Image Restoration Enhancement workshop challenge super-resolution (NTIRE SR 2017). Our final...