NFDI4DS | UHH-SEMS - Publication Details

Sucheng Ren

ORCID: 0000-0003-4730-8435

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5048891976

Research Areas

Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Advanced Image and Video Retrieval Techniques
Visual Attention and Saliency Detection
Anomaly Detection Techniques and Applications
Video Surveillance and Tracking Methods
Machine Learning and Data Classification
Multimodal Machine Learning Applications
Human Pose and Action Recognition
Medical Image Segmentation Techniques
Music and Audio Processing
Image Retrieval and Classification Techniques
Generative Adversarial Networks and Image Synthesis
Image Enhancement Techniques
Face recognition and analysis
Speech and Audio Processing
Face Recognition and Perception
Speech and dialogue systems
Water Systems and Optimization
Image and Video Quality Assessment
Brain Tumor Detection and Classification
Target Tracking and Data Fusion in Sensor Networks
Adversarial Robustness in Machine Learning
Radiology practices and education
Olfactory and Sensory Function Studies

Hong Kong University of Science and Technology
2025

University of Hong Kong
2025

Singapore Management University
2023-2024

South China University of Technology
2020-2023

Microsoft Research Asia (China)
2023

National University of Singapore
2022-2023

ShangHai JiAi Genetics & IVF Institute
2021-2022

Shunted Self-Attention via Multi-Scale Token Aggregation

OPENALEX - Publications

Sucheng Ren Daquan Zhou Shengfeng He Jiashi Feng Xinchao Wang

Recent Vision Transformer (ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to its competence in modeling long-range de-pendencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields each token feature within layer. Such a constraint inevitably limits ability self-attention layer capturing multi-scale features, thereby leading performance degradation handling images with multiple...

10.1109/cvpr52688.2022.01058 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Reciprocal Transformations for Unsupervised Video Object Segmentation

OPENALEX - Publications

Sucheng Ren Wenxi Liu Yongtuo Liu Haoxin Chen Guoqiang Han and 1 more

Unsupervised video object segmentation (UVOS) aims at segmenting the primary objects in videos without any human intervention. Due to lack of prior knowledge about objects, identifying them from is major challenge UVOS. Previous methods often regard moving as ones and rely on optical flow capture motion cues videos, but information alone insufficient distinguish background that move together. This because, when noisy features are combined with appearance features, localization misguided. To...

10.1109/cvpr46437.2021.01520 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

On Feature Decorrelation in Self-Supervised Learning

OPENALEX - Publications

Tianyu Hua Wenxiao Wang Zihui Xue Sucheng Ren Yue Wang and 1 more

In self-supervised representation learning, a common idea behind most of the state-of-the-art approaches is to enforce robustness representations predefined augmentations. A potential issue this existence completely collapsed solutions (i.e., constant features), which are typically avoided implicitly by carefully chosen implementation details. work, we study relatively concise framework containing components from recent approaches. We verify complete collapse and discover another reachable...

10.1109/iccv48922.2021.00946 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Learning from the Master: Distilling Cross-modal Advanced Knowledge for Lip Reading

OPENALEX - Publications

Sucheng Ren Yong Du Jianming Lv Guoqiang Han Shengfeng He

Lip reading aims to predict the spoken sentences from silent lip videos. Due fact that such a vision task usually performs worse than its counterpart speech recognition, one potential scheme is distill knowledge teacher pretrained by audio signals. However, latent domain gap between cross-modal data could lead learning ambiguity and thus limits performance of reading. In this paper, we propose novel collaborative framework for reading, two aspects issues are considered: 1) should understand...

10.1109/cvpr46437.2021.01312 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

SG-Former: Self-guided Transformer with Evolving Token Reallocation

OPENALEX - Publications

Sucheng Ren Xingyi Yang Songhua Liu Xinchao Wang

Vision Transformer has demonstrated impressive success across various vision tasks. However, its heavy computation cost, which grows quadratically with respect to the token sequence length, largely limits power in handling large feature maps. To alleviate previous works rely on either fine-grained self-attentions restricted local small regions, or global but shorten length resulting coarse granularity. In this paper, we propose a novel model, termed as Self-guided (SG-Former), towards...

10.1109/iccv51070.2023.00552 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Unifying Global-Local Representations in Salient Object Detection With Transformers

OPENALEX - Publications

Sucheng Ren Nanxuan Zhao Qiang Wen Guoqiang Han Shengfeng He

The fully convolutional network (FCN) has dominated salient object detection for a long period. However, the locality of CNN requires model deep enough to have global receptive field and such always leads loss local details. In this paper, we introduce new attention-based encoder, vision transformer, into ensure globalization representations from shallow layers. With view in very layers, transformer encoder preserves more recover spatial details final saliency maps. Besides, as each layer...

10.1109/tetci.2024.3380442 article EN IEEE Transactions on Emerging Topics in Computational Intelligence 2024-04-02

Co-advise: Cross Inductive Bias Distillation

OPENALEX - Publications

Sucheng Ren Zhengqi Gao Tianyu Hua Zihui Xue Yonglong Tian and 2 more

The inductive bias of vision transformers is more relaxed that cannot work well with insufficient data. Knowledge distillation thus introduced to assist the training transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, in this paper, we delve into influence models biases knowledge (e.g., convolution and involution). Our key observation teacher accuracy not dominant reason for student accuracy, but important. We demonstrate lightweight different...

10.1109/cvpr52688.2022.01627 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

OPENALEX - Publications

Sucheng Ren Fangyun Wei Zheng Zhang Han Hu

Masked image modeling (MIM) performs strongly in pretraining large vision Transformers (ViTs). However, small models that are critical for real-world applications can-not or only marginally benefit from this pre-training approach. In paper, we explore distillation techniques to transfer the success of MIM-based pre-trained smaller ones. We systematically study different options framework, including distilling targets, losses, input, network regularization, sequential distillation, etc,...

10.1109/cvpr52729.2023.00359 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

A Simple Data Mixing Prior for Improving Self-Supervised Learning

OPENALEX - Publications

Sucheng Ren Huiyu Wang Zhengqi Gao Shengfeng He Alan Yuille and 2 more

Data mixing (e.g., Mixup, Cutmix, ResizeMix) is an essential component for advancing recognition models. In this paper, we focus on studying its effectiveness in the self-supervised setting. By noticing mixed images that share same source are intrinsically related to each other, hereby propose SDMP, short Simple Mixing Prior, capture straightforward yet prior, and position such as additional positive pairs facilitate representation learning. Our experiments verify proposed SDMP enables data...

10.1109/cvpr52688.2022.01419 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

HResFormer: Hybrid Residual Transformer for Volumetric Medical Image Segmentation

OPENALEX - Publications

Sucheng Ren Xiaomeng Li

Vision Transformer shows great superiority in medical image segmentation due to the ability learn long-range dependency. For from 3-D data, such as computed tomography (CT), existing methods can be broadly classified into 2-D-based and 3-D-based methods. One key limitation is that intraslice information ignored, while high computation cost memory consumption, resulting a limited feature representation for inner slice information. During clinical examination, radiologists primarily use axial...

10.1109/tnnls.2024.3519634 article EN IEEE Transactions on Neural Networks and Learning Systems 2025-01-01

DeepMIM: Deep Supervision for Masked Image Modeling

OPENALEX - Publications

Sucheng Ren Fangyun Wei S. Zhang Hu Han

10.1109/wacv61041.2025.00095 article DA 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

DeepMIM: Deep Supervision for Masked Image Modeling

OPENALEX - Publications

Sucheng Ren Fangyun Wei Samuel Albanie Zheng Zhang Han Hu

Deep supervision, which involves extra supervisions to the intermediate features of a neural network, was widely used in image classification early deep learning era since it significantly reduces training difficulty and eases optimization like avoiding gradient vanish over vanilla training. Nevertheless, with emergence normalization techniques residual connection, supervision gradually phased out. In this paper, we revisit for masked modeling (MIM) that pre-trains Vision Transformer (ViT)...

10.48550/arxiv.2303.08817 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Glance to Count: Learning to Rank with Anchors for Weakly-supervised Crowd Counting

OPENALEX - Publications

Zheng Xiong Liangyu Chai Wenxi Liu Yongtuo Liu Sucheng Ren and 1 more

Crowd image is arguably one of the most laborious data to annotate. In this paper, we aim reduce massive demand for densely labeled crowd data, and propose a novel weakly-supervised setting, in which leverage binary ranking two images with high-contrast counts as training guidance. To enable under new convert count regression problem potential prediction problem. particular, tailor Siamese Ranking Network that predicts scores indicating ordering counts. Hence, ultimate goal assign...

10.1109/wacv57701.2024.00041 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Fine-grained Domain Adaptive Crowd Counting via Point-derived Segmentation

OPENALEX - Publications

Yongtuo Liu Dan Xu Sucheng Ren Hanjie Wu Hongmin Cai and 1 more

Due to domain shift, a large performance drop is usually observed when trained crowd counting model deployed in the wild. While existing domain-adaptive methods achieve promising results, they typically regard each image as whole and reduce discrepancies holistic manner, thus limiting further improvement of adaptation performance. To this end, we propose untangle domain-invariant domain-specific background from images design fine-grained adaption method for counting. Specifically,...

10.1109/icme55011.2023.00403 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2023-07-01

Multimodal Knowledge Expansion

OPENALEX - Publications

Zihui Xue Sucheng Ren Zhengqi Gao Hang Zhao

The popularity of multimodal sensors and the accessibility Internet have brought us a massive amount unlabeled data. Since existing datasets well-trained models are primarily unimodal, modality gap between unimodal network multi-modal data poses an interesting problem: how to transfer pre-trained perform same task with extra data? In this work, we propose knowledge expansion (MKE), distillation-based framework effectively utilize without requiring labels. Opposite traditional distillation,...

10.1109/iccv48922.2021.00089 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Delving Deep into Many-to-many Attention for Few-shot Video Object Segmentation

OPENALEX - Publications

Haoxin Chen Hanjie Wu Nanxuan Zhao Sucheng Ren Shengfeng He

This paper tackles the task of Few-Shot Video Object Segmentation (FSVOS), i.e., segmenting objects in query videos with certain class specified a few labeled support images. The key is to model relationship between and images for propagating object information. many-to-many problem often relies on full-rank attention, which computationally intensive. In this paper, we propose novel Domain Agent Network (DAN), breaking down attention into two smaller ones. We consider one single frame video...

10.1109/cvpr46437.2021.01382 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Unifying Global-Local Representations in Salient Object Detection with Transformer

OPENALEX - Publications

Sucheng Ren Qiang Wen Nanxuan Zhao Guoqiang Han Shengfeng He

10.48550/arxiv.2108.02759 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Reducing Spatial Labeling Redundancy for Active Semi-Supervised Crowd Counting

OPENALEX - Publications

Yongtuo Liu Sucheng Ren Liangyu Chai Hanjie Wu Dan Xu and 2 more

Labeling is onerous for crowd counting as it should annotate each individual in images. Recently, several methods have been proposed semi-supervised to reduce the labeling efforts. Given a limited budget, they typically select few images and densely label all individuals of them. Despite promising results, we argue None-or-All strategy suboptimal labeled image usually appear similar while massive unlabeled may contain entirely diverse individuals. To this end, propose break chain previous...

10.1109/tpami.2022.3232712 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2022-12-28

The Modality Focusing Hypothesis: Towards Understanding Crossmodal Knowledge Distillation

OPENALEX - Publications

Zihui Xue Zhengqi Gao Sucheng Ren Hang Zhao

Crossmodal knowledge distillation (KD) extends traditional to the area of multimodal learning and demonstrates great success in various applications. To achieve transfer across modalities, a pretrained network from one modality is adopted as teacher provide supervision signals student another modality. In contrast empirical reported prior works, working mechanism crossmodal KD remains mystery. this paper, we present thorough understanding KD. We begin with two case studies demonstrate that...

10.48550/arxiv.2206.06487 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Self-supervision through Random Segments with Autoregressive Coding (RandSAC)

OPENALEX - Publications

Tianyu Hua Yonglong Tian Sucheng Ren Hang Zhao Leonid Sigal

Inspired by the success of self-supervised autoregressive representation learning in natural language (GPT and its variants), advances recent visual architecture design with Vision Transformers (ViTs), this paper, we explore effect various choices have on applying such training strategies for feature learning. Specifically, introduce a novel strategy that call Random Segments Autoregressive Coding (RandSAC). In RandSAC, group patch representations (image tokens) into hierarchically arranged...

10.48550/arxiv.2203.12054 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Edge Distraction-aware Salient Object Detection

OPENALEX - Publications

Sucheng Ren Wenxi Liu Jianbo Jiao Guoqiang Han Shengfeng He

Integrating low-level edge features has been proven to be effective in preserving clear boundaries of salient objects. However, the locality makes it difficult capture globally edges, leading distraction final predictions. To address this problem, we propose produce distraction-free by incorporating cross-scale holistic interdependencies between high-level features. In particular, first formulate our extraction process as a boundary-filling problem. way, enforce focus on closed instead those...

10.1109/mmul.2023.3235936 article EN IEEE Multimedia 2023-01-10

Coming Soon ...