NFDI4DS | UHH-SEMS - Publication Details

Shuo Wang

ORCID: 0000-0002-4881-9344

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5100400159

Research Areas

Advanced Neural Network Applications
Domain Adaptation and Few-Shot Learning
Video Surveillance and Tracking Methods
Multimodal Machine Learning Applications
Human Pose and Action Recognition
Advanced Image and Video Retrieval Techniques
Autonomous Vehicle Technology and Safety
Face and Expression Recognition
Hearing Impairment and Communication
Advanced Vision and Imaging
Hand Gesture Recognition Systems
Face recognition and analysis
Anomaly Detection Techniques and Applications
Topic Modeling
Neural Networks and Applications
Brain Tumor Detection and Classification
Image Processing Techniques and Applications
Digital Media and Philosophy
Fuzzy Logic and Control Systems
Video Analysis and Summarization
Fire Detection and Safety Systems
Robotics and Automated Systems
Reinforcement Learning in Robotics
Adversarial Robustness in Machine Learning
Educational Research and Pedagogy

University of Science and Technology of China
2019-2025

University Medical Center Freiburg
2025

University of Freiburg
2025

Heidelberg University
2025

University Hospital Heidelberg
2025

Nvidia (United States)
2022-2024

Beijing Institute of Graphic Communication
2020-2024

Yantai University
2023

Trier University of Applied Sciences
2023

Meizu (China)
2023

CityFlow: A City-Scale Benchmark for Multi-Target Multi-Camera Vehicle Tracking and Re-Identification

OPENALEX - Publications

Zheng Tang Milind Naphade Ming-Yu Liu Xiaodong Yang Stan Birchfield and 4 more

Urban traffic optimization using cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale camera dataset consisting of more than 3 hours synchronized HD videos from 40 across 10 intersections, with longest distance between two simultaneous being 2.5 km. To best our knowledge, CityFlow largest-scale in terms spatial coverage and number cameras/videos an urban environment. The contains 200K...

10.1109/cvpr.2019.00900 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Attention in Attention: Modeling Context Correlation for Efficient Video Classification

OPENALEX - Publications

Yanbin Hao Shuo Wang P.P. Cao Xinjian Gao Tong Xu and 2 more

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to utilization perspective contexts. However, current research on attention generally focuses adopting a specific aspect contexts (e.g., channel, spatial/temporal, or global context) refine features and neglects their underlying correlation when computing attentions. This leads incomplete context hence bears weakness limited improvement. To tackle problem, this paper proposes an...

10.1109/tcsvt.2022.3169842 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-04-22

The 6th AI City Challenge

OPENALEX - Publications

Milind Naphade Shuo Wang David C. Anastasiu Zheng Tang Ming-Ching Chang and 12 more

The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at intersection computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), brick mortar retail businesses. four challenge tracks 2022 received participation requests from 254 teams across 27 countries. Track 1 addressed city-scale multi-target multi-camera (MTMC) vehicle tracking. 2 natural-language-based track retrieval. 3 was a brand new...

10.1109/cvprw56347.2022.00378 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Video Corpus Moment Retrieval with Query-specific Context Learning and Progressive Localization

OPENALEX - Publications

Long Zhang Peipei Song Ziyang Duan Shuo Wang Xiaojun Chang and 1 more

10.1109/tcsvt.2025.3530570 article EN IEEE Transactions on Circuits and Systems for Video Technology 2025-01-01

Bi-Directional Distribution Alignment for Transductive Zero-Shot Learning

OPENALEX - Publications

Zhicai Wang Yanbin Hao Tingting Mu Ouxiang Li Shuo Wang and 1 more

Zero-shot learning (ZSL) suffers intensely from the domain shift issue, i.e., mismatch (or misalignment) between true and learned data distributions for classes without training (unseen classes). By additionally unlabelled collected unseen classes, transductive ZSL (TZSL) could reduce but only to a certain extent. To improve TZSL, we propose novel approach Bi-VAEGAN which strengthens distribution alignment visual space an auxiliary space. As result, it can largely shift. The proposed key...

10.1109/cvpr52729.2023.01905 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Equally-Guided Discriminative Hashing for Cross-modal Retrieval

OPENALEX - Publications

Yufeng Shi Xinge You Feng Zheng Shuo Wang Qinmu Peng

Cross-modal hashing intends to project data from two modalities into a common hamming space perform cross-modal retrieval efficiently. Despite satisfactory performance achieved on real applications, existing methods are incapable of effectively preserving semantic structure maintain inter-class relationship and improving discriminability make intra-class samples aggregated simultaneously, which thus limits the higher performance. To handle this problem, we propose Equally-Guided...

10.24963/ijcai.2019/662 article EN 2019-07-28

An Efficient Training Approach for Very Large Scale Face Recognition

OPENALEX - Publications

Kai Wang Shuo Wang Panpan Zhang Zhipeng Zhou Zheng Zhu and 5 more

Face recognition has achieved significant progress in deep learning era due to the ultra-large-scale and well- labeled datasets. However, training on outsize datasets is time-consuming takes up a lot of hardware resource. Therefore, designing an efficient approach in- dispensable. The heavy computational memory costs mainly result from million-level dimensionality fully connected (FC) layer. To this end, we propose novel approach, termed Faster Classification (F <inf...

10.1109/cvpr52688.2022.00405 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

DETRDistill: A Universal Knowledge Distillation Framework for DETR-families

OPENALEX - Publications

Jiahao Chang Shuo Wang Hai-Ming Xu Zehui Chen Chenhongyi Yang and 1 more

Transformer-based detectors (DETRs) are becoming popular for their simple framework, but the large model size and heavy time consumption hinder deployment in real world. While knowledge distillation (KD) can be an appealing technique to compress giant into small ones comparable detection performance low inference cost. Since DETRs formulate object as a set prediction problem, existing KD methods designed classic convolution-based may not directly applicable. In this paper, we propose...

10.1109/iccv51070.2023.00635 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Feature Mixture on Pre-trained Model for Few-shot Learning

OPENALEX - Publications

Shuo Wang Jinda Lu Haiyang Xu Yanbin Hao Xiangnan He

Few-shot learning (FSL) aims at recognizing a novel object under limited training samples. A robust feature extractor (backbone) can significantly improve the recognition performance of FSL model. However, an effective backbone is challenging issue since 1) designing and validating structures backbones are time-consuming expensive processes, 2) trained on known (base) categories more inclined to focus textures objects it learns, which hard describe To solve these problems, we propose mixture...

10.1109/tip.2024.3411452 article EN IEEE Transactions on Image Processing 2024-01-01

Improved Selective Refinement Network for Face Detection

OPENALEX - Publications

Shifeng Zhang Rui Zhu Xiaobo Wang Hailin Shi Tianyu Fu and 3 more

As a long-standing problem in computer vision, face detection has attracted much attention recent decades for its practical applications. With the availability of benchmark WIDER FACE dataset, progresses have been made by various algorithms years. Among them, Selective Refinement Network (SRN) detector introduces two-step classification and regression operations selectively into an anchor-based to reduce false positives improve location accuracy simultaneously. Moreover, it designs receptive...

10.48550/arxiv.1901.06651 preprint EN other-oa arXiv (Cornell University) 2019-01-01

OneBit: Towards Extremely Low-bit Large Language Models

OPENALEX - Publications

Yuzhuang Xu Xu Han Zonghan Yang Shuo Wang Qingfu Zhu and 3 more

Model quantification uses low bit-width values to represent the weight matrices of models, which is a promising approach reduce both storage and computational overheads deploying highly anticipated LLMs. However, existing quantization methods suffer severe performance degradation when extremely reduced, thus focus on utilizing 4-bit or 8-bit quantize models. This paper boldly quantizes LLMs 1-bit, paving way for deployment For this target, we introduce 1-bit quantization-aware training (QAT)...

10.48550/arxiv.2402.11295 preprint EN arXiv (Cornell University) 2024-02-17

The 8th AI City Challenge

OPENALEX - Publications

Shuo Wang David C. Anastasiu Zheng Tang Ming-Ching Chang Yue Yao and 19 more

10.1109/cvprw63382.2024.00722 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024-06-17

Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

OPENALEX - Publications

Shuo Wang Xinhai Zhao Haiming Xu Zehui Chen Dameng Yu and 3 more

Multi-view 3D object detection (MV3D-Det) in Bird-Eye-View (BEV) has drawn extensive attention due to its low cost and high efficiency. Although new algorithms for camera-only have been continuously proposed, most of them may risk drastic performance degradation when the domain input images differs from that training. In this paper, we first analyze causes gap MV3D-Det task. Based on covariate shift assumption, find mainly attributes feature distribution BEV, which is determined by quality...

10.1109/cvpr52729.2023.01281 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Gloss-driven Conditional Diffusion Models for Sign Language Production

OPENALEX - Publications

Shengeng Tang Feng Xue Jingjing Wu Shuo Wang Richang Hong

Sign Language Production (SLP) aims to convert text or audio sentences into sign language videos corresponding their semantics, which is challenging due the diversity and complexity of languages, cross-modal semantic mapping issues. In this work, we propose a Gloss-driven Conditional Diffusion Model (GCDM) for SLP. The core GCDM diffusion model architecture, in gloss sequence encoded by Transformer-based encoder input as prior condition. process pose generation, textual priors carried...

10.1145/3663572 article EN ACM Transactions on Multimedia Computing Communications and Applications 2024-05-03

Detect influential points of feature rankings

OPENALEX - Publications

Shuo Wang Junyan Lu

10.1016/j.compbiolchem.2024.108339 article EN cc-by Computational Biology and Chemistry 2025-01-05

Educational holographic cultural and creative product design for museums

OPENALEX - Publications

Zicheng Liang Jian Wu Shuo Wang Xiaoshuang Ma

10.1117/12.3057999 article EN 2025-02-05

Linguistics-Vision Monotonic Consistent Network for Sign Language Production

OPENALEX - Publications

Xu Wang Shengeng Tang Peipei Song Shuo Wang Dan Guo and 1 more

10.1109/icassp49660.2025.10890594 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Interventional Feature Generation for Few-shot Learning

OPENALEX - Publications

Shuo Wang Jinda Lu Huixia Ben Yanbin Hao Xingyu Gao and 1 more

Few-shot learning (FSL) aims to classify a novel object into specific category under limited training samples. This is challenging task since (1) the features expressed by pre-trained knowledge introduce perceived bias and then constrain classification space, (2) use of general hallucination techniques based on global fails escape resulting in suboptimal improvements. To solve these issues, this paper proposes an interventional feature generation (IFG) method. Specifically, we first...

10.1145/3729171 article EN ACM Transactions on Multimedia Computing Communications and Applications 2025-04-10

mmWave-SAR dataset: large high-resolution heatmap and point cloud dataset for static object detection and other machine-learning applications

OPENALEX - Publications

Shuo Wang Zihan Shan Jingjie He Ananth Grama John Li and 1 more

10.1117/12.3053931 article EN 2025-04-11

Confusion Region Mining for Crowd Counting

OPENALEX - Publications

Jiawen Zhu Wenda Zhao Libo Yao You He Maodi Hu and 4 more

Existing works mainly focus on crowd and ignore the confusion regions which contain extremely similar appearance to in background, while counting needs face these two sides at same time. To address this issue, we propose a novel end-to-end trainable region discriminating erasing network called CDENet. Specifically, CDENet is composed of modules mining module (CRM) guided (GEM). CRM consists basic density estimation (BDE) network, aware bridge network. The BDE first generates primary map,...

10.1109/tnnls.2023.3311020 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-09-15

Spatio-Temporal Collaborative Module for Efficient Action Recognition

OPENALEX - Publications

Yanbin Hao Shuo Wang Yi Tan Xiangnan He Zhenguang Liu and 1 more

Efficient action recognition aims to classify a video clip into specific category with low computational cost. It is challenging since the integrated spatial-temporal calculation (e. g., 3D convolution) introduces intensive operations and increases complexity. This paper explores feasibility of integration channel splitting filter decoupling for efficient architecture design feature refinement by proposing novel spatio-temporal collaborative (STC) module. STC splits channels two groups...

10.1109/tip.2022.3221292 article EN IEEE Transactions on Image Processing 2022-01-01

Learning with Noisy Data for Semi-Supervised 3D Object Detection

OPENALEX - Publications

Zehui Chen Zhenyu Li Shuo Wang Dengpan Fu Feng Zhao

Pseudo-Labeling (PL) is a critical approach in semisupervised 3D object detection (SSOD). In PL, delicately selected pseudo-labels, generated by the teacher model, are provided for student model to supervise framework. However, such paradigm may introduce misclassified labels or loose localized box predictions, resulting sub-optimal solution of performance. this paper, we take PL from noisy learning perspective: instead directly applying vanilla design noise-resistant instance supervision...

10.1109/iccv51070.2023.00638 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Pseudo Content Hallucination for Unpaired Image Captioning

OPENALEX - Publications

Huixia Ben Shuo Wang Meng Wang Richang Hong

Unpaired Image Captioning (UIC) is designed to describe an image without relying on matched vision-language training data. It a challenging task since (1) the implicit and unpaired data nature of limits captioning model's ability represent diverse scene representations, (2) it difficult for model discern intrinsic relationships among objects, potentially leading misinterpretation con- tent. To solve these issues, we propose pseudo content hallucination (PCH) help enlarge perception ob- jects...

10.1145/3652583.3658080 article EN 2024-05-30

Coming Soon ...