Shuo Wang

ORCID: 0000-0002-4881-9344
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Video Surveillance and Tracking Methods
  • Multimodal Machine Learning Applications
  • Human Pose and Action Recognition
  • Advanced Image and Video Retrieval Techniques
  • Autonomous Vehicle Technology and Safety
  • Face and Expression Recognition
  • Hearing Impairment and Communication
  • Advanced Vision and Imaging
  • Hand Gesture Recognition Systems
  • Face recognition and analysis
  • Anomaly Detection Techniques and Applications
  • Topic Modeling
  • Neural Networks and Applications
  • Brain Tumor Detection and Classification
  • Image Processing Techniques and Applications
  • Digital Media and Philosophy
  • Fuzzy Logic and Control Systems
  • Video Analysis and Summarization
  • Fire Detection and Safety Systems
  • Robotics and Automated Systems
  • Reinforcement Learning in Robotics
  • Adversarial Robustness in Machine Learning
  • Educational Research and Pedagogy

University of Science and Technology of China
2019-2025

University Medical Center Freiburg
2025

University of Freiburg
2025

Heidelberg University
2025

University Hospital Heidelberg
2025

Nvidia (United States)
2022-2024

Beijing Institute of Graphic Communication
2020-2024

Yantai University
2023

Trier University of Applied Sciences
2023

Meizu (China)
2023

Urban traffic optimization using cameras as sensors is driving the need to advance state-of-the-art multi-target multi-camera (MTMC) tracking. This work introduces CityFlow, a city-scale camera dataset consisting of more than 3 hours synchronized HD videos from 40 across 10 intersections, with longest distance between two simultaneous being 2.5 km. To best our knowledge, CityFlow largest-scale in terms spatial coverage and number cameras/videos an urban environment. The contains 200K...

10.1109/cvpr.2019.00900 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to utilization perspective contexts. However, current research on attention generally focuses adopting a specific aspect contexts (e.g., channel, spatial/temporal, or global context) refine features and neglects their underlying correlation when computing attentions. This leads incomplete context hence bears weakness limited improvement. To tackle problem, this paper proposes an...

10.1109/tcsvt.2022.3169842 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-04-22

The 6th edition of the AI City Challenge specifically focuses on problems in two domains where there is tremendous unlocked potential at intersection computer vision and artificial intelligence: Intelligent Traffic Systems (ITS), brick mortar retail businesses. four challenge tracks 2022 received participation requests from 254 teams across 27 countries. Track 1 addressed city-scale multi-target multi-camera (MTMC) vehicle tracking. 2 natural-language-based track retrieval. 3 was a brand new...

10.1109/cvprw56347.2022.00378 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2022-06-01

Zero-shot learning (ZSL) suffers intensely from the domain shift issue, i.e., mismatch (or misalignment) between true and learned data distributions for classes without training (unseen classes). By additionally unlabelled collected unseen classes, transductive ZSL (TZSL) could reduce but only to a certain extent. To improve TZSL, we propose novel approach Bi-VAEGAN which strengthens distribution alignment visual space an auxiliary space. As result, it can largely shift. The proposed key...

10.1109/cvpr52729.2023.01905 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Cross-modal hashing intends to project data from two modalities into a common hamming space perform cross-modal retrieval efficiently. Despite satisfactory performance achieved on real applications, existing methods are incapable of effectively preserving semantic structure maintain inter-class relationship and improving discriminability make intra-class samples aggregated simultaneously, which thus limits the higher performance. To handle this problem, we propose Equally-Guided...

10.24963/ijcai.2019/662 article EN 2019-07-28

Face recognition has achieved significant progress in deep learning era due to the ultra-large-scale and well- labeled datasets. However, training on outsize datasets is time-consuming takes up a lot of hardware resource. Therefore, designing an efficient approach in- dispensable. The heavy computational memory costs mainly result from million-level dimensionality fully connected (FC) layer. To this end, we propose novel approach, termed Faster Classification (F <inf...

10.1109/cvpr52688.2022.00405 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Transformer-based detectors (DETRs) are becoming popular for their simple framework, but the large model size and heavy time consumption hinder deployment in real world. While knowledge distillation (KD) can be an appealing technique to compress giant into small ones comparable detection performance low inference cost. Since DETRs formulate object as a set prediction problem, existing KD methods designed classic convolution-based may not directly applicable. In this paper, we propose...

10.1109/iccv51070.2023.00635 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Few-shot learning (FSL) aims at recognizing a novel object under limited training samples. A robust feature extractor (backbone) can significantly improve the recognition performance of FSL model. However, an effective backbone is challenging issue since 1) designing and validating structures backbones are time-consuming expensive processes, 2) trained on known (base) categories more inclined to focus textures objects it learns, which hard describe To solve these problems, we propose mixture...

10.1109/tip.2024.3411452 article EN IEEE Transactions on Image Processing 2024-01-01

As a long-standing problem in computer vision, face detection has attracted much attention recent decades for its practical applications. With the availability of benchmark WIDER FACE dataset, progresses have been made by various algorithms years. Among them, Selective Refinement Network (SRN) detector introduces two-step classification and regression operations selectively into an anchor-based to reduce false positives improve location accuracy simultaneously. Moreover, it designs receptive...

10.48550/arxiv.1901.06651 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Model quantification uses low bit-width values to represent the weight matrices of models, which is a promising approach reduce both storage and computational overheads deploying highly anticipated LLMs. However, existing quantization methods suffer severe performance degradation when extremely reduced, thus focus on utilizing 4-bit or 8-bit quantize models. This paper boldly quantizes LLMs 1-bit, paving way for deployment For this target, we introduce 1-bit quantization-aware training (QAT)...

10.48550/arxiv.2402.11295 preprint EN arXiv (Cornell University) 2024-02-17

Multi-view 3D object detection (MV3D-Det) in Bird-Eye-View (BEV) has drawn extensive attention due to its low cost and high efficiency. Although new algorithms for camera-only have been continuously proposed, most of them may risk drastic performance degradation when the domain input images differs from that training. In this paper, we first analyze causes gap MV3D-Det task. Based on covariate shift assumption, find mainly attributes feature distribution BEV, which is determined by quality...

10.1109/cvpr52729.2023.01281 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Sign Language Production (SLP) aims to convert text or audio sentences into sign language videos corresponding their semantics, which is challenging due the diversity and complexity of languages, cross-modal semantic mapping issues. In this work, we propose a Gloss-driven Conditional Diffusion Model (GCDM) for SLP. The core GCDM diffusion model architecture, in gloss sequence encoded by Transformer-based encoder input as prior condition. process pose generation, textual priors carried...

10.1145/3663572 article EN ACM Transactions on Multimedia Computing Communications and Applications 2024-05-03

10.1016/j.compbiolchem.2024.108339 article EN cc-by Computational Biology and Chemistry 2025-01-05

10.1109/icassp49660.2025.10890594 article EN ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025-03-12

Few-shot learning (FSL) aims to classify a novel object into specific category under limited training samples. This is challenging task since (1) the features expressed by pre-trained knowledge introduce perceived bias and then constrain classification space, (2) use of general hallucination techniques based on global fails escape resulting in suboptimal improvements. To solve these issues, this paper proposes an interventional feature generation (IFG) method. Specifically, we first...

10.1145/3729171 article EN ACM Transactions on Multimedia Computing Communications and Applications 2025-04-10

Existing works mainly focus on crowd and ignore the confusion regions which contain extremely similar appearance to in background, while counting needs face these two sides at same time. To address this issue, we propose a novel end-to-end trainable region discriminating erasing network called CDENet. Specifically, CDENet is composed of modules mining module (CRM) guided (GEM). CRM consists basic density estimation (BDE) network, aware bridge network. The BDE first generates primary map,...

10.1109/tnnls.2023.3311020 article EN IEEE Transactions on Neural Networks and Learning Systems 2023-09-15

Efficient action recognition aims to classify a video clip into specific category with low computational cost. It is challenging since the integrated spatial-temporal calculation (e. g., 3D convolution) introduces intensive operations and increases complexity. This paper explores feasibility of integration channel splitting filter decoupling for efficient architecture design feature refinement by proposing novel spatio-temporal collaborative (STC) module. STC splits channels two groups...

10.1109/tip.2022.3221292 article EN IEEE Transactions on Image Processing 2022-01-01

Pseudo-Labeling (PL) is a critical approach in semisupervised 3D object detection (SSOD). In PL, delicately selected pseudo-labels, generated by the teacher model, are provided for student model to supervise framework. However, such paradigm may introduce misclassified labels or loose localized box predictions, resulting sub-optimal solution of performance. this paper, we take PL from noisy learning perspective: instead directly applying vanilla design noise-resistant instance supervision...

10.1109/iccv51070.2023.00638 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Unpaired Image Captioning (UIC) is designed to describe an image without relying on matched vision-language training data. It a challenging task since (1) the implicit and unpaired data nature of limits captioning model's ability represent diverse scene representations, (2) it difficult for model discern intrinsic relationships among objects, potentially leading misinterpretation con- tent. To solve these issues, we propose pseudo content hallucination (PCH) help enlarge perception ob- jects...

10.1145/3652583.3658080 article EN 2024-05-30
Coming Soon ...