NFDI4DS | UHH-SEMS - Publication Details

Zhuang Shao

ORCID: 0000-0001-7824-0985

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5077948412

Research Areas

Video Surveillance and Tracking Methods
Advanced Image and Video Retrieval Techniques
Human Pose and Action Recognition
Multimodal Machine Learning Applications
Advanced Neural Network Applications
Automated Road and Building Extraction
3D Shape Modeling and Analysis
Remote-Sensing Image Classification
Robotics and Sensor-Based Localization
Video Analysis and Summarization
Domain Adaptation and Few-Shot Learning
Human Mobility and Location-Based Analysis
Second Language Learning and Teaching
Electric Vehicles and Infrastructure
Electric and Hybrid Vehicle Technologies
Advanced Battery Technologies Research

Northwestern Polytechnical University
2023-2024

Newcastle University
2022-2024

Tianjin University
2023

University of Warwick
2022-2023

Region-Object Relation-Aware Dense Captioning via Transformer

OPENALEX - Publications

Zhuang Shao Jungong Han Demetris Marnerides Kurt Debattista

Dense captioning provides detailed captions of complex visual scenes. While a number successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, forget gate mechanism LSTM makes it vulnerable when dealing with sequence and 2) vast majority prior arts consider regions interests (RoIs) equally important,...

10.1109/tnnls.2022.3152990 article EN publisher-specific-oa IEEE Transactions on Neural Networks and Learning Systems 2022-03-11

Textual Context-Aware Dense Captioning With Diverse Words

OPENALEX - Publications

Zhuang Shao Jungong Han Kurt Debattista Yanwei Pang

Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...

10.1109/tmm.2023.3241517 article EN IEEE Transactions on Multimedia 2023-01-01

DCMSTRD: End-to-end Dense Captioning via Multi-Scale Transformer Decoding

OPENALEX - Publications

Zhuang Shao Jungong Han Kurt Debattista Yanwei Pang

Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual scenes. While promising results have been obtained, several issues persist. In particular: 1) it is hard to find the optimal parameters artificially designed modules (e.g., non-maximum suppression (NMS)) causing redundancies and fewer interactions benefit two sub-tasks RoI detection captioning; 2) absence a multi-scale decoder in current methods hinders acquisition scale-invariant features, thus...

10.1109/tmm.2024.3369863 article EN IEEE Transactions on Multimedia 2024-01-01

Deep intra-image contrastive learning for weakly supervised one-step person search

OPENALEX - Publications

Jiabei Wang Yanwei Pang Jiale Cao Hanqing Sun Zhuang Shao and 1 more

10.1016/j.patcog.2023.110047 article EN Pattern Recognition 2023-10-14

ESGN: Efficient Stereo Geometry Network for Fast 3D Object Detection

OPENALEX - Publications

Aqi Gao Yanwei Pang Jing Nie Zhuang Shao Jiale Cao and 2 more

Fast stereo based 3D object detectors have made great progress recently. However, they suffer from the inferior accuracy. We argue that main reason is due to poor geometry-aware feature representation in space. To solve this problem, we propose an efficient geometry network (ESGN). The key our ESGN generation (EGFG) module. Our EGFG module first uses a correlation and reprojection construct multi-scale volumes camera frustum space, second employs bird's eye view (BEV) projection fusion...

10.1109/tcsvt.2022.3202810 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-08-29

View-target relation-guided unsupervised 2D image-based 3D model retrieval via transformer

OPENALEX - Publications

Jiacheng Chang Lanyong Zhang Zhuang Shao

Abstract Unsupervised 2D image-based 3D model retrieval aims at retrieving images from the gallery of models by given images. Despite encouraging progress made in this task, there are still two significant limitations: (1) feature alignment and is difficult due to huge gap between modalities. (2) The important view information was ignored prior arts, which led inaccurate results. To alleviate these limitations, inspired success vision transformers (ViT) a great variety tasks, paper, we...

10.1007/s00530-023-01166-y article EN cc-by Multimedia Systems 2023-08-24

Attentive Alignment Network for Multispectral Pedestrian Detection

OPENALEX - Publications

Nuo Chen Jin Xie Jing Nie Jiale Cao Zhuang Shao and 1 more

Multispectral pedestrian detection is of great importance in various around-the-clock applications, i.e., self-driving and video surveillance. Fusing the features from RGB images thermal infrared (TIR) to explore complementary information between different modalities one most effective manners improve multispectral performance. However, misalignment spatial dimension modality reliability would introduce harmful during feature fusion, limiting performance detection. To address above issues,...

10.1145/3581783.3613444 article EN 2023-10-26

Adaptive semantic transfer network for unsupervised 2D image-based 3D model retrieval

OPENALEX - Publications

Dan Song Yuanxiang Yang Wenhui Li Zhuang Shao Weizhi Nie and 2 more

10.1016/j.cviu.2023.103858 article EN Computer Vision and Image Understanding 2023-10-11

Multi-stage reasoning on introspecting and revising bias for visual question answering

OPENALEX - Publications

An-An Liu Zimu Lu Ning Xu Min Liu Chenggang Yan and 5 more

Visual Question Answering (VQA) is a task that involves predicting an answer to question depending on the content of image. However, recent VQA methods have relied more language priors between and rather than image content. To address this issue, many debiasing been proposed reduce bias in model reasoning. can be divided into two categories: good bad bias. Good benefit prediction, while may associate models with unrelated information. Therefore, instead excluding indiscriminately existing...

10.1145/3616399 article EN ACM Transactions on the Web 2023-08-28

Tutorial: Large Language-Vision Model in Society

OPENALEX - Publications

Kaicheng Yu Zhuang Shao Siyuan Qi Dongfang Liu

10.1145/3664647.3689175 article EN 2024-10-26

Design and Modeling of a High-Peak-Power Distributed Electric Propulsion System for a Super-STOL UAV

OPENALEX - Publications

Jia Zong Zhou Zhou Jixin Zhu Zhuang Shao S. Sun

Electric short takeoff and landing (eSTOL) aircraft utilize the slipstream generated by distributed propellers to significantly increase effective lift coefficient reduce distances. By utilizing blown lift, eSTOL UAVs can achieve similar site requirements as electric vertical (eVTOL) UAVs, while having lower energy consumption thrust requirements. This research proposes a high-peak-power propulsion (DEP) system model overload design method for further improve power of system. The considers...

10.3390/drones8120761 article EN cc-by Drones 2024-12-16

Toward Generalizable Multispectral Pedestrian Detection

OPENALEX - Publications

Fuchen Chu Jiale Cao Zhanjie Song Zhuang Shao Yanwei Pang and 1 more

Multispectral pedestrian detection has achieved great success in past years, which can be used autonomous driving for intelligent transportation system. Most existing multispectral approaches are developed on the assumption that training and test data belong to an identical distribution, does not guarantee a good generalization cross-domain (unseen) data. In this paper, we aim develop generalizable detector, achieves favorable performance both intra-dataset evaluation cross-dataset...

10.1109/tits.2023.3330155 article EN IEEE Transactions on Intelligent Transportation Systems 2023-11-21

Deep Intra-Image Contrastive Learning for Weakly Supervised One-Step Person Search

OPENALEX - Publications

Jiabei Wang Yanwei Pang Jiale Cao Hanqing Sun Zhuang Shao and 1 more

Weakly supervised person search aims to perform joint pedestrian detection and re-identification (re-id) with only bounding-box annotations. Recently, the idea of contrastive learning is initially applied weakly search, where two common contrast strategies are memory-based intra-image contrast. We argue that current shallow, which suffers from spatial-level occlusion-level variance. In this paper, we present a novel deep using Siamese network. Two key modules spatial-invariant (SIC)...

10.48550/arxiv.2302.04607 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Reinforced Pedestrian Attribute Recognition with Group Optimization

OPENALEX - Publications

Zhong Ji Zhenfei Hu Yaodong Wang Zhuang Shao Yanwei Pang

Pedestrian Attribute Recognition (PAR) is a challenging task in intelligent video surveillance. Two key challenges PAR include complex alignment relations between images and attributes, imbalanced data distribution. Existing approaches usually formulate as recognition task. Different from them, this paper addresses it decision-making via reinforcement learning framework, which dubbed Rein-PAR. Specifically, formulated Markov decision process (MDP) to efficiently explore semantic alignments...

10.2139/ssrn.4130856 article EN SSRN Electronic Journal 2022-01-01

Coming Soon ...