Zhuang Shao

ORCID: 0000-0001-7824-0985
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Surveillance and Tracking Methods
  • Advanced Image and Video Retrieval Techniques
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Advanced Neural Network Applications
  • Automated Road and Building Extraction
  • 3D Shape Modeling and Analysis
  • Remote-Sensing Image Classification
  • Robotics and Sensor-Based Localization
  • Video Analysis and Summarization
  • Domain Adaptation and Few-Shot Learning
  • Human Mobility and Location-Based Analysis
  • Second Language Learning and Teaching
  • Electric Vehicles and Infrastructure
  • Electric and Hybrid Vehicle Technologies
  • Advanced Battery Technologies Research

Northwestern Polytechnical University
2023-2024

Newcastle University
2022-2024

Tianjin University
2023

University of Warwick
2022-2023

Dense captioning provides detailed captions of complex visual scenes. While a number successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, forget gate mechanism LSTM makes it vulnerable when dealing with sequence and 2) vast majority prior arts consider regions interests (RoIs) equally important,...

10.1109/tnnls.2022.3152990 article EN publisher-specific-oa IEEE Transactions on Neural Networks and Learning Systems 2022-03-11

Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...

10.1109/tmm.2023.3241517 article EN IEEE Transactions on Multimedia 2023-01-01

Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual scenes. While promising results have been obtained, several issues persist. In particular: 1) it is hard to find the optimal parameters artificially designed modules (e.g., non-maximum suppression (NMS)) causing redundancies and fewer interactions benefit two sub-tasks RoI detection captioning; 2) absence a multi-scale decoder in current methods hinders acquisition scale-invariant features, thus...

10.1109/tmm.2024.3369863 article EN IEEE Transactions on Multimedia 2024-01-01

Fast stereo based 3D object detectors have made great progress recently. However, they suffer from the inferior accuracy. We argue that main reason is due to poor geometry-aware feature representation in space. To solve this problem, we propose an efficient geometry network (ESGN). The key our ESGN generation (EGFG) module. Our EGFG module first uses a correlation and reprojection construct multi-scale volumes camera frustum space, second employs bird's eye view (BEV) projection fusion...

10.1109/tcsvt.2022.3202810 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-08-29

Abstract Unsupervised 2D image-based 3D model retrieval aims at retrieving images from the gallery of models by given images. Despite encouraging progress made in this task, there are still two significant limitations: (1) feature alignment and is difficult due to huge gap between modalities. (2) The important view information was ignored prior arts, which led inaccurate results. To alleviate these limitations, inspired success vision transformers (ViT) a great variety tasks, paper, we...

10.1007/s00530-023-01166-y article EN cc-by Multimedia Systems 2023-08-24

Multispectral pedestrian detection is of great importance in various around-the-clock applications, i.e., self-driving and video surveillance. Fusing the features from RGB images thermal infrared (TIR) to explore complementary information between different modalities one most effective manners improve multispectral performance. However, misalignment spatial dimension modality reliability would introduce harmful during feature fusion, limiting performance detection. To address above issues,...

10.1145/3581783.3613444 article EN 2023-10-26

Visual Question Answering (VQA) is a task that involves predicting an answer to question depending on the content of image. However, recent VQA methods have relied more language priors between and rather than image content. To address this issue, many debiasing been proposed reduce bias in model reasoning. can be divided into two categories: good bad bias. Good benefit prediction, while may associate models with unrelated information. Therefore, instead excluding indiscriminately existing...

10.1145/3616399 article EN ACM Transactions on the Web 2023-08-28

Electric short takeoff and landing (eSTOL) aircraft utilize the slipstream generated by distributed propellers to significantly increase effective lift coefficient reduce distances. By utilizing blown lift, eSTOL UAVs can achieve similar site requirements as electric vertical (eVTOL) UAVs, while having lower energy consumption thrust requirements. This research proposes a high-peak-power propulsion (DEP) system model overload design method for further improve power of system. The considers...

10.3390/drones8120761 article EN cc-by Drones 2024-12-16

Multispectral pedestrian detection has achieved great success in past years, which can be used autonomous driving for intelligent transportation system. Most existing multispectral approaches are developed on the assumption that training and test data belong to an identical distribution, does not guarantee a good generalization cross-domain (unseen) data. In this paper, we aim develop generalizable detector, achieves favorable performance both intra-dataset evaluation cross-dataset...

10.1109/tits.2023.3330155 article EN IEEE Transactions on Intelligent Transportation Systems 2023-11-21

Weakly supervised person search aims to perform joint pedestrian detection and re-identification (re-id) with only bounding-box annotations. Recently, the idea of contrastive learning is initially applied weakly search, where two common contrast strategies are memory-based intra-image contrast. We argue that current shallow, which suffers from spatial-level occlusion-level variance. In this paper, we present a novel deep using Siamese network. Two key modules spatial-invariant (SIC)...

10.48550/arxiv.2302.04607 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Pedestrian Attribute Recognition (PAR) is a challenging task in intelligent video surveillance. Two key challenges PAR include complex alignment relations between images and attributes, imbalanced data distribution. Existing approaches usually formulate as recognition task. Different from them, this paper addresses it decision-making via reinforcement learning framework, which dubbed Rein-PAR. Specifically, formulated Markov decision process (MDP) to efficiently explore semantic alignments...

10.2139/ssrn.4130856 article EN SSRN Electronic Journal 2022-01-01
Coming Soon ...