- Video Surveillance and Tracking Methods
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Automated Road and Building Extraction
- 3D Shape Modeling and Analysis
- Remote-Sensing Image Classification
- Robotics and Sensor-Based Localization
- Video Analysis and Summarization
- Domain Adaptation and Few-Shot Learning
- Human Mobility and Location-Based Analysis
- Second Language Learning and Teaching
- Electric Vehicles and Infrastructure
- Electric and Hybrid Vehicle Technologies
- Advanced Battery Technologies Research
Northwestern Polytechnical University
2023-2024
Newcastle University
2022-2024
Tianjin University
2023
University of Warwick
2022-2023
Dense captioning provides detailed captions of complex visual scenes. While a number successes have been achieved in recent years, there are still two broad limitations: 1) most existing methods adopt an encoder-decoder framework, where the contextual information is sequentially encoded using long short-term memory (LSTM). However, forget gate mechanism LSTM makes it vulnerable when dealing with sequence and 2) vast majority prior arts consider regions interests (RoIs) equally important,...
Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...
Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual scenes. While promising results have been obtained, several issues persist. In particular: 1) it is hard to find the optimal parameters artificially designed modules (e.g., non-maximum suppression (NMS)) causing redundancies and fewer interactions benefit two sub-tasks RoI detection captioning; 2) absence a multi-scale decoder in current methods hinders acquisition scale-invariant features, thus...
Fast stereo based 3D object detectors have made great progress recently. However, they suffer from the inferior accuracy. We argue that main reason is due to poor geometry-aware feature representation in space. To solve this problem, we propose an efficient geometry network (ESGN). The key our ESGN generation (EGFG) module. Our EGFG module first uses a correlation and reprojection construct multi-scale volumes camera frustum space, second employs bird's eye view (BEV) projection fusion...
Abstract Unsupervised 2D image-based 3D model retrieval aims at retrieving images from the gallery of models by given images. Despite encouraging progress made in this task, there are still two significant limitations: (1) feature alignment and is difficult due to huge gap between modalities. (2) The important view information was ignored prior arts, which led inaccurate results. To alleviate these limitations, inspired success vision transformers (ViT) a great variety tasks, paper, we...
Multispectral pedestrian detection is of great importance in various around-the-clock applications, i.e., self-driving and video surveillance. Fusing the features from RGB images thermal infrared (TIR) to explore complementary information between different modalities one most effective manners improve multispectral performance. However, misalignment spatial dimension modality reliability would introduce harmful during feature fusion, limiting performance detection. To address above issues,...
Visual Question Answering (VQA) is a task that involves predicting an answer to question depending on the content of image. However, recent VQA methods have relied more language priors between and rather than image content. To address this issue, many debiasing been proposed reduce bias in model reasoning. can be divided into two categories: good bad bias. Good benefit prediction, while may associate models with unrelated information. Therefore, instead excluding indiscriminately existing...
Electric short takeoff and landing (eSTOL) aircraft utilize the slipstream generated by distributed propellers to significantly increase effective lift coefficient reduce distances. By utilizing blown lift, eSTOL UAVs can achieve similar site requirements as electric vertical (eVTOL) UAVs, while having lower energy consumption thrust requirements. This research proposes a high-peak-power propulsion (DEP) system model overload design method for further improve power of system. The considers...
Multispectral pedestrian detection has achieved great success in past years, which can be used autonomous driving for intelligent transportation system. Most existing multispectral approaches are developed on the assumption that training and test data belong to an identical distribution, does not guarantee a good generalization cross-domain (unseen) data. In this paper, we aim develop generalizable detector, achieves favorable performance both intra-dataset evaluation cross-dataset...
Weakly supervised person search aims to perform joint pedestrian detection and re-identification (re-id) with only bounding-box annotations. Recently, the idea of contrastive learning is initially applied weakly search, where two common contrast strategies are memory-based intra-image contrast. We argue that current shallow, which suffers from spatial-level occlusion-level variance. In this paper, we present a novel deep using Siamese network. Two key modules spatial-invariant (SIC)...
Pedestrian Attribute Recognition (PAR) is a challenging task in intelligent video surveillance. Two key challenges PAR include complex alignment relations between images and attributes, imbalanced data distribution. Existing approaches usually formulate as recognition task. Different from them, this paper addresses it decision-making via reinforcement learning framework, which dubbed Rein-PAR. Specifically, formulated Markov decision process (MDP) to efficiently explore semantic alignments...