- Advanced Image and Video Retrieval Techniques
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Image Retrieval and Classification Techniques
- Video Surveillance and Tracking Methods
- Higher Education and Teaching Methods
- Advanced Vision and Imaging
- Generative Adversarial Networks and Image Synthesis
- Infrared Target Detection Methodologies
- Remote-Sensing Image Classification
- Image Enhancement Techniques
- Anomaly Detection Techniques and Applications
- Education and Work Dynamics
- Hand Gesture Recognition Systems
- Robotics and Sensor-Based Localization
- Face recognition and analysis
- Advanced Image Processing Techniques
- Machine Learning and Data Classification
- Image and Object Detection Techniques
- Advanced Image Fusion Techniques
- Higher Education Learning Practices
- Hospitality and Tourism Education
- Advanced Measurement and Detection Methods
Hubei University of Technology
2024
Beihang University
2017-2024
Wuhan University of Science and Technology
2022
Alibaba Group (United States)
2021
Wilmington University
2020
MSIGHT Technologies (China)
2019-2020
State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing
2018
Wuhan University
2018
Shenzhen Institutes of Advanced Technology
2015-2017
University of Chinese Academy of Sciences
2014-2017
Traditional object detectors employ the dense paradigm of scanning over locations and scales in an image. The recent query-based break this convention by decoding image features with a set learnable queries. However, still suffers from slow convergence, limited performance, design complexity extra networks between backbone decoder. In paper, we find that key to these issues is adaptability decoders for casting queries varying objects. Accordingly, propose fast-converging detector, named...
Convolutional neural networks (CNNs) have made remarkable progress on scene recognition, partially due to these recent large-scale datasets, such as the Places and Places2. Scene categories are often defined by multi-level information, including local objects, global layout, background environment, thus leading large intra-class variations. In addition, with increasing number of categories, label ambiguity has become another crucial issue in classification. This paper focuses recognition...
VGGNets have turned out to be effective for object recognition in still images. However, it is unable yield good performance by directly adapting the VGGNet models trained on ImageNet dataset scene recognition. This report describes our implementation of training large-scale Places205 dataset. Specifically, we train three models, namely VGGNet-11, VGGNet-13, and VGGNet-16, using a Multi-GPU extension Caffe toolbox with high computational efficiency. We verify Places205-VGGNet datasets:...
Convolutional neural networks (CNN) have recently achieved remarkable successes in various image classification and understanding tasks. The deep features obtained at the top fully-connected layer of CNN (FC-features) exhibit rich global semantic information are extremely effective classification. On other hand, convolutional middle layers also contain meaningful local information, but not fully explored for representation. In this paper, we propose a novel Locally-Supervised Deep Hybrid...
Many Large-scale image databases such as ImageNet have significantly advanced classification and other visual recognition tasks. However much of these datasets are constructed only for single-label coarse object-level classification. For real-world applications, multiple labels fine-grained categories often needed, yet very few exist publicly, especially those large-scale high quality. In this work, we contribute to the community a new dataset called iMaterialist Fashion Attribute...
Detection of small target has been an important and challenging task in infrared systems. Most detection algorithms which only use single metric are difficult to separate from clutter completely. The false alarm may be high when there exists complex backgrounds. In this letter, multiple novel features proposed four aspects establish elaborate description. Each feature reflects specific characteristic target. best vector is selected apply these for detection. Then, learning-based classifier...
Weakly-supervised instance segmentation aims to detect and segment object instances precisely, given image-level labels only. Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation Enhancement Networks (referred as Label-PEnet) that progressively transforms pixel-wise in a coarse-to-fine manner. We design four cascaded modules including multi-label classification, detection, refinement segmentation, implemented sequentially by sharing...
MR images (MRIs) accurate segmentation of brain lesions is important for improving cancer diagnosis, surgical planning, and prediction outcome. However, manual from 3D MRIs highly expensive, time-consuming, prone to user biases. We present an efficient yet conceptually simple network (referred as Brain SegNet), which a residual framework automatic voxel-wise lesion. Our model able directly predict dense voxel tumor or ischemic stroke regions in MRIs. The proposed can run at about 0.5s per -...
Most existing 3D CNNs for video representation learning are clip-based methods, and thus do not consider video-level temporal evolution of spatio-temporal features. In this paper, we propose Video-level 4D Convolutional Neural Networks, referred as V4D, to model the long-range with convolutions, at same time, preserve strong residual connections. Specifically, design a new block able capture inter-clip interactions, which could enhance power original clip-level CNNs. The blocks can be easily...
Unsupervised style transfer aims to change the of an input sentence while preserving its original content without using parallel training data. In current dominant approaches, owing lack fine-grained control on influence from target style, they are unable yield desirable output sentences. this paper, we propose a novel attentional sequence-to-sequence (Seq2seq) model that dynamically exploits relevance each word for unsupervised transfer. Specifically, first pretrain classifier, where can be...
Pedestrian detection in infrared images is always a challenging task. Segmentation an important step of pedestrian detection. An accurate segmentation could provide more information for further analysis. In this paper, improved Fuzzy C-Means clustering method, which incorporates geometric symmetry information, proposed segmentation. the introduced by Markov random field theory. Moreover, new metric utilized to handle weak pedestrian. addition, whole procedure extract pedestrians. The...
In this paper, we present a new cross-architecture contrastive learning (CACL) framework for self-supervised video representation learning. CACL consists of 3D CNN and transformer which are used in parallel to generate diverse positive pairs This allows the model learn strong representations from such yet meaningful pairs. Furthermore, introduce temporal module able predict an Edit distance explicitly between two sequences order. enables rich that compensates strongly video-level learned by...
Video summarization is an essential problem in signal processing, which intends to produce a concise summary of the original video. Existing video approaches regard task as keyframe selection and generally construct frame-wise representation by combining long-range temporal dependency with either unimodal or bimodal information. The optimal should offer semantic whole content exploiting multimodal shot-level hierarchical natures videos, however, such are not fully exploited existing methods....
Accurately determining pedestrian location in indoor environments using consumer smartphones is a significant step the development of ubiquitous localization services. Many different map-matching methods have been combined with dead reckoning (PDR) to achieve low-cost and bias-free tracking. However, this works only areas dense map constraints error accumulates open areas. In order reliable without constraints, an improved image-based aided trajectory estimation method proposed paper. The...
Event recognition from still images is one of the most important problems for image understanding. However, compared with object and scene recognition, event has received much less research attention in computer vision community. This paper addresses problem cultural focuses on applying deep learning methods this problem. In particular, we utilize successful architecture Object-Scene Convolutional Neural Networks (OS-CNNs) to perform recognition. OS-CNNs are composed nets nets, which...
This work aims at improving instance retrieval with self-supervision. We find that fine-tuning using the recently developed self-supervised learning (SSL) methods, such as SimCLR and MoCo, fails to improve performance of retrieval. In this work, we identify learnt representations for should be invariant large variations in viewpoint background etc., whereas self-augmented positives applied by current SSL methods can not provide strong enough signals robust instance-level representations. To...
In this work, we propose Knowledge Integration Networks (referred as KINet) for video action recognition. KINet is capable of aggregating meaningful context features which are great importance to identifying an action, such human information and scene context. We design a three-branch architecture consisting main branch recognition, two auxiliary branches parsing recognition allow the model encode knowledge explore pre-trained models teacher networks distill training tasks KINet....
The existing few-shot video classification methods often employ a meta-learning paradigm by designing customized temporal alignment module for similarity calculation. While significant progress has been made, these fail to focus on learning effective representations, and heavily rely the ImageNet pre-training, which might be unreasonable recognition setting due semantics overlap. In this paper, we aim present an in-depth study making three contributions. First, perform consistent comparative...
Weakly-supervised instance segmentation aims to detect and segment object instances precisely, given imagelevel labels only. Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation Enhancement Networks (referred as Label-PEnet) that progressively transform image-level pixel-wise in a coarse-to-fine manner. We design four cascaded modules including multi-label classification, detection, refinement segmentation, implemented sequentially by...
The growing use of infrared (IR) imaging systems places increasing demands for simulating images real scenes. Utilizing captured from unmanned aerial vehicles (UAV), we propose a semi-automatic pipeline to generate large-scale IR urban scenes in the form levels detail (LODs). It significantly reduces cost labor and time while providing detailed structures. Starting surface meshes generated by multi-view stereo (MVS) systems, produce watertight LODs via semantic segmentation structure-aware...