- Advanced Image and Video Retrieval Techniques
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Multimodal Machine Learning Applications
- Video Surveillance and Tracking Methods
- Face and Expression Recognition
- Image Retrieval and Classification Techniques
- Image Enhancement Techniques
- Anomaly Detection Techniques and Applications
- Human Pose and Action Recognition
- Advanced MRI Techniques and Applications
- Advanced Vision and Imaging
- Advanced Image Processing Techniques
- Video Analysis and Summarization
- Visual Attention and Saliency Detection
- Remote-Sensing Image Classification
- Medical Imaging Techniques and Applications
- COVID-19 diagnosis using AI
- Advanced Image Fusion Techniques
- Advanced Memory and Neural Computing
- Image Processing Techniques and Applications
- Machine Learning and ELM
- Gait Recognition and Analysis
- Industrial Vision Systems and Defect Detection
- EEG and Brain-Computer Interfaces
Tianjin University
2016-2025
Beijing Academy of Artificial Intelligence
2023-2024
Shanghai Artificial Intelligence Laboratory
2023-2024
University of Warwick
2023
Shanghai Center for Brain Science and Brain-Inspired Technology
2022-2023
Inception Institute of Artificial Intelligence
2019
Nokia (China)
2012
Birkbeck, University of London
2008
University of Science and Technology of China
2003-2006
Institute of Automation
2006
Images captured under water are usually degraded due to the effects of absorption and scattering. Degraded underwater images show some limitations when they used for display analysis. For example, with low contrast color cast decrease accuracy rate object detection marine biology recognition. To overcome those limitations, a systematic image enhancement method, which includes an dehazing algorithm algorithm, is proposed. Built on minimum information loss principle, effective proposed restore...
This paper addresses the problem of supervised video summarization by formulating it as a sequence-to-sequence learning problem, where input is sequence original frames, and output keyshot sequence. Our key idea to learn deep network with attention mechanism mimic way selecting keyshots human. To this end, we propose novel framework named attentive encoder-decoder networks for (AVS), in which encoder uses bidirectional long short-term memory (BiLSTM) encode contextual information among...
Travel route planning is an important step for a tourist to prepare his/her trip. As common scenario, usually asks the following questions when he/she trip in unfamiliar place: 1) Are there any travel suggestions one-day or three-day Beijing? 2) What most popular path within Forbidden City? To facilitate tourist's planning, this paper, we target at solving problem of automatic planning. We propose leverage existing clues recovered from 20 million geo-tagged photos collected www.panoramio.com...
Convolutional Neural Network (CNN) based methods generally take crowd counting as a regression task by outputting densities. They learn the mapping between image contents and density distributions. Though having achieved promising results, these data-driven networks are prone to overestimate or underestimate people counts of regions with different patterns, which degrades whole count accuracy. To overcome this problem, we propose an approach alleviate performance differences in regions....
Pedestrian detection relying on deep convolution neural networks has made significant progress. Though promising results have been achieved standard pedestrians, the performance heavily occluded pedestrians remains far from satisfactory. The main culprits are intra-class occlusions involving other and inter-class caused by objects, such as cars bicycles. These in a multitude of occlusion patterns. We propose an approach for pedestrian with following contributions. First, we introduce novel...
Network in network (NiN) is an effective instance and important extension of deep convolutional neural consisting alternating layers pooling layers. Instead using a linear filter for convolution, NiN utilizes shallow multilayer perceptron (MLP), nonlinear function, to replace the filter. Because powerfulness MLP convolutions spatial domain, has stronger ability feature representation hence results better recognition performance. However, itself consists fully connected that give rise large...
We propose a novel two-stage detection method, D2Det, that collectively addresses both precise localization and accurate classification. For localization, we introduce dense local regression predicts multiple box offsets for an object proposal. Different from traditional keypoint-based employed in detectors, our is not limited to quantized set of keypoints within fixed region has the ability regress position-sensitive real number offsets, leading more localization. The further improved by...
The cerebellum plays a vital role in motor learning and control with supervised capability, while neuromorphic engineering devises diverse approaches to high-performance computation inspired by biological neural systems. This article presents large-scale cerebellar network model for learning, as well cerebellum-inspired architecture map the anatomical structure into model. Our multinucleus its underpinning contain approximately 3.5 million neurons, upscaling state-of-the-art designs over 34...
Pedestrian detection is an important but challenging problem in computer vision, especially human-centric tasks. Over the past decade, significant improvement has been witnessed with help of handcrafted features and deep features. Here we present a comprehensive survey on recent advances pedestrian detection. First, provide detailed review single-spectral that includes based methods approaches. For methods, extensive approaches find large freedom degrees shape space have better performance....
Dense captioning generates more detailed spoken descriptions for complex visual scenes. Despite several promising leads, existing methods still have two broad limitations: 1) The vast majority of prior arts only consider contextual clues during but ignore potentially important textual context; 2) current imbalanced learning mechanisms limit the diversity vocabulary learned from dictionary, thus giving rise to low language-learning efficiency. To alleviate these gaps, in this paper, we...
Dense captioning creates diverse Region of Interests (RoIs) descriptions for complex visual scenes. While promising results have been obtained, several issues persist. In particular: 1) it is hard to find the optimal parameters artificially designed modules (e.g., non-maximum suppression (NMS)) causing redundancies and fewer interactions benefit two sub-tasks RoI detection captioning; 2) absence a multi-scale decoder in current methods hinders acquisition scale-invariant features, thus...
As a fundamental and challenging task in bridging language vision domains, Image-Text Retrieval (ITR) aims at searching for the target instances that are semantically relevant to given query from other modality, its key challenge is measure semantic similarity across different modalities. Although significant progress has been achieved, existing approaches typically suffer two major limitations: (1) It hurts accuracy of representation by directly exploiting bottom-up attention based...
Tensor analysis plays an important role in modern image and vision computing problems. Most of the existing tensor approaches are based on Frobenius norm, which makes them sensitive to outliers. In this paper, we propose L1-norm-based (TPCA-L1), is robust Experimental results upon face other datasets demonstrate advantages proposed approach.
With the prosperity of tourism and Web 2.0 technologies, more people have willingness to share their travel experiences on (e.g., weblogs, forums, or communities). These so-called travelogues contain rich information, particularly including location-representative knowledge such as attractions Golden Gate Bridge), styles beach, history), activities diving, surfing). The information in can greatly facilitate other tourists' trip planning, if it be correctly extracted summarized. However,...
Restoring underwater image from a single is know to be ill-posed, and some assumptions made in previous methods are not suitable for many situations. In this paper, we propose method based on blue-green channels dehazing red channel correction restoration. Firstly, recovered via algorithm an extension modification of Dark Channel Prior algorithm. Then, corrected following the Gray-World assumption theory. Finally, order resolve problem which regions may look too dim or bright, adaptive...
This work proposes to combine neural networks with the compositional hierarchy of human bodies for efficient and complete parsing. We formulate approach as a information fusion framework. Our model assembles from three inference processes over hierarchy: direct (directly predicting each part body using image information), bottom-up (assembling knowledge constituent parts), top-down (leveraging context parent nodes). The inferences explicitly decompositional relations in bodies, respectively....
Human-object interaction detection is an important and relatively new class of visual relationship tasks, essential for deeper scene understanding. Most existing approaches decompose the problem into object localization recognition. Despite showing progress, these only rely on appearances humans objects overlook available context information, crucial capturing subtle interactions between them. We propose a contextual attention framework human-object detection. Our approach leverages by...
Aggregating multi-level features is essential for capturing multi-scale context information precise scene semantic segmentation. However, the improvement by directly fusing shallow and deep becomes limited as gap between them increases. To solve this problem, we explore two strategies robust feature fusion. One enhancing using a enhancement module (SeEM) to alleviate features. The other strategy attention, which involves discovering complementary (i.e., boundary information) from low-level...