- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Multimodal Machine Learning Applications
- Video Analysis and Summarization
- Human Pose and Action Recognition
- Advanced Vision and Imaging
- Topic Modeling
- Image Retrieval and Classification Techniques
- Advanced Image Processing Techniques
- Domain Adaptation and Few-Shot Learning
- Recommender Systems and Techniques
- Anomaly Detection Techniques and Applications
- Advanced Neural Network Applications
- Advanced Graph Neural Networks
- Image Processing Techniques and Applications
- Data Management and Algorithms
- Face recognition and analysis
- Image and Signal Denoising Methods
- Gait Recognition and Analysis
- Reinforcement Learning in Robotics
- Automated Road and Building Extraction
- Natural Language Processing Techniques
- Robotics and Sensor-Based Localization
- Advanced Algorithms and Applications
- Privacy-Preserving Technologies in Data
University of Electronic Science and Technology of China
2016-2025
Nanjing Drum Tower Hospital
2025
Xihua University
2012-2024
Shanghai University of Electric Power
2010-2024
Children's Hospital of Zhejiang University
2022-2024
South China Agricultural University
2024
Shenzhen Institute of Information Technology
2024
China Agricultural University
2021-2024
Huawei Technologies (China)
2021-2024
China Southern Power Grid (China)
2024
Fine-grained categorization, which aims to distinguish subordinate-level categories such as bird species or dog breeds, is an extremely challenging task. This due two main issues: how localize discriminative regions for recognition and learn sophisticated features representation. Neither of them easy handle if there insufficient labeled data. We leverage the fact that a object already has other labels in its ontology tree. These "free" can be used train series CNN-based classifiers, each...
Anomaly detection has a wide range of applications in security area such as network monitoring and smart city/campus construction. It become an active research issue great concern recent years. However, most algorithms the existing studies are powerless for large-scale high-dimensional data, intermediate data extracted by some methods that can handle will consume lots storage space. In this paper, we propose novel sparse representation framework learns dictionaries based on latent space...
With the massive growth of social events in Internet , it has become more and difficult to exactly find organize interesting from media data, which is useful browse, search, monitor by users or governments . To deal with this problem, we propose a novel multi-modal event tracking evolution framework not only effectively capture topics events, but also obtain evolutionary trends generate effective summary details over time. achieve goal, topic model (mmETM), can documents, including long text...
Hashing has shown its efficiency and effectiveness in facilitating large-scale multimedia applications. Supervised knowledge (\emph{e.g.}, semantic labels or pair-wise relationship) associated to data is capable of significantly improving the quality hash codes functions. However, confronted with rapid growth newly-emerging concepts on Web, existing supervised hashing approaches may easily suffer from scarcity validity information due expensive cost manual labelling. In this paper, we...
Recently, remote sensing images have become increasingly popular in a number of tasks, such as environmental monitoring. However, the observed from satellite sensors often suffer low-resolution (LR), making it difficult to meet requirements for further analysis. Super-resolution (SR) aims increase image resolution while providing finer spatial details, which perfectly remedies weakness images. Therefore, this article, we propose an innovative mixed high-order attention network (MHAN) SR. It...
While the recent tree-based neural models have demonstrated promising results in generating solution expression for math word problem (MWP), most of these do not capture relationships and order information among quantities well. This poor quantity representations incorrect expressions. In this paper, we propose Graph2Tree, a novel deep learning architecture that combines merits graph-based encoder decoder to generate better Included our Graph2Tree framework are two graphs, namely Quantity...
The goal of cross-view image matching based on geo-localization is to determine the location a given ground-view (front view) by it with group satellite-view images (vertical geographic tags. Due rapid development unmanned aerial vehicle (UAV) technology in recent years, has provided real viewpoint close 45 degrees (oblique bridge visual gap between views. However, existing methods ignore direct geometric space correspondence UAV-satellite views, and only use brute force for feature...
This paper studies the context aggregation problem in semantic image segmentation. The existing researches focus on improving pixel representations by aggregating contextual information within individual images. Though impressive, these methods neglect significance of pixels corresponding class beyond input image. To address this, this proposes to mine images further augment representations. We first set up a feature memory module, which is updated dynamically during training, store...
Math word problem (MWP) solving faces a dilemma in number representation learning. In order to avoid the issue and reduce search space of feasible solutions, existing works striving for MWP usually replace real numbers with symbolic placeholders focus on logic reasoning. However, different from common reasoning tasks like program synthesis knowledge graph reasoning, has extra requirements numerical other words, instead value itself, it is reusable property that matters more Therefore, we...
3D object detection in autonomous driving systems perceives the surrounding environment and is foundation for driving. Due to sparsity inherent point clouds scenarios, LiDAR-based often fails distinguish distant objects effectively. Addressing issue of cloud will enhance range scenarios. Pseudo have been used ability deep learning models detect points. However, this approach has several shortcomings. In paper, a curbed fake collector (CFPC), which addresses three issues caused by pseudo...
We investigated the effects of increasing soil NaCl concentration on intracellular compartmentalization salt and activities antioxidant enzymes (superoxide dismutase (SOD), ascorbic peroxidase (APX), catalase (CAT) glutathione reductase (GR)) their role in regulation reactive oxygen species (ROS; O2− H2O2) leaves xylem sap salt-tolerant Populus euphratica Oliv. salt-sensitive P. popularis cv. 35-44. Mesophyll cells exhibited a high capacity for exclusion vacuoles compared with popularis. In...
Human gesture recognition is one of the central research fields computer vision, and effective still challenging up to now. In this paper, we present a pyramidal 3D convolutional network framework for large-scale isolated human recognition. networks are utilized learn spatiotemporal features from video files. Pyramid input proposed preserve multi-scale contextual information gestures, each pyramid segment uniformly sampled with temporal jitter. fusion layers inserted into fuse input. This...
With the advance of various location-acquisition technologies, a myriad GPS trajectories can be collected every day. However, raw coordinate data captured by sensors often cannot reflect real positions due to many physical constraints and some rules law. How accurately match roads on digital map is an important issue. The problem map-matching fundamental for applications. Unfortunately, existing methods still meet stringent performance requirements in engineering. In particular, low/unstable...
The current research focus on Content-Based Video Retrieval requires higher-level video representation describing the long-range semantic dependencies of relevant incidents, events, etc. However, existing methods commonly process frames a as individual images or short clips, making modeling difficult. In this paper, we propose TCA (Temporal Context Aggregation for Retrieval), learning framework that incorporates longrange temporal information between frame-level features using self-attention...
This paper studies the problem of learning self-supervised representations on videos. In contrast to image modality that only requires appearance information objects or scenes, video needs further explore relations between multiple frames/clips along temporal dimension. However, recent proposed contrastive-based frameworks do not grasp such explicitly since they simply utilize two augmented clips from same and compare their distance without referring relation. To address this, we present a...
Fully mining visual cues to aid in content understanding is crucial for video captioning. However, most state-of-the-art captioning methods are limited generating captions purely based on straightforward information while ignoring the scenario and context information. To fill gap, we propose a novel, simple but effective scenario-aware recurrent transformer (SART) model execute Our contains “scenario understanding” module obtain global perspective across multiple frames, providing specific...