- Advanced Image and Video Retrieval Techniques
- Video Surveillance and Tracking Methods
- Advanced Neural Network Applications
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Face recognition and analysis
- Video Analysis and Summarization
- Human Pose and Action Recognition
- Handwritten Text Recognition Techniques
- Image Processing and 3D Reconstruction
- Adversarial Robustness in Machine Learning
- Generative Adversarial Networks and Image Synthesis
- Gait Recognition and Analysis
- Digital Media Forensic Detection
- Robotics and Sensor-Based Localization
- Optical measurement and interference techniques
- Automated Road and Building Extraction
- Machine Learning and Data Classification
- Physical Unclonable Functions (PUFs) and Hardware Security
- Brain Tumor Detection and Classification
- AI in cancer detection
- Anomaly Detection Techniques and Applications
- Advanced Vision and Imaging
- Advanced Malware Detection Techniques
Chinese Academy of Sciences
2018-2025
Institute of Information Engineering
2018-2025
University of Chinese Academy of Sciences
2017-2019
Hashing has shown great potential in large-scale image retrieval due to its storage and computation efficiency, especially the recent deep supervised hashing methods. To achieve promising performance, methods require a large amount of training data from different classes. However, when images new categories emerge, existing have retrain CNN model generate hash codes for all database again, which is impractical system. In this paper, we propose novel framework, called Deep Incremental Network...
Deep hashing methods have achieved tremendous success in cross-modal retrieval, due to its low storage consumption and fast retrieval speed. In real applications, it's hard obtain label information. Recently, increasing attention has been paid unsupervised hashing. However, existing fail exploit the intrinsic connections between images their corresponding descriptions or tags (text modality). this paper, we propose a novel Semantic-Alignment Hashing (DSAH) for which sufficiently utilizes...
One of the most challenging tasks in large-scale multi-label image retrieval is to map images into binary codes while preserving multilevel semantic similarity. Recently, several deep supervised hashing methods have been proposed learn hash functions that preserve similarity with convolutional neural networks. However, these triplet label based try ranking order according their degrees queries not putting direct constraints on distance between very similar images. Besides, current evaluation...
Real-world data often follows a long-tailed distribution, which makes the performance of existing classification algorithms degrade heavily. A key issue is that samples in tail categories fail to depict their intra-class diversity. Humans can imagine sample new poses, scenes and view angles with prior knowledge even if it first time see this category. Inspired by this, we propose novel reasoning-based implicit semantic augmentation method borrow transformation directions from other classes....
Large-scale data from the real-world usually follow a long-tailed distribution (i.e., few majority classes occupy plentiful training data, while most minority have samples), making hyperplanes heavily skewed to classes. Traditionally, reweighting is adopted make fairly split feature space, where weights are designed according number of samples. However, we find that samples in class can not accurately measure size its spanned especially for class, space larger than samples' because high...
Video-text retrieval is an emerging stream in both computer vision and natural language processing communities, which aims to find relevant videos given text queries. In this paper, we study the notoriously challenging task, i.e., Unsupervised Domain Adaptation Retrieval (UDAVR), wherein training testing data come from different distributions. Previous works merely alleviate domain shift, however overlook pairwise misalignment issue target domain, there exist no semantic relationships...
Existing video self-supervised learning methods mainly rely on trimmed videos for model training. They apply their and verify the effectiveness datasets including UCF101 Kinetics-400, among others. However, are manually annotated from untrimmed videos. In this sense, these not truly unsupervised. article, we propose a novel method, referred to as Exploring Relations in Untrimmed Videos (ERUV), which can be straightforwardly applied (real unlabeled) learn spatio-temporal features. ERUV first...
Privacy information existing in the scene text will be leaked with spread of images cyberspace. Vanishing from image is a simple yet effective method to prevent privacy disclosure machine and human. Previous visual vanishing methods have achieved promising results but performance still fell short expectations for complicated-shape texts various scales. In this paper, we propose novel hierarchical context-aware interaction reconstruction make vanish natural image. To avoid interference...
Hashing has become increasingly important for large-scale image retrieval. Recently, deep supervised hashing shown promising performance, yet little work been done under the more realistic unsupervised setting. The most challenging problem in methods is lack of information. Besides, existing fail to distinguish pairs with different similarity degrees, which leads a suboptimal construction matrix. In this paper, we propose simple effective method, dubbed Deep Unsupervised Hybrid-similarity...
Due to the large success in object detection and instance segmentation, Mask R-CNN attracts great attention is widely adopted as a strong baseline for arbitrary-shaped scene text spotting. However, two issues remain be settled. The first dense case, which easy neglected but quite practical. There may exist multiple instances one proposal, makes it difficult mask head distinguish different degrades performance. In this work, we argue that performance degradation results from learning...
As a basic component in multimedia applications, object detectors are generally trained on fixed set of classes that pre-defined. However, new often emerge after the models practice. Modern based Convolutional Neural Networks (CNN) suffer from catastrophic forgetting when fine-tuning without original training data. Therefore, it is critical to improve incremental learning capability detection. In this article, we propose novel Residual-Distillation-based Incremental method Object Detection...
Accurate detection of multi-oriented text that accounts for a large proportion in real practice is great significance. The performance has improved rapidly on common benchmarks recent years. However, dense long case and the quality are easy to be overlooked. Direct regression may produce low-quality incomplete detections due constrain receptive field; proposal-based methods could alleviate this but might introduce redundant context RoI operation, degrading performance. To address dilemma,...
The research focus of scene text detection and recognition has shifted to arbitrary shape in recent years, where the representation is a fundamental problem. An ideal should be compact, complete, efficient, reusable for subsequent our opinion. However, previous representations have flaws one or more aspects. Thin-Plate-Spline (TPS) transformation achieved great success recognition. Inspired by this, we reversely think its usage sophisticatedly take TPS as an exquisite representation....
Hashing has been drawing increasing attention in the task of large-scale image retrieval owing to its storage and computation efficiency, especially recent asymmetric deep hashing methods. These approaches treat query database an way can take full advantage whole training data. Though it achieved state-of-the-art performance, methods still suffer from large quantization error efficiency problem on datasets due tight coupling between database. In this article, we propose a novel method,...
The recent advance has shown that few-shot learning may be a promising way to alleviate the data reliance of remote sensing image scene classification. However, most existing works focus on extracting distinguishable features only from visual modality, while problem knowledge multiple modalities barely been visited. In this work, we propose text-aware framework for classification (TeAw). Specifically, TeAw converts class names more detailed text descriptions and extracts using pre-trained...
Deep incremental hashing methods require a large number of original training samples to preserve old knowledge. However, the are not always available. This "data-free" setting poses great challenges for learning discriminative codes new classes (plasticity) and maintaining code invariance ones (stability). On one hand, presence ambiguous data in new-emerging classes, which is highly similar that further aggravates catastrophic forgetting. other although well-separated hash can be learned by...
Incremental learning for person re-identification (ReID) aims to develop models that can be trained with a continuous data stream, which is more practical setting real-world applications. However, the existing incremental ReID methods make two strong assumptions cameras are fixed and new-emerging class-disjoint from previous classes. This unrealistic as previously observed pedestrians may re-appear captured again by new cameras. In this paper, we investigate in an unexplored scenario named...
Neural Radiance Fields (NeRFs) have demonstrated impressive performance in vision and graphics tasks, such as novel view synthesis immersive reality. However, the shape-radiance ambiguity of radiance fields remains a challenge, especially sparse viewpoints setting. Recent work resorts to integrating depth priors into outdoor NeRF training alleviate issue. criteria for selecting relative merits different not been thoroughly investigated. Moreover, approaches use is also an unexplored problem....
Modern object detection methods based on convolutional neural network suffer from severe catastrophic forgetting in learning new classes without original data. Due to time consumption, storage burden and privacy of old data, it is inadvisable train the model scratch with both data when emerge after trained. In this paper, we propose a novel incremental detector Faster R-CNN continuously learn using It triple where an residual as assistants for helping previous learned knowledge. To better...
Benefiting from recent advances in deep learning, hashing methods have achieved promising performance large-scale image retrieval. To improve storage and computational efficiency, existing hash codes need to be compressed accordingly. However, previous retrain their models then regenerate the whole database using new when code length changes, which is time consuming especially for large databases. In this paper, we propose a novel method, called Code Compression oriented Deep Hashing (CCDH),...
Hashing has become increasingly important for large-scale image retrieval, of which the low storage cost and fast searching are two key properties. However, existing methods adopt large neural networks, hard to be deployed in resource-limited devices due unacceptable memory runtime overhead. We address that this huge overhead networks somewhatviolates appealing properties hashing. In paper, we propose a novel deep hashing method, called Binary Neural Network (BNNH) retrieval. Specifically,...
Deep hashing methods have achieved promising results for large-scale image retrieval recently. To accelerate the subsequent Hamming ranking process, multi-index approach has been proposed to reduce computations distance. However, binary codes output by previous deep may not be optimally compatible with approach. In this paper, we present a novel Index-Compatible Hashing (DICH) method fast retrieval, which can learn similarity-preserving that are more With learned codes, both size of...