- Advanced Image and Video Retrieval Techniques
- Image Retrieval and Classification Techniques
- Multimodal Machine Learning Applications
- Advanced Neural Network Applications
- Image Processing Techniques and Applications
- Visual Attention and Saliency Detection
- Industrial Vision Systems and Defect Detection
- Domain Adaptation and Few-Shot Learning
- Electrocatalysts for Energy Conversion
- Brain Tumor Detection and Classification
- Advanced Optical Sensing Technologies
- Robotics and Sensor-Based Localization
- Video Analysis and Summarization
- CCD and CMOS Imaging Sensors
- Remote-Sensing Image Classification
- Infrared Target Detection Methodologies
- Non-Destructive Testing Techniques
- Olfactory and Sensory Function Studies
- Hydrogen Storage and Materials
- Ammonia Synthesis and Nitrogen Reduction
- Human Motion and Animation
- Anomaly Detection Techniques and Applications
- Medical Image Segmentation Techniques
- Advancements in Photolithography Techniques
- Text and Document Classification Technologies
Horizon Robotics (China)
2023
Chinese Academy of Sciences
2014-2022
Shandong Institute of Automation
2014-2022
Institute of Automation
2014-2020
Beijing University of Technology
2018-2020
Beijing Academy of Artificial Intelligence
2020
University of Chinese Academy of Sciences
2020
Wuhan University
2007
Cross-modal retrieval emphasizes understanding inter-modality semantic correlations, which is often achieved by designing a similarity function. Generally, one of the most important things considered function how to make cross-modal computable. In this paper, deep and bidirectional representation learning model proposed address issue image-text retrieval. Owing solid progress in computer vision natural language processing, it reliable extract representations from both raw image text data...
Face detection, as a fundamental technology for various applications, is always deployed on edge devices which have limited memory storage and low computing power. This paper introduces Light Fast Detector (LFFD) devices. The proposed method anchor-free belongs to the one-stage category. Specifically, we rethink importance of receptive field (RF) effective (ERF) in background face detection. Essentially, RFs neurons certain layer are distributed regularly input image theses natural...
Cross-modal retrieval extends the ability of search engines to deal with massive cross-modal data. The goal image-text is images (texts) by using text (image) queries computing similarities and texts directly. Many existing methods rely on low-level visual features textual for retrieval, ignoring characteristics in raw data different modalities. In this paper, a novel model based modality-specific feature learning proposed. Considering modalities, uses two types convolutional neural networks...
Bird's Eye View (BEV) semantic segmentation is a critical task in autonomous driving. However, existing Transformer-based methods confront difficulties transforming Perspective (PV) to BEV due their unidirectional and posterior interaction mechanisms. To address this issue, we propose novel Bi-directional Early Interaction Transformers framework named BAEFormer, consisting of (i) an early-interaction PV-BEV pipeline (ii) bi-directional cross-attention mechanism. Moreover, find that the image...
The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on Internet. A new approach been raised which intends match features different modalities directly. In this research, there are two critical issues: how get rid heterogeneity between and cross-modal dimensions. Recently metric learning methods show a good capability distance explore relationship points. However, traditional algorithms only focus single-modal features,...
In this article, an anomaly detection method based on background reconstruction is proposed to perform defect inspection the texture surface of industrial products. This consists two modules: 1) autoencoder integrated with a generative adversarial network utilized reconstruct textured original image as defect-free reference. Specifically, extra anomalous images are introduced and mapping given improve stability reconstruction. 2) A U-net trained pixel-wise analysis differences between...
In this paper, we focus on the issue of large scale image annotation, whereas most existing methods are devised for small datasets. A novel model based deep representation learning and tag embedding is proposed. Specifically, proposed learns an unified latent space visual features embeddings simultaneously. Furthermore, a metric matrix introduced to estimate relevance scores between images tags. Finally, objective function modeling triplet relationships (irrelevant tag, image, relevant tag)...
The cross-media retrieval problem has received much attention in recent years due to the rapid increasing of multimedia data on Internet. A new approach been raised which intends match features different modalities directly. In this research, there are two critical issues: how get rid heterogeneity between and cross-modal dimensions. Recently metric learning methods show a good capability distance explore relationship points. However, traditional algorithms only focus single-modal features,...
Open-set object detection (OSOD) is highly desirable for robotic manipulation in unstructured environments. However, existing OSOD methods often fail to meet the requirements of applications due their high computational burden and complex deployment. To address this issue, paper proposes a light-weight framework called Decoupled (DOSOD), which practical efficient solution support real-time tasks systems. Specifically, DOSOD builds upon YOLO-World pipeline by integrating vision-language model...
With the rapid development of social networking, user requirement suffers more and from intention gap interest semantic multimedia. It becomes urgent to investigate personalized recommendation. In this paper, we propose modular manifold ranking (MMR) for image MMR attempts construct global over CNN based features extraction involve content relations in Specifically, modularity is introduced perform a flexible learning large scale database manner decomposition. employed propagate users'...
In the MSR-Bing Image Retrieval Challenge, contestants are required to design a system that can score query-image pairs based on relevance between queries and images. To address this problem, we propose regression cross modal deep learning model Gaussian Process scoring model. The takes image features query as inputs respectively outputs scores directly. regards challenge ranking problem utilizes click (or pseudo click) information from both training set development predict scores. proposed...
Saliency estimation becomes a hot research topic due to its wide and successful application in almost all vision related problems. However, it is still far from satisfactory saliency techniques the complex visual content various requirements. In this paper, we propose manifold ranking based kernel propagation (MRKP) approach for estimation. MRKP begins work on background seeds four image boundaries individually select representative salient seeds. Pairwise constraints of must-link...
The goal of image annotation is to automatically assign meaningful and content-related labels the digital images by using machines. It beneficial search sharing in social networks. Various methods for are proposed last decade they have gained much progress. However, most them not precise fast enough real-world applications. In this paper, we propose a novel method via learning image-label interrelation. main idea predict linearly propagating label information through interrelation...
The Vision Challenge Track 1 for Data-Effificient Defect Detection requires competitors to instance segment 14 industrial inspection datasets in a data-defificient setting. This report introduces the technical details of team Aoi-overfifitting-Team this challenge. Our method focuses on key problem segmentation quality defect masks scenarios with limited training samples. Based Hybrid Task Cascade (HTC) algorithm, we connect transformer backbone (Swin-B) through composite connections inspired...
Anchor-free detectors basically formulate object detection as dense classification and regression. For popular anchor-free detectors, it is common to introduce an individual prediction branch estimate the quality of localization. The following inconsistencies are observed when we delve into practices estimation. Firstly, for some adjacent samples which assigned completely different labels, trained model would produce similar scores. This violates training objective leads performance...