- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Advanced Image and Video Retrieval Techniques
- Multimodal Machine Learning Applications
- Visual Attention and Saliency Detection
- Image Enhancement Techniques
- Generative Adversarial Networks and Image Synthesis
- Video Surveillance and Tracking Methods
- Image Retrieval and Classification Techniques
- Advanced Image Processing Techniques
- Industrial Vision Systems and Defect Detection
- Remote-Sensing Image Classification
- Human Pose and Action Recognition
- Remote Sensing and LiDAR Applications
- Video Analysis and Summarization
- Tactile and Sensory Interactions
- Advanced Image Fusion Techniques
- Advanced Vision and Imaging
- Diabetic Foot Ulcer Assessment and Management
- Machine Learning and ELM
- 3D Shape Modeling and Analysis
- Robotics and Sensor-Based Localization
Tencent (China)
2021-2023
Shandong Institute of Automation
2020-2022
Nanjing Agricultural University
2021
Institute of Automation
2017-2020
Chinese Academy of Sciences
2019-2020
University of Chinese Academy of Sciences
2018-2020
Beijing Academy of Artificial Intelligence
2020
Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...
The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr2. In contrast visual transformers other vision...
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn models. Optimizing convolutional neural network (CNN) for classification tends activate local discriminative regions while ignoring complete extent, causing the partial activation issue. In this paper, we argue that caused by intrinsic characteristics of CNN, where convolution operations produce receptive fields and experience difficulty capture long-range feature...
Weakly supervised object localization (WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for...
Modern object detectors have achieved impressive progress under the close-set setup. However, open-set detection (OSOD) remains challenging since objects of unknown categories are often misclassified to existing known classes. In this work, we propose identify by separating high/low-density regions in latent space, based on consensus that usually distributed low-density regions. As traditional threshold-based methods only maintain limited regions, which cannot cover all objects, present a...
Weakly supervised object localization (WSOL), which trains models using solely image category annotations, remains a challenging problem. Existing approaches based on convolutional neural networks (CNNs) tend to miss full extent while activating discriminative parts. Based our analysis, this is caused by CNN's intrinsic characteristics, experiences difficulty capture semantics at long distances. In article, we introduce the vision transformer WSOL, with aim long-range semantic dependency of...
Accurate building rooftop extraction from high-resolution aerial images is of crucial importance in a wide range applications. Owing to the varying appearance and large-scale scene objects, especially for rooftops different scales heights, single-scale or individual prior-based technique insufficient pursuing efficient, generic, accurate results. The trend toward integrating multiscale several cue techniques appears be best way; thus, such integration focus this paper. We first propose novel...
The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr$^2$. In contrast visual transformers other vision...
Input scale plays an important role in modern detection frameworks, and optimal training for images exists empirically. However, the one usually cannot be reached facing extremely large under memory constraint. In this study, we explore effect inside object pipeline find that feature upsampling with introduction of high-resolution information benefits detection. Compared direct input upscaling, trades a small performance loss amount savings. From these observations, propose self-supervised...
Object detection under imperfect data receives great attention recently. Weakly supervised object (WSOD) suffers from severe localization issues due to the lack of instance-level annotation, while semi-supervised (SSOD) remains challenging led by inter-image discrepancy between labeled and unlabeled data. In this study, we propose Single Instance annotated Detection (SIOD), requiring only one instance annotation for each existing category in an image. Degraded inter-task or discrepancies...
Limited by objectively poor lighting conditions and hardware devices, low-light images with low visual quality visibility are inevitable in the real world. Accurate local details reasonable global information play their essential distinct roles image enhancement: contribute to fine textures, while is critical for a proper understanding of brightness level. In this paper, we focus on integrating aspects achieve high-quality enhancement proposing synchronous multi-scale network (SMNet). A...
With the surge of images in information era, people demand an effective and accurate way to access meaningful visual information. Accordingly, communication has become indispensable. In this article, we propose a content-based approach that automatically generates clear informative summarization based on design principles cognitive psychology represent image collections. We first introduce novel method make representative nonredundant summarizations collections, thereby ensuring data...
We study the problem of weakly supervised grounded image captioning. That is, given an image, goal is to automatically generate a sentence describing context with each noun word corresponding region in image. This task challenging due lack explicit fine-grained alignments as supervision. Previous methods mainly explore various kinds regularization schemes improve attention accuracy. However, their performances are still far from fully ones. One main issue that has been ignored for generating...
Point-based indoor 3D object detection has received increasing attention with the large demand for augmented reality, autonomous driving, and robot technology in industry. However, precision suffers from inputs semantic ambiguity, i.e., shape symmetries, occlusion, texture missing, which would lead that different objects appearing similar viewpoints then confusing model. Typical point-based detectors relieve this problem via learning proposal representations both geometric information, while...
Multi-person pose estimation (MPPE) has achieved impressive progress in recent years. However, due to the large variance of appearances among images or occlusions, model can hardly learn consistent patterns enough, which leads severe location jitter and missing issues. In this study, we propose a novel framework, termed Inter-image Contrastive consistency (ICON), strengthen keypoint for MPPE. Concretely, consider two-fold constraints, include single contrastive (SKCC) pair relation (PRCC)....
By exploring the localizable representations in deep CNN, weakly supervised object localization (WSOL) methods could determine position of each image just trained by classification task. However, partial activation problem caused discriminant function makes network unable to locate objects accurately. To alleviate this problem, we propose Structure-Preserved Attention Activated Network (SPA2Net), a simple and effective one-stage WSOL framework explore ability structure preservation features....
Weakly supervised object localization(WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for WSOL....
Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...
Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn models. Optimizing convolutional neural network (CNN) for classification tends activate local discriminative regions while ignoring complete extent, causing the partial activation issue. In this paper, we argue that caused by intrinsic characteristics of CNN, where convolution operations produce receptive fields and experience difficulty capture long-range feature...
Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream tasks. Directly fine/prompt tuning closed-set classification tasks will inevitably suffer from data bias and always learn more or less target class-irrelevant cooccurring contextual information, which leads over-confident predictions unknown samples. In...
With the ubiquity of digital cameras and growth social media population, people share upload millions photos per day. To effectively manage or explore a series shots different scenes, often hope to pick few representative examples with various contents, in order fastly have global view whole set. Thus, it is important considering evaluate diversity an image