Xingjia Pan

ORCID: 0000-0003-3699-8936
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image and Video Retrieval Techniques
  • Multimodal Machine Learning Applications
  • Visual Attention and Saliency Detection
  • Image Enhancement Techniques
  • Generative Adversarial Networks and Image Synthesis
  • Video Surveillance and Tracking Methods
  • Image Retrieval and Classification Techniques
  • Advanced Image Processing Techniques
  • Industrial Vision Systems and Defect Detection
  • Remote-Sensing Image Classification
  • Human Pose and Action Recognition
  • Remote Sensing and LiDAR Applications
  • Video Analysis and Summarization
  • Tactile and Sensory Interactions
  • Advanced Image Fusion Techniques
  • Advanced Vision and Imaging
  • Diabetic Foot Ulcer Assessment and Management
  • Machine Learning and ELM
  • 3D Shape Modeling and Analysis
  • Robotics and Sensor-Based Localization

Tencent (China)
2021-2023

Shandong Institute of Automation
2020-2022

Nanjing Agricultural University
2021

Institute of Automation
2017-2020

Chinese Academy of Sciences
2019-2020

University of Chinese Academy of Sciences
2018-2020

Beijing Academy of Artificial Intelligence
2020

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.1109/cvpr42600.2020.01122 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr2. In contrast visual transformers other vision...

10.1109/cvpr52688.2022.01104 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn models. Optimizing convolutional neural network (CNN) for classification tends activate local discriminative regions while ignoring complete extent, causing the partial activation issue. In this paper, we argue that caused by intrinsic characteristics of CNN, where convolution operations produce receptive fields and experience difficulty capture long-range feature...

10.1109/iccv48922.2021.00288 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Weakly supervised object localization (WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for...

10.1109/cvpr46437.2021.01147 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Modern object detectors have achieved impressive progress under the close-set setup. However, open-set detection (OSOD) remains challenging since objects of unknown categories are often misclassified to existing known classes. In this work, we propose identify by separating high/low-density regions in latent space, based on consensus that usually distributed low-density regions. As traditional threshold-based methods only maintain limited regions, which cannot cover all objects, present a...

10.1109/cvpr52688.2022.00937 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Weakly supervised object localization (WSOL), which trains models using solely image category annotations, remains a challenging problem. Existing approaches based on convolutional neural networks (CNNs) tend to miss full extent while activating discriminative parts. Based our analysis, this is caused by CNN's intrinsic characteristics, experiences difficulty capture semantics at long distances. In article, we introduce the vision transformer WSOL, with aim long-range semantic dependency of...

10.1109/tnnls.2022.3218471 article EN IEEE Transactions on Neural Networks and Learning Systems 2022-11-24

Accurate building rooftop extraction from high-resolution aerial images is of crucial importance in a wide range applications. Owing to the varying appearance and large-scale scene objects, especially for rooftops different scales heights, single-scale or individual prior-based technique insufficient pursuing efficient, generic, accurate results. The trend toward integrating multiscale several cue techniques appears be best way; thus, such integration focus this paper. We first propose novel...

10.1109/tgrs.2018.2850972 article EN IEEE Transactions on Geoscience and Remote Sensing 2018-07-26

The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr$^2$. In contrast visual transformers other vision...

10.48550/arxiv.2105.14576 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Input scale plays an important role in modern detection frameworks, and optimal training for images exists empirically. However, the one usually cannot be reached facing extremely large under memory constraint. In this study, we explore effect inside object pipeline find that feature upsampling with introduction of high-resolution information benefits detection. Compared direct input upscaling, trades a small performance loss amount savings. From these observations, propose self-supervised...

10.1109/tip.2020.2993403 article EN IEEE Transactions on Image Processing 2020-01-01

Object detection under imperfect data receives great attention recently. Weakly supervised object (WSOD) suffers from severe localization issues due to the lack of instance-level annotation, while semi-supervised (SSOD) remains challenging led by inter-image discrepancy between labeled and unlabeled data. In this study, we propose Single Instance annotated Detection (SIOD), requiring only one instance annotation for each existing category in an image. Degraded inter-task or discrepancies...

10.1109/cvpr52688.2022.01380 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Limited by objectively poor lighting conditions and hardware devices, low-light images with low visual quality visibility are inevitable in the real world. Accurate local details reasonable global information play their essential distinct roles image enhancement: contribute to fine textures, while is critical for a proper understanding of brightness level. In this paper, we focus on integrating aspects achieve high-quality enhancement proposing synchronous multi-scale network (SMNet). A...

10.1109/tmm.2023.3254141 article EN IEEE Transactions on Multimedia 2023-01-01

With the surge of images in information era, people demand an effective and accurate way to access meaningful visual information. Accordingly, communication has become indispensable. In this article, we propose a content-based approach that automatically generates clear informative summarization based on design principles cognitive psychology represent image collections. We first introduce novel method make representative nonredundant summarizations collections, thereby ensuring data...

10.1109/tvcg.2019.2948611 article EN IEEE Transactions on Visualization and Computer Graphics 2019-10-23

We study the problem of weakly supervised grounded image captioning. That is, given an image, goal is to automatically generate a sentence describing context with each noun word corresponding region in image. This task challenging due lack explicit fine-grained alignments as supervision. Previous methods mainly explore various kinds regularization schemes improve attention accuracy. However, their performances are still far from fully ones. One main issue that has been ignored for generating...

10.1145/3474085.3475354 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Point-based indoor 3D object detection has received increasing attention with the large demand for augmented reality, autonomous driving, and robot technology in industry. However, precision suffers from inputs semantic ambiguity, i.e., shape symmetries, occlusion, texture missing, which would lead that different objects appearing similar viewpoints then confusing model. Typical point-based detectors relieve this problem via learning proposal representations both geometric information, while...

10.1109/tcsvt.2023.3271318 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-04-28

Multi-person pose estimation (MPPE) has achieved impressive progress in recent years. However, due to the large variance of appearances among images or occlusions, model can hardly learn consistent patterns enough, which leads severe location jitter and missing issues. In this study, we propose a novel framework, termed Inter-image Contrastive consistency (ICON), strengthen keypoint for MPPE. Concretely, consider two-fold constraints, include single contrastive (SKCC) pair relation (PRCC)....

10.1609/aaai.v37i3.25410 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

By exploring the localizable representations in deep CNN, weakly supervised object localization (WSOL) methods could determine position of each image just trained by classification task. However, partial activation problem caused discriminant function makes network unable to locate objects accurately. To alleviate this problem, we propose Structure-Preserved Attention Activated Network (SPA2Net), a simple and effective one-stage WSOL framework explore ability structure preservation features....

10.1109/tip.2023.3323793 article EN IEEE Transactions on Image Processing 2023-01-01

Weakly supervised object localization(WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for WSOL....

10.48550/arxiv.2103.04523 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.48550/arxiv.2005.09973 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn models. Optimizing convolutional neural network (CNN) for classification tends activate local discriminative regions while ignoring complete extent, causing the partial activation issue. In this paper, we argue that caused by intrinsic characteristics of CNN, where convolution operations produce receptive fields and experience difficulty capture long-range feature...

10.48550/arxiv.2103.14862 preprint EN other-oa arXiv (Cornell University) 2021-01-01

Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream tasks. Directly fine/prompt tuning closed-set classification tasks will inevitably suffer from data bias and always learn more or less target class-irrelevant cooccurring contextual information, which leads over-confident predictions unknown samples. In...

10.1109/tmm.2023.3339387 article EN IEEE Transactions on Multimedia 2023-12-05

With the ubiquity of digital cameras and growth social media population, people share upload millions photos per day. To effectively manage or explore a series shots different scenes, often hope to pick few representative examples with various contents, in order fastly have global view whole set. Thus, it is important considering evaluate diversity an image

10.1145/3145690.3145700 article EN 2017-11-20
Coming Soon ...