NFDI4DS | UHH-SEMS - Publication Details

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

OPENALEX - Publications

Xingjia Pan Yuqiang Ren Kekai Sheng Weiming Dong Haolei Yuan and 3 more

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.1109/cvpr42600.2020.01122 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

StyTr2: Image Style Transfer with Transformers

OPENALEX - Publications

Yingying Deng Fan Tang Weiming Dong Chongyang Ma Xingjia Pan and 2 more

The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr2. In contrast visual transformers other vision...

10.1109/cvpr52688.2022.01104 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

OPENALEX - Publications

Wei Gao Fang Wan Xingjia Pan Zhiliang Peng Qi Tian and 3 more

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn models. Optimizing convolutional neural network (CNN) for classification tends activate local discriminative regions while ignoring complete extent, causing the partial activation issue. In this paper, we argue that caused by intrinsic characteristics of CNN, where convolution operations produce receptive fields and experience difficulty capture long-range feature...

10.1109/iccv48922.2021.00288 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization

OPENALEX - Publications

Xingjia Pan Yingguo Gao Zhiwen Lin Fan Tang Weiming Dong and 3 more

Weakly supervised object localization (WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for...

10.1109/cvpr46437.2021.01147 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Expanding Low-Density Latent Regions for Open-Set Object Detection

OPENALEX - Publications

Jiaming Han Yuqiang Ren Jian Ding Xingjia Pan Ke Yan and 1 more

Modern object detectors have achieved impressive progress under the close-set setup. However, open-set detection (OSOD) remains challenging since objects of unknown categories are often misclassified to existing known classes. In this work, we propose identify by separating high/low-density regions in latent space, based on consensus that usually distributed low-density regions. As traditional threshold-based methods only maintain limited regions, which cannot cover all objects, present a...

10.1109/cvpr52688.2022.00937 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

OPENALEX - Publications

Yuan Yao Fang Wan Wei Gao Xingjia Pan Zhiliang Peng and 2 more

Weakly supervised object localization (WSOL), which trains models using solely image category annotations, remains a challenging problem. Existing approaches based on convolutional neural networks (CNNs) tend to miss full extent while activating discriminative parts. Based our analysis, this is caused by CNN's intrinsic characteristics, experiences difficulty capture semantics at long distances. In article, we introduce the vision transformer WSOL, with aim long-range semantic dependency of...

10.1109/tnnls.2022.3218471 article EN IEEE Transactions on Neural Networks and Learning Systems 2022-11-24

Automatic Building Rooftop Extraction From Aerial Images via Hierarchical RGB-D Priors

OPENALEX - Publications

Shibiao Xu Xingjia Pan Er Li Baoyuan Wu Shuhui Bu and 3 more

Accurate building rooftop extraction from high-resolution aerial images is of crucial importance in a wide range applications. Owing to the varying appearance and large-scale scene objects, especially for rooftops different scales heights, single-scale or individual prior-based technique insufficient pursuing efficient, generic, accurate results. The trend toward integrating multiscale several cue techniques appears be best way; thus, such integration focus this paper. We first propose novel...

10.1109/tgrs.2018.2850972 article EN IEEE Transactions on Geoscience and Remote Sensing 2018-07-26

StyTr$^2$: Image Style Transfer with Transformers

OPENALEX - Publications

Yingying Deng Fan Tang Weiming Dong Chongyang Ma Xingjia Pan and 2 more

The goal of image style transfer is to render an with artistic features guided by a reference while maintaining the original content. Owing locality in convolutional neural networks (CNNs), extracting and global information input images difficult. Therefore, traditional methods face biased content representation. To address this critical issue, we take long-range dependencies into account for proposing transformer-based approach called StyTr$^2$. In contrast visual transformers other vision...

10.48550/arxiv.2105.14576 preprint EN cc-by arXiv (Cornell University) 2021-01-01

A Comparative Study of CNN- and Transformer-Based Visual Style Transfer

OPENALEX - Publications

Huapeng Wei Yingying Deng Fan Tang Xingjia Pan Weiming Dong

10.1007/s11390-022-2140-7 article EN Journal of Computer Science and Technology 2022-05-31

Self-Supervised Feature Augmentation for Large Image Object Detection

OPENALEX - Publications

Xingjia Pan Fan Tang Weiming Dong Yang Gu Zhichao Song and 4 more

Input scale plays an important role in modern detection frameworks, and optimal training for images exists empirically. However, the one usually cannot be reached facing extremely large under memory constraint. In this study, we explore effect inside object pipeline find that feature upsampling with introduction of high-resolution information benefits detection. Compared direct input upscaling, trades a small performance loss amount savings. From these observations, propose self-supervised...

10.1109/tip.2020.2993403 article EN IEEE Transactions on Image Processing 2020-01-01

SIOD: Single Instance Annotated Per Category Per Image for Object Detection

OPENALEX - Publications

Hanjun Li Xingjia Pan Ke Yan Fan Tang Wei‐Shi Zheng

Object detection under imperfect data receives great attention recently. Weakly supervised object (WSOD) suffers from severe localization issues due to the lack of instance-level annotation, while semi-supervised (SSOD) remains challenging led by inter-image discrepancy between labeled and unlabeled data. In this study, we propose Single Instance annotated Detection (SIOD), requiring only one instance annotation for each existing category in an image. Degraded inter-task or discrepancies...

10.1109/cvpr52688.2022.01380 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

SMNet: Synchronous Multi-Scale Low Light Enhancement Network With Local and Global Concern

OPENALEX - Publications

Shideng Lin Fan Tang Weiming Dong Xingjia Pan Changsheng Xu

Limited by objectively poor lighting conditions and hardware devices, low-light images with low visual quality visibility are inevitable in the real world. Accurate local details reasonable global information play their essential distinct roles image enhancement: contribute to fine textures, while is critical for a proper understanding of brightness level. In this paper, we focus on integrating aspects achieve high-quality enhancement proposing synchronous multi-scale network (SMNet). A...

10.1109/tmm.2023.3254141 article EN IEEE Transactions on Multimedia 2023-01-01

CrossRectify: Leveraging disagreement for semi-supervised object detection

OPENALEX - Publications

Chengcheng Ma Xingjia Pan Qixiang Ye Fan Tang Weiming Dong and 1 more

10.1016/j.patcog.2022.109280 article EN Pattern Recognition 2022-12-24

Content-Based Visual Summarization for Image Collections

OPENALEX - Publications

Xingjia Pan Fan Tang Weiming Dong Chongyang Ma Yiping Meng and 3 more

With the surge of images in information era, people demand an effective and accurate way to access meaningful visual information. Accordingly, communication has become indispensable. In this article, we propose a content-based approach that automatically generates clear informative summarization based on design principles cognitive psychology represent image collections. We first introduce novel method make representative nonredundant summarizations collections, thereby ensuring data...

10.1109/tvcg.2019.2948611 article EN IEEE Transactions on Visualization and Computer Graphics 2019-10-23

Distributed Attention for Grounded Image Captioning

OPENALEX - Publications

Nenglun Chen Xingjia Pan Runnan Chen Lei Yang Zhiwen Lin and 5 more

We study the problem of weakly supervised grounded image captioning. That is, given an image, goal is to automatically generate a sentence describing context with each noun word corresponding region in image. This task challenging due lack explicit fine-grained alignments as supervision. Previous methods mainly explore various kinds regularization schemes improve attention accuracy. However, their performances are still far from fully ones. One main issue that has been ignored for generating...

10.1145/3474085.3475354 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Semantic-Context Graph Network for Point-based 3D Object Detection

OPENALEX - Publications

Shuwei Dong Xiaoyu Kong Xingjia Pan Fan Tang Wei Li and 2 more

Point-based indoor 3D object detection has received increasing attention with the large demand for augmented reality, autonomous driving, and robot technology in industry. However, precision suffers from inputs semantic ambiguity, i.e., shape symmetries, occlusion, texture missing, which would lead that different objects appearing similar viewpoints then confusing model. Typical point-based detectors relieve this problem via learning proposal representations both geometric information, while...

10.1109/tcsvt.2023.3271318 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-04-28

Inter-image Contrastive Consistency for Multi-Person Pose Estimation

OPENALEX - Publications

Xixia Xu Yingguo Gao Xingjia Pan Ke Yan Xiaoyu Chen and 1 more

Multi-person pose estimation (MPPE) has achieved impressive progress in recent years. However, due to the large variance of appearances among images or occlusions, model can hardly learn consistent patterns enough, which leads severe location jitter and missing issues. In this study, we propose a novel framework, termed Inter-image Contrastive consistency (ICON), strengthen keypoint for MPPE. Concretely, consider two-fold constraints, include single contrastive (SKCC) pair relation (PRCC)....

10.1609/aaai.v37i3.25410 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

SPA2Net: Structure-Preserved Attention Activated Network for Weakly Supervised Object Localization

OPENALEX - Publications

Dong Chen Xingjia Pan Fan Tang Weiming Dong Changsheng Xu

By exploring the localizable representations in deep CNN, weakly supervised object localization (WSOL) methods could determine position of each image just trained by classification task. However, partial activation problem caused discriminant function makes network unable to locate objects accurately. To alleviate this problem, we propose Structure-Preserved Attention Activated Network (SPA2Net), a simple and effective one-stage WSOL framework explore ability structure preservation features....

10.1109/tip.2023.3323793 article EN IEEE Transactions on Image Processing 2023-01-01

Unveiling the Potential of Structure Preserving for Weakly Supervised Object Localization

OPENALEX - Publications

Xingjia Pan Yingguo Gao Zhiwen Lin Fan Tang Weiming Dong and 3 more

Weakly supervised object localization(WSOL) remains an open problem given the deficiency of finding extent information using a classification network. Although prior works struggled to localize objects through various spatial regularization strategies, we argue that how extract structural from trained network is neglected. In this paper, propose two-stage approach, termed structure-preserving activation (SPA), toward fully leveraging structure incorporated in convolutional features for WSOL....

10.48550/arxiv.2103.04523 preprint EN cc-by arXiv (Cornell University) 2021-01-01

Dynamic Refinement Network for Oriented and Densely Packed Object Detection

OPENALEX - Publications

Xingjia Pan Yuqiang Ren Kekai Sheng Weiming Dong Haolei Yuan and 3 more

Object detection has achieved remarkable progress in the past decade. However, of oriented and densely packed objects remains challenging because following inherent reasons: (1) receptive fields neurons are all axis-aligned same shape, whereas usually diverse shapes align along various directions; (2) models typically trained with generic knowledge may not generalize well to handle specific at test time; (3) limited dataset hinders development on this task. To resolve first two issues, we...

10.48550/arxiv.2005.09973 preprint EN other-oa arXiv (Cornell University) 2020-01-01

TS-CAM: Token Semantic Coupled Attention Map for Weakly Supervised Object Localization

OPENALEX - Publications

Wei Gao Fang Wan Xingjia Pan Zhiliang Peng Qi Tian and 3 more

Weakly supervised object localization (WSOL) is a challenging problem when given image category labels but requires to learn models. Optimizing convolutional neural network (CNN) for classification tends activate local discriminative regions while ignoring complete extent, causing the partial activation issue. In this paper, we argue that caused by intrinsic characteristics of CNN, where convolution operations produce receptive fields and experience difficulty capture long-range feature...

10.48550/arxiv.2103.14862 preprint EN other-oa arXiv (Cornell University) 2021-01-01

A2Pt: Anti-Associative Prompt Tuning for Open Set Visual Recognition

OPENALEX - Publications

Hairui Ren Fan Tang Xingjia Pan Juan Cao Weiming Dong and 3 more

Multi-modality pre-trained models (PTMs) have considerably boosted the performance on a broad range of computer vision topics. Still, they not been explored purposefully in open set recognition (OSR) scenarios when applying PTMs to downstream tasks. Directly fine/prompt tuning closed-set classification tasks will inevitably suffer from data bias and always learn more or less target class-irrelevant cooccurring contextual information, which leads over-confident predictions unknown samples. In...

10.1109/tmm.2023.3339387 article EN IEEE Transactions on Multimedia 2023-12-05

Content-based measure of image set diversity

OPENALEX - Publications

Xingjia Pan Juntao Ye Fan Tang Weiming Dong Feiyue Huang and 1 more

With the ubiquity of digital cameras and growth social media population, people share upload millions photos per day. To effectively manage or explore a series shots different scenes, often hope to pick few representative examples with various contents, in order fastly have global view whole set. Thus, it is important considering evaluate diversity an image

10.1145/3145690.3145700 article EN 2017-11-20