NFDI4DS | UHH-SEMS - Publication Details

Xiaokang Chen

ORCID: 0000-0002-6188-5821

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101412833

Research Areas

Advanced Neural Network Applications
Computer Graphics and Visualization Techniques
3D Shape Modeling and Analysis
Domain Adaptation and Few-Shot Learning
Advanced Vision and Imaging
Multimodal Machine Learning Applications
Topic Modeling
Spectroscopy and Chemometric Analyses
Prenatal Screening and Diagnostics
Remote Sensing and Land Use
Advanced Image and Video Retrieval Techniques
Video Surveillance and Tracking Methods
Natural Language Processing Techniques
Image Processing and 3D Reconstruction
Industrial Vision Systems and Defect Detection
3D Surveying and Cultural Heritage
Fetal and Pediatric Neurological Disorders
Explainable Artificial Intelligence (XAI)
Traditional Chinese Medicine Analysis
Advanced Chemical Sensor Technologies
Generative Adversarial Networks and Image Synthesis
Advanced Image Processing Techniques
Congenital Diaphragmatic Hernia Studies
Text and Document Classification Technologies
Analytical Chemistry and Sensors

Peking University
2019-2024

Sun Yat-sen University
2024

Children’s Hospital of Fudan University Xiamen Branch
2017-2024

King University
2023-2024

North China University of Water Resources and Electric Power
2023

Microsoft Research (India)
2023

Carnegie Mellon University
2023

ETH Zurich
2023

Wuhan University of Technology
2021

Hunan Normal University
2021

Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

OPENALEX - Publications

Xiaokang Chen Yuhui Yuan Gang Zeng Jingdong Wang

In this paper, we study the semi-supervised semantic segmentation problem via exploring both labeled data and extra unlabeled data. We propose a novel consistency regularization approach, called cross pseudo supervision (CPS). Our approach imposes on two networks perturbed with different initialization for same input image. The one-hot label map, output from one network, is used to supervise other network standard cross-entropy loss, vice versa. CPS has roles: encourage high similarity...

10.1109/cvpr46437.2021.00264 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Conditional DETR for Fast Training Convergence

OPENALEX - Publications

Depu Meng Xiaokang Chen Zejia Fan Gang Zeng Houqiang Li and 3 more

The recently-developed DETR approach applies the transformer encoder and decoder architecture to object detection achieves promising performance. In this paper, we handle critical issue, slow training convergence, present a conditional cross-attention mechanism for fast training. Our is motivated by that in relies highly on content embeddings localizing four extremities predicting box, which increases need high-quality thus difficulty.Our approach, named DETR, learns spatial query from...

10.1109/iccv48922.2021.00363 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Segmentation Transformer: Object-Contextual Representations for Semantic Segmentation

OPENALEX - Publications

Yuhui Yuan Xiaokang Chen Xilin Chen Jingdong Wang

In this paper, we address the semantic segmentation problem with a focus on context aggregation strategy. Our motivation is that label of pixel category object belongs to. We present simple yet effective approach, object-contextual representations, characterizing by exploiting representation corresponding class. First, learn regions under supervision ground-truth segmentation. Second, compute region aggregating representations pixels lying in region. Last, % similarity relation between each...

10.48550/arxiv.1909.11065 preprint EN other-oa arXiv (Cornell University) 2019-01-01

Context Autoencoder for Self-supervised Representation Learning

OPENALEX - Publications

Xiaokang Chen Mingyu Ding Xiaodi Wang Ying Xin Shentong Mo and 5 more

10.1007/s11263-023-01852-4 article EN International Journal of Computer Vision 2023-08-28

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

OPENALEX - Publications

Qiang Chen Xiaokang Chen Jian Wang Shan Zhang Kun Yao and 5 more

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to prediction, for end-to-end detection without NMS post-processing. It is known that one-to-many multiple predictions, succeeds in methods such as Faster R-CNN and FCOS. While the naive assignment does not work DETR, it remains challenging apply DETR training. In this paper, we introduce Group a simple yet efficient training approach introduces group-wise way assignment. This involves using...

10.1109/iccv51070.2023.00610 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

OPENALEX - Publications

Wenhai Wang Zhe Chen Xiaokang Chen Jiannan Wu Xizhou Zhu and 6 more

Large language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing them immense potential across a range of applications. However, in the field computer vision, despite availability numerous powerful vision foundation (VFMs), they are still restricted to tasks pre-defined form, struggling match open-ended task capabilities LLMs. In this work, we present an LLM-based framework...

10.48550/arxiv.2305.11175 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Delicate Textured Mesh Recovery from NeRF via Adaptive Surface Refinement

OPENALEX - Publications

Jiaxiang Tang Hang Zhou Xiaokang Chen Tianshu Hu Errui Ding and 2 more

Neural Radiance Fields (NeRF) have constituted a remarkable breakthrough in image-based 3D reconstruction. However, their implicit volumetric representations differ significantly from the widely-adopted polygonal meshes and lack support common software hardware, making rendering manipulation inefficient. To overcome this limitation, we present novel framework that generates textured surface images. Our approach begins by efficiently initializing geometry view-dependency decomposed appearance...

10.1109/iccv51070.2023.01626 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

3D Sketch-Aware Semantic Scene Completion via Semi-Supervised Structure Prior

OPENALEX - Publications

Xiaokang Chen Kwan-Yee Lin Qian Chen Gang Zeng Hongsheng Li

The goal of the Semantic Scene Completion (SSC) task is to simultaneously predict a completed 3D voxel representation volumetric occupancy and semantic labels objects in scene from single-view observation. Since computational cost generally increases explosively along with growth resolution, most current state-of-the-arts have tailor their framework into low-resolution sacrifice detail prediction. Thus, resolution becomes one crucial difficulties that lead performance bottleneck. In this...

10.1109/cvpr42600.2020.00425 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

OPENALEX - Publications

Xiaokang Chen Zhiyu Wu Xingchao Liu Zizheng Pan Wenjun Liu and 3 more

In this work, we introduce Janus-Pro, an advanced version of the previous work Janus. Specifically, Janus-Pro incorporates (1) optimized training strategy, (2) expanded data, and (3) scaling to larger model size. With these improvements, achieves significant advancements in both multimodal understanding text-to-image instruction-following capabilities, while also enhancing stability generation. We hope will inspire further exploration field. Code models are publicly available.

10.48550/arxiv.2501.17811 preprint EN arXiv (Cornell University) 2025-01-29

Joint Implicit Image Function for Guided Depth Super-Resolution

OPENALEX - Publications

Jiaxiang Tang Xiaokang Chen Gang Zeng

Guided depth super-resolution is a practical task where low-resolution and noisy input map restored to high-resolution version, with the help of RGB guide image. Existing methods usually view this as generalized guided filtering problem that relies on designing explicit filters objective functions, or dense regression directly predicts target image via deep neural networks. These suffer from either model capability interpretability. Inspired by recent progress in implicit representation, we...

10.1145/3474085.3475584 article EN Proceedings of the 30th ACM International Conference on Multimedia 2021-10-17

Compressible-composable NeRF via Rank-residual Decomposition

OPENALEX - Publications

Jiaxiang Tang Xiaokang Chen Jingbo Wang Gang Zeng

Neural Radiance Field (NeRF) has emerged as a compelling method to represent 3D objects and scenes for photo-realistic rendering. However, its implicit representation causes difficulty in manipulating the models like explicit mesh representation. Several recent advances NeRF manipulation are usually restricted by shared renderer network, or suffer from large model size. To circumvent hurdle, this paper, we present an neural field that enables efficient convenient of models. achieve goal,...

10.48550/arxiv.2205.14870 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Bi-directional Cross-Modality Feature Propagation with Separation-and-Aggregation Gate for RGB-D Semantic Segmentation

OPENALEX - Publications

Xiaokang Chen Kwan-Yee Lin Jingbo Wang Wayne Wu Chen Qian and 2 more

Depth information has proven to be a useful cue in the semantic segmentation of RGB-D images for providing geometric counterpart RGB representation. Most existing works simply assume that depth measurements are accurate and well-aligned with pixels models problem as cross-modal feature fusion obtain better representations achieve more segmentation. This, however, may not lead satisfactory results actual data generally noisy, which might worsen accuracy networks go deeper. In this paper, we...

10.48550/arxiv.2007.09183 preprint EN other-oa arXiv (Cornell University) 2020-01-01

MaskGroup: Hierarchical Point Grouping and Masking for 3D Instance Segmentation

OPENALEX - Publications

Min Zhong Xinghao Chen Xiaokang Chen Gang Zeng Yunhe Wang

This paper studies the 3D instance segmentation problem, which has a variety of real-world applications such as robotics and augmented reality. Since surroundings objects are high complexity, separating different is very difficult. To address this challenging we propose novel framework to group refine instances. In practice, first learn an offset vector for each point shift it its predicted center. better these points, Hierarchical Point Grouping algorithm merge centrally aggregated points...

10.1109/icme52920.2022.9859996 article EN 2022 IEEE International Conference on Multimedia and Expo (ICME) 2022-07-18

Not All Voxels Are Equal: Semantic Scene Completion from the Point-Voxel Perspective

OPENALEX - Publications

Jiaxiang Tang Xiaokang Chen Jingbo Wang Gang Zeng

We revisit Semantic Scene Completion (SSC), a useful task to predict the semantic and occupancy representation of 3D scenes, in this paper. A number methods for are always based on voxelized scene representations. Although voxel representations keep local structures scene, these suffer from heavy computation redundancy due existence visible empty voxels when network goes deeper. To address dilemma, we propose our novel point-voxel aggregation task. first transfer scenes point clouds by...

10.1609/aaai.v36i2.20134 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

2.5D Convolution for RGB-D Semantic Segmentation

OPENALEX - Publications

Yajie Xing Jingbo Wang Xiaokang Chen Gang Zeng

Convolutional neural networks (CNN) have achieved great success in RGB semantic segmentation. RGB-D images provide additional depth information, which can improve segmentation performance. To take full advantages of the 3D geometry relations provided by images, this paper, we propose 2.5D convolution, mimics one convolution kernel several masked 2D kernels. Our effectively process spatial between pixels a manner similar to while still sampling on plane, and thus saves computational cost. And...

10.1109/icip.2019.8803757 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Coupling Two-Stream RGB-D Semantic Segmentation Network by Idempotent Mappings

OPENALEX - Publications

Yajie Xing Jingbo Wang Xiaokang Chen Gang Zeng

In RGB-D semantic segmentation tasks, it has been shown that HHA embeddings effectively encode rich depth features and using together with RGB images can improve performance. this paper, we propose a novel method to integrate features. By replacing identity mappings in ResNet-based two-stream network idempotent mappings, couple the originally separated two branches mix from modalities, while still keep good information flow nature of ResNet. Moreover, our does not bring any additional blocks...

10.1109/icip.2019.8803146 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2019-08-26

Conditional DETR V2: Efficient Detection Transformer with Box Queries

OPENALEX - Publications

Xiaokang Chen Fangyun Wei Gang Zeng Jingdong Wang

In this paper, we are interested in Detection Transformer (DETR), an end-to-end object detection approach based on a transformer encoder-decoder architecture without hand-crafted postprocessing, such as NMS. Inspired by Conditional DETR, improved DETR with fast training convergence, that presented box queries (originally called spatial queries) for internal decoder layers, reformulate the query into format of is composition embeddings reference point and transformation respect to point. This...

10.48550/arxiv.2207.08914 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

OPENALEX - Publications

Qiang Chen Jian Wang Chuchu Han Shan Zhang Zexian Li and 10 more

We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, variant DINO~\cite{zhang2022dino}, an efficient training method DETR~\cite{chen2022group}. The process consists of self-supervised finetuning ViT-Huge on ImageNet-1K, the Object365, finally it COCO. v2 achieves $\textbf{64.5}$ mAP COCO test-dev, establishes new SoTA leaderboard...

10.48550/arxiv.2211.03594 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Interactive Segment Anything NeRF with Feature Imitation

OPENALEX - Publications

Xiaokang Chen Jiaxiang Tang Diwen Wan Jingbo Wang Gang Zeng

This paper investigates the potential of enhancing Neural Radiance Fields (NeRF) with semantics to expand their applications. Although NeRF has been proven useful in real-world applications like VR and digital creation, lack hinders interaction objects complex scenes. We propose imitate backbone feature off-the-shelf perception models achieve zero-shot semantic segmentation NeRF. Our framework reformulates process by directly rendering features only applying decoder from models. eliminates...

10.48550/arxiv.2305.16233 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition

OPENALEX - Publications

Jiaxiang Tang Kaisiyuan Wang Hang Zhou Xiaokang Chen Dongliang He and 4 more

While dynamic Neural Radiance Fields (NeRF) have shown success in high-fidelity 3D modeling of talking portraits, the slow training and inference speed severely obstruct their potential usage. In this paper, we propose an efficient NeRF-based framework that enables real-time synthesizing portraits faster convergence by leveraging recent grid-based NeRF. Our key insight is to decompose inherently high-dimensional portrait representation into three low-dimensional feature grids. Specifically,...

10.48550/arxiv.2211.12368 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Coming Soon ...