NFDI4DS | UHH-SEMS - Publication Details

Haocheng Feng

ORCID: 0000-0002-7567-3053

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5113096046

Research Areas

Face recognition and analysis
Computer Graphics and Visualization Techniques
Advanced Vision and Imaging
Advanced Neural Network Applications
Human Pose and Action Recognition
Speech and Audio Processing
Domain Adaptation and Few-Shot Learning
Generative Adversarial Networks and Image Synthesis
Advanced Image and Video Retrieval Techniques
3D Shape Modeling and Analysis
Human Motion and Animation
Music and Audio Processing
Image Processing and 3D Reconstruction
Autonomous Vehicle Technology and Safety
Hand Gesture Recognition Systems
Robotics and Sensor-Based Localization
Video Surveillance and Tracking Methods
Natural Language Processing Techniques
Visual Attention and Saliency Detection
Biometric Identification and Security
Stroke Rehabilitation and Recovery
Image Retrieval and Classification Techniques
Induction Heating and Inverter Technology
COVID-19 diagnosis using AI
Anomaly Detection Techniques and Applications

Baidu (China)
2023-2024

Vision Technology (United States)
2023-2024

Xi'an Jiaotong University
2023

Japan Advanced Institute of Science and Technology
2022

Beihang University
2010-2011

Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment

OPENALEX - Publications

Qiang Chen Xiaokang Chen Jian Wang Shan Zhang Kun Yao and 5 more

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth object to prediction, for end-to-end detection without NMS post-processing. It is known that one-to-many multiple predictions, succeeds in methods such as Faster R-CNN and FCOS. While the naive assignment does not work DETR, it remains challenging apply DETR training. In this paper, we introduce Group a simple yet efficient training approach introduces group-wise way assignment. This involves using...

10.1109/iccv51070.2023.00610 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

RTFormer: Efficient Design for Real-Time Semantic Segmentation with Transformer

OPENALEX - Publications

Jian Wang Chenhui Gou Qiman Wu Haocheng Feng Junyu Han and 2 more

Recently, transformer-based networks have shown impressive results in semantic segmentation. Yet for real-time segmentation, pure CNN-based approaches still dominate this field, due to the time-consuming computation mechanism of transformer. We propose RTFormer, an efficient dual-resolution transformer segmenation, which achieves better trade-off between performance and efficiency than models. To achieve high inference on GPU-like devices, our RTFormer leverages GPU-Friendly Attention with...

10.48550/arxiv.2210.07124 preprint EN cc-by arXiv (Cornell University) 2022-01-01

StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-Based Generator

OPENALEX - Publications

Jiazhi Guan Zhanwang Zhang Hang Zhou Tianshu Hu Kaisiyuan Wang and 6 more

Despite recent advances in syncing lip movements with any audio waves, current methods still struggle to balance generation quality and the model's generalization ability. Previous studies either require long-term data for training or produce a similar movement pattern on all subjects low quality. In this paper, we propose StyleSync, an effective framework that enables high-fidelity synchronization. We identify style-based generator would sufficiently enable such charming property both...

10.1109/cvpr52729.2023.00151 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

HD-Fusion: Detailed Text-to-3D Generation Leveraging Multiple Noise Estimation

OPENALEX - Publications

Jinbo Wu Xiaobo Gao Xing Liu Zhengyang Shen Chen Zhao and 3 more

In this paper, we study Text-to-3D content generation leveraging 2D diffusion priors to enhance the quality and detail of generated 3D models. Recent progress [11] in text-to-3D has shown that employing high-resolution (e.g., 512 × 512) renderings can lead production high-quality models using latent priors. To enable rendering at even higher resolutions, which potential further augment models, propose a novel approach combines multiple noise estimation processes with pretrained prior....

10.1109/wacv57701.2024.00317 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024-01-03

Multi-Domain Incremental Learning for Face Presentation Attack Detection

OPENALEX - Publications

Keyao Wang Guosheng Zhang Haixiao Yue Ajian Liu Gang Zhang and 4 more

Previous face Presentation Attack Detection (PAD) methods aim to improve the effectiveness of cross-domain tasks. However, in real-world scenarios, original training data pre-trained model is not available due privacy or other reasons. Under these constraints, general for fine-tuning single-target domain may lose previously learned knowledge, leading a catastrophic forgetting problem. To address issues, we propose multi-domain incremental learning (MDIL) method PAD, which only learns...

10.1609/aaai.v38i6.28359 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2024-03-24

PSVT: End-to-End Multi-Person 3D Pose and Shape Estimation with Progressive Video Transformers

OPENALEX - Publications

Zhongwei Qiu Qiansheng Yang Jian Wang Haocheng Feng Junyu Han and 4 more

Existing methods of multi-person video 3D human Pose and Shape Estimation (PSE) typically adopt a two-stage strategy, which first detects instances in each frame then performs single-person PSE with temporal model. However, the global spatio-temporal context among spatial can not be captured. In this paper, we propose new end-to-end estimation framework progressive Video Transformer, termed PSVT. PSVT, encoder (STE) captures feature dependencies objects. Then, pose decoder (STPD) shape...

10.1109/cvpr52729.2023.02036 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Cyclically Disentangled Feature Translation for Face Anti-spoofing

OPENALEX - Publications

Haixiao Yue Keyao Wang Guosheng Zhang Haocheng Feng Junyu Han and 2 more

Current domain adaptation methods for face anti-spoofing leverage labeled source data and unlabeled target to obtain a promising generalizable decision boundary. However, it is usually difficult these achieve perfect domain-invariant liveness feature disentanglement, which may degrade the final classification performance by differences in illumination, category, spoof type, etc. In this work, we tackle cross-scenario proposing novel method called cyclically disentangled translation network...

10.1609/aaai.v37i3.25443 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning

OPENALEX - Publications

Yanpeng Sun Qiang Chen Xiangyu He Jian Wang Haocheng Feng and 5 more

Freezing the pre-trained backbone has become a standard paradigm to avoid overfitting in few-shot segmentation. In this paper, we rethink and explore new regime: {\em fine-tuning small part of parameters backbone}. We present solution overcome problem, leading better model generalization on learning novel classes. Our method decomposes into three successive matrices via Singular Value Decomposition (SVD), then only fine-tunes singular values} keeps others frozen. The above design allows...

10.48550/arxiv.2206.06122 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Wind energy harvesting inspired by Palm leaf flutter: Observation, mechanism and experiment

OPENALEX - Publications

Kun Wang Wei Xia Jiayuan Ren Weiwei Yu Haocheng Feng and 1 more

10.1016/j.enconman.2023.116971 article EN Energy Conversion and Management 2023-04-01

VDG: Vision-Only Dynamic Gaussian for Driving Simulation

OPENALEX - Publications

Hao Li Jing‐Feng Li Dingwen Zhang Chenming Wu Jieqi Shi and 5 more

10.1109/lra.2025.3555938 article EN IEEE Robotics and Automation Letters 2025-01-01

Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining

OPENALEX - Publications

Qiang Chen Jian Wang Chuchu Han Shan Zhang Zexian Li and 10 more

We present a strong object detector with encoder-decoder pretraining and finetuning. Our method, called Group DETR v2, is built upon vision transformer encoder ViT-Huge~\cite{dosovitskiy2020image}, variant DINO~\cite{zhang2022dino}, an efficient training method DETR~\cite{chen2022group}. The process consists of self-supervised finetuning ViT-Huge on ImageNet-1K, the Object365, finally it COCO. v2 achieves $\textbf{64.5}$ mAP COCO test-dev, establishes new SoTA leaderboard...

10.48550/arxiv.2211.03594 preprint EN other-oa arXiv (Cornell University) 2022-01-01

VideoGen: A Reference-Guided Latent Diffusion Approach for High Definition Text-to-Video Generation

OPENALEX - Publications

Xin Li Wenqing Chu Ye Wu Weihang Yuan Fanglong Liu and 5 more

In this paper, we present VideoGen, a text-to-video generation approach, which can generate high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion. We leverage an off-the-shelf text-to-image model, e.g., Stable Diffusion, to image content quality from the text prompt, as reference guide generation. Then, introduce efficient cascaded diffusion module conditioned on both for generating representations, followed by flow-based...

10.48550/arxiv.2309.00398 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Graph Contrastive Learning for Skeleton-based Action Recognition

OPENALEX - Publications

Xiaohu Huang Hao Zhou Bin Feng Xinggang Wang Wenyu Liu and 5 more

In the field of skeleton-based action recognition, current top-performing graph convolutional networks (GCNs) exploit intra-sequence context to construct adaptive graphs for feature aggregation. However, we argue that such is still \textit{local} since rich cross-sequence relations have not been explicitly investigated. this paper, propose a contrastive learning framework recognition (\textit{SkeletonGCL}) explore \textit{global} across all sequences. specific, SkeletonGCL associates...

10.48550/arxiv.2301.10900 preprint EN other-oa arXiv (Cornell University) 2023-01-01

GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time

OPENALEX - Publications

Hao Li Yuanyuan Gao Dingwen Zhang Chenming Wu Yalun Dai and 5 more

This paper presents GGRt, a novel approach to generalizable view synthesis that alleviates the need for real camera poses, complexity in processing high-resolution images, and lengthy optimization processes, thus facilitating stronger applicability of 3D Gaussian Splatting (3D-GS) real-world scenarios. Specifically, we design joint learning framework consists an Iterative Pose Optimization Network (IPO-Net) Generalizable 3D-Gaussians (G-3DG) model. With mechanism, proposed can inherently...

10.48550/arxiv.2403.10147 preprint EN arXiv (Cornell University) 2024-03-15

A Knowledge-based and Extensible Aircraft Conceptual Design Environment

OPENALEX - Publications

Haocheng Feng M. X. Luo Hu Liu Zhe Wu

Design knowledge and experience are the bases to carry out aircraft conceptual design tasks due high complexity integration of during this phase. When carrying same task, different designers may need individual strategies fulfill their own demands. A knowledge-based extensible method in building systems is studied considering above requirements. Based on theory, a environment, called environment (KEACDE) with open architecture, built as enable wrap add-on extensions make systems. The...

10.1016/s1000-9361(11)60083-6 article EN cc-by-nc-nd Chinese Journal of Aeronautics 2011-12-01

A Web-based Software Framework for Aircraft Design Modeling, Analysis and Multidisciplinary Optimization

OPENALEX - Publications

Haocheng Feng Luo Mingqiang Liu Hu Zhe Wu

Design knowledge and experience are the basis to carry out aircraft conceptual design tasks due high complexity integration of works involved in this phase. Aircraft designers need a computer-aided package help them easily with their individual strategies. This paper presents set web-based software framework called Pad (ADP). The architecture is open so that users can wrap add-on extensions make own system. development aspects ADP discussed case presented demonstrate its usability effectiveness.

10.1016/j.proenv.2011.12.046 article EN Procedia Environmental Sciences 2011-01-01

Knowledge Distillation for Detection Transformer with Consistent Distillation Points Sampling

OPENALEX - Publications

Yu Wang Xin Li Shengzhao Wen Fukui Yang Wanping Zhang and 4 more

DETR is a novel end-to-end transformer architecture object detector, which significantly outperforms classic detectors when scaling up the model size. In this paper, we focus on compression of with knowledge distillation. While distillation has been well-studied in detectors, there lack researches how to make it work effectively DETR. We first provide experimental and theoretical analysis point out that main challenge consistent points. Distillation points refer corresponding inputs...

10.48550/arxiv.2211.08071 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Splatter-360: Generalizable 360$^{\circ}$ Gaussian Splatting for Wide-baseline Panoramic Images

OPENALEX - Publications

Zheng Chen Chenming Wu Zhelun Shen Zhao Chen Weicai Ye and 3 more

Wide-baseline panoramic images are frequently used in applications like VR and simulations to minimize capturing labor costs storage needs. However, synthesizing novel views from these real time remains a significant challenge, especially due imagery's high resolution inherent distortions. Although existing 3D Gaussian splatting (3DGS) methods can produce photo-realistic under narrow baselines, they often overfit the training when dealing with wide-baseline difficulty learning precise...

10.48550/arxiv.2412.06250 preprint EN arXiv (Cornell University) 2024-12-09

GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video

OPENALEX - Publications

Xinqi Liu Chenming Wu Xing Liu Jialun Liu Jinbo Wu and 4 more

This paper presents GEA, a novel method for creating expressive 3D avatars with high-fidelity reconstructions of body and hands based on Gaussians. The key contributions are twofold. First, we design two-stage pose estimation to obtain an accurate SMPL-X from input images, providing correct mapping between the pixels training image model. It uses attention-aware network optimization scheme align normal silhouette estimated real in image. Second, propose iterative re-initialization strategy...

10.48550/arxiv.2402.16607 preprint EN arXiv (Cornell University) 2024-02-26

TexRO: Generating Delicate Textures of 3D Models by Recursive Optimization

OPENALEX - Publications

Jinbo Wu Xing Liu Chenming Wu Xiaobo Gao Jialun Liu and 5 more

This paper presents TexRO, a novel method for generating delicate textures of known 3D mesh by optimizing its UV texture. The key contributions are two-fold. We propose an optimal viewpoint selection strategy, that finds the most miniature set viewpoints covering all faces mesh. Our strategy guarantees completeness generated result. recursive optimization pipeline optimizes texture at increasing resolutions, with adaptive denoising re-uses existing new generation. Through extensive...

10.48550/arxiv.2403.15009 preprint EN arXiv (Cornell University) 2024-03-22

Dense Connector for MLLMs

OPENALEX - Publications

Huanjin Yao Wenhao Wu Taojiannan Yang Yuxin Song Mengxi Zhang and 5 more

Do we fully leverage the potential of visual encoder in Multimodal Large Language Models (MLLMs)? The recent outstanding performance MLLMs multimodal understanding has garnered broad attention from both academia and industry. In current MLLM rat race, focus seems to be predominantly on linguistic side. We witness rise larger higher-quality instruction datasets, as well involvement larger-sized LLMs. Yet, scant been directed towards signals utilized by MLLMs, often assumed final high-level...

10.48550/arxiv.2405.13800 preprint EN arXiv (Cornell University) 2024-05-22

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

OPENALEX - Publications

Yanmin Wu Jiarui Meng Haijie Li Chenming Wu Yahao Shi and 6 more

This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based methods mainly focus 2D pixel-level parsing. These struggle with tasks due to weak feature expressiveness and inaccurate 2D-3D associations. To ensure robust presentation understanding, we first employ SAM masks without cross-frame associations train instance features consistency. exhibit...

10.48550/arxiv.2406.02058 preprint EN arXiv (Cornell University) 2024-06-04

VDG: Vision-Only Dynamic Gaussian for Driving Simulation

OPENALEX - Publications

Hao Li Jing‐Feng Li Dingwen Zhang Chenming Wu Jieqi Shi and 5 more

Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses issue integrating self-supervised VO into our pose-free dynamic method (VDG) boost pose depth static-dynamic decomposition. Moreover, VDG can work with only RGB input construct scenes at a faster...

10.48550/arxiv.2406.18198 preprint EN arXiv (Cornell University) 2024-06-26

XLD: A Cross-Lane Dataset for Benchmarking Novel Driving View Synthesis

OPENALEX - Publications

Hao Li Ming Yuan Yan Zhang Chenming Wu Chen Zhao and 5 more

Thoroughly testing autonomy systems is crucial in the pursuit of safe autonomous driving vehicles. It necessitates creating safety-critical scenarios that go beyond what can be safely collected from real-world data, as many these occur infrequently on public roads. However, evaluation most existing NVS methods relies sporadic sampling image frames training comparing rendered images with ground truth using metrics. Unfortunately, this protocol falls short meeting actual requirements...

10.48550/arxiv.2406.18360 preprint EN arXiv (Cornell University) 2024-06-26

Coming Soon ...