NFDI4DS | UHH-SEMS - Publication Details

STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

OPENALEX - Publications

Yupeng Zheng Chengliang Zhong Pengfei Li Huan-ang Gao Yuhang Zheng and 6 more

Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capa-bilities self-driving vehicles. However, intrinsically relies upon photometric consistency assumption, which hardly holds during nighttime. Although various supervised night-time image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose first method that jointly learns nighttime enhancer and...

10.1109/icra48891.2023.10160708 article EN 2023-05-29

P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors

OPENALEX - Publications

Zhou Jiang Zhenxin Zhu Pengfei Li Huan-ang Gao Tianyuan Yuan and 3 more

10.1109/lra.2024.3447450 article EN cc-by-nc-nd IEEE Robotics and Automation Letters 2024-10-01

AVD2: Accident Video Diffusion for Accident Video Description

OPENALEX - Publications

Cheng Li Kaishang Zhou Tong Liu Yu Wang Mengmeng Zhuang and 3 more

Traffic accidents present complex challenges for autonomous driving, often featuring unpredictable scenarios that hinder accurate system interpretation and responses.Nonetheless, prevailing methodologies fall short in elucidating the causes of proposing preventive measures due to paucity training data specific accident scenarios.In this work, we introduce AVD2 (Accident Video Diffusion Accident Description), a novel framework enhances scene understanding by generating videos aligned with...

10.48550/arxiv.2502.14801 preprint EN arXiv (Cornell University) 2025-02-20

FB-4D: Spatial-Temporal Coherent Dynamic 3D Content Generation with Feature Banks

OPENALEX - Publications

Jinwei Li Huan-ang Gao Wenyi Li Heng Chi Chenyu Liu and 11 more

With the rapid advancements in diffusion models and 3D generation techniques, dynamic content has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) with strong spatial-temporal consistency remains challenging task. Inspired by recent findings that pretrained features capture rich correspondences, we propose FB-4D, novel framework integrates Feature Bank mechanism to enhance both spatial temporal generated frames. In store extracted from previous frames fuse...

10.48550/arxiv.2503.20784 preprint EN arXiv (Cornell University) 2025-03-26

Diffusion-based Visual Anagram as Multi-task Learning

OPENALEX - Publications

Zhiyuan Xu Pei‐Jer Chen Huan-ang Gao Weiyan Zhao Guiyu Zhang and 1 more

10.1109/wacv61041.2025.00099 article EN 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2025-02-26

Unsupervised Road Anomaly Detection with Language Anchors

OPENALEX - Publications

Beiwen Tian M. Liu Huan-ang Gao Pengfei Li Hao Zhao and 1 more

Road anomaly detection is critical to safe autonomous driving, because current road scene understanding models are usually trained in a closed-set manner and fail identify unknown objects. What's worse, it difficult, if not impossible, collect large-scale dataset with annotations. So this paper studies unsupervised which finds out regions using parsing logits solely. While former methods depend on the weights learned from closed training set as anchors for logit generation, we resort...

10.1109/icra48891.2023.10160470 article EN 2023-05-29

From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds

OPENALEX - Publications

Huan-ang Gao Beiwen Tian Pengfei Li Xiaoxue Chen Hao Zhao and 3 more

Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this based upon idea model exponential moving averaging. But adapting scheme state-of-the-art (SOTA) solution for PC-based not straightforward. To end, define quad set matching strategy several consistency losses metrics...

10.1109/icra48891.2023.10161273 article EN 2023-05-29

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

OPENALEX - Publications

Huan-ang Gao Beiwen Tian Pengfei Li Hao Zhao Guyue Zhou

In this paper, we study the problem of semi-supervised 3D object detection, which is great importance considering high annotation cost for cluttered indoor scenes. We resort to robust and principled framework self-teaching, has triggered notable progress learning recently. While paradigm natural image-level or pixel-level prediction, adapting it detection challenged by issue proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals...

10.1109/iccv51070.2023.02002 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Latency-aware Road Anomaly Segmentation in Videos: A Photorealistic Dataset and New Metrics

OPENALEX - Publications

Beiwen Tian Huan-ang Gao Leiyao Cui Yupeng Zheng Lan Luo and 4 more

In the past several years, road anomaly segmentation is actively explored in academia and drawing growing attention industry. The rationale behind straightforward: if autonomous car can brake before hitting an anomalous object, safety promoted. However, this naturally calls for a temporally informed setting while existing methods benchmarks are designed unrealistic frame-wise manner. To bridge gap, we contribute first video dataset driving. Since placing various objects on busy roads...

10.48550/arxiv.2401.04942 preprint EN cc-by-nc-nd arXiv (Cornell University) 2024-01-01

SA-GS: Scale-Adaptive Gaussian Splatting for Training-Free Anti-Aliasing

OPENALEX - Publications

Xiaowei Song Jv Zheng Shiran Yuan Huan-ang Gao J. Y. Zhao and 3 more

In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art Mip-Splatting needs modifying training procedure of splatting, our functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained splatting field as plugin significantly improve field's anti-alising performance. The core technique apply 2D scale-adaptive filters each during test time. As pointed out by Mip-Splatting, observing Gaussians...

10.48550/arxiv.2403.19615 preprint EN arXiv (Cornell University) 2024-03-28

SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior

OPENALEX - Publications

Huan-ang Gao Mingju Gao Jiaju Li Wenyi Li Rong Zhi and 2 more

Semantic image synthesis (SIS) shows good promises for sensor simulation. However, current best practices in this field, based on GANs, have not yet reached the desired level of quality. As latent diffusion models make significant strides generation, we are prompted to evaluate ControlNet, a notable method its dense control capabilities. Our investigation uncovered two primary issues with results: presence weird sub-structures within large semantic areas and misalignment content mask....

10.48550/arxiv.2403.09638 preprint EN arXiv (Cornell University) 2024-03-14

P-MapNet: Far-seeing Map Generator Enhanced by both SDMap and HDMap Priors

OPENALEX - Publications

Zhou Jiang Zhenxin Zhu Pengfei Li Huan-ang Gao Tianyuan Yuan and 3 more

Autonomous vehicles are gradually entering city roads today, with the help of high-definition maps (HDMaps). However, reliance on HDMaps prevents autonomous from stepping into regions without this expensive digital infrastructure. This fact drives many researchers to study online HDMap generation algorithms, but performance these algorithms at far is still unsatisfying. We present P-MapNet, in which letter P highlights that we focus incorporating map priors improve model performance....

10.48550/arxiv.2403.10521 preprint EN arXiv (Cornell University) 2024-03-15

Ultraman: Single Image 3D Human Reconstruction with Ultra Speed and Detail

OPENALEX - Publications

Mingjin Chen Junhao Chen Xiaojun Ye Huan-ang Gao Xiaoxue Chen and 2 more

3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture detailed appearance body. In this paper, we propose new method called \emph{Ultraman} for fast textured models from single image. Compared existing techniques, greatly improves speed accuracy while preserving high-quality texture details. We present set frameworks consisting three parts, geometric reconstruction, generation mapping. Firstly,...

10.48550/arxiv.2403.12028 preprint EN arXiv (Cornell University) 2024-03-18

Rip-NeRF: Anti-aliasing Radiance Fields with Ripmap-Encoded Platonic Solids

OPENALEX - Publications

J Liu Wenbo Hu Zhuo Yang Jianteng Chen G.-H. Wang and 4 more

Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively efficiently characterize anisotropic areas induced by cone-casting procedure. This paper introduces Ripmap-Encoded Platonic Solid representation precisely featurize 3D areas, achieving high-fidelity anti-aliasing renderings. Central our approach are two key components: Projection Ripmap encoding. The...

10.1145/3641519.3657402 preprint EN 2024-07-12

FairDiff: Fair Segmentation with Point-Image Diffusion

OPENALEX - Publications

Wenyi Li Haoran Xu Guiyu Zhang Huan-ang Gao Mingju Gao and 2 more

Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and societal demand equitable quality. In response to this issue, our research adopts a data-driven strategy-enhancing balance integrating synthetic images. However, in terms generating images, previous works either lack paired labels or fail precisely control boundaries images be aligned with those labels. To address this, we formulate problem joint...

10.48550/arxiv.2407.06250 preprint EN arXiv (Cornell University) 2024-07-08

RGM: Reconstructing High-fidelity 3D Car Assets with Relightable 3D-GS Generative Model from a Single Image

OPENALEX - Publications

Xiaoxue Chen Jv Zheng Hao Huang Haoran Xu Weihao Gu and 6 more

The generation of high-quality 3D car assets is essential for various applications, including video games, autonomous driving, and virtual reality. Current methods utilizing NeRF or 3D-GS as representations objects, generate a Lambertian object under fixed lighting lack separated modelings material global illumination. As result, the generated are unsuitable relighting varying conditions, limiting their applicability in downstream tasks. To address this challenge, we propose novel...

10.48550/arxiv.2410.08181 preprint EN arXiv (Cornell University) 2024-10-10

Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling

OPENALEX - Publications

Guiyu Zhang Huan-ang Gao Zijian Jiang Hao Zhao Zhedong Zheng

In this paper, we focus on the task of conditional image generation, where an is synthesized according to user instructions. The critical challenge underpinning ensuring both fidelity generated images and their semantic alignment with provided conditions. To tackle issue, previous studies have employed supervised perceptual losses derived from pre-trained models, i.e., reward enforce between condition result. However, observe one inherent shortcoming: considering diversity images, model...

10.48550/arxiv.2410.11236 preprint EN arXiv (Cornell University) 2024-10-14

Dual-frame Fluid Motion Estimation with Test-time Optimization and Zero-divergence Loss

OPENALEX - Publications

Yifei Zhang Huan-ang Gao Zhou Jiang Hao Zhao

3D particle tracking velocimetry (PTV) is a key technique for analyzing turbulent flow, one of the most challenging computational problems our century. At core PTV dual-frame fluid motion estimation algorithm, which tracks particles across two consecutive frames. Recently, deep learning-based methods have achieved impressive accuracy in estimation; however, they heavily depend on large volumes labeled data. In this paper, we introduce new method that completely self-supervised and notably...

10.48550/arxiv.2410.11934 preprint EN arXiv (Cornell University) 2024-10-15

Hint-AD: Holistically Aligned Interpretability in End-to-End Autonomous Driving

OPENALEX - Publications

Kefeng Ding Boyuan Chen Yuchen Su Huan-ang Gao Bu Jin and 6 more

End-to-end architectures in autonomous driving (AD) face a significant challenge interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative where interpretations are not grounded intermediate outputs AD systems, making only declarative. In contrast, aligned interpretability establishes connection between systems. Here we introduce...

10.48550/arxiv.2409.06702 preprint EN arXiv (Cornell University) 2024-09-10

Diffusion-based Visual Anagram as Multi-task Learning

OPENALEX - Publications

Zhiyuan Xu Pei‐Jer Chen Huan-ang Gao Weiyan Zhao Guiyu Zhang and 1 more

Visual anagrams are images that change appearance upon transformation, like flipping or rotation. With the advent of diffusion models, generating such optical illusions can be achieved by averaging noise across multiple views during reverse denoising process. However, we observe two critical failure modes in this approach: (i) concept segregation, where concepts different independently generated, which not considered a true anagram, and (ii) domination, certain overpower others. In work,...

10.48550/arxiv.2412.02693 preprint EN arXiv (Cornell University) 2024-12-03

From Semi-supervised to Omni-supervised Room Layout Estimation Using Point Clouds

OPENALEX - Publications

Huan-ang Gao Beiwen Tian Pengfei Li Xiaoxue Chen Hao Zhao and 3 more

Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this based upon idea model exponential moving averaging. But adapting scheme state-of-the-art (SOTA) solution for PC-based not straightforward. To end, define quad set matching strategy several consistency losses metrics...

10.48550/arxiv.2301.13865 preprint EN cc-by-sa arXiv (Cornell University) 2023-01-01

STEPS: Joint Self-supervised Nighttime Image Enhancement and Depth Estimation

OPENALEX - Publications

Yupeng Zheng Chengliang Zhong Pengfei Li Huan-ang Gao Yuhang Zheng and 6 more

Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities self-driving vehicles. However, intrinsically relies upon photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose first method that jointly learns enhancer and estimator,...

10.48550/arxiv.2302.01334 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DQS3D: Densely-matched Quantization-aware Semi-supervised 3D Detection

OPENALEX - Publications

Huan-ang Gao Beiwen Tian Pengfei Li Hao Zhao Guyue Zhou

In this paper, we study the problem of semi-supervised 3D object detection, which is great importance considering high annotation cost for cluttered indoor scenes. We resort to robust and principled framework selfteaching, has triggered notable progress semisupervised learning recently. While paradigm natural image-level or pixel-level prediction, adapting it detection challenged by issue proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected...

10.48550/arxiv.2304.13031 preprint EN cc-by-nc-sa arXiv (Cornell University) 2023-01-01