- Advanced Vision and Imaging
- Generative Adversarial Networks and Image Synthesis
- Computer Graphics and Visualization Techniques
- Image Retrieval and Classification Techniques
- Adversarial Robustness in Machine Learning
- Anomaly Detection Techniques and Applications
- 3D Shape Modeling and Analysis
- Image Processing Techniques and Applications
- Advanced Image and Video Retrieval Techniques
- Robotics and Sensor-Based Localization
- 3D Surveying and Cultural Heritage
- Advanced Neural Network Applications
- Image Enhancement Techniques
- Geographic Information Systems Studies
- Video Surveillance and Tracking Methods
- Model Reduction and Neural Networks
- Flow Measurement and Analysis
- Industrial Vision Systems and Defect Detection
- Face and Expression Recognition
- Human Pose and Action Recognition
- Fluid Dynamics and Turbulent Flows
- Advanced Image Processing Techniques
- Computer Science and Engineering
- Advanced Optical Sensing Technologies
- Machine Learning in Healthcare
Tsinghua University
2023-2024
University of Chinese Academy of Sciences
2023
Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capa-bilities self-driving vehicles. However, intrinsically relies upon photometric consistency assumption, which hardly holds during nighttime. Although various supervised night-time image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose first method that jointly learns nighttime enhancer and...
Traffic accidents present complex challenges for autonomous driving, often featuring unpredictable scenarios that hinder accurate system interpretation and responses.Nonetheless, prevailing methodologies fall short in elucidating the causes of proposing preventive measures due to paucity training data specific accident scenarios.In this work, we introduce AVD2 (Accident Video Diffusion Accident Description), a novel framework enhances scene understanding by generating videos aligned with...
With the rapid advancements in diffusion models and 3D generation techniques, dynamic content has become a crucial research area. However, achieving high-fidelity 4D (dynamic 3D) with strong spatial-temporal consistency remains challenging task. Inspired by recent findings that pretrained features capture rich correspondences, we propose FB-4D, novel framework integrates Feature Bank mechanism to enhance both spatial temporal generated frames. In store extracted from previous frames fuse...
Road anomaly detection is critical to safe autonomous driving, because current road scene understanding models are usually trained in a closed-set manner and fail identify unknown objects. What's worse, it difficult, if not impossible, collect large-scale dataset with annotations. So this paper studies unsupervised which finds out regions using parsing logits solely. While former methods depend on the weights learned from closed training set as anchors for logit generation, we resort...
Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this based upon idea model exponential moving averaging. But adapting scheme state-of-the-art (SOTA) solution for PC-based not straightforward. To end, define quad set matching strategy several consistency losses metrics...
In this paper, we study the problem of semi-supervised 3D object detection, which is great importance considering high annotation cost for cluttered indoor scenes. We resort to robust and principled framework self-teaching, has triggered notable progress learning recently. While paradigm natural image-level or pixel-level prediction, adapting it detection challenged by issue proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected proposals...
In the past several years, road anomaly segmentation is actively explored in academia and drawing growing attention industry. The rationale behind straightforward: if autonomous car can brake before hitting an anomalous object, safety promoted. However, this naturally calls for a temporally informed setting while existing methods benchmarks are designed unrealistic frame-wise manner. To bridge gap, we contribute first video dataset driving. Since placing various objects on busy roads...
In this paper, we present a Scale-adaptive method for Anti-aliasing Gaussian Splatting (SA-GS). While the state-of-the-art Mip-Splatting needs modifying training procedure of splatting, our functions at test-time and is training-free. Specifically, SA-GS can be applied to any pretrained splatting field as plugin significantly improve field's anti-alising performance. The core technique apply 2D scale-adaptive filters each during test time. As pointed out by Mip-Splatting, observing Gaussians...
Semantic image synthesis (SIS) shows good promises for sensor simulation. However, current best practices in this field, based on GANs, have not yet reached the desired level of quality. As latent diffusion models make significant strides generation, we are prompted to evaluate ControlNet, a notable method its dense control capabilities. Our investigation uncovered two primary issues with results: presence weird sub-structures within large semantic areas and misalignment content mask....
Autonomous vehicles are gradually entering city roads today, with the help of high-definition maps (HDMaps). However, reliance on HDMaps prevents autonomous from stepping into regions without this expensive digital infrastructure. This fact drives many researchers to study online HDMap generation algorithms, but performance these algorithms at far is still unsatisfying. We present P-MapNet, in which letter P highlights that we focus incorporating map priors improve model performance....
3D human body reconstruction has been a challenge in the field of computer vision. Previous methods are often time-consuming and difficult to capture detailed appearance body. In this paper, we propose new method called \emph{Ultraman} for fast textured models from single image. Compared existing techniques, greatly improves speed accuracy while preserving high-quality texture details. We present set frameworks consisting three parts, geometric reconstruction, generation mapping. Firstly,...
Despite significant advancements in Neural Radiance Fields (NeRFs), the renderings may still suffer from aliasing and blurring artifacts, since it remains a fundamental challenge to effectively efficiently characterize anisotropic areas induced by cone-casting procedure. This paper introduces Ripmap-Encoded Platonic Solid representation precisely featurize 3D areas, achieving high-fidelity anti-aliasing renderings. Central our approach are two key components: Projection Ripmap encoding. The...
Fairness is an important topic for medical image analysis, driven by the challenge of unbalanced training data among diverse target groups and societal demand equitable quality. In response to this issue, our research adopts a data-driven strategy-enhancing balance integrating synthetic images. However, in terms generating images, previous works either lack paired labels or fail precisely control boundaries images be aligned with those labels. To address this, we formulate problem joint...
The generation of high-quality 3D car assets is essential for various applications, including video games, autonomous driving, and virtual reality. Current methods utilizing NeRF or 3D-GS as representations objects, generate a Lambertian object under fixed lighting lack separated modelings material global illumination. As result, the generated are unsuitable relighting varying conditions, limiting their applicability in downstream tasks. To address this challenge, we propose novel...
In this paper, we focus on the task of conditional image generation, where an is synthesized according to user instructions. The critical challenge underpinning ensuring both fidelity generated images and their semantic alignment with provided conditions. To tackle issue, previous studies have employed supervised perceptual losses derived from pre-trained models, i.e., reward enforce between condition result. However, observe one inherent shortcoming: considering diversity images, model...
3D particle tracking velocimetry (PTV) is a key technique for analyzing turbulent flow, one of the most challenging computational problems our century. At core PTV dual-frame fluid motion estimation algorithm, which tracks particles across two consecutive frames. Recently, deep learning-based methods have achieved impressive accuracy in estimation; however, they heavily depend on large volumes labeled data. In this paper, we introduce new method that completely self-supervised and notably...
End-to-end architectures in autonomous driving (AD) face a significant challenge interpretability, impeding human-AI trust. Human-friendly natural language has been explored for tasks such as explanation and 3D captioning. However, previous works primarily focused on the paradigm of declarative where interpretations are not grounded intermediate outputs AD systems, making only declarative. In contrast, aligned interpretability establishes connection between systems. Here we introduce...
Visual anagrams are images that change appearance upon transformation, like flipping or rotation. With the advent of diffusion models, generating such optical illusions can be achieved by averaging noise across multiple views during reverse denoising process. However, we observe two critical failure modes in this approach: (i) concept segregation, where concepts different independently generated, which not considered a true anagram, and (ii) domination, certain overpower others. In work,...
Room layout estimation is a long-existing robotic vision task that benefits both environment sensing and motion planning. However, using point clouds (PCs) still suffers from data scarcity due to annotation difficulty. As such, we address the semi-supervised setting of this based upon idea model exponential moving averaging. But adapting scheme state-of-the-art (SOTA) solution for PC-based not straightforward. To end, define quad set matching strategy several consistency losses metrics...
Self-supervised depth estimation draws a lot of attention recently as it can promote the 3D sensing capabilities self-driving vehicles. However, intrinsically relies upon photometric consistency assumption, which hardly holds during nighttime. Although various supervised nighttime image enhancement methods have been proposed, their generalization performance in challenging driving scenarios is not satisfactory. To this end, we propose first method that jointly learns enhancer and estimator,...
In this paper, we study the problem of semi-supervised 3D object detection, which is great importance considering high annotation cost for cluttered indoor scenes. We resort to robust and principled framework selfteaching, has triggered notable progress semisupervised learning recently. While paradigm natural image-level or pixel-level prediction, adapting it detection challenged by issue proposal matching. Prior methods are based upon two-stage pipelines, matching heuristically selected...