- Reinforcement Learning in Robotics
- Advanced Vision and Imaging
- Generative Adversarial Networks and Image Synthesis
- Mobile Crowdsensing and Crowdsourcing
- Face recognition and analysis
- Industrial Vision Systems and Defect Detection
- 3D Surveying and Cultural Heritage
- Speech and Audio Processing
- Robotics and Sensor-Based Localization
- Image Processing and 3D Reconstruction
- Advanced Bandit Algorithms Research
- CCD and CMOS Imaging Sensors
- Computer Graphics and Visualization Techniques
- Mechanics and Biomechanics Studies
- Fault Detection and Control Systems
- Optical measurement and interference techniques
- Astronomical Observations and Instrumentation
- Machine Learning and Data Classification
- Advanced Image and Video Retrieval Techniques
Tsinghua University
2023-2024
The generation of stylistic 3D facial animations driven by speech presents a significant challenge as it requires learning many-to-many mapping between speech, style, and the corresponding natural motion. However, existing methods either employ deterministic model for speech-to-motion or encode style using one-hot encoding scheme. Notably, approach fails to capture complexity thus limits generalization ability. In this paper, we propose DiffPoseTalk, generative framework based on diffusion...
Few-shot object detection (FSOD) aims to detect novel targets with only a few instances of the associated samples. Although combinations distillation techniques and meta-learning paradigms have been acknowledged as primary strategies for FSOD tasks, existing methods exhibit inherent biases sensitivity class variability. A critical hurdle is difficulty in ensuring appropriate knowledge learned from teacher model during fine-tuning stage. Furthermore, coarse procedures risk misalignment...
The reconstruction of indoor scenes from multi-view RGB images is challenging due to the coexistence flat and texture-less regions alongside delicate fine-grained regions. Recent methods leverage neural radiance fields aided by predicted surface normal priors recover scene geometry. These excel in producing complete smooth results for floor wall areas. However, they struggle capture complex surfaces with high-frequency structures inadequate representation inaccurately priors. This work aims...
Some robust point cloud registration approaches with controllable pose refinement magnitude, such as ICP and its variants, are commonly used to improve 6D estimation accuracy. However, the effectiveness of these methods gradually diminishes advancement deep learning techniques enhancement initial accuracy, primarily due their lack specific design for refinement. In this paper, we propose Point Cloud Completion Keypoint Refinement Fusion Data (PCKRF), a new pipeline estimation. The consists...
Neural implicit representations have revolutionized dense multi-view surface reconstruction, yet their performance significantly diminishes with sparse input views. A few pioneering works sought to tackle this challenge by leveraging additional geometric priors or multi-scene generalizability. However, they are still hindered the imperfect choice of views, using images under empirically determined viewpoints. We propose PVP-Recon , a novel and effective sparse-view reconstruction method that...
Unsupervised reinforcement learning aims at a generalist policy in reward-free manner for fast adaptation to downstream tasks. Most of the existing methods propose provide an intrinsic reward based on surprise. Maximizing or minimizing surprise drives agent either explore gain control over its environment. However, both strategies rely strong assumption: entropy environment's dynamics is high low. This assumption may not always hold real-world scenarios, where be unknown. Hence, choosing...
Training practical agents usually involve offline and online reinforcement learning (RL) to balance the policy's performance interaction costs. In particular, fine-tuning has become a commonly used method correct erroneous estimates of out-of-distribution data learned in training phase. However, even limited interactions can be inaccessible or catastrophic for high-stake scenarios like healthcare autonomous driving. this work, we introduce an interaction-free scheme dubbed...
The generation of stylistic 3D facial animations driven by speech poses a significant challenge as it requires learning many-to-many mapping between speech, style, and the corresponding natural motion. However, existing methods either employ deterministic model for speech-to-motion or encode style using one-hot encoding scheme. Notably, approach fails to capture complexity thus limits generalization ability. In this paper, we propose DiffPoseTalk, generative framework based on diffusion...
Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-training on pre-collected dataset with fine-tuning in an online environment. However, the incorporation of can intensify well-known distributional shift problem. Existing solutions tackle this problem by imposing policy constraint improvement objective both offline and learning. They typically advocate single balance between constraints across diverse data collections. This one-size-fits-all manner may not...