Yinhuai Wang

ORCID: 0000-0003-4601-4881
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Advanced Image Processing Techniques
  • Advanced Vision and Imaging
  • Generative Adversarial Networks and Image Synthesis
  • Image Processing Techniques and Applications
  • Face recognition and analysis
  • Image Enhancement Techniques
  • Advanced Neuroimaging Techniques and Applications
  • Facial Nerve Paralysis Treatment and Research
  • Robotics and Sensor-Based Localization
  • Facial Rejuvenation and Surgery Techniques
  • Robot Manipulation and Learning
  • Gaze Tracking and Assistive Technology
  • Human Pose and Action Recognition
  • Remote Sensing and LiDAR Applications
  • Advanced Neural Network Applications
  • 3D Shape Modeling and Analysis
  • Radiomics and Machine Learning in Medical Imaging
  • Photoacoustic and Ultrasonic Imaging
  • Educational Games and Gamification
  • Image and Signal Denoising Methods
  • Medical Imaging Techniques and Applications
  • Music and Audio Processing
  • 3D Surveying and Cultural Heritage
  • Online Learning and Analytics
  • Physical Education and Pedagogy

Peking University
2023-2024

Peking University Shenzhen Hospital
2022-2023

Central South University
2020

Most existing Image Restoration (IR) models are task-specific, which can not be generalized to different degradation operators. In this work, we propose the Denoising Diffusion Null-Space Model (DDNM), a novel zero-shot framework for arbitrary linear IR problems, including but limited image super-resolution, colorization, inpainting, compressed sensing, and deblurring. DDNM only needs pre-trained off-the-shelf diffusion model as generative prior, without any extra training or network...

10.48550/arxiv.2212.00490 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need train a time-dependent classifier or condition-dependent score estimator, which increases the cost of constructing and is inconvenient transfer across different conditions. Some current works aim overcome this limitation by proposing training-free solutions, but most can only be applied specific...

10.1109/iccv51070.2023.02118 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Consistency and realness have always been the two critical issues of image super-resolution. While has dramatically improved with use GAN prior, state-of-the-art methods still suffer inconsistencies in local structures colors (e.g., tooth eyes). In this paper, we show that these can be analytically eliminated by learning only null-space component while fixing range-space part. Further, design a pooling-based decomposition (PD), universal range-null space for super-resolution tasks, which is...

10.1609/aaai.v37i3.25372 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2023-06-26

With the development of generative adversarial networks (GANs), recent face restoration (FR) methods often utilize pre-trained GAN models (i.e.,, StyleGAN2) as prior to generate rich details. However, these usually struggle balance realness and fidelity when facing various degradation levels. In this paper, we propose a novel DEgradation-Aware Restoration network with prior, dubbed DEAR-GAN, for FR tasks by explicitly learning representations (DR) adapt degradation. Specifically, an...

10.1109/tcsvt.2023.3244786 article EN IEEE Transactions on Circuits and Systems for Video Technology 2023-02-16

Recently, using diffusion models for zero-shot image restoration (IR) has become a new hot paradigm. This type of method only needs to use the pre-trained off-the-shelf models, without any finetuning, and can directly handle various IR tasks. The upper limit performance depends on which are in rapid evolution. However, current methods discuss how deal with fixed-size images, but dealing images arbitrary sizes is very important practical applications. paper focuses those diffusion-based size...

10.1109/cvprw59228.2023.00123 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2023-06-01

Emerging high-quality face restoration (FR) methods often utilize pre-trained GAN models (i.e., StyleGAN2) as Prior. However, these usually struggle to balance realness and fidelity when facing various degradation levels. Besides, there is still a noticeable visual quality gap compared with models. In this paper, we propose novel Prior based degradation-aware feature interpolation network, dubbed Panini-Net, for FR tasks by explicitly learning the abstract representations distinguish...

10.1609/aaai.v36i3.20159 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2022-06-28

Point cloud completion aims at estimating the complete data of objects from degraded observations. Despite existing methods achieving impressive performances, they rely heavily on degraded-complete pairs for supervision. In this work, we propose a novel framework named Null-Space Diffusion Sampling (NSDS) to solve point task in zero-shot manner. By leveraging pre-trained diffusion model as off-the-shelf generator, our sampling approach can generate desired outputs with guidance observed...

10.24963/ijcai.2023/69 article EN 2023-08-01

Position information is critical for Vision Transformers (VTs) due to the permutation-invariance of self-attention operations. A typical way introduce position adding absolute Embedding (PE) patch embedding before entering VTs. However, this approach operates same Layer Normalization (LN) token and PE, delivers PE each layer. This results in restricted monotonic across layers, as shared LN affine parameters are not dedicated cannot be adjusted on a per-layer basis. To overcome these...

10.1109/iccv51070.2023.00541 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Neural radiance fields (NeRF) bring a new wave for 3D interactive experiences. However, as an important part of the immersive experiences, defocus effects have not been fully explored within NeRF. Some recent NeRF-based methods generate in post-process fashion by utilizing multiplane technology. Still, they are either time-consuming or memory-consuming. This paper proposes novel thin-lens-imaging-based NeRF framework that can directly render various effects, dubbed NeRFocus. Unlike pinhole,...

10.48550/arxiv.2203.05189 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recently, conditional diffusion models have gained popularity in numerous applications due to their exceptional generation ability. However, many existing methods are training-required. They need train a time-dependent classifier or condition-dependent score estimator, which increases the cost of constructing and is inconvenient transfer across different conditions. Some current works aim overcome this limitation by proposing training-free solutions, but most can only be applied specific...

10.48550/arxiv.2303.09833 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Existing unsupervised low-light image enhancement methods lack enough effectiveness and generalization in practical applications. We suppose this is because of the absence explicit supervision inherent gap between real-world scenarios training data domain. In paper, we develop Diffusion-based domain calibration to realize more robust effective Low-Light Enhancement, called DiffLLE. Since diffusion model performs impressive denoising capability has been trained on massive clean images, adopt...

10.48550/arxiv.2308.09279 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Humans interact with objects all the time. Enabling a humanoid to learn human-object interaction (HOI) is key step for future smart animation and intelligent robotics systems. However, recent progress in physics-based HOI requires carefully designed task-specific rewards, making system unscalable labor-intensive. This work focuses on dynamic imitation: teaching skills through imitating kinematic demonstrations. It quite challenging because of complexity between body parts lack data. To...

10.48550/arxiv.2312.04393 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Mastering basketball skills such as diverse layups and dribbling involves complex interactions with the ball requires real-time adjustments. Traditional reinforcement learning methods for interaction rely on labor-intensive, manually designed rewards that do not generalize well across different skills. Inspired by how humans learn from demonstrations, we propose SkillMimic, a data-driven approach mimics both human motions to wide variety of SkillMimic employs unified configuration human-ball...

10.48550/arxiv.2408.15270 preprint EN arXiv (Cornell University) 2024-08-12

The Position Embedding (PE) is critical for Vision Transformers (VTs) due to the permutation-invariance of self-attention operation. By analyzing input and output each encoder layer in VTs using reparameterization visualization, we find that default PE joining method (simply adding patch embedding together) operates same affine transformation token PE, which limits expressiveness hence constrains performance VTs. To overcome this limitation, propose a simple, effective, robust method....

10.48550/arxiv.2212.05262 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recently, using diffusion models for zero-shot image restoration (IR) has become a new hot paradigm. This type of method only needs to use the pre-trained off-the-shelf models, without any finetuning, and can directly handle various IR tasks. The upper limit performance depends on which are in rapid evolution. However, current methods discuss how deal with fixed-size images, but dealing images arbitrary sizes is very important practical applications. paper focuses those diffusion-based size...

10.48550/arxiv.2303.00354 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Emerging high-quality face restoration (FR) methods often utilize pre-trained GAN models (\textit{i.e.}, StyleGAN2) as Prior. However, these usually struggle to balance realness and fidelity when facing various degradation levels. Besides, there is still a noticeable visual quality gap compared with models. In this paper, we propose novel Prior based degradation-aware feature interpolation network, dubbed Panini-Net, for FR tasks by explicitly learning the abstract representations...

10.48550/arxiv.2203.08444 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Consistency and realness have always been the two critical issues of image super-resolution. While has dramatically improved with use GAN prior, state-of-the-art methods still suffer inconsistencies in local structures colors (e.g., tooth eyes). In this paper, we show that these can be analytically eliminated by learning only null-space component while fixing range-space part. Further, design a pooling-based decomposition (PD), universal range-null space for super-resolution tasks, which is...

10.48550/arxiv.2211.13524 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...