Chuanxia Zheng

ORCID: 0000-0002-3584-9640
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Advanced Image and Video Retrieval Techniques
  • Advanced Image Processing Techniques
  • Computer Graphics and Visualization Techniques
  • Image Enhancement Techniques
  • 3D Shape Modeling and Analysis
  • Human Pose and Action Recognition
  • Multimodal Machine Learning Applications
  • Human Motion and Animation
  • Advanced Neural Network Applications
  • Domain Adaptation and Few-Shot Learning
  • Image Processing Techniques and Applications
  • Face recognition and analysis
  • Image Retrieval and Classification Techniques
  • Aesthetic Perception and Analysis
  • Cell Image Analysis Techniques
  • Advanced Numerical Analysis Techniques
  • Robotics and Sensor-Based Localization
  • Video Surveillance and Tracking Methods
  • Advanced Fluorescence Microscopy Techniques
  • Hand Gesture Recognition Systems
  • ECG Monitoring and Analysis
  • Adversarial Robustness in Machine Learning
  • 3D Surveying and Cultural Heritage

University of Oxford
2023-2025

South China University of Technology
2024

Monash University
2022

Nanyang Technological University
2018-2022

Beihang University
2016-2017

Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach pluralistic - the task of generating multiple and diverse plausible solutions completion. A major challenge faced by learning-based approaches is that usually ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, propose a novel probabilistically...

10.1109/cvpr.2019.00153 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In paper, we propose to treat as a directionless sequence-to-sequence prediction task, and deploy transformer directly capture long-range depen-dence. Crucially, employ restrictive CNN small non-overlapping RF weighted token...

10.1109/cvpr52688.2022.01122 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation. Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses, but the domain-specific nature of these losses hinder translation across domain gaps. To address this, we exploit spatial patterns self-similarity as means defining...

10.1109/cvpr46437.2021.01614 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021-06-01

Portraiture as an art form has evolved from realistic depiction into a plethora of creative styles. While substantial progress been made in automated stylization, generating high quality stylistic portraits is still challenge, and even the recent popular Toonify suffers several artifacts when used on real input images. Such StyleGAN-based methods have focused finding best latent inversion mapping for reconstructing images; however, our key insight that this does not lead to good...

10.1145/3450626.3459771 article EN ACM Transactions on Graphics 2021-07-19

We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers tasks prediction, completion, interpolation, spatial-temporal recovery. Since these have different input constraints various fidelity diversity requirements, most existing approaches only cater specific task or use architectures tasks. Here we propose based on Conditional Variational Auto-Encoder (CVAE), where treat any arbitrary as masked series. Notably, by considering this...

10.1109/iccv48922.2021.01144 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

10.1109/cvpr52733.2024.02645 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

10.1109/cvpr52733.2024.00928 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it increasingly used representation learning. However, optimizing the codevectors existing VQ-VAE not entirely trivial. A problem codebook collapse, only small subset of receive gradients useful for their optimisation, whereas majority them simply "dies off" and never updated or used. This limits effectiveness VQ learning larger codebooks complex computer vision tasks that require high-capacity representations. In...

10.1109/iccv51070.2023.02084 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

We present a new generalizable NeRF method that is able to directly generalize unseen scenarios and perform novel view synthesis with as few two source views. The key our approach lies in the explicitly modeled correspondence matching information, so provide geometry prior prediction of color density for volume rendering. explicit quantified cosine similarity between image features sampled at 2D projections 3D point on different views, which reliable cues about surface geometry. Unlike...

10.48550/arxiv.2304.12294 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

10.1007/s11263-021-01502-7 article EN International Journal of Computer Vision 2021-07-30

Recent advances in generative models like Stable Diffusion enable the generation of highly photo-realistic images. Our objective this paper is to probe diffusion network determine what extent it 'understands' different properties 3D scene depicted an image. To end, we make following contributions: (i) We introduce a protocol evaluate whether features off-the-shelf model encode number physical 'properties' scene, by training discriminative classifiers on for these properties. The probes are...

10.48550/arxiv.2310.06836 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate spatially conditional normalization modulate quantized vectors so as insert variant information embedded index maps, encouraging generate more...

10.48550/arxiv.2209.09002 preprint EN cc-by-nc-sa arXiv (Cornell University) 2022-01-01

Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most work in learning representations has mainly focused on improving original VQ-VAE form none them studied from generative viewpoint. In this work, we study Specifically, endow distributions over sequences codewords learn deterministic decoder transports...

10.48550/arxiv.2302.05917 preprint EN public-domain arXiv (Cornell University) 2023-01-01

We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the centers, we to build a cost volume representation via plane sweeping in space, where cross-view feature similarities stored can provide valuable geometry cues estimation of depth. learn primitives' opacities, covariances, and spherical harmonics coefficients jointly with centers while only relying on photometric supervision. demonstrate importance...

10.48550/arxiv.2403.14627 preprint EN arXiv (Cornell University) 2024-03-21

We introduce PICFormer, a novel framework for <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">P</b> luralistic xmlns:xlink="http://www.w3.org/1999/xlink">I</b> mage xmlns:xlink="http://www.w3.org/1999/xlink">C</b> ompletion using trans xmlns:xlink="http://www.w3.org/1999/xlink">Former</b> based architecture, that achieves both high quality and diversity at much faster inference speed. Our key contribution is to <italic...

10.1109/tpami.2024.3403695 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2024-05-21
Coming Soon ...