- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Advanced Image and Video Retrieval Techniques
- Advanced Image Processing Techniques
- Computer Graphics and Visualization Techniques
- Image Enhancement Techniques
- 3D Shape Modeling and Analysis
- Human Pose and Action Recognition
- Multimodal Machine Learning Applications
- Human Motion and Animation
- Advanced Neural Network Applications
- Domain Adaptation and Few-Shot Learning
- Image Processing Techniques and Applications
- Face recognition and analysis
- Image Retrieval and Classification Techniques
- Aesthetic Perception and Analysis
- Cell Image Analysis Techniques
- Advanced Numerical Analysis Techniques
- Robotics and Sensor-Based Localization
- Video Surveillance and Tracking Methods
- Advanced Fluorescence Microscopy Techniques
- Hand Gesture Recognition Systems
- ECG Monitoring and Analysis
- Adversarial Robustness in Machine Learning
- 3D Surveying and Cultural Heritage
University of Oxford
2023-2025
South China University of Technology
2024
Monash University
2022
Nanyang Technological University
2018-2022
Beihang University
2016-2017
Most image completion methods produce only one result for each masked input, although there may be many reasonable possibilities. In this paper, we present an approach pluralistic - the task of generating multiple and diverse plausible solutions completion. A major challenge faced by learning-based approaches is that usually ground truth training instance per label. As such, sampling from conditional VAEs still leads to minimal diversity. To overcome this, propose a novel probabilistically...
Bridging global context interactions correctly is important for high-fidelity image completion with large masks. Previous methods attempting this via deep or receptive field (RF) convolutions cannot escape from the dominance of nearby interactions, which may be inferior. In paper, we propose to treat as a directionless sequence-to-sequence prediction task, and deploy transformer directly capture long-range depen-dence. Crucially, employ restrictive CNN small non-overlapping RF weighted token...
We propose a novel spatially-correlative loss that is simple, efficient and yet effective for preserving scene structure consistency while supporting large appearance changes during unpaired image-to-image (I2I) translation. Previous methods attempt this by using pixel-level cycle-consistency or feature-level matching losses, but the domain-specific nature of these losses hinder translation across domain gaps. To address this, we exploit spatial patterns self-similarity as means defining...
Portraiture as an art form has evolved from realistic depiction into a plethora of creative styles. While substantial progress been made in automated stylization, generating high quality stylistic portraits is still challenge, and even the recent popular Toonify suffers several artifacts when used on real input images. Such StyleGAN-based methods have focused finding best latent inversion mapping for reconstructing images; however, our key insight that this does not lead to good...
We present a unified and flexible framework to address the generalized problem of 3D motion synthesis that covers tasks prediction, completion, interpolation, spatial-temporal recovery. Since these have different input constraints various fidelity diversity requirements, most existing approaches only cater specific task or use architectures tasks. Here we propose based on Conditional Variational Auto-Encoder (CVAE), where treat any arbitrary as masked series. Notably, by considering this...
Vector Quantisation (VQ) is experiencing a comeback in machine learning, where it increasingly used representation learning. However, optimizing the codevectors existing VQ-VAE not entirely trivial. A problem codebook collapse, only small subset of receive gradients useful for their optimisation, whereas majority them simply "dies off" and never updated or used. This limits effectiveness VQ learning larger codebooks complex computer vision tasks that require high-capacity representations. In...
We present a new generalizable NeRF method that is able to directly generalize unseen scenarios and perform novel view synthesis with as few two source views. The key our approach lies in the explicitly modeled correspondence matching information, so provide geometry prior prediction of color density for volume rendering. explicit quantified cosine similarity between image features sampled at 2D projections 3D point on different views, which reliable cues about surface geometry. Unlike...
Recent advances in generative models like Stable Diffusion enable the generation of highly photo-realistic images. Our objective this paper is to probe diffusion network determine what extent it 'understands' different properties 3D scene depicted an image. To end, we make following contributions: (i) We introduce a protocol evaluate whether features off-the-shelf model encode number physical 'properties' scene, by training discriminative classifiers on for these properties. The probes are...
Although two-stage Vector Quantized (VQ) generative models allow for synthesizing high-fidelity and high-resolution images, their quantization operator encodes similar patches within an image into the same index, resulting in a repeated artifact adjacent regions using existing decoder architectures. To address this issue, we propose to incorporate spatially conditional normalization modulate quantized vectors so as insert variant information embedded index maps, encouraging generate more...
Learning deep discrete latent presentations offers a promise of better symbolic and summarized abstractions that are more useful to subsequent downstream tasks. Inspired by the seminal Vector Quantized Variational Auto-Encoder (VQ-VAE), most work in learning representations has mainly focused on improving original VQ-VAE form none them studied from generative viewpoint. In this work, we study Specifically, endow distributions over sequences codewords learn deterministic decoder transports...
We propose MVSplat, an efficient feed-forward 3D Gaussian Splatting model learned from sparse multi-view images. To accurately localize the centers, we to build a cost volume representation via plane sweeping in space, where cross-view feature similarities stored can provide valuable geometry cues estimation of depth. learn primitives' opacities, covariances, and spherical harmonics coefficients jointly with centers while only relying on photometric supervision. demonstrate importance...
We introduce PICFormer, a novel framework for <bold xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">P</b> luralistic xmlns:xlink="http://www.w3.org/1999/xlink">I</b> mage xmlns:xlink="http://www.w3.org/1999/xlink">C</b> ompletion using trans xmlns:xlink="http://www.w3.org/1999/xlink">Former</b> based architecture, that achieves both high quality and diversity at much faster inference speed. Our key contribution is to <italic...