Xingang Pan

ORCID: 0000-0002-5825-9467
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • 3D Shape Modeling and Analysis
  • Advanced Vision and Imaging
  • Computer Graphics and Visualization Techniques
  • Face recognition and analysis
  • Advanced Neural Network Applications
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Advanced Image Processing Techniques
  • Advanced Image and Video Retrieval Techniques
  • Image Processing and 3D Reconstruction
  • Image Enhancement Techniques
  • Human Pose and Action Recognition
  • COVID-19 diagnosis using AI
  • Image Processing Techniques and Applications
  • Robotics and Sensor-Based Localization
  • Adversarial Robustness in Machine Learning
  • Digital Media Forensic Detection
  • Autonomous Vehicle Technology and Safety
  • Facial Nerve Paralysis Treatment and Research
  • Infrared Target Detection Methodologies
  • Advanced Materials Characterization Techniques
  • Infrastructure Maintenance and Monitoring
  • Medical Imaging Techniques and Applications
  • Oil Spill Detection and Mitigation

Nanyang Technological University
2023-2024

Max Planck Institute for Informatics
2022-2024

Inner Mongolia Electric Power Survey & Design Institute (China)
2024

Google (United States)
2023

Max Planck Center for Visual Computing and Communication
2023

Max Planck Institute for Mathematics
2023

Chinese University of Hong Kong
2018-2021

University of Michigan–Ann Arbor
2013

Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity capture spatial relationships of pixels across rows and columns an image is not fully explored. These important learn semantic objects with shape priors but weak appearance coherences, such as traffic lanes, which often occluded or even painted on the road surface in Fig. 1 (a). In this paper,...

10.1609/aaai.v32i1.12301 article EN Proceedings of the AAAI Conference on Artificial Intelligence 2018-04-27

Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity capture spatial relationships of pixels across rows and columns an image is not fully explored. These important learn semantic objects with shape priors but weak appearance coherences, such as traffic lanes, which often occluded or even painted on the road surface in Fig. 1 (a). In this paper,...

10.48550/arxiv.1712.06080 preprint EN other-oa arXiv (Cornell University) 2017-01-01

Learning a good image prior is long-term goal for restoration and manipulation. While existing methods like deep (DIP) capture low-level statistics, there are still gaps toward an that captures rich semantics including color, spatial coherence, textures, high-level concepts. This work presents effective way to exploit the captured by generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig. 1, (DGP) provides compelling results restore missing semantics,...

10.1109/tpami.2021.3115428 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2021-09-24

Synthesizing visual content that meets users' needs often requires flexible and precise controllability of the pose, shape, expression, layout generated objects. Existing approaches gain generative adversarial networks (GANs) via manually annotated training data or a prior 3D model, which lack flexibility, precision, generality. In this work, we study powerful yet much less explored way controlling GANs, is, to "drag" any points image precisely reach target in user-interactive manner, as...

10.1145/3588432.3591500 article EN cc-by 2023-07-19

Normalization methods are essential components in convolutional neural networks (CNNs). They either standardize or whiten data using statistics estimated predefined sets of pixels. Unlike existing works that design normalization techniques for specific tasks, we propose Switchable Whitening (SW), which provides a general form unifying different whitening as well standardization methods. SW learns to switch among these operations an end-to-end manner. It has several advantages. First,...

10.1109/iccv.2019.00195 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2019-10-01

A typical domain adaptation approach is to adapt models trained on the annotated data in a source (e.g., sunny weather) for achieving high performance test target rainy weather). Whether contains single homogeneous or multiple heterogeneous domains, existing works always assume that there exist clear distinctions between which often not true practice changes We study an open compound (OCDA) problem, of domains without labels, reflecting realistic collection from mixed and novel situations....

10.1109/cvpr42600.2020.01242 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Natural scene understanding is a challenging task, particularly when encountering images of multiple objects that are partially occluded. This obstacle given rise by varying object ordering and positioning. Existing paradigms able to parse only the visible parts, resulting in incomplete unstructured interpretation. In this paper, we investigate problem de-occlusion, which aims recover underlying occlusion complete invisible parts occluded objects. We make first attempt address through novel...

10.1109/cvpr42600.2020.00384 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020-06-01

Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous fine-grained mode (e.g., slightly smiling face big laughing one) natural interactions users. In this work, we propose Talk-to-Edit, interactive facial framework that performs attribute manipulation through dialog between the user system. Our key insight model continual "semantic field" GAN latent space. 1) Unlike previous regard as traversing...

10.1109/iccv48922.2021.01354 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

Purely MLP-based neural radiance fields (NeRF-based methods) often suffer from underfitting with blurred renderings on large-scale scenes due to limited model capacity. Recent approaches propose geographically divide the scene and adopt multiple sub-NeRFs each region individually, leading linear scale-up in training costs number of as expands. An alternative solution is use a feature grid representation, which computationally efficient can naturally scale large increased resolutions....

10.1109/cvpr52729.2023.00802 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

Natural images are projections of 3D objects on a 2D image plane. While state-of-the-art generative models like GANs show unprecedented quality in modeling the natural manifold, it is unclear whether they implicitly capture underlying object structures. And if so, how could we exploit such knowledge to recover shapes images? To answer these questions, this work, present first attempt directly mine geometric cues from an off-the-shelf GAN that trained RGB only. Through our investigation,...

10.48550/arxiv.2011.00844 preprint EN other-oa arXiv (Cornell University) 2020-01-01

Learning 3D generative models from a dataset of monocular images enables self-supervised reasoning and controllable synthesis. State-of-the-art are GANs that use neural volumetric representations for Images synthesized by rendering the volumes given camera. These can disentangle scene camera viewpoint in any generated image. However, most do not other factors image formation, such as geometry appearance. In this paper, we design GAN which learn disentangled model objects, just observations....

10.1109/cvpr52688.2022.00157 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022-06-01

Intelligent agent naturally learns from motion. Various self-supervised algorithms have leveraged the motion cues to learn effective visual representations. The hurdle here is that both ambiguous and complex, rendering previous works either suffer degraded learning efficacy, or resort strong assumptions on object motions. In this work, we design a new learning-from-motion paradigm bridge these gaps. Instead of explicitly modeling probabilities, pretext task as conditional propagation...

10.1109/cvpr.2019.00198 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019-06-01

Neural surface reconstruction aims to reconstruct accurate 3D surfaces based on multi-view images. Previous methods neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for single scene. Recent efforts explore the explicit volumetric representation accelerate optimization via memorizing significant information learnable voxel grids. However, existing voxel-based often struggle in reconstructing fine-grained geometry, even when...

10.48550/arxiv.2208.12697 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Capturing and editing full-head performances enables the creation of virtual characters with various applications such as extended reality media production. The past few years witnessed a steep rise in photorealism human head avatars. Such avatars can be controlled through different input data modalities, including RGB, audio, depth, IMUs, others. While these modalities provide effective means control, they mostly focus on movements facial expressions, pose, and/or camera viewpoint. In this...

10.1145/3618368 article EN cc-by ACM Transactions on Graphics 2023-12-05

Multi-view volumetric rendering techniques have recently shown great potential in modeling and synthesizing high-quality head avatars. A common approach to capture full dynamic performances is track the underlying geometry using a mesh-based template or 3D cube-based graphics primitives. While these model-based approaches achieve promising results, they often fail learn complex geometric details such as mouth interior, hair, topological changes over time. This article presents novel building...

10.1145/3649889 article EN cc-by ACM Transactions on Graphics 2024-02-29

Textured 3D morphing creates smooth and plausible interpolation sequences between two objects, focusing on transitions in both shape texture. This is important for creative applications like visual effects filmmaking. Previous methods rely establishing point-to-point correspondences determining deformation trajectories, which inherently restrict them to shape-only untextured, topologically aligned datasets. restriction leads labor-intensive preprocessing poor generalization. To overcome...

10.48550/arxiv.2502.14316 preprint EN arXiv (Cornell University) 2025-02-20

The advancement of generative radiance fields has pushed the boundary 3D-aware image synthesis. Motivated by observation that a 3D object should look realistic from multiple viewpoints, these methods introduce multi-view constraint as regularization to learn valid 2D images. Despite progress, they often fall short capturing accurate shapes due shape-color ambiguity, limiting their applicability in downstream tasks. In this work, we address ambiguity proposing novel shading-guided implicit...

10.48550/arxiv.2110.15678 preprint EN other-oa arXiv (Cornell University) 2021-01-01

3D content creation from a single image is long-standing yet highly desirable task. Recent advances introduce 2D diffusion priors, yielding reasonable results. However, existing methods are not hyper-realistic enough for post-generation usage, as users cannot view, render and edit the resulting full range. To address these challenges, we HyperDreamer with several key designs appealing properties: 1) Full-range viewable: 360° mesh modeling high-resolution textures enables of visually...

10.1145/3610548.3618168 article EN cc-by 2023-12-10

Learning a good image prior is long-term goal for restoration and manipulation. While existing methods like deep (DIP) capture low-level statistics, there are still gaps toward an that captures rich semantics including color, spatial coherence, textures, high-level concepts. This work presents effective way to exploit the captured by generative adversarial network (GAN) trained on large-scale natural images. As shown in Fig.1, (DGP) provides compelling results restore missing semantics,...

10.48550/arxiv.2003.13659 preprint EN other-oa arXiv (Cornell University) 2020-01-01

The advent of generative radiance fields has significantly promoted the development 3D-aware image synthesis. cumulative rendering process in makes training these models much easier since gradients are distributed over entire volume, but leads to diffused object surfaces. In meantime, compared occupancy representations could inherently ensure deterministic However, if we directly apply models, during they will only receive sparse located on surfaces and eventually suffer from convergence...

10.48550/arxiv.2111.00969 preprint EN other-oa arXiv (Cornell University) 2021-01-01
Coming Soon ...