- Generative Adversarial Networks and Image Synthesis
- Advanced Vision and Imaging
- Computer Graphics and Visualization Techniques
- Advanced Image and Video Retrieval Techniques
- Advanced Image Processing Techniques
- Image Retrieval and Classification Techniques
- 3D Shape Modeling and Analysis
- Digital Media Forensic Detection
- Video Analysis and Summarization
- Domain Adaptation and Few-Shot Learning
- Advanced Neural Network Applications
- Cell Image Analysis Techniques
- Anomaly Detection Techniques and Applications
- Image Processing and 3D Reconstruction
- Multimodal Machine Learning Applications
- Tactile and Sensory Interactions
- Image Enhancement Techniques
- Topic Modeling
- Visual Attention and Saliency Detection
- Medical Image Segmentation Techniques
- Machine Learning and Data Classification
- Model Reduction and Neural Networks
- Interactive and Immersive Displays
- Artificial Intelligence in Games
- AI in cancer detection
Carnegie Mellon University
2021-2025
Baylor University
2025
Central China Normal University
2024
Institute of Microelectronics
2020-2024
University of Chinese Academy of Sciences
2020-2024
Chinese Academy of Sciences
2020-2024
Queen Mary University of London
2023
Wuhan Institute of Technology
2022
Adobe Systems (United States)
2018-2021
Massachusetts Institute of Technology
2018-2020
We investigate conditional adversarial networks as a general-purpose solution to image-to-image translation problems. These not only learn the mapping from input image output image, but also loss function train this mapping. This makes it possible apply same generic approach problems that traditionally would require very different formulations. demonstrate is effective at synthesizing photos label maps, reconstructing objects edge and colorizing images, among other tasks. Moreover, since...
Image-to-image translation is a class of vision and graphics problems where the goal to learn mapping between an input image output using training set aligned pairs. However, for many tasks, paired data will not be available. We present approach learning translate from source domain X target Y in absence examples. Our G : → such that distribution images G(X) indistinguishable adversarial loss. Because this highly under-constrained, we couple it with inverse F introduce cycle consistency loss...
We present a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs). Conditional GANs have enabled variety of applications, but the results are often limited to low-resolution and still far realistic. In this work, we generate 2048 × 1024 visually appealing with novel loss, as well multi-scale generator discriminator architectures. Furthermore, extend our framework interactive visual...
We propose spatially-adaptive normalization, a simple but effective layer for synthesizing photorealistic images given an input semantic layout. Previous methods directly feed the layout as to network, forcing network memorize information throughout all layers. Instead, we using modulating activations in normalization layers through spatially-adaptive, learned affine transformation. Experiments on several challenging datasets demonstrate superiority of our method compared existing...
Domain adaptation is critical for success in new, unseen environments. Adversarial models applied feature spaces discover domain invariant representations, but are difficult to visualize and sometimes fail capture pixel-level low-level shifts. Recent work has shown that generative adversarial networks combined with cycle-consistency constraints surprisingly effective at mapping images between domains, even without the use of aligned image pairs. We propose a novel discriminatively-trained...
Image-to-image translation is a class of vision and graphics problems where the goal to learn mapping between an input image output using training set aligned pairs. However, for many tasks, paired data will not be available. We present approach learning translate from source domain $X$ target $Y$ in absence examples. Our $G: X \rightarrow Y$ such that distribution images $G(X)$ indistinguishable adversarial loss. Because this highly under-constrained, we couple it with inverse $F: Y X$...
Many image-to-image translation problems are ambiguous, as a single input image may correspond to multiple possible outputs. In this work, we aim model \emph{distribution} of outputs in conditional generative modeling setting. The ambiguity the mapping is distilled low-dimensional latent vector, which can be randomly sampled at test time. A generator learns map given input, combined with code, output. We explicitly encourage connection between output and code invertible. This helps prevent...
Deep neural networks (DNNs) have been found to be vulnerable adversarial examples resulting from adding small-magnitude perturbations inputs. Such can mislead DNNs produce adversary-selected results. Different attack strategies proposed generate examples, but how them with high perceptual quality and more efficiently requires research efforts. In this paper, we propose AdvGAN exam- ples generative (GANs), which learn approximate the distribution of original instances. For AdvGAN, once...
A commonly observed failure mode of Neural Radiance Field (NeRF) is fitting incorrect geometries when given an insufficient number input views. One potential reason that standard volumetric rendering does not enforce the constraint most a scene's geometry consist empty space and opaque surfaces. We formalize above assumption through DS-NeRF (Depth-supervised Fields), loss for learning radiance fields takes advantage readily-available depth supervision. leverage fact current NeRF pipelines...
We propose a deep learning approach for user-guided image colorization. The system directly maps grayscale image, along with sparse, local user "hints" to an output colorization Convolutional Neural Network (CNN). Rather than using hand-defined rules, the network propagates edits by fusing low-level cues high-level semantic information, learned from large-scale data. train on million images, simulated inputs. To guide towards efficient input selection, recommends likely colors based and...
Deep neural networks excel at finding hierarchical representations that solve complex tasks over large data sets. How can we humans understand these learned representations? In this work, present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and generation networks. First, analyze a convolutional (CNN) trained on scene discover match diverse set object concepts. We find evidence has many classes play...
Despite the recent success of GANs in synthesizing images conditioned on inputs such as a user sketch, text, or semantic labels, manipulating high-level attributes an existing natural photograph with is challenging for two reasons. First, it hard to precisely reproduce input image. Second, after manipulation, newly synthesized pixels often do not fit original In this paper, we address these issues by adapting image prior learned statistics individual Our method can accurately reconstruct and...
Despite the success of Generative Adversarial Networks (GANs), mode collapse remains a serious issue during GAN training. To date, little work has focused on understanding and quantifying which modes have been dropped by model. In this work, we visualize at both distribution level instance level. First, deploy semantic segmentation network to compare segmented objects in generated images with target training set. Differences statistics reveal object classes that are omitted GAN. Second,...
The performance of generative adversarial networks (GANs) heavily deteriorates given a limited amount training data. This is mainly because the discriminator memorizing exact set. To combat it, we propose Differentiable Augmentation (DiffAugment), simple method that improves data efficiency GANs by imposing various types differentiable augmentations on both real and fake samples. Previous attempts to directly augment manipulate distribution images, yielding little benefit; DiffAugment...
While generative models produce high-quality images of concepts learned from a large-scale database, user often wishes to synthesize instantiations their own (for example, family, pets, or items). Can we teach model quickly acquire new concept, given few examples? Furthermore, can compose multiple together? We propose Custom Diffusion, an efficient method for augmenting existing text-to-image models. find that only optimizing parameters in the conditioning mechanism is sufficiently powerful...
Recent studies show that widely used deep neural networks (DNNs) are vulnerable to carefully crafted adversarial examples. Many advanced algorithms have been proposed generate examples by leveraging the $\mathcal{L}_p$ distance for penalizing perturbations. Researchers explored different defense methods defend against such attacks. While effectiveness of as a metric perceptual quality remains an active research area, in this paper we will instead focus on type perturbation, namely spatial...
Generative Adversarial Networks (GANs) have recently achieved impressive results for many real-world applications, and GAN variants emerged with improvements in sample quality training stability. However, they not been well visualized or understood. How does a represent our visual world internally? What causes the artifacts results? do architectural choices affect learning? Answering such questions could enable us to develop new insights better models. In this work, we present an analytic...
The recent success of text-to-image synthesis has taken the world by storm and captured general public's imagination. From a technical standpoint, it also marked drastic change in favored architecture to design generative image models. GANs used be de facto choice, with techniques like StyleGAN. With DALL.E 2, autoregressive diffusion models became new standard for large-scale overnight. This rapid shift raises fundamental question: can we scale up benefit from large datasets LAION? We find...
Deep neural networks (DNNs) have been found to be vulnerable adversarial examples resulting from adding small-magnitude perturbations inputs. Such can mislead DNNs produce adversary-selected results. Different attack strategies proposed generate examples, but how them with high perceptual quality and more efficiently requires research efforts. In this paper, we propose AdvGAN generative (GANs), which learn approximate the distribution of original instances. For AdvGAN, once generator is...
Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse, high-quality images. However, directly applying these for real image editing remains challenging two reasons. First, it is hard users craft a perfect text prompt depicting every visual detail in the input image. Second, while existing can introduce desirable changes certain regions, they often dramatically alter content and unexpected unwanted regions. In this work, we pix2pix-zero, an...