- Generative Adversarial Networks and Image Synthesis
- Domain Adaptation and Few-Shot Learning
- Model Reduction and Neural Networks
- Music and Audio Processing
- Machine Learning in Healthcare
- Multimodal Machine Learning Applications
- Topic Modeling
- Machine Learning and Data Classification
- Computer Graphics and Visualization Techniques
- Image Retrieval and Classification Techniques
- Adversarial Robustness in Machine Learning
- Advanced Image and Video Retrieval Techniques
- Face recognition and analysis
- Gaussian Processes and Bayesian Inference
- Neural Networks and Applications
- Video Analysis and Summarization
- Bayesian Methods and Mixture Models
- Text and Document Classification Technologies
- Anomaly Detection Techniques and Applications
- Chaos-based Image/Signal Encryption
- Human Pose and Action Recognition
- Ferroelectric and Negative Capacitance Devices
- Advanced Image Processing Techniques
- COVID-19 diagnosis using AI
- Cell Image Analysis Techniques
Renmin University of China
2021-2025
Beijing Institute of Big Data Research
2022-2025
Tsinghua University
2015-2023
Robert Bosch (Taiwan)
2020
Generative Adversarial Nets (GANs) have shown promise in image generation and semi-supervised learning (SSL). However, existing GANs SSL two problems: (1) the generator discriminator (i.e. classifier) may not be optimal at same time; (2) cannot control semantics of generated samples. The problems essentially arise from two-player formulation, where a single shares incompatible roles identifying fake samples predicting labels it only estimates data without considering labels. To address...
Diffusion probabilistic models (DPMs) are emerging powerful generative models. Despite their high-quality generation performance, DPMs still suffer from slow sampling as they generally need hundreds or thousands of sequential function evaluations (steps) large neural networks to draw a sample. Sampling can be viewed alternatively solving the corresponding diffusion ordinary differential equations (ODEs). In this work, we propose an exact formulation solution ODEs. The analytically computes...
Score distillation sampling (SDS) has shown great promise in text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models, but suffers from over-saturation, over-smoothing, and low-diversity problems. In this work, we propose to model the 3D parameter as a random variable instead of constant SDS present variational score (VSD), principled particle-based framework explain address aforementioned issues generation. We show that is special case VSD leads poor samples...
Diffusion probabilistic models (DPMs) have achieved impressive success in high-resolution image synthesis, especially recent large-scale text-to-image generation applications. An essential technique for improving the sample quality of DPMs is guided sampling, which usually needs a large guidance scale to obtain best quality. The commonly-used fast sampler sampling DDIM, first-order diffusion ODE solver that generally 100 250 steps high-quality samples. Although works propose dedicated...
Vision transformers (ViT) have shown promise in various vision tasks while the U-Net based on a convolutional neural network (CNN) remains dominant diffusion models. We design simple and general ViT-based architecture (named U-ViT) for image generation with U-ViT is characterized by treating all inputs including time, condition noisy patches as tokens employing long skip connections between shallow deep layers. evaluate unconditional classconditional generation, well text-to-image tasks,...
Diffusion probabilistic models (DPMs) represent a class of powerful generative models. Despite their success, the inference DPMs is expensive since it generally needs to iterate over thousands timesteps. A key problem in estimate variance each timestep reverse process. In this work, we present surprising result that both optimal and corresponding KL divergence DPM have analytic forms w.r.t. its score function. Building upon it, propose Analytic-DPM, training-free framework estimates using...
Automatically writing stylized characters is an attractive yet challenging task, especially for Chinese with complex shapes and structures. Most current methods are restricted to generate already present in the training set, but required retrain model when generating of new styles. In this paper, we develop a novel framework Style-Aware Variational Auto-Encoder (SA-VAE), which disentangles content-relevant style-relevant components character feature intercross pair-wise optimization method....
Continual learning usually assumes the incoming data are fully labeled, which might not be applicable in real applications. In this work, we consider semi-supervised continual (SSCL) that incrementally learns from partially labeled data. Observing existing methods lack ability to continually exploit unlabeled data, propose deep Online Replay with Discriminator Consistency (ORDisCo) interdependently learn a classifier conditional generative adversarial network (GAN), passes learned...
Large vision-language models (VLMs) such as GPT-4 have achieved unprecedented performance in response generation, especially with visual inputs, enabling more creative and adaptable interaction than large language ChatGPT. Nonetheless, multimodal generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating most vulnerable modality (e.g., vision). To this end, we propose evaluating robustness of open-source VLMs realistic high-risk...
We propose a unified game-theoretical framework to perform classification and conditional image generation given limited supervision. It is formulated as three-player minimax game consisting of generator, classifier discriminator, therefore referred Triple Generative Adversarial Network (Triple-GAN). The generator the characterize distributions between images labels classification, respectively. discriminator solely focuses on identifying fake image-label pairs. Theoretically, formulation...
Continual learning needs to overcome catastrophic forgetting of the past. Memory replay representative old training samples has been shown as an effective solution, and achieves state-of-the-art (SOTA) performance. However, existing work is mainly built on a small memory buffer containing few original data, which cannot fully characterize data distribution. In this work, we propose with compression (MRDC) reduce storage cost thus increase their amount that can be stored in buffer. Observing...
Autoregressive models (ARMs) are widely regarded as the cornerstone of large language (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under pre-training and supervised fine-tuning (SFT) paradigm. LLaDA distributions through forward data masking process reverse process, parameterized vanilla Transformer to predict masked tokens. By optimizing likelihood bound, it provides principled generative approach for probabilistic inference. Across extensive...
Deep generative models (DGMs) can effectively capture the underlying distributions of complex data by learning multilayered representations and performing inference. However, it is relatively insufficient to boost discriminative ability DGMs. This paper presents max-margin deep (mmDGMs) a class-conditional variant (mmDCGMs), which explore strongly principle improve predictive performance DGMs in both supervised semi-supervised learning, while retaining capability. In we use predictions...
Deep neural networks have shown promise in collaborative filtering (CF). However, existing approaches are either user-based or item-based, which cannot leverage all the underlying information explicitly. We propose CF-UIcA, a co-autoregressive model for CF tasks, exploits structural correlation domains of both users and items. The co-autoregression allows extra desired properties to be incorporated different tasks. Furthermore, we develop an efficient stochastic learning algorithm handle...
Memory units have been widely used to enrich the capabilities of deep networks on capturing long-term dependencies in reasoning and prediction tasks, but little investigation exists generative models (DGMs) which are good at inferring high-level invariant representations from unlabeled data. This paper presents a model with possibly large external memory an attention mechanism capture local detail information that is often lost bottom-up abstraction process representation learning. By...
For the first time, this paper develops a novel stochastic computing method by utilizing inherent random noises of analog RRAM. With designed switching characteristics, RRAM device can realize function sampling from tunable probabilistic distribution. A Bayesian neural network (BayNN), whose weights are represented probability distributions, is experimentally demonstrated on fabricated 160K array. The measured result achieves 97% accuracy for image classification MNIST dataset. Moreover,...
Deep generative models (DGMs) are effective on learning multilayered representations of complex data and performing inference input by exploring the ability. However, little work has been done examining or empowering discriminative ability DGMs making accurate predictions. This paper presents max-margin deep (mmDGMs), which explore strongly principle to improve power DGMs, while retaining capability. We develop an efficient doubly stochastic subgradient algorithm for piecewise linear...
We present Mixture of Contrastive Experts (MiCE), a unified probabilistic clustering framework that simultaneously exploits the discriminative representations learned by contrastive learning and semantic structures captured latent mixture model. Motivated experts, MiCE employs gating function to partition an unlabeled dataset into subsets according semantics multiple experts discriminate distinct instances assigned them in manner. To solve nontrivial inference problems caused variables, we...