Yuchao Gu

ORCID: 0009-0007-7167-8766
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Generative Adversarial Networks and Image Synthesis
  • Advanced Vision and Imaging
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Computer Graphics and Visualization Techniques
  • Image Retrieval and Classification Techniques
  • Face recognition and analysis
  • 3D Shape Modeling and Analysis
  • Video Analysis and Summarization
  • Domain Adaptation and Few-Shot Learning
  • Scientific Research and Discoveries
  • Music and Audio Processing
  • Advanced Image Processing Techniques
  • Digital Humanities and Scholarship
  • 3D Surveying and Cultural Heritage
  • E-commerce and Technology Innovations
  • Facial Nerve Paralysis Treatment and Research
  • Remote Sensing and LiDAR Applications
  • Computational Physics and Python Applications
  • Dark Matter and Cosmic Phenomena
  • Advanced Neural Network Applications

National University of Singapore
2024

Nankai University
2022

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose new T2V generation setting—One-Shot Video Tuning, where only one text-video pair presented. Our model built on state-of-the-art T2I diffusion models pre-trained massive image data. We make two key observations: 1) can generate still images that...

10.1109/iccv51070.2023.00701 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community. These models can be easily customized for new concepts using low-rank adaptations (LoRAs). However, utilization of multiple concept LoRAs to jointly support presents a challenge. We refer this scenario decentralized multi-concept customization, which involves single-client tuning and center-node fusion. In paper, we propose framework called Mix-of-Show that...

10.48550/arxiv.2305.18292 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Current deep networks are very data-hungry and benefit from training on largescale datasets, which often time-consuming to collect annotate. By contrast, synthetic data can be generated infinitely using generative models such as DALL-E diffusion models, with minimal effort cost. In this paper, we present DatasetDM, a generic dataset generation model that produce diverse images the corresponding high-quality perception annotations (e.g., segmentation masks, depth). Our method builds upon...

10.48550/arxiv.2308.06160 preprint EN cc-by arXiv (Cornell University) 2023-01-01

10.1109/cvpr52733.2024.00729 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator. Despite their promising results, such paradigm is computationally expensive. In this work, we propose new T2V generation setting$\unicode{x2014}$One-Shot Video Tuning, where only one text-video pair presented. Our model built on state-of-the-art T2I diffusion models pre-trained massive image data. We make two key observations: 1) can generate...

10.48550/arxiv.2212.11565 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recent advancements in generation models have showcased remarkable capabilities generating fantastic content. However, most of them are trained on proprietary high-quality data, and some withhold their parameters only provide accessible application programming interfaces (APIs), limiting benefits for downstream tasks. To explore the feasibility training a text-to-image model comparable to advanced using publicly available resources, we introduce EvolveDirector. This framework interacts with...

10.48550/arxiv.2410.07133 preprint EN arXiv (Cornell University) 2024-10-09

Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models simpler compositions featuring only a few dominant instances. To address this limitation, work enhances diffusion by introducing regional instance control, where each is governed bounding box paired free-form caption. Previous methods in area typically rely on implicit position encoding or explicit attention masks...

10.48550/arxiv.2411.17949 preprint EN arXiv (Cornell University) 2024-11-26

Although generative facial prior and geometric have recently demonstrated high-quality results for blind face restoration, producing fine-grained details faithful to inputs remains a challenging problem. Motivated by the classical dictionary-based methods recent vector quantization (VQ) technique, we propose VQ-based restoration method - VQFR. VQFR takes advantage of low-level feature banks extracted from faces can thus help recover realistic details. However, simple application VQ codebook...

10.48550/arxiv.2205.06803 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Vector-Quantized (VQ-based) generative models usually consist of two basic components, i.e., VQ tokenizers and transformers. Prior research focuses on improving the reconstruction fidelity but rarely examines how improvement in affects generation ability In this paper, we surprisingly find that does not necessarily improve generation. Instead, learning to compress semantic features within significantly improves transformers' capture textures structures. We thus highlight competing objectives...

10.48550/arxiv.2212.03185 preprint EN other-oa arXiv (Cornell University) 2022-01-01

Recent diffusion-based image editing approaches have exhibited impressive capabilities in images with simple compositions. However, localized complex scenarios has not been well-studied the literature, despite its growing real-world demands. Existing mask-based inpainting methods fall short of retaining underlying structure within edit region. Meanwhile, mask-free attention-based often exhibit leakage and misalignment more In this work, we develop MAG-Edit, a training-free, inference-stage...

10.48550/arxiv.2312.11396 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Recent advances in generative AI have significantly enhanced image and video editing, particularly the context of text prompt control. State-of-the-art approaches predominantly rely on diffusion models to accomplish these tasks. However, computational demands diffusion-based methods are substantial, often necessitating large-scale paired datasets for training, therefore challenging deployment real applications. To address issues, this paper breaks down text-based editing task into two...

10.48550/arxiv.2312.12468 preprint EN other-oa arXiv (Cornell University) 2023-01-01

The low-threshold experiment SENSEI, which uses the ultralow-noise silicon Skipper-CCD to explore light dark matter from halo, has achieved most rigorous limitations on DM-electron scattering cross section. In this work, we investigate inelastic process with SENSEI data and derive constraints model a $U(1)$ gauge boson as mediator. Comparing elastic process, find that down-scattering mass splitting $δ\equiv m_{χ_2}-m_{χ_1}<0$ is more strongly constrained while up-scattering $δ>0$ gets...

10.48550/arxiv.2203.06664 preprint EN other-oa arXiv (Cornell University) 2022-01-01
Coming Soon ...