NFDI4DS | UHH-SEMS - Publication Details

Ziqi Huang

ORCID: 0000-0001-8008-5873

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101830887

Research Areas

Generative Adversarial Networks and Image Synthesis
Multimodal Machine Learning Applications
Face recognition and analysis
Advanced Vision and Imaging
Advanced Image Processing Techniques
Domain Adaptation and Few-Shot Learning
Advanced Neuroimaging Techniques and Applications
Video Analysis and Summarization
Advanced Image and Video Retrieval Techniques
Facial Nerve Paralysis Treatment and Research
Video Coding and Compression Technologies
Human Pose and Action Recognition
Image Processing Techniques and Applications
Data Management and Algorithms
Model Reduction and Neural Networks
Aesthetic Perception and Analysis
Handwritten Text Recognition Techniques
Visual Attention and Saliency Detection
Vehicle License Plate Recognition
Cell Image Analysis Techniques
Image and Signal Denoising Methods
Data Visualization and Analytics
Computer Graphics and Visualization Techniques
Advanced Data Processing Techniques
Image Retrieval and Classification Techniques

University of California, Santa Barbara
2025

Nanyang Technological University
2021-2024

Southern University of Science and Technology
2022

Wuhan University of Technology
2022

The Ohio State University
2015

Collaborative Diffusion for Multi-Modal Face Generation and Editing

OPENALEX - Publications

Ziqi Huang Kelvin C. K. Chan Yuming Jiang Ziwei Liu

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion mainly focus on uni-modal control, i.e., process is driven by only one modality of condition. To further unleash users' creativity, it desirable for model to be controllable multiple modalities simultaneously, e.g. generating and editing faces describing age (text-driven) while drawing face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained...

10.1109/cvpr52729.2023.00589 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023-06-01

FreeU: Free Lunch in Diffusion U-Net

OPENALEX - Publications

Chenyang Si Ziqi Huang Yuming Jiang Ziwei Liu

10.1109/cvpr52733.2024.00453 article 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

VBench: Comprehensive Benchmark Suite for Video Generative Models

OPENALEX - Publications

Ziqi Huang Yinan He Jiashuo Yu Fan Zhang Chenyang Si and 11 more

10.1109/cvpr52733.2024.02060 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Talk-to-Edit: Fine-Grained Facial Editing via Dialog

OPENALEX - Publications

Yuming Jiang Ziqi Huang Xingang Pan Chen Change Loy Ziwei Liu

Facial editing is an important task in vision and graphics with numerous applications. However, existing works are incapable to deliver a continuous fine-grained mode (e.g., slightly smiling face big laughing one) natural interactions users. In this work, we propose Talk-to-Edit, interactive facial framework that performs attribute manipulation through dialog between the user system. Our key insight model continual "semantic field" GAN latent space. 1) Unlike previous regard as traversing...

10.1109/iccv48922.2021.01354 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2021-10-01

LAVIE: High-Quality Video Generation with Cascaded Latent Diffusion Models

OPENALEX - Publications

Yaohui Wang Xinyuan Chen Xin Ma Shangchen Zhou Ziqi Huang and 15 more

This work aims to learn a high-quality text-to-video (T2V) generative model by leveraging pre-trained text-to-image (T2I) as basis. It is highly desirable yet challenging task simultaneously a) accomplish the synthesis of visually realistic and temporally coherent videos while b) preserving strong creative generation nature T2I model. To this end, we propose LaVie, an integrated video framework that operates on cascaded latent diffusion models, comprising base T2V model, temporal...

10.48550/arxiv.2309.15103 preprint EN other-oa arXiv (Cornell University) 2023-01-01

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

OPENALEX - Publications

Yaohui Wang Xinyuan Chen Xin Ma Shangchen Zhou Ziqi Huang and 15 more

10.1007/s11263-024-02295-1 article EN International Journal of Computer Vision 2024-12-23

ReVersion: Diffusion-Based Relation Inversion from Images

OPENALEX - Publications

Ziqi Huang Tianxing Wu Yuming Jiang Kelvin C. K. Chan Ziwei Liu

Diffusion models gain increasing popularity for their generative capabilities. Recently, there have been surging needs to generate customized images by inverting diffusion from exemplar images, and existing inversion methods mainly focus on capturing object appearances (i.e., the "look"). However, how invert relations, another important pillar in visual world, remains unexplored. In this work, we propose Relation Inversion task, which aims learn a specific relation (represented as "relation...

10.1145/3680528.3687658 article EN 2024-12-03

Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models

OPENALEX - Publications

Wenjie Fan Chenyang Si Junhao Song Zhenyu Yang Yinan He and 14 more

We present Vchitect-2.0, a parallel transformer architecture designed to scale up video diffusion models for large-scale text-to-video generation. The overall Vchitect-2.0 system has several key designs. (1) By introducing novel Multimodal Diffusion Block, our approach achieves consistent alignment between text descriptions and generated frames, while maintaining temporal coherence across sequences. (2) To overcome memory computational bottlenecks, we propose Memory-efficient Training...

10.48550/arxiv.2501.08453 preprint EN arXiv (Cornell University) 2025-01-14

Prediction of Mental Problem Based on Deep Learning Models

OPENALEX - Publications

Ziqi Huang

The rising prevalence of mental health issues identifies the urgent need for accurate, scalable, and timely prediction systems. Deep learning, a subset machine learning inspired by humans neuron structure, has offered an opportunity innovative solutions diagnosis. main idea this paper is analyzing application deep in diagnosing disorders, including but not limited to Alzheimer, Parkinson Schizophrenia. An enormous number techniques will be put into real life while dealing with diagnosis...

10.54254/2755-2721/2025.21185 article EN cc-by Applied and Computational Engineering 2025-02-27

ReVersion: Diffusion-Based Relation Inversion from Images

OPENALEX - Publications

Ziqi Huang Tianxing Wu Yuming Jiang Kelvin C. K. Chan Ziwei Liu

Diffusion models gain increasing popularity for their generative capabilities. Recently, there have been surging needs to generate customized images by inverting diffusion from exemplar images. However, existing inversion methods mainly focus on capturing object appearances. How invert relations, another important pillar in the visual world, remains unexplored. In this work, we propose ReVersion Relation Inversion task, which aims learn a specific relation (represented as "relation prompt")...

10.48550/arxiv.2303.13495 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Transformer-Based T2-weighted MRI Synthesis from T1-weighted Images

OPENALEX - Publications

Kai Pan Pujin Cheng Ziqi Huang Li Lin Xiaoying Tang

Multi-modality magnetic resonance (MR) images provide complementary information for disease diagnoses. However, modality missing is quite usual in real-life clinical practice. Current methods usually employ convolution-based generative adversarial network (GAN) or its variants to synthesize the modality. With development of vision transformer, we explore application MRI synthesis task this work. We propose a novel supervised deep learning method synthesizing modality, making use...

10.1109/embc48229.2022.9871183 article EN 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) 2022-07-11

FreeU: Free Lunch in Diffusion U-Net

OPENALEX - Publications

Chenyang Si Ziqi Huang Yuming Jiang Ziwei Liu

In this paper, we uncover the untapped potential of diffusion U-Net, which serves as a "free lunch" that substantially improves generation quality on fly. We initially investigate key contributions U-Net architecture to denoising process and identify its main backbone primarily contributes denoising, whereas skip connections mainly introduce high-frequency features into decoder module, causing network overlook semantics. Capitalizing discovery, propose simple yet effective method-termed...

10.48550/arxiv.2309.11497 preprint EN other-oa arXiv (Cornell University) 2023-01-01

FreeInit: Bridging Initialization Gap in Video Diffusion Models

OPENALEX - Publications

Tianxing Wu Chenyang Si Yuming Jiang Ziqi Huang Ziwei Liu

Though diffusion-based video generation has witnessed rapid progress, the inference results of existing models still exhibit unsatisfactory temporal consistency and unnatural dynamics. In this paper, we delve deep into noise initialization diffusion models, discover an implicit training-inference gap that attributes to quality. Our key findings are: 1) spatial-temporal frequency distribution initial latent at is intrinsically different from for training, 2) denoising process significantly...

10.48550/arxiv.2312.07537 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Talk-to-Edit: Fine-Grained 2D and 3D Facial Editing via Dialog

OPENALEX - Publications

Yuming Jiang Ziqi Huang Tianxing Wu Xingang Pan Chen Change Loy and 1 more

Facial editing is to manipulate the facial attributes of a given face image. Nowadays, with development generative models, users can easily generate 2D and 3D images high fidelity 3D-aware consistency. However, existing works are incapable delivering continuous fine-grained mode (e.g., slightly smiling big laughing one) natural interactions users. In this work, we propose Talk-to-Edit, an interactive framework that performs attribute manipulation through dialog between user system. Our key...

10.1109/tpami.2023.3347299 article EN IEEE Transactions on Pattern Analysis and Machine Intelligence 2023-12-26

VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models

OPENALEX - Publications

Ziqi Huang Fan Zhang Xiaojie Xu Yinan He Jiashuo Yu and 12 more

Video generation has witnessed significant advancements, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video is indispensable two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal system should provide insights to inform future developments of generation. To this end, we present VBench, suite that dissects "video quality" into specific, hierarchical, and disentangled dimensions, each tailored prompts methods....

10.48550/arxiv.2411.13503 preprint EN arXiv (Cornell University) 2024-11-20

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

OPENALEX - Publications

Fan Zhang Shulin Tian Ziqi Huang Yu Qiao Ziwei Liu

Recent advancements in visual generative models have enabled high-quality image and video generation, opening diverse applications. However, evaluating these often demands sampling hundreds or thousands of images videos, making the process computationally expensive, especially for diffusion-based with inherently slow sampling. Moreover, existing evaluation methods rely on rigid pipelines that overlook specific user needs provide numerical results without clear explanations. In contrast,...

10.48550/arxiv.2412.09645 preprint EN arXiv (Cornell University) 2024-12-10

A Diagnostic Study Of Visual Question Answering With Analogical Reasoning

OPENALEX - Publications

Ziqi Huang Hongyuan Zhu Ying Sun Dongkyu Choi Cheston Tan and 1 more

The deep learning community has made rapid progress in low-level visual perception tasks such as object localization, detection and segmentation. However, for Visual Question Answering (VQA) language grounding that require high-level reasoning abilities, huge gaps still exist between artificial systems human intelligence. In this work, we perform a diagnostic study on recent popular VQA terms of analogical reasoning. We term it Analogical VQA, where system needs to reason group images find...

10.1109/icip42928.2021.9506539 article EN 2022 IEEE International Conference on Image Processing (ICIP) 2021-08-23

Collaborative Diffusion for Multi-Modal Face Generation and Editing

OPENALEX - Publications

Ziqi Huang Kelvin C. K. Chan Yuming Jiang Ziwei Liu

Diffusion models arise as a powerful generative tool recently. Despite the great progress, existing diffusion mainly focus on uni-modal control, i.e., process is driven by only one modality of condition. To further unleash users' creativity, it desirable for model to be controllable multiple modalities simultaneously, e.g., generating and editing faces describing age (text-driven) while drawing face shape (mask-driven). In this work, we present Collaborative Diffusion, where pre-trained...

10.48550/arxiv.2304.10530 preprint EN other-oa arXiv (Cornell University) 2023-01-01

VBench: Comprehensive Benchmark Suite for Video Generative Models

OPENALEX - Publications

Ziqi Huang Yinan He Jiashuo Yu Fan Zhang Chenyang Si and 11 more

10.48550/arxiv.2311.17982 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...