NFDI4DS | UHH-SEMS - Publication Details

Yujie Wei

ORCID: 0009-0003-9304-0609

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5101580042

Research Areas

Video Analysis and Summarization
Generative Adversarial Networks and Image Synthesis
Image and Video Quality Assessment
Multimodal Machine Learning Applications
Domain Adaptation and Few-Shot Learning
Multimedia Communication and Technology
Human Pose and Action Recognition
Model Reduction and Neural Networks
Human Motion and Animation
Advanced Graph Neural Networks
Topic Modeling
Industrial Vision Systems and Defect Detection
Machine Learning and ELM
Image Enhancement Techniques
Speech and Audio Processing
Text and Document Classification Technologies
Sentiment Analysis and Opinion Mining
Advanced Neural Network Applications
Emotion and Mood Recognition

Alibaba Group (United States)
2024

Fudan University
2023-2024

Shanghai Center for Brain Science and Brain-Inspired Technology
2023-2024

University of Science and Technology Beijing
2023

University of Jinan
2023

Dream Video: Composing Your Dream Videos with Customized Subject and Motion

OPENALEX - Publications

Yujie Wei Shiwei Zhang Zhiwu Qing Hangjie Yuan Zhi‐Heng Liu and 4 more

10.1109/cvpr52733.2024.00625 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Online Prototype Learning for Online Continual Learning

OPENALEX - Publications

Yujie Wei Jiaxin Ye Zhizhong Huang Junping Zhang Hongming Shan

Online continual learning (CL) studies the problem of continuously from a single-pass data stream while adapting to new and mitigating catastrophic forgetting. Recently, by storing small subset old data, replay-based methods have shown promising performance. Unlike previous that focus on sample storage or knowledge distillation against forgetting, this paper aims understand why online models fail generalize well perspective shortcut learning. We identify as key limiting factor for CL, where...

10.1109/iccv51070.2023.01720 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

Hierarchical Spatio-temporal Decoupling for Text-to- Video Generation

OPENALEX - Publications

Zhiwu Qing Shiwei Zhang Jiayu Wang Xiang Wang Yujie Wei and 3 more

10.1109/cvpr52733.2024.00634 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

InstructVideo: Instructing Video Diffusion Models with Human Feedback

OPENALEX - Publications

Hangjie Yuan Shiwei Zhang Xiang Wang Yujie Wei Tao Feng and 5 more

10.1109/cvpr52733.2024.00618 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

OPENALEX - Publications

Haonan Qiu Shiwei Zhang Yujie Wei Ruihang Chu Hangjie Yuan and 3 more

Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability generate high-fidelity images or videos higher resolutions. Recent efforts have explored tuning-free strategies exhibit untapped potential higher-resolution visual generation pre-trained models. However, these methods still prone producing low-quality content with repetitive patterns....

10.48550/arxiv.2412.09626 preprint EN arXiv (Cornell University) 2024-12-12

DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

OPENALEX - Publications

Yujie Wei Shiwei Zhang Zhiwu Qing Hangjie Yuan Zhi‐Heng Liu and 4 more

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory the challenging video task, as it requires controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from few static images desired subject target motion. DreamVideo decouples this task into two stages, learning motion learning, by leveraging pre-trained model. The aims accurately capture fine...

10.48550/arxiv.2312.04433 preprint EN other-oa arXiv (Cornell University) 2023-01-01

DreamVideo-2: Zero-Shot Subject-Driven Video Customization with Precise Motion Control

OPENALEX - Publications

Yujie Wei Shiwei Zhang Hangjie Yuan Wang Xiang Haonan Qiu and 7 more

Recent advances in customized video generation have enabled users to create videos tailored both specific subjects and motion trajectories. However, existing methods often require complicated test-time fine-tuning struggle with balancing subject learning control, limiting their real-world applications. In this paper, we present DreamVideo-2, a zero-shot customization framework capable of generating trajectory, guided by single image bounding box sequence, respectively, without the need for...

10.48550/arxiv.2410.13830 preprint EN arXiv (Cornell University) 2024-10-17

PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation

OPENALEX - Publications

Hengjia Li Haonan Qiu Shiwei Zhang Xiang Wang Yujie Wei and 4 more

The current text-to-video (T2V) generation has made significant progress in synthesizing realistic general videos, but it is still under-explored identity-specific human video with customized ID images. key challenge lies maintaining high fidelity consistently while preserving the original motion dynamic and semantic following after identity injection. Current customization methods mainly rely on reconstructing given images text-to-image models, which have a divergent distribution T2V model....

10.48550/arxiv.2411.17048 preprint EN arXiv (Cornell University) 2024-11-25

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

OPENALEX - Publications

Feng Liu Shiwei Zhang Xiaofeng Wang Yujie Wei Haonan Qiu and 4 more

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods up caching and reusing model outputs at uniformly selected timesteps. However, such strategy neglects fact that differences among not uniform across timesteps, which hinders selecting appropriate cache, leading poor balance between efficiency visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache),...

10.48550/arxiv.2411.19108 preprint EN arXiv (Cornell University) 2024-11-28

Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

OPENALEX - Publications

Jiaxin Ye Yujie Wei Xin-Cheng Wen Chenglong Ma Zhizhong Huang and 2 more

Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring from a well-labeled corpus an unlabeled one, which is rather challenging task due significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle learn corpus-invariant features by global distribution alignment, but unfortunately, resulting are mixed with corpus-specific or not class-discriminative. To tackle these challenges, we propose...

10.1145/3581783.3611704 preprint EN 2023-10-26

Online Prototype Learning for Online Continual Learning

OPENALEX - Publications

Yujie Wei Jiaxin Ye Zhizhong Huang Junping Zhang Hongming Shan

10.48550/arxiv.2308.00301 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Research on efficient image semantic segmentation algorithm for glass transparent containers based on improved DeepLabV3+

OPENALEX - Publications

Yujie Wei Zhengguang Xu

Aiming at the problems of slow segmentation speed and excessive computational complexity in practical engineering applications when using existing models to segment glass transparent containers, an efficient image semantic algorithm based on improved DeepLabV3+for containers is proposed. The proposed uses MobileNetV3 network replace backbone feature extraction Xception original model, effectively reduces number parameters improves ASPP module, introduces strip pooling module (SPM) depthwise...

10.1117/12.2684556 article EN 2023-08-01

Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation

OPENALEX - Publications

Zhiwu Qing Shiwei Zhang Jiayu Wang Xiang Wang Yujie Wei and 3 more

Despite diffusion models having shown powerful abilities to generate photorealistic images, generating videos that are realistic and diverse still remains in its infancy. One of the key reasons is current methods intertwine spatial content temporal dynamics together, leading a notably increased complexity text-to-video generation (T2V). In this work, we propose HiGen, model-based method improves performance by decoupling factors from two perspectives, i.e., structure level level. At level,...

10.48550/arxiv.2312.04483 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Coming Soon ...