Yujie Wei

ORCID: 0009-0003-9304-0609
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Analysis and Summarization
  • Generative Adversarial Networks and Image Synthesis
  • Image and Video Quality Assessment
  • Multimodal Machine Learning Applications
  • Domain Adaptation and Few-Shot Learning
  • Multimedia Communication and Technology
  • Human Pose and Action Recognition
  • Model Reduction and Neural Networks
  • Human Motion and Animation
  • Advanced Graph Neural Networks
  • Topic Modeling
  • Industrial Vision Systems and Defect Detection
  • Machine Learning and ELM
  • Image Enhancement Techniques
  • Speech and Audio Processing
  • Text and Document Classification Technologies
  • Sentiment Analysis and Opinion Mining
  • Advanced Neural Network Applications
  • Emotion and Mood Recognition

Alibaba Group (United States)
2024

Fudan University
2023-2024

Shanghai Center for Brain Science and Brain-Inspired Technology
2023-2024

University of Science and Technology Beijing
2023

University of Jinan
2023

Online continual learning (CL) studies the problem of continuously from a single-pass data stream while adapting to new and mitigating catastrophic forgetting. Recently, by storing small subset old data, replay-based methods have shown promising performance. Unlike previous that focus on sample storage or knowledge distillation against forgetting, this paper aims understand why online models fail generalize well perspective shortcut learning. We identify as key limiting factor for CL, where...

10.1109/iccv51070.2023.01720 article EN 2021 IEEE/CVF International Conference on Computer Vision (ICCV) 2023-10-01

10.1109/cvpr52733.2024.00634 article EN 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024-06-16

Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability generate high-fidelity images or videos higher resolutions. Recent efforts have explored tuning-free strategies exhibit untapped potential higher-resolution visual generation pre-trained models. However, these methods still prone producing low-quality content with repetitive patterns....

10.48550/arxiv.2412.09626 preprint EN arXiv (Cornell University) 2024-12-12

Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory the challenging video task, as it requires controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from few static images desired subject target motion. DreamVideo decouples this task into two stages, learning motion learning, by leveraging pre-trained model. The aims accurately capture fine...

10.48550/arxiv.2312.04433 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Recent advances in customized video generation have enabled users to create videos tailored both specific subjects and motion trajectories. However, existing methods often require complicated test-time fine-tuning struggle with balancing subject learning control, limiting their real-world applications. In this paper, we present DreamVideo-2, a zero-shot customization framework capable of generating trajectory, guided by single image bounding box sequence, respectively, without the need for...

10.48550/arxiv.2410.13830 preprint EN arXiv (Cornell University) 2024-10-17

The current text-to-video (T2V) generation has made significant progress in synthesizing realistic general videos, but it is still under-explored identity-specific human video with customized ID images. key challenge lies maintaining high fidelity consistently while preserving the original motion dynamic and semantic following after identity injection. Current customization methods mainly rely on reconstructing given images text-to-image models, which have a divergent distribution T2V model....

10.48550/arxiv.2411.17048 preprint EN arXiv (Cornell University) 2024-11-25

As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods up caching and reusing model outputs at uniformly selected timesteps. However, such strategy neglects fact that differences among not uniform across timesteps, which hinders selecting appropriate cache, leading poor balance between efficiency visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache),...

10.48550/arxiv.2411.19108 preprint EN arXiv (Cornell University) 2024-11-28

Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring from a well-labeled corpus an unlabeled one, which is rather challenging task due significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle learn corpus-invariant features by global distribution alignment, but unfortunately, resulting are mixed with corpus-specific or not class-discriminative. To tackle these challenges, we propose...

10.1145/3581783.3611704 preprint EN 2023-10-26

Online continual learning (CL) studies the problem of continuously from a single-pass data stream while adapting to new and mitigating catastrophic forgetting. Recently, by storing small subset old data, replay-based methods have shown promising performance. Unlike previous that focus on sample storage or knowledge distillation against forgetting, this paper aims understand why online models fail generalize well perspective shortcut learning. We identify as key limiting factor for CL, where...

10.48550/arxiv.2308.00301 preprint EN cc-by-nc-nd arXiv (Cornell University) 2023-01-01

Aiming at the problems of slow segmentation speed and excessive computational complexity in practical engineering applications when using existing models to segment glass transparent containers, an efficient image semantic algorithm based on improved DeepLabV3+for containers is proposed. The proposed uses MobileNetV3 network replace backbone feature extraction Xception original model, effectively reduces number parameters improves ASPP module, introduces strip pooling module (SPM) depthwise...

10.1117/12.2684556 article EN 2023-08-01

Despite diffusion models having shown powerful abilities to generate photorealistic images, generating videos that are realistic and diverse still remains in its infancy. One of the key reasons is current methods intertwine spatial content temporal dynamics together, leading a notably increased complexity text-to-video generation (T2V). In this work, we propose HiGen, model-based method improves performance by decoupling factors from two perspectives, i.e., structure level level. At level,...

10.48550/arxiv.2312.04483 preprint EN other-oa arXiv (Cornell University) 2023-01-01
Coming Soon ...