- Video Analysis and Summarization
- Generative Adversarial Networks and Image Synthesis
- Image and Video Quality Assessment
- Multimodal Machine Learning Applications
- Domain Adaptation and Few-Shot Learning
- Multimedia Communication and Technology
- Human Pose and Action Recognition
- Model Reduction and Neural Networks
- Human Motion and Animation
- Advanced Graph Neural Networks
- Topic Modeling
- Industrial Vision Systems and Defect Detection
- Machine Learning and ELM
- Image Enhancement Techniques
- Speech and Audio Processing
- Text and Document Classification Technologies
- Sentiment Analysis and Opinion Mining
- Advanced Neural Network Applications
- Emotion and Mood Recognition
Alibaba Group (United States)
2024
Fudan University
2023-2024
Shanghai Center for Brain Science and Brain-Inspired Technology
2023-2024
University of Science and Technology Beijing
2023
University of Jinan
2023
Online continual learning (CL) studies the problem of continuously from a single-pass data stream while adapting to new and mitigating catastrophic forgetting. Recently, by storing small subset old data, replay-based methods have shown promising performance. Unlike previous that focus on sample storage or knowledge distillation against forgetting, this paper aims understand why online models fail generalize well perspective shortcut learning. We identify as key limiting factor for CL, where...
Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability generate high-fidelity images or videos higher resolutions. Recent efforts have explored tuning-free strategies exhibit untapped potential higher-resolution visual generation pre-trained models. However, these methods still prone producing low-quality content with repetitive patterns....
Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory the challenging video task, as it requires controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from few static images desired subject target motion. DreamVideo decouples this task into two stages, learning motion learning, by leveraging pre-trained model. The aims accurately capture fine...
Recent advances in customized video generation have enabled users to create videos tailored both specific subjects and motion trajectories. However, existing methods often require complicated test-time fine-tuning struggle with balancing subject learning control, limiting their real-world applications. In this paper, we present DreamVideo-2, a zero-shot customization framework capable of generating trajectory, guided by single image bounding box sequence, respectively, without the need for...
The current text-to-video (T2V) generation has made significant progress in synthesizing realistic general videos, but it is still under-explored identity-specific human video with customized ID images. key challenge lies maintaining high fidelity consistently while preserving the original motion dynamic and semantic following after identity injection. Current customization methods mainly rely on reconstructing given images text-to-image models, which have a divergent distribution T2V model....
As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods up caching and reusing model outputs at uniformly selected timesteps. However, such strategy neglects fact that differences among not uniform across timesteps, which hinders selecting appropriate cache, leading poor balance between efficiency visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache),...
Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring from a well-labeled corpus an unlabeled one, which is rather challenging task due significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle learn corpus-invariant features by global distribution alignment, but unfortunately, resulting are mixed with corpus-specific or not class-discriminative. To tackle these challenges, we propose...
Online continual learning (CL) studies the problem of continuously from a single-pass data stream while adapting to new and mitigating catastrophic forgetting. Recently, by storing small subset old data, replay-based methods have shown promising performance. Unlike previous that focus on sample storage or knowledge distillation against forgetting, this paper aims understand why online models fail generalize well perspective shortcut learning. We identify as key limiting factor for CL, where...
Aiming at the problems of slow segmentation speed and excessive computational complexity in practical engineering applications when using existing models to segment glass transparent containers, an efficient image semantic algorithm based on improved DeepLabV3+for containers is proposed. The proposed uses MobileNetV3 network replace backbone feature extraction Xception original model, effectively reduces number parameters improves ASPP module, introduces strip pooling module (SPM) depthwise...
Despite diffusion models having shown powerful abilities to generate photorealistic images, generating videos that are realistic and diverse still remains in its infancy. One of the key reasons is current methods intertwine spatial content temporal dynamics together, leading a notably increased complexity text-to-video generation (T2V). In this work, we propose HiGen, model-based method improves performance by decoupling factors from two perspectives, i.e., structure level level. At level,...