Fuwei Zhang

ORCID: 0000-0003-0179-9988
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Video Analysis and Summarization
  • Multimodal Machine Learning Applications
  • Advanced Image and Video Retrieval Techniques
  • Domain Adaptation and Few-Shot Learning
  • Human Pose and Action Recognition
  • Data Quality and Management
  • Music and Audio Processing
  • Advanced Vision and Imaging
  • Multimedia Communication and Technology
  • Machine Learning and Data Classification
  • Natural Language Processing Techniques

North University of China
2024

Sun Yat-sen University
2022-2024

Spatiotemporal attention learning remains a challenging video question answering (VideoQA) task as it requires sufficient understanding of cross-modal spatiotemporal information. Existing methods usually leverage different mechanisms to reveal potential associations between and question. While these effectively remove irrelevant information from the attention, they ignore pseudo-related within interaction attention. To address this problem, we proposed novel energy-based refined-attention...

10.1109/tcsvt.2022.3212463 article EN IEEE Transactions on Circuits and Systems for Video Technology 2022-10-05

The joint task of video moment retrieval and highlight detection is a challenging study, which requires building model that not only captures contextual information between sequences in time but also has the ability to understand judge significance. This paper solves these problems from three aspects. Firstly, we design parameter-free cross-modal statistical correlation interaction method. A novel saliency enhancement function defined quantify differences important features associated with...

10.1109/tcsvt.2024.3389024 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-04-16

Outfit collocation requires considering the interrelationship and adaptability among attributes of component items. However, with numerous diverse fashion items, accurately capturing attribute features modeling complex relationships between become key challenges. To address these challenges, we propose a novel scheme Decoupling-driven Multi-level Attribute Parsing for interpretable outfit collocation. First, decouple series from item's visual feature by fully supervised, which can improve...

10.1109/tmm.2024.3402541 article EN IEEE Transactions on Multimedia 2024-01-01

10.1109/tcsvt.2024.3409897 article EN IEEE Transactions on Circuits and Systems for Video Technology 2024-01-01

Spatiotemporal attention learning has always been a challenging research task in video question answering (VideoQA). It needs to consider not only the modelling of local neighbourhood dependencies between adjacent frames but also long-term nonadjacent frames. Although existing methods are usually good at temporal one aspect, they cannot simultaneously and effectively model To address this issue, we first derive novel statistic-driven difference-aware generation function, which can...

10.1109/tmm.2023.3333192 article EN IEEE Transactions on Multimedia 2023-11-15
Coming Soon ...