- Multimodal Machine Learning Applications
- Advanced Image and Video Retrieval Techniques
- Human Pose and Action Recognition
- Advanced Neural Network Applications
- Synthesis and biological activity
- Domain Adaptation and Few-Shot Learning
- Synthesis and Reactions of Organic Compounds
- Synthesis and Biological Evaluation
Beijing University of Technology
2023-2025
Centro Universitário FEI
2023
Anhui University of Technology
2020
Multi-head attention (MA), which allows the model to jointly attend crucial information from diverse representation subspaces through its heads, has yielded remarkable achievement in image captioning. However, there is no explicit mechanism ensure MA attends appropriate positions subspaces, resulting overfocused for each head and redundancy between heads. In this paper, we propose a novel Intra- Inter-Head Orthogonal Attention (I2OA) efficiently improve captioning by introducing concise...
Attention-based Transformer models have achieved remarkable progress in multi-modal tasks, such as visual question answering. The explainability of attention-based methods has recently attracted wide interest it can explain the inner changes attention tokens by accumulating relevancy across layers. Current simply update equally token before and after processes. However, importance values is usually different during relevance accumulation.In this paper, we propose a weighted strategy, which...