NFDI4DS | UHH-SEMS - Publication Details

A Fusion Encoder with Multi-Task Guided for Cross-Modal Text-Image Retrieval in Remote Sensing

Feature (linguistics)

DOI: 10.20944/preprints202306.2010.v1 Publication Date: 2023-06-30T05:48:56Z

Abstract Supplemental Material References Cited by

AUTHORS (7)

Xiong Zhang

Weipeng Li

Xu Wang

Luyao Wang

Fuzhong Zheng

Long Wang

Haisu Zhang

ABSTRACT

In recent years, there has been a growing interest in remote sensing image-text cross-modal retrieval due to the rapid development of space information technology and significant increase image data volume. One approach that shown promising results natural images is multimodal fusion encoding method. However, have unique characteristics make task challenging. Firstly, semantic features are fine-grained, meaning they can be divided into multiple basic units expression. Additionally, these exhibit variations resolution, color, perspective. Different combinations expression generate diverse text descriptions. These pose considerable challenges for retrieval. To address challenges, this paper proposes multi-task guided encoder (MTGFE) based on The model incorporates three tasks: matching (ITM), masked language modeling (MLM), newly introduced multi-view joint representations contrast (MVJRC) task. By jointly training with tasks, we aim enhance its capability capture fine-grained correlations between texts. Specifically, MVJRC designed improve model’s consistency feature correlation, particularly differences angle. Furthermore, computational complexity associated large-scale models efficiency, filtering This method achieves higher efficiency while minimizing accuracy loss. Extensive experiments were conducted four public datasets evaluate proposed method, validate effectiveness. Overall, study focuses introduces MTGFE model, which combines tasks ability correlations. efficiency. Experimental demonstrate effectiveness

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (0)

CITATIONS (3)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications OPENALEX - Publications

PlumX Metrics

A Fusion Encoder with Multi-Task Guided for Cross-Modal Text-Image Retrieval in Remote Sensing

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....