Cross-modal retrieval based on multi-dimensional feature fusion hashing
Feature hashing
Feature (linguistics)
DOI:
10.3389/fphy.2024.1379873
Publication Date:
2024-06-19T04:20:00Z
AUTHORS (2)
ABSTRACT
Along with the continuous breakthrough and popularization of information network technology, multi-modal data, including texts, images, videos, audio, is growing rapidly. We can retrieve different modal data to meet our needs, so cross-modal retrieval has important theoretical significance application value. In addition, because modalities be mutually retrieved by mapping them a unified Hamming space, hash codes have been extensively used in field. However, existing hashing models generate based on single-dimension features, ignoring semantic correlation between features dimensions. Therefore, an innovative method using Multi-Dimensional Feature Fusion Hashing (MDFFH) proposed. To better get image’s multi-dimensional convolutional neural network, Vision Transformer are combined construct image fusion module. Similarly, we apply text module modality obtain text’s features. These two modules effectively integrate dimensions through feature fusion, making generated code more representative semantic. Extensive experiments corresponding analysis results datasets indicate that MDFFH’s performance outdoes other baseline models.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (52)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....