NFDI4DS | UHH-SEMS - Publication Details

Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information

Pointwise mutual information Pointwise

DOI: 10.48550/arxiv.2404.19228 Publication Date: 2024-04-29

Abstract Supplemental Material References Cited by

AUTHORS (6)

Toshimitsu Uesaka

Taiji Suzuki

Yuhta Takida

Chieh-Hsin Lai

Naoki Murata

Yuki Mitsufuji

ABSTRACT

Multimodal representation learning to integrate different modalities, such as text, vision, and audio is important for real-world applications. The symmetric InfoNCE loss proposed in CLIP a key concept multimodal learning. In this work, we provide theoretical understanding of the through lens pointwise mutual information show that encoders achieve optimal similarity pretraining good downstream classification tasks under mild assumptions. Based on our results, also propose new metric contrastive by utilizing nonlinear kernel enrich capability. To verify effectiveness method, demonstrate models Conceptual Caption datasets evaluate zero-shot linear common benchmark datasets.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENALEX - Publications OPENAIRE - Products

PlumX Metrics

Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....