Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities

Modalities Modality (human–computer interaction) Benchmark (surveying) Feature Learning Multimodal learning
DOI: 10.1016/j.jbi.2023.104466 Publication Date: 2023-08-05T06:06:39Z
ABSTRACT
With the increasing amount and growing variety of healthcare data, multimodal machine learning supporting integrated modeling structured unstructured data is an increasingly important tool for clinical tasks. However, it non-trivial to manage differences in dimensionality, volume, temporal characteristics modalities context a shared target task. Furthermore, patients can have substantial variations availability while existing methods typically assume completeness lack mechanism handle missing modalities.We propose Transformer-based fusion model with modality-specific tokens that summarize corresponding achieve effective cross-modal interaction accommodating context. The further refined by inter-modal, inter-sample contrastive improve representations better predictive performance. We denote as Attention-based cRoss-MOdal fUsion contRast (ARMOUR). evaluate ARMOUR using two input (structured measurements text), six prediction tasks, evaluation regimes, either including or excluding samples modalities.Our shows improved performances over unimodal baselines both input. improves representation power shown be essential results. simple setup enables allows comparison benchmark results.We robust performance modalities. This work could inspire future research study incorporation multiple, more complex into single model.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (100)
CITATIONS (17)