NFDI4DS | UHH-SEMS - Publication Details

Learning Multimodal Data Augmentation in Feature Space

FOS: Computer and information sciences Computer Science - Machine Learning Computer Science - Computation and Language Computer Vision and Pattern Recognition (cs.CV) Computer Science - Computer Vision and Pattern Recognition 0202 electrical engineering, electronic engineering, information engineering 02 engineering and technology Computation and Language (cs.CL) Machine Learning (cs.LG)

DOI: 10.48550/arxiv.2212.14453 Publication Date: 2022-01-01

Abstract Supplemental Material References Cited by

AUTHORS (7)

Liu, Zichang

Tang, Zhiqiang

Shi, Xingjian

Zhang, Aston

Li, Mu

Shrivastava, Ansh...

Wilson, Andrew Go...

ABSTRACT

ICLR 2023. Code available at https://github.com/lzcemma/LeMDA/<br/>The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the enormous success of data augmentation currently remains limited to single-modality tasks like image classification. Indeed, it is particularly difficult to augment each modality while preserving the overall semantic structure of the data; for example, a caption may no longer be a good description of an image after standard augmentations have been applied, such as translation. Moreover, it is challenging to specify reasonable transformations that are not tailored to a particular modality. In this paper, we introduce LeMDA, Learning Multimodal Data Augmentation, an easy-to-use method that automatically learns to jointly augment multimodal data in feature space, with no constraints on the identities of the modalities or the relationship between modalities. We show that LeMDA can (1) profoundly improve the performance of multimodal deep learning architectures, (2) apply to combinations of modalities that have not been previously considered, and (3) achieve state-of-the-art results on a wide range of applications comprised of image, text, and tabular data.<br/>

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES ()

CITATIONS ()

EXTERNAL LINKS

OPENAIRE - Products

PlumX Metrics

Learning Multimodal Data Augmentation in Feature Space

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....