NFDI4DS | UHH-SEMS - Publication Details

MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations

Representation Feature Learning Modalities ENCODE

DOI: 10.1093/bioinformatics/btae260 Publication Date: 2024-06-28T09:30:32Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Xiangru Tang

Andrew Tran

Jeffrey Tan

Mark B Gerstein

ABSTRACT

Abstract Motivation The current paradigm of deep learning models for the joint representation molecules and text primarily relies on 1D or 2D molecular formats, neglecting significant 3D structural information that offers valuable physical insight. This narrow focus inhibits models’ versatility adaptability across a wide range modalities. Conversely, limited research focusing explicit tends to overlook textual data within biomedical domain. Results We present unified pre-trained language model, MolLM, concurrently captures alongside text. MolLM consists Transformer encoder encoder, designed encode both structures. To support MolLM’s self-supervised pre-training, we constructed 160K molecule-text pairings. Employing contrastive as supervisory signal learning, demonstrates robust capabilities four downstream tasks, including cross-modal molecule matching, property prediction, captioning, text-prompted editing. Through ablation, demonstrate inclusion representations improves performance in these tasks. Availability implementation Our code, data, model weights, examples using our are all available at https://github.com/gersteinlab/MolLM. In particular, provide Jupyter Notebooks offering step-by-step guidance how use extract embeddings

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (59)

CITATIONS (2)

EXTERNAL LINKS

OPENAIRE - Products OPENALEX - Publications CROSSREF - Publications

PlumX Metrics

MolLM: a unified language model for integrating biomedical text with 2D and 3D molecular representations

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....