NFDI4DS | UHH-SEMS - Publication Details

Causality-driven feature selection and domain adaptation for enhancing chemical foundation models in downstream tasks

QM9 TK7885-7895 foundation models Computer engineering. Computer hardware causality Electronic computers. Computer science molecular properties prediction QA75.5-76.95

DOI: 10.1088/2632-2153/adabb1 Publication Date: 2025-01-17T22:55:03Z

Abstract Supplemental Material References Cited by

AUTHORS (8)

Eduardo Soares

Victor Yukio Shir...

Emilio Vital Brazil

Karen Fiorella Aq...

Renato Cerqueira

Dmitry Zubarev

Kristin Schmidt

Daniel P Sanders

ABSTRACT

Abstract Recent advancements in large foundation models have revealed impressive capabilities in mastering complex chemical language representations. These models undergo a task-agnostic learning phase, characterized by pre-training on extensive unlabeled corpora followed by fine-tuning on specific downstream tasks. This methodology reduces reliance on labeled data, facilitating data acquisition and broadening the scope of chemical language representation. However, real-world scenarios often pose challenges due to domain shift, a phenomenon where the data distribution in downstream tasks differs from that of the pre-training phase, potentially degrading model performance. To address this, we present a novel causal-based framework for feature selection and domain adaptation to enhance the performance of chemical foundation models on downstream tasks. Our approach employs a multi-stage feature selection method that identifies physico-chemical features based on their direct causal-effect over specific downstream properties. By employing Mordred descriptors and Markov blanket causal graphs, our approach provides insight into the causal relationships between features and target properties for prediction tasks. We evaluate our approach on various foundation model architectures and datasets, demonstrating performance improvements, which showcases the robustness and the agnostic nature of our approach.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (36)

CITATIONS (0)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Causality-driven feature selection and domain adaptation for enhancing chemical foundation models in downstream tasks

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....