Multitask methods for predicting molecular properties from heterogeneous data

Chemical Physics (physics.chem-ph) FOS: Computer and information sciences Statistics - Machine Learning Physics - Chemical Physics 0103 physical sciences FOS: Physical sciences Machine Learning (stat.ML) 01 natural sciences
DOI: 10.1063/5.0201681 Publication Date: 2024-07-03T12:16:57Z
ABSTRACT
Data generation remains a bottleneck in training surrogate models to predict molecular properties. We demonstrate that multitask Gaussian process regression overcomes this limitation by leveraging both expensive and cheap data sources. In particular, we consider sets constructed from coupled-cluster (CC) density functional theory (DFT) data. report surrogates can at CC-level accuracy with reduction cost over an order of magnitude. Of note, our approach allows the set include DFT generated heterogeneous mix exchange–correlation functionals without imposing any artificial hierarchy on accuracy. More generally, framework accommodate wider range structures—including full disparity between different levels fidelity—than existing kernel approaches based Δ-learning although show two be similar. Consequently, tool for reducing costs even further opportunistically exploiting
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (66)
CITATIONS (3)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....