Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

Molecular graph HOMO/LUMO
DOI: 10.1021/acs.jctc.7b00577 Publication Date: 2017-09-19T18:07:07Z
ABSTRACT
We investigate the impact of choosing regressors and molecular representations for construction fast machine learning (ML) models thirteen electronic ground-state properties organic molecules. The performance each regressor/representation/property combination is assessed using curves which report out-of-sample errors as a function training set size with up to $\sim$117k distinct Molecular structures at hybrid density functional theory (DFT) level used testing come from QM9 database [Ramakrishnan et al, {\em Scientific Data} {\bf 1} 140022 (2014)] include dipole moment, polarizability, HOMO/LUMO energies gap, spatial extent, zero point vibrational energy, enthalpies free atomization, heat capacity highest fundamental frequency. Various literature have been studied (Coulomb matrix, bag bonds, BAML ECFP4, graphs (MG)), well newly developed distribution based variants including histograms distances (HD), angles (HDA/MARAD), dihedrals (HDAD). Regressors linear (Bayesian ridge regression (BR) elastic net regularization (EN)), random forest (RF), kernel (KRR) two types neural works, graph convolutions (GC) gated networks (GG). present numerical evidence that ML model predictions deviate DFT less than deviates experiment all properties. Furthermore, our prediction respect reference are on par with, or close to, chemical accuracy. Our findings suggest could be more accurate if explicitly electron correlated quantum (or experimental) data was available.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (64)
CITATIONS (489)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....