Probing out-of-distribution generalization in machine learning for materials

Condensed Matter - Materials Science TA401-492 Materials Science (cond-mat.mtrl-sci) FOS: Physical sciences Materials of engineering and construction. Mechanics of materials
DOI: 10.48550/arxiv.2406.06489 Publication Date: 2024-06-10
ABSTRACT
Scientific machine learning (ML) endeavors to develop generalizable models with broad applicability. However, the assessment of generalizability is often based on heuristics. Here, we demonstrate in materials science setting that heuristics evaluations lead substantially biased conclusions ML and benefits neural scaling. We evaluate generalization performance over 700 out-of-distribution tasks features new chemistry or structural symmetry not present training data. Surprisingly, good found most across various including simple boosted trees. Analysis representation space reveals contain test data lie regions well covered by data, while poorly-performing mainly outside domain. For latter case, increasing set size time has marginal even adverse effects performance, contrary what scaling paradigm assumes. Our findings show heuristically-defined tests are genuinely difficult only ability interpolate. Evaluating such rather than truly challenging ones can an overestimation
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....