A Practical Probabilistic Benchmark for AI Weather Models
Benchmark (surveying)
DOI:
10.48550/arxiv.2401.15305
Publication Date:
2024-01-27
AUTHORS (9)
ABSTRACT
Since the weather is chaotic, forecasts aim to predict distribution of future states rather than make a single prediction. Recently, multiple data driven models have emerged claiming breakthroughs in skill. However, these mostly been benchmarked using deterministic skill scores, and little known about their probabilistic Unfortunately, it hard fairly compare AI sense, since variations choice ensemble initialization, definition state, noise injection methodology become confounding. Moreover, even obtaining forecast baselines substantial engineering challenge given volumes involved. We sidestep both problems by applying decades-old idea -- lagged ensembles whereby an can be constructed from moderately-sized library forecasts. This allows first parameter-free intercomparison leading models' against operational baseline. The results reveal that two models, i.e. GraphCast Pangu, are tied on CRPS metric though former outperforms latter scoring. also how time-step loss functions, which many data-driven employed, counter-productive: they improve metrics at cost increased dissipation, deteriorating confirmed through ablations applied spherical Fourier Neural Operator (SFNO) approach forecasting. Separate SFNO modulating effective resolution has useful effect dispersion relevant achieving good calibration. hope forthcoming insights help guide development thus shared diagnostic code.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....