An Ensemble Machine-Learning Model To Predict Historical PM2.5Concentrations in China from Satellite Data
Hindcast
Gradient boosting
Ensemble forecasting
Ensemble Learning
Data set
DOI:
10.1021/acs.est.8b02917
Publication Date:
2018-10-24T23:04:26Z
AUTHORS (4)
ABSTRACT
The long satellite aerosol data record enables assessments of historical PM2.5 level in regions where routine monitoring began only recently. However, most previous models reported decreased prediction accuracy when predicting levels outside the model-training period. In this study, we proposed an ensemble machine learning approach that provided reliable hindcast capabilities. missing were first filled by multiple imputation. Then modeling domain, China, was divided into seven using a spatial clustering method to control for unobserved heterogeneity. A set including random forest, generalized additive model, and extreme gradient boosting trained each region separately. Finally, model developed combine predictions from different algorithms. characterized spatiotemporal distribution daily well with cross-validation (CV) R2 (RMSE) 0.79 (21 μg/m3). cluster-based subregion outperformed national improved CV ∼0.05. Compared studies, our more accurate out-of-range at ( = 0.58, RMSE 29 μg/m3) monthly 0.76, 16 Our system allows construction unbiased levels.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (52)
CITATIONS (256)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....