Evaluation of different machine learning approaches for predicting high concentration episodes of ground-level ozone: A case study in Catalonia, Spain

Machine Learning Àrees temàtiques de la UPC::Desenvolupament humà i sostenible::Enginyeria ambiental Ozone Air pollution High ozone episodes Random forest
DOI: 10.1016/j.apr.2023.101999 Publication Date: 2023-11-29T13:46:47Z
ABSTRACT
Ground-level ozone (O3) is a pollutant with a great impact on human health and the environment. As a secondary air contaminant of photochemical origin, those areas with greater exposure to solar radiation, such as Spain and other Mediterranean countries, are considerably affected. With the aggravation of O3 pollution, it is important to provide reliable forecasting tools to help stakeholders implement more effective policies to mitigate the negative impact associated with this problem. In this regard, Machine Learning-based models have emerged in recent years, since they are able to identify complex relationships between ozone levels and relevant variables. However, their application to capture the most extreme events remains difficult. In this work, different ML approaches for predicting daily maximum 8-h average ozone (O3,MDA8) were compared, investigating their ability to forecast the highest concentration levels recorded. Two variants of the Random Forest algorithm (regression and classification) were applied to a specific area of Catalonia, Spain, with a special interest due to the high number of episodes of exceedance of O3 concentration levels. The predictive models were built with a 1 day time horizon, using datasets from 2002 to 2020. The variables used as inputs were other air pollutants concentrations and meteorological processes, monitored the day before to the target day to be predicted, and time information. Although results showed reasonable overall performances, low accuracy was achieved when forecasting the highest episodes of O3,MDA8. To improve the capacity of the models in predicting high-O3,MDA8 concentration levels, a methodology was proposed to fine-tuning the original predictions of the ML models according to a classification metric, G-Mean, which allows adjusting the balance between the correct predictions of different classes. Using the Sensitivity and Specificity metrics, the classical approaches were compared with the original ones proposed in the present study. The results obtained, for all the cases analysed, showed a mean increase in Sensitivity of 0.28, associated with a greater number of True Positives (correct predictions of high O3-episodes). On the other hand, the average Specificity value decreased, due to the appearance of a greater number of False Positives, although this reduction was only 0.05. The proposed criteria showed promising results, better balancing classification metrics and increasing the ratio of correct predictions linked to the higher ranges of O3.<br/>This research was developed within the PIKSEL project, "Portal for the integration of knowledge for a sustainable ecosystems and land management" funded by Generalitat de Catalonia, through the Department of Territory and Sustainability and the Department of Climate Action. The authors also acknowledge the financial support through the Severo Ochoa Centers of Excellence Program (CEX 2018-000797-S) funded by MCIN/AEI/10.13039/501100011033. Finally, we would also like to thank the following people, Xavier Baulies i Bochaca, Eva M. Pérez Gabucio, Cristina Alonso Rodríguez and Miguel García Dalmau from Generalitat de Catalonia, for their invaluable help in providing expert knowledge in the study area.<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (66)
CITATIONS (4)