NFDI4DS | UHH-SEMS - Publication Details

Analysis of input set characteristics and variances on k-fold cross validation for a Recurrent Neural Network model on waste disposal rate estimation

Overfitting Cross-validation Predictive modelling

DOI: 10.1016/j.jenvman.2022.114869 Publication Date: 2022-03-11T11:26:21Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Hoang Lan Vu

Kelvin Tsun Wai Ng

Amy Richter

Chunjiang An

ABSTRACT

The use of machine learning techniques in waste management studies is increasingly popular. Recent literature suggests k-fold cross validation may reduce input dataset partition uncertainties and minimize overfitting issues. The objectives are to quantify the benefits of k-fold cross validation for municipal waste disposal prediction and to identify the relationship of testing dataset variance on predictive neural network model performance. It is hypothesized that the dataset characteristics and variances may dictate the necessity of k-fold cross validation on neural network waste model construction. Seven RNN-LSTM predictive models were developed using historical landfill waste records and climatic and socio-economic data. The performance of all trials was acceptable in the training and validation stages, with MAPE all less than 10%. In this study, the 7-fold cross validation reduced the bias in selection of testing sets as it helps to reduce MAPE by up to 44.57%, MSE by up to 54.15%, and increased R value by up to 8.33%. Correlation analysis suggests that fewer outliers and less variance of the testing dataset correlated well with lower modeling error. The length of the continuous high waste season and length of total high waste period appear not important to the model performance. The result suggests that k-fold cross validation should be applied to testing datasets with higher variances. The use of MSE as an evaluation index is recommended.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (59)

CITATIONS (118)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products OPENALEX - Publications

PlumX Metrics

Analysis of input set characteristics and variances on k-fold cross validation for a Recurrent Neural Network model on waste disposal rate estimation

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....