NFDI4DS | UHH-SEMS - Publication Details

Interpretable machine learning methods to explain on-farm yield variability of high productivity wheat in Northwest India

forests crop residues machine learning Data & Analytics wheat Wheat yields India 0401 agriculture, forestry, and fisheries 04 agricultural and veterinary sciences 630

DOI: 10.1016/j.fcr.2022.108640 Publication Date: 2022-08-20T08:58:28Z

Abstract Supplemental Material References Cited by

AUTHORS (11)

Hari Sankar Nayak

João Vasco Silva

Chiter Mal Parihar

Timothy J. Krupnik

Dipaka Ranjan Sena

Suresh K. Kakraliya

Hanuman Sahay Jat

Harminder Singh S...

Parbodh C. Sharma

Mangi Lal Jat

Tek B. Sapkota

ABSTRACT

The increasing availability of complex, geo-referenced on-farm data demands analytical frameworks that can guide crop management recommendations. Recent developments in interpretable machine learning techniques offer opportunities to use these methods in agronomic studies. Our objectives were two-fold: (1) to assess the performance of different machine learning methods to explain on-farm wheat yield variability in the Northwestern Indo-Gangetic Plains of India, and (2) to identify the most important drivers and interactions explaining wheat yield variability. A suite of fine-tuned machine learning models (ridge and lasso regression, classification and regression trees, k-nearest neighbor, support vector machines, gradient boosting, extreme gradient boosting, and random forest) were statistically compared using the R2, root mean square error (RMSE), and mean absolute error (MAE). The best performing model was again fine-tuned using a grid search approach for the bias-variance trade-off. Three post-hoc model agnostic techniques were used to interpret the best performing model: variable importance (a variable was considered “important” if shuffling its values increased or decreased the model error considerably), interaction strength (based on Friedman’s H-statistic), and two-way interaction (i.e., how much of the total variability in wheat yield was explained by a particular two-way interaction). Model outputs were compared against empirical data to contextualize results and provide a blueprint for future analysis in other production systems. Tree-based and decision boundary-based methods outperformed regression-based methods in explaining wheat yield variability. Random forest was the best performing method in terms of goodness-of-fit and model precision and accuracy with RMSE, MAE, and R2 ranging between 367 and 470 kg ha−1, 276–345 kg ha−1, and 0.44–0.63, respectively. Random forest was then used for selection of important variables and interactions. The most important management variables explaining wheat yield ...

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (77)

CITATIONS (34)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Interpretable machine learning methods to explain on-farm yield variability of high productivity wheat in Northwest India

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....