Is poverty predictable with machine learning? A study of DHS data from Kyrgyzstan

Variables
DOI: 10.1016/j.seps.2021.101195 Publication Date: 2021-11-14T04:51:13Z
ABSTRACT
Abstract A prerequisite for eliminating poverty is to accurately identify and target the households in poverty. While some factors such as asset holdings are well recognized as relevant for assessing and predicting poverty, a priori selected indicators are not sufficient conditions for poverty and the key factors may vary from one case to another. Researchers have begun to apply machine learning algorithms to predict poor households. This paper uses the accuracy of prediction as the standard to study the application of machine learning algorithms. Using the DHS data of 8040 households in Kyrgyzstan, we apply a state-of-the-art algorithm (XGBoost) to explore the full dataset, profiting from the algorithm's ability in handling many variables, and compare the results with the a priori selected variables. We also compare XGBoost with generalized linear model (GLM), the latter being viewed as an approach in between traditional models and modern machine learning algorithms. The results imply that the inclusion of more variables is not necessarily preferable for prediction; a few important variables selected by the algorithms may also perform well. Different algorithms may select different variables as the important ones for prediction. XGBoost performs better than GLM in most cases, and machine learning is useful for variable selection. Additionally, XGBoost is particularly preferable when using a priori variables.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (59)
CITATIONS (17)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....