Advancing Semi-Continuous Treatment Effect Estimation: Machine Learning and Parametric Approaches in Propensity Score Analysis

DOI: 10.31234/osf.io/q4cgz_v1 Publication Date: 2025-03-07T21:50:37Z
ABSTRACT
AbstractPropensity score analysis (PSA) is a widely used method to address selection bias in observational studies, but its application to semi-continuous treatments remains limited. This study explores two generalized propensity score (GPS) definitions and compares parametric methods with Gradient Boosting Machines (GBM) for estimating average treatment effects (ATE) in semi-continuous treatments. The simulation findings highlight that the Zero-Inflated Negative Binomial (ZINB) model paired with the conditional mean GPS achieves the best covariate balance and reliable ATE estimates. While GBM excelled under specific conditions for the Hurdle model, it was less effective than the ZINB model. The Hurdle Negative Binomial (HNB) model consistently failed to yield unbiased ATE estimates. A practical example using Math Nation data, where the treatment is the number of recommended videos watched and the outcome is quiz scores, demonstrates PSA application. Translational AbstractPropensity score analysis (PSA) is a powerful tool for reducing selection bias in observational studies, but its application to semi-continuous treatments remains underexplored. This challenge is particularly relevant in online educational settings, where semi-continuous data, such as student engagement metrics, are increasingly common. This study addresses this gap by evaluating methods to estimate treatment effects for semi-continuous data, comparing statistical models and machine learning approaches.Through comprehensive simulations, we found that the Zero-Inflated model combined with a conditional mean definition provides the best balance and reliable treatment effect estimates. These findings offer practical guidance for researchers and practitioners dealing with semi-continuous data in education and beyond, helping to improve the accuracy of causal inferences in complex observational studies. A practical example using Math Nation data further illustrates the application of these methods.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....