Thresholding Nonprobability Units in Combined Data for Efficient Domain Estimation

Methodology (stat.ME) FOS: Computer and information sciences Statistics - Methodology
DOI: 10.48550/arxiv.2502.09524 Publication Date: 2025-01-01
ABSTRACT
20 pages, 3 figures<br/>Quasi-randomization approaches estimate latent participation probabilities for units from a nonprobability / convenience sample. Estimation of participation probabilities for convenience units allows their combination with units from the randomized survey sample to form a survey weighted domain estimate. One leverages convenience units for domain estimation under the expectation that estimation precision and bias will improve relative to solely using the survey sample; however, convenience sample units that are very different in their covariate support from the survey sample units may inflate estimation bias or variance. This paper develops a method to threshold or exclude convenience units to minimize the variance of the resulting survey weighted domain estimator. We compare our thresholding method with other thresholding constructions in a simulation study for two classes of datasets based on degree of overlap between survey and convenience samples on covariate support. We reveal that excluding convenience units that each express a low probability of appearing in \emph{both} reference and convenience samples reduces estimation error.<br/>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....