Statistical model building: Background “knowledge” based on inappropriate preselection causes misspecification
ddc:004
Medicine (General)
Simulation study
Variable selection
Regression model
610
01 natural sciences
Background knowledge
Backward elimination
510
R5-920
Humans
Computer Simulation
0101 mathematics
Univariable selection
ddc:610
Need for more data sharing
Models, Statistical
Research
DATA processing & computer science
004
Causality
Regression model ; Need for more data sharing ; Univariable selection ; Backward elimination ; Simulation study ; Research ; Humans [MeSH] ; Models, Statistical [MeSH] ; Variable selection ; Causality [MeSH] ; Computer Simulation [MeSH] ; Background knowledge
610 Medizin und Gesundheit
info:eu-repo/classification/ddc/004
600 Technik, Medizin, angewandte Wissenschaften::610 Medizin und Gesundheit::610 Medizin und Gesundheit
DOI:
10.1186/s12874-021-01373-z
Publication Date:
2021-09-29T09:05:51Z
AUTHORS (6)
ABSTRACT
Abstract
Background
Statistical model building requires selection of variables for a model depending on the model’s aim. In descriptive and explanatory models, a common recommendation often met in the literature is to include all variables in the model which are assumed or known to be associated with the outcome independent of their identification with data driven selection procedures. An open question is, how reliable this assumed “background knowledge” truly is. In fact, “known” predictors might be findings from preceding studies which may also have employed inappropriate model building strategies.
Methods
We conducted a simulation study assessing the influence of treating variables as “known predictors” in model building when in fact this knowledge resulting from preceding studies might be insufficient. Within randomly generated preceding study data sets, model building with variable selection was conducted. A variable was subsequently considered as a “known” predictor if a predefined number of preceding studies identified it as relevant.
Results
Even if several preceding studies identified a variable as a “true” predictor, this classification is often false positive. Moreover, variables not identified might still be truly predictive. This especially holds true if the preceding studies employed inappropriate selection methods such as univariable selection.
Conclusions
The source of “background knowledge” should be evaluated with care. Knowledge generated on preceding studies can cause misspecification.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (29)
CITATIONS (13)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....