High-Dimensional Methods and Inference on Structural and Treatment Effects
Overfitting
Sample (material)
Predictive power
Data set
DOI:
10.1257/jep.28.2.29
Publication Date:
2014-05-02T11:02:59Z
AUTHORS (3)
ABSTRACT
Data with a large number of variables relative to the sample size—“high-dimensional data”—are readily available and increasingly common in empirical economics. Highdimensional data arise through combination two phenomena. First, may be inherently high dimensional that many different characteristics per observation are available. For example, US Census collects information on hundreds individual scanner datasets record transaction-level for households across wide range products. Second, even when is relatively small, researchers rarely know exact functional form which small enter model interest. Researchers thus faced set potential formed by ways interacting transforming underlying variables. This paper provides an overview how innovations “data mining” can adapted modified provide high-quality inference about parameters. Note we use term modern sense denotes principled search “true” predictive power guards against false discovery overfitting, does not erroneously equate in-sample fit out-of-sample ability, accurately accounts using same examine hypotheses or models.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (20)
CITATIONS (440)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....