A framework for dependency estimation in heterogeneous data streams
Categorical variable
DOI:
10.1007/s10619-020-07295-x
Publication Date:
2020-06-06T09:03:16Z
AUTHORS (4)
ABSTRACT
Abstract Estimating dependencies from data is a fundamental task of Knowledge Discovery. Identifying the relevant variables leads to better understanding and improves both runtime outcomes downstream Data Mining tasks. Dependency estimation static numerical has received much attention. However, real-world often occurs as heterogeneous streams: On one hand, collected online virtually infinite. other various components stream may be different types, e.g., numerical, ordinal or categorical. For this setting, we propose Monte Carlo Estimation (MCDE), framework that quantifies multivariate dependency average statistical discrepancy between marginal conditional distributions, via simulations. MCDE handles heterogeneity by leveraging three tests: Mann–Whitney U, Kolmogorov–Smirnov Chi-Squared test. We demonstrate goes beyond state art regarding meeting broad set requirements. Finally, show with use case can discover useful patterns in streams.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (47)
CITATIONS (2)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....