Data Reduction Algorithm: Identification and Elimination of Erroneous rows
Identification
DOI:
10.1109/rmkmate59243.2023.10369905
Publication Date:
2024-01-03T19:25:26Z
AUTHORS (4)
ABSTRACT
Data Reduction without the removal of exact, correct rows is a crucial pre-processing step. Large Datasets make it difficult to model data effectively or forecast results accurately. Additionally, they demand lengthy processing times, sophisticated complexity software and thorough cleaning. The incorrect irrelevant may produce inaccurate that impair performance model. For better more accurate outcomes properly detect remove data. proposed algorithm calculates Initial Recall value Dataset. It eliminates least correlated features using Correlation Matrix. Using Gaussian Curve, for all columns identifies having values which lie beyond (μ ± 3σ). Furthermore, takes into account column with highest Standard Deviation, selects nearest 50% Left Right from column's mean. only those Final value. Negligible difference between implies removed had no minimal impact on Dataset's final result. This implemented 3 Medical - Pima Diabetes, Heart Attack Breast Cancer. Cancer Dataset, this eliminated number 231.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (9)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....