NFDI4DS | UHH-SEMS - Publication Details

Performance discrepancy mitigation in heart disease prediction for multisensory inter-datasets

Artificial neural network Artificial intelligence Support vector machine Data pre-processing Bioinformatics Health Professions Handling Imbalanced Data in Classification Problems Normalization (sociology) Boosting (machine learning) Heartbeat Classification Pattern recognition (psychology) Heart disease prediction 0302 clinical medicine Health Information Management Sociology Artificial Intelligence Health Sciences Machine learning Arrhythmia Detection Decision tree 0202 electrical engineering, electronic engineering, information engineering Multilayer perceptron Data mining Preprocessor Inter-dataset Machine Learning in Healthcare and Medicine Naive Bayes classifier QA75.5-76.95 Analysis of Electrocardiogram Signals Dimensionality reduction Computer science FOS: Sociology Programming language Performance discrepancy Electronic computers. Computer science Anthropology Computer Science Physical Sciences Signal Processing Feature selection Medicine Heart Disease Prediction Cardiac Health Diagnosis Pipeline (software) Cardiology and Cardiovascular Medicine Random forest

DOI: 10.7717/peerj-cs.1917 Publication Date: 2024-03-18T08:18:51Z

Abstract Supplemental Material References Cited by

AUTHORS (6)

Mahmudul Hasan

Md Abdus Sahid

Md Palash Uddin

Md Abu Marjan

Seifedine Kadry

Jungeun Kim

ABSTRACT

Heart disease is one of the primary causes of morbidity and death worldwide. Millions of people have had heart attacks every year, and only early-stage predictions can help to reduce the number. Researchers are working on designing and developing early-stage prediction systems using different advanced technologies, and machine learning (ML) is one of them. Almost all existing ML-based works consider the same dataset (intra-dataset) for the training and validation of their method. In particular, they do not consider inter-dataset performance checks, where different datasets are used in the training and testing phases. In inter-dataset setup, existing ML models show a poor performance named the inter-dataset discrepancy problem. This work focuses on mitigating the inter-dataset discrepancy problem by considering five available heart disease datasets and their combined form. All potential training and testing mode combinations are systematically executed to assess discrepancies before and after applying the proposed methods. Imbalance data handling using SMOTE-Tomek, feature selection using random forest (RF), and feature extraction using principle component analysis (PCA) with a long preprocessing pipeline are used to mitigate the inter-dataset discrepancy problem. The preprocessing pipeline builds on missing value handling using RF regression, log transformation, outlier removal, normalization, and data balancing that convert the datasets to more ML-centric. Support vector machine, K-nearest neighbors, decision tree, RF, eXtreme Gradient Boosting, Gaussian naive Bayes, logistic regression, and multilayer perceptron are used as classifiers. Experimental results show that feature selection and classification using RF produce better results than other combination strategies in both single- and inter-dataset setups. In certain configurations of individual datasets, RF demonstrates 100% accuracy and 96% accuracy during the feature selection phase in an inter-dataset setup, exhibiting commendable precision, recall, F1 score, specificity, and AUC score. The results indicate that an effective preprocessing technique has the potential to improve the performance of the ML model without necessitating the development of intricate prediction models. Addressing inter-dataset discrepancies introduces a novel research avenue, enabling the amalgamation of identical features from various datasets to construct a comprehensive global dataset within a specific domain.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (87)

CITATIONS (13)

EXTERNAL LINKS

CROSSREF - Publications OPENAIRE - Products

PlumX Metrics

Performance discrepancy mitigation in heart disease prediction for multisensory inter-datasets

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....