HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes

0301 basic medicine Artificial intelligence Autism Spectrum Disorder Developmental psychology Autism Gene prediction Boosting (machine learning) Gene Psychology Gene Regulatory Networks Similarity (geometry) Protein Interaction Maps Biology (General) Autism spectrum disorder Child Life Sciences Ensemble forecasting Analysis of Gene Interaction Networks 3. Good health FOS: Psychology Semantic similarity Gradient boosting Functional gene network Gene classification Candidate gene QH301-705.5 Cognitive Neuroscience Computer applications to medicine. Medical informatics R858-859.7 Gene Set Enrichment Analysis 03 medical and health sciences Biochemistry, Genetics and Molecular Biology Ensemble learning Machine learning Genetics Image (mathematics) Humans Autistic Disorder Molecular Biology Biology Prediction of Protein Subcellular Localization Research Boosting techniques Computer science Autism Spectrum Disorders FOS: Biological sciences Gene ontology Classifier (UML) Neuroscience Random forest
DOI: 10.1186/s12859-022-05099-7 Publication Date: 2022-12-21T12:02:57Z
ABSTRACT
Abstract Purpose Autism spectrum disorder (ASD) is the most prevalent disease today. The causes of its infection may be attributed to genetic causes by 80% and environmental causes by 20%. In spite of this, the majority of the current research is concerned with environmental causes, and the least proportion with the genetic causes of the disease. Autism is a complex disease, which makes it difficult to identify the genes that cause the disease. Methods Hybrid ensemble-based classification (HEC-ASD) model for predicting ASD genes using gradient boosting machines is proposed. The proposed model utilizes gene ontology (GO) to construct a gene functional similarity matrix using hybrid gene similarity (HGS) method. HGS measures the semantic similarity between genes effectively. It combines the graph-based method, such as Wang method with the number of directed children’s nodes of gene term from GO. Moreover, an ensemble gradient boosting classifier is adapted to enhance the prediction of genes forming a robust classification model. Results The proposed model is evaluated using the Simons Foundation Autism Research Initiative (SFARI) gene database. The experimental results are promising as they improve the classification performance for predicting ASD genes. The results are compared with other approaches that used gene regulatory network (GRN), protein to protein interaction network (PPI), or GO. The HEC-ASD model reaches the highest prediction accuracy of 0.88% using ensemble learning classifiers. Conclusion The proposed model demonstrates that ensemble learning technique using gradient boosting is effective in predicting autism spectrum disorder genes. Moreover, the HEC-ASD model utilized GO rather than using PPI network and GRN.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (46)
CITATIONS (4)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....