Deep learning methods may not outperform other machine learning methods on analyzing genomic studies
Benchmark (surveying)
Elastic net regularization
Personalized Medicine
Sample (material)
F1 score
DOI:
10.3389/fgene.2022.992070
Publication Date:
2022-09-23T08:10:28Z
AUTHORS (7)
ABSTRACT
Deep Learning (DL) has been broadly applied to solve big data problems in biomedical fields, which is most successful image processing. Recently, many DL methods have analyze genomic studies. However, usually too small a sample size fit complex network. They do not common structural patterns like images utilize pre-trained networks or take advantage of convolution layers. The concern overusing motivates us evaluate methods' performance versus popular non-deep Machine (ML) for analyzing with wide range sizes. In this paper, we conduct benchmark study using the UK Biobank and its random subsets different original about 500k participants. Each patient comprehensive characteristics, disease histories, information, i.e., genotypes millions Single-Nucleotide Polymorphism (SNPs). We are interested predicting risk three lung diseases: asthma, COPD, cancer. There 205,238 participants recorded outcomes these diseases. Five prediction models investigated study, including machine learning (Elastic Net, XGBoost, SVM) two deep (DNN LSTM). Besides metrics, such as F1-score, promote hit curve, visual tool describe rare events. discovered that frequently fail outperform ML data, even large datasets over 200k samples. experiment results suggest studies, biobank-level differences between decrease increases. This suggests when significant, further increasing sizes leads more gain methods. Hence, could be better if bigger than study.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (16)
CITATIONS (10)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....