DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Sequence (biology) Binary classification
DOI: 10.48550/arxiv.2307.05628 Publication Date: 2023-01-01
ABSTRACT
Pre-trained large language models demonstrate potential in extracting information from DNA sequences, yet adapting to a variety of tasks and data modalities remains challenge. To address this, we propose DNAGPT, generalized pre-training model trained on over 200 billion base pairs all mammals. By enhancing the classic GPT with binary classification task (DNA sequence order), numerical regression (guanine-cytosine content prediction), comprehensive token language, DNAGPT can handle versatile analysis while processing both data. Our evaluation genomic signal region recognition, mRNA abundance regression, artificial genomes generation demonstrates DNAGPT's superior performance compared existing designed for specific downstream tasks, benefiting using newly structure.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....