CD-GPT As a Biological Foundation Model Bridging the Gap between Molecular Sequences Through Central Dogma
Foundation (evidence)
DOI:
10.1101/2024.06.24.600337
Publication Date:
2024-06-29T00:40:32Z
AUTHORS (7)
ABSTRACT
Abstract The central dogma serves as a fundamental framework for understanding the flow and expression of genetic information within living organisms, facilitating connection diverse biological sequences across molecule types. In this study, we present CD-GPT (Central Dogma Generative Pretrained Transformer), generative foundation model comprising 1 billion parameters, aiming to capture intricate system-wide molecular interactions in systems. We introduce concept unified representational space employ shared, multi-molecule vocabulary effectively represent narrow their distance embedding space. Through extensive pretraining on comprehensive full level data, exhibits exceptional performance wide range predictive downstream tasks, encompassing mono-molecular multi-molecular analyses. Notably, excels tasks such genomic element detection, protein property prediction, RNA-protein interaction identification also like de novo generation reverse translation. versatility opens up promising avenues advanced multi-omics analysis.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (72)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....