- Topic Modeling
- Biomedical Text Mining and Ontologies
- Cancer Genomics and Diagnostics
- Cancer, Hypoxia, and Metabolism
- Radiomics and Machine Learning in Medical Imaging
- Genomics and Phylogenetic Studies
- Biochemical and Structural Characterization
- Natural Language Processing Techniques
- Hepatocellular Carcinoma Treatment and Prognosis
- Genetic factors in colorectal cancer
- Metabolism and Genetic Disorders
- Cancer-related gene regulation
- Gastric Cancer Management and Outcomes
- Toxoplasma gondii Research Studies
- MicroRNA in disease regulation
- Microtubule and mitosis dynamics
- Ferroptosis and cancer prognosis
- Epigenetics and DNA Methylation
- Machine Learning in Materials Science
- Cancer-related molecular mechanisms research
- RNA modifications and cancer
- Machine Learning in Bioinformatics
- ATP Synthase and ATPases Research
- Glycosylation and Glycoproteins Research
- RNA Research and Splicing
Stanford University
2024
Harvard College Observatory
2019-2021
Rockefeller University
2014-2017
Dalton School
2014
University Hospital Schleswig-Holstein
2012
University of Lübeck
2012
Oncogenic Suspect Exposed It can be difficult logistically to study the genomics of rare variants common cancers. Nevertheless, Honeyman et al. (p. 1010 ) studied fibrolamellar hepatocellular carcinoma (FL-HCC), a and poorly understood liver tumor that affects adolescents young adults for which there is no effective treatment. FL-HCCs from 15 patients all expressed chimeric RNA transcript protein containing sequences molecular chaperone fused in frame with catalytic domain kinase A. The...
Abstract The ability to design functional sequences and predict effects of variation is central protein engineering biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments not robust. Such include the prediction variant indels, disordered proteins, proteins such as antibodies due highly variable complementarity determining regions. We introduce a deep generative...
Significance Fibrolamellar hepatocellular carcinoma (FLHCC) is a rare pediatric liver cancer. A deletion of ∼400 kb in one copy chromosome 19 results chimeric protein, an activated protein kinase A. No other deletions, amplifications, mutations, or structural variants were found. This strongly implicates the chimera as driving mutation. paper examines gene expression FLHCC. The establish FLHCC single disease distinct from cancers, including carcinoma. help explain some known pathophysiology:...
Large pretrained models such as GPT-3 have had tremendous impact on modern natural language processing by leveraging self-supervised learning to learn salient representations that can be used readily finetune a wide variety of downstream tasks. We investigate the possibility transferring advances molecular machine building chemical foundation model, ChemBERTa-2, using SMILES. While labeled data for prediction tasks is typically scarce, libraries SMILES strings are available. In this work, we...
Abstract The ability to design functional sequences and predict effects of variation is central protein engineering biotherapeutics. State-of-art computational methods rely on models that leverage evolutionary information but are inadequate for important applications where multiple sequence alignments not robust. Such include the prediction variant indels, disordered proteins, proteins such as antibodies due highly variable complementarity determining regions. We introduce a deep generative...
A bstract Protein language models (PLMs) have demonstrated remarkable success in protein modeling and design, yet their internal mechanisms for predicting structure function remain poorly understood. Here we present a systematic approach to extract analyze interpretable features from PLMs using sparse autoencoders (SAEs). By training SAEs on embeddings the PLM ESM-2, identify up 2,548 human-interpretable latent per layer that strongly correlate with 143 known biological concepts such as...
Fibrolamellar hepatocellular carcinoma (FLC) is a rare primary liver cancer found in adolescents and young adults without underlying disease. A deletion of ~400 kD has been one copy chromosome 19 the tumor tissue all patients tested. This produces fusion genes DNAJB1 PRKACA which, turn, chimeric transcript protein. Transcriptomic analysis shown upregulation various oncologically relevant pathways, including EGF/ErbB, Aurora Kinase A, pak21 wnt. To explore other factors that may contribute to...
Protein language models (PLMs) have demonstrated remarkable success in protein modeling and design, yet their internal mechanisms for predicting structure function remain poorly understood. Here we present a systematic approach to extract analyze interpretable features from PLMs using sparse autoencoders (SAEs). By training SAEs on embeddings the PLM ESM-2, identify up 2,548 human-interpretable latent per layer that strongly correlate with 143 known biological concepts such as binding sites,...
Abstract Genomic analysis of the pediatric cancer fibrolamellar hepatocellular carcinoma Fibrolamellar (FLHCC) is a rare liver tumor that usually occurs in adolescents and young adults. Originally considered variant (HCC), FLHCC characterized by hepatocytes with deeply eosinophilic, granular cytoplasm interspersed fibrous bands, without signs cirrhosis as HCC. Since little known its molecular pathogenesis, we performed RNA-seq whole genome sequencing paired normal from same patient. Our...
Einleitung: Karzinome des Hepato-Gastrointestinaltrakts stellen noch immer eine der häufigsten tumorbedingten Todesursachen weltweit dar. Die Suche nach neuen prädiktiven und diagnostischen Zielstrukturen ist wie vor von großer klinischer Relevanz. Der orphan G-Protein gekoppelte Rezeptor LGR5 wurde kürzlich als Stammzellmarker intestinal differenzierter Zellen entdeckt. In unserer Studie untersuchten wir die Verbreitung, histoanatomische Verteilung biologische Bedeutung Wnt-Zielproteins...
Abstract Advances in genomics and proteomics have enabled more precise characterizations of tumors with the consequence that many cancers are being segregated into smaller categories. With this larger number categories, categorized as rare. The downside to such categorizations is reduced numbers patients each category makes it difficult gather enough information about cancer. We a group caregivers who joined together form repository for patient-shared data reports an IRB-approved, non-profit...
Advances in genomics and proteomics have enabled more precise characterizations of tumors with the consequence that many cancers are being segregated into smaller categories. With this larger number categories, categorized as rare. The downside to such categorizations is reduced numbers patients each category makes it difficult gather enough information about cancer. We a group caregivers who joined together form repository for patient-shared data reports an IRB-approved, non-profit medical...
Abstract Advances in genomics and proteomics have enabled more precise characterizations of tumors with the consequence that many cancers are being segregated into smaller categories. With this larger number categories, categorized as rare. The downside to such categorizations is reduced numbers patients each category makes it difficult gather enough information about cancer. We a group caregivers who joined together form repository for patient-shared data reports an IRB-approved, non-profit...