Nilah M. Ioannidis

ORCID: 0000-0001-9628-8229
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Chromatin Dynamics
  • Genomics and Rare Diseases
  • Bioinformatics and Genomic Networks
  • Gene expression and cancer classification
  • Genetic Associations and Epidemiology
  • Machine Learning in Bioinformatics
  • Genomics and Phylogenetic Studies
  • Epigenetics and DNA Methylation
  • Nonmelanoma Skin Cancer Studies
  • Nutrition, Genetics, and Disease
  • Genomic variations and chromosomal abnormalities
  • RNA Research and Splicing
  • Birth, Development, and Health
  • RNA and protein synthesis mechanisms
  • Renal and related cancers
  • Cancer Genomics and Diagnostics
  • Advanced Proteomics Techniques and Applications
  • Muscle Physiology and Disorders
  • Cutaneous lymphoproliferative disorders research
  • RNA modifications and cancer
  • Immunotherapy and Immune Responses
  • Genetic and Kidney Cyst Diseases
  • Genetics, Aging, and Longevity in Model Organisms
  • Adipose Tissue and Metabolism
  • Vector-Borne Animal Diseases

Chan Zuckerberg Initiative (United States)
2022-2025

University of California, Berkeley
2020-2024

Berkeley College
2024

Stanford University
2016-2020

Jain Foundation
2018-2019

Abstract Regular, systematic, and independent assessments of computational tools that are used to predict the pathogenicity missense variants necessary evaluate their clinical research utility guide future improvements. The Critical Assessment Genome Interpretation (CAGI) conducts ongoing Annotate-All-Missense (Missense Marathon) challenge, in which variant effect predictors (also called impact predictors) evaluated on added disease-relevant databases following prediction submission...

10.1007/s00439-025-02732-2 article EN cc-by Human Genetics 2025-03-21

Age is the primary risk factor for many common human diseases. Here, we quantify relative contributions of genetics and aging to gene expression patterns across 27 tissues from 948 humans. We show that predictive power quantitative trait loci impacted by age in tissues. Jointly modelling transcript level variation find heritability (h2) consistent among while contribution varies >20-fold with [Formula: see text] 5 force purifying selection stronger on genes expressed early versus late life...

10.1038/s41467-022-33509-0 article EN cc-by Nature Communications 2022-10-03

Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current perform well at predicting across genes in different cell types the reference genome, their ability to explain variation between individuals due cis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art on paired personal genome transcriptome data find limited performance when explaining individuals. In...

10.1038/s41588-023-01574-w article EN cc-by Nature Genetics 2023-11-30

Genetic variation in the human genome is a major determinant of individual disease risk, but vast majority missense variants have unknown etiological effects. Here, we present robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors proteome-wide variant pathogenicity.We train cross-protein transfer (CPT) models using deep mutational scanning (DMS) data from only five proteins and achieve state-of-the-art performance on...

10.1186/s13059-023-03024-6 article EN cc-by Genome biology 2023-08-07

Abstract Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current perform well at predicting across genes in different cell types the reference genome, their ability to explain variation between individuals due cis-regulatory genetic variants remains largely unexplored. Here we evaluate four state-of-the-art on paired personal genome transcriptome data find limited performance when explaining individuals.

10.1101/2023.06.30.547100 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-06-30

Abstract Regular, systematic, and independent assessment of computational tools used to predict the pathogenicity missense variants is necessary evaluate their clinical research utility suggest directions for future improvement. Here, as part sixth edition Critical Assessment Genome Interpretation (CAGI) challenge, we assess variant effect predictors (or impact predictors) on an evaluation dataset rare from disease-relevant databases. Our evaluates submitted CAGI6 Annotate-All-Missense...

10.1101/2024.06.06.597828 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2024-06-08

Abstract Background A number of deep learning models have been developed to predict epigenetic features such as chromatin accessibility from DNA sequence. Model evaluations commonly report performance genome-wide; however, cis regulatory elements (CREs), which play critical roles in gene regulation, make up only a small fraction the genome. Furthermore, cell type-specific CREs contain large proportion complex disease heritability. Results We evaluate genomic regions with varying degrees type...

10.1186/s13059-024-03335-2 article EN cc-by Genome biology 2024-08-01

Interpreting genetic variation in noncoding regions of the genome is an important challenge for personal analysis. One mechanism by which single nucleotide variants (SNVs) influence downstream phenotypes through regulation gene expression. Methods to predict whether or not individual SNVs are likely regulate expression would aid interpretation unknown significance identified whole-genome sequencing studies.We developed FIRE (Functional Inference Regulators Expression), a tool score both and...

10.1093/bioinformatics/btx534 article EN Bioinformatics 2017-08-23

Deep learning models in genomics that predict molecular phenotypes from DNA sequence traditionally focus on one-hot encoded representations. Here, we develop a novel model extends this approach by incorporating structural attributes indicative of local shape alongside canonical inputs. This augmentation provides an additional axis for interpretability and aids identifying regulatory patterns not apparent alone. Applying to prediction transcription factor binding (ChIP-seq) demonstrates...

10.1101/2025.04.01.646034 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2025-04-03

Genomic sequence-to-expression deep learning models, which are trained to predict gene expression and other molecular phenotypes across the reference genome, have recently been shown poor out-of-the-box performance in predicting variation individuals based on their personal genome sequences. Here we explore whether additional training (fine-tuning) paired transcriptome data improves of such models. Using Enformer as a representative pre-trained model, various fine-tuning strategies. Our...

10.1101/2024.09.23.614632 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-09-25

Cutaneous squamous cell carcinoma (cSCC) is a common skin cancer with genetic susceptibility loci identified in recent genome-wide association studies (GWAS). Transcriptome-wide (TWAS) using imputed gene expression levels can identify additional gene-level associations. Here we impute 6891 cSCC cases and 54,566 controls the Kaiser Permanente Genetic Epidemiology Research Adult Health Aging (GERA) cohort 25,558 self-reported 673,788 from 23andMe. In discovery-validation study, 19 containing...

10.1038/s41467-018-06149-6 article EN cc-by Nature Communications 2018-10-09

Kidney disease is highly heritable; however, the causal genetic variants, cell types in which these variants function, and molecular mechanisms underlying kidney remain largely unknown. To identify loci affecting we performed a GWAS using multiple function biomarkers identified 462 loci. begin to investigate how affect generated single-cell chromatin accessibility (scATAC-seq) maps of human candidate

10.1101/2024.06.18.599625 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-06-22

Gene therapies have the potential to treat disease by delivering therapeutic genetic cargo disease-associated cells. One limitation their widespread use is lack of short regulatory sequences, or promoters, that differentially induce expression delivered in target cells, minimizing side effects other cell types. Such cell-type-specific promoters are difficult discover using existing methods, requiring either manual curation access large datasets promoter-driven from both targeted and...

10.1101/2024.06.23.600232 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-06-23

The ability to deliver genetic cargo human cells is enabling rapid progress in molecular medicine, but designing this for precise expression specific cell types a major challenge. Expression driven by regulatory DNA sequences within short synthetic promoters, relatively few of these promoters are cell-type-specific. design cell-type-specific using model-based optimization would be impactful research and therapeutic applications. However, models from (promoter-driven expression) lacking most...

10.1101/2023.02.24.529941 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-02-27

Computational genomics increasingly relies on machine learning methods for genome interpretation, and the recent adoption of neural sequence-to-function models highlights need rigorous model specification controlled evaluation, problems familiar to other fields AI. Research strategies that have greatly benefited -- including benchmarking, auditing, algorithmic fairness --- are also needed advance field genomic AI facilitate development. Here we propose a benchmark, GUANinE, evaluating...

10.1101/2023.10.12.562113 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-10-17

Abstract Genetic variation in the human genome is a major determinant of individual disease risk, but vast majority missense variants have unknown etiological effects. Here, we present robust learning framework for leveraging saturation mutagenesis experiments to construct accurate computational predictors proteome-wide variant pathogenicity. We train cross-protein transfer (CPT) models using deep mutational scanning data from only five proteins and achieve state-of-the-art performance on...

10.1101/2022.11.15.516532 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-11-17

Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications genome sequencing for diagnosis and personalized care. Non-coding remain particularly difficult to interpret, despite making up a large majority trait associations identified genome-wide association studies (GWAS) analyses. Predicting the regulatory effects non-coding on candidate genes key step evaluating their significance. Here, we develop machine-learning algorithm, Inference Connected...

10.1093/bioinformatics/btaa254 article EN Bioinformatics 2020-04-17
Coming Soon ...