Irina M. Armean
- Genomics and Rare Diseases
- Genomics and Phylogenetic Studies
- Bioinformatics and Genomic Networks
- RNA and protein synthesis mechanisms
- Genetic Associations and Epidemiology
- CRISPR and Genetic Engineering
- Genomic variations and chromosomal abnormalities
- Cancer Genomics and Diagnostics
- Computational Drug Discovery Methods
- Parkinson's Disease Mechanisms and Treatments
- Biomedical Text Mining and Ontologies
- Vector-Borne Animal Diseases
- RNA modifications and cancer
- Molecular Biology Techniques and Applications
- Advanced Proteomics Techniques and Applications
- Plant Virus Research Studies
- Insect Resistance and Genetics
- Viral Infections and Vectors
- Genomics and Chromatin Dynamics
- Animal Disease Management and Epidemiology
- Rural development and sustainability
- Protein Structure and Dynamics
- Social Capital and Networks
- Peptidase Inhibition and Analysis
- Insect symbiosis and bacterial influences
Broad Institute
2017-2024
European Bioinformatics Institute
2015-2023
Wellcome Trust
2018-2023
Massachusetts General Hospital
2019-2021
University of Cambridge
2011-2018
Abstract Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences gene disruption: crucial for function an organism will be depleted such in natural populations, whereas non-essential tolerate their accumulation. However, predicted loss-of-function enriched annotation errors, and tend to found at extremely low frequencies, so analysis requires careful variant very large sample sizes 1 . Here we describe aggregation 125,748...
Ensembl (https://www.ensembl.org) is unique in its flexible infrastructure for access to genomic data and annotation. It has been designed efficiently deliver annotation at scale all eukaryotic life, it also provides deep comprehensive key species. Genomes representing a greater diversity of species are increasingly being sequenced. In response, we have focussed our recent efforts on expediting the new assemblies. Here, report release greatest annual number newly annotated genomes history...
Abstract The Ensembl project (https://www.ensembl.org) annotates genomes and disseminates genomic data for vertebrate species. We create detailed comprehensive annotation of gene structures, regulatory elements variants, enable comparative genomics by inferring the evolutionary history genes genomes. Our integrated are made available in a variety ways, including genome browsers, search interfaces, specialist tools such as Variant Effect Predictor, download files programmatic interfaces....
The Ensembl (https://www.ensembl.org) is a system for generating and distributing genome annotation such as genes, variation, regulation comparative genomics across the vertebrate subphylum key model organisms. pipeline capable of integrating experimental reference data from multiple providers into single integrated resource. Here, we present 94 newly annotated re-annotated genomes, bringing total number genomes offered by to 227. This represents largest expansion resource since its...
Summary Genetic variants that inactivate protein-coding genes are a powerful source of information about the phenotypic consequences gene disruption: critical for an organism’s function will be depleted such in natural populations, while non-essential tolerate their accumulation. However, predicted loss-of-function (pLoF) enriched annotation errors, and tend to found at extremely low frequencies, so analysis requires careful variant very large sample sizes 1 . Here, we describe aggregation...
The Ensembl project (https://www.ensembl.org) makes key genomic data sets available to the entire scientific community without restrictions. seeks be a fundamental resource driving progress by creating, maintaining and updating reference genome annotation comparative genomics resources. This year we describe our new expanded gene, variant capabilities, which led 50% increase in number of vertebrate genomes support. We have also doubled human variants added regulatory regions for many mouse...
Structural variants (SVs) rearrange large segments of DNA
IntAct is an open-source, open data molecular interaction database and toolkit. Data abstracted from the literature or direct depositions by expert curators following a deep annotation model providing high level of detail. As September 2009, contains over 200.000 curated binary evidences. In response to growing volume user requests, now provides two-tiered view data. The search interface allows iteratively develop complex queries, exploiting detailed with hierarchical controlled...
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources vertebrate genomics developed in context of project (http://www.ensembl.org). Together, two provide a consistent set programmatic and interactive interfaces to rich range including reference sequence, gene models, transcriptional data, genetic variation comparative analysis. This paper provides update previous publications about resource,...
The major goal of sequencing humans and many other species is to understand the link between genomic variation, phenotype disease. There are numerous valuable well-established variation resources, but collating making sense non-homogeneous, often large-scale data sets from disparate sources remains a challenge. Without systematic catalogue these appropriate query annotation tools, understanding genome sequence an individual assessing their disease risk impossible. In Ensembl, we...
Abstract The acceleration of DNA sequencing in samples from patients and population studies has resulted extensive catalogues human genetic variation, but the interpretation rare variants remains problematic. A notable example this challenge is existence disruptive dosage-sensitive disease genes, even apparently healthy individuals. Here, by manual curation putative loss-of-function (pLoF) haploinsufficient genes Genome Aggregation Database (gnomAD) 1 , we show that one explanation for...
Abstract Naturally occurring human genetic variants that are predicted to inactivate protein-coding genes provide an in vivo model of gene inactivation complements knockout studies cells and organisms. Here we report three key findings regarding the assessment candidate drug targets using loss-of-function variants. First, even essential genes, which not tolerated, can be highly successful as inhibitory drugs. Second, most sufficiently rare genotype-based ascertainment homozygous or compound...
Upstream open reading frames (uORFs) are tissue-specific cis-regulators of protein translation. Isolated reports have shown that variants create or disrupt uORFs can cause disease. Here, in a systematic genome-wide study using 15,708 whole genome sequences, we show new upstream start codons, and disrupting stop sites existing uORFs, under strong negative selection. This selection signal is significantly stronger for arising genes intolerant to loss-of-function variants. Furthermore, creating...
Although we now have a wealth of information on the transcription patterns all genes in Drosophila genome, much less is known about properties encoded proteins. To provide expression and subcellular localisations many proteins parallel, performed large-scale protein trap screen using hybrid piggyBac vector carrying an artificial exon encoding yellow fluorescent (YFP) affinity tags. From screening 41 million embryos, recovered 616 verified independent YFP-positive lines representing traps 374...
Multi-nucleotide variants (MNVs), defined as two or more nearby existing on the same haplotype in an individual, are a clinically and biologically important class of genetic variation. However, tools typically do not accurately classify MNVs, understanding their mutational origins remains limited. Here, we systematically survey MNVs 125,748 whole exomes 15,708 genomes from Genome Aggregation Database (gnomAD). We identify 1,792,248 across genome with constituent falling within 2 bp distance...
Abstract Human genetic variants predicted to cause loss-of-function of protein-coding genes (pLoF variants) provide natural in vivo models human gene inactivation and can be valuable indicators function the potential toxicity therapeutic inhibitors targeting these 1,2 . Gain-of-kinase-function LRRK2 are known significantly increase risk Parkinson’s disease 3,4 , suggesting that inhibition kinase activity is a promising strategy. While preclinical studies model organisms have raised some...
Abstract Summary Assessing the pathogenicity of genetic variants can be a complex and challenging task. Spliceogenic variants, which alter mRNA splicing, may yield mature transcripts that encode non-functional protein products, an important predictor Mendelian disease risk. However, most variant annotation tools do not adequately assess spliceogenicity outside native splice site thus disease-causing potential in other intronic exonic regions is often overlooked. Here, we present plugin for...
The Ensembl Variant Effect Predictor (VEP) is a freely available, open-source tool for the annotation and filtering of genomic variants. It predicts variant molecular consequences using Ensembl/GENCODE or RefSeq gene sets. also reports phenotype associations from databases such as ClinVar, allele frequencies studies including gnomAD, predictions deleteriousness tools Sorting Intolerant From Tolerant Combined Annotation Dependent Depletion. VEP includes options to customize prioritization....
Affinity purification coupled to mass spectrometry provides a reliable method for identifying proteins and their binding partners. In this study we have used Drosophila melanogaster triple tagged with Flag, Strep II, Yellow fluorescent protein in vivo within affinity pull-down experiments isolated these native complexes from embryos. We describe pipeline determining interactomes by Parallel Capture (iPAC) show its use partners of several baits range sizes subcellular locations. This protocol...