Jaina Mistry
- Genomics and Phylogenetic Studies
- Advanced Proteomics Techniques and Applications
- Machine Learning in Bioinformatics
- Bioinformatics and Genomic Networks
- Genetics, Bioinformatics, and Biomedical Research
- RNA and protein synthesis mechanisms
- Computational Drug Discovery Methods
- Enzyme Structure and Function
- Protein Structure and Dynamics
- SARS-CoV-2 and COVID-19 Research
- vaccines and immunoinformatics approaches
- Microbial Community Ecology and Physiology
- Algorithms and Data Compression
- Biotechnology and Related Fields
- COVID-19 diagnosis using AI
- Saffron Plant Research Studies
- Analytical Chemistry and Chromatography
- Cell Image Analysis Techniques
- Enzyme Catalysis and Immobilization
- Drug Transport and Resistance Mechanisms
- Forensic and Genetic Research
- Mosquito-borne diseases and control
- Metabolism, Diabetes, and Cancer
- Scientific Computing and Data Management
- Cerebral Venous Sinus Thrombosis
Cambridge University Hospitals NHS Foundation Trust
2023
European Bioinformatics Institute
2008-2020
Wellcome Trust
2013-2018
University of California, San Diego
2015
Wellcome Sanger Institute
2007-2013
Science for Life Laboratory
2013
Howard Hughes Medical Institute
2007-2009
Stockholm University
2007-2009
University College London
2008
Université Claude Bernard Lyon 1
2008
Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries current release, version 27.0. Since last update article 2 years ago, we have generated 1182 new families maintained sequence coverage UniProt Knowledgebase (UniProtKB) at nearly 80%, despite 50% increase size underlying database. our 2012 describing also undertaken comprehensive review features that...
In the last two years Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce effort involved in making release, thereby permitting more frequent releases. Arguably most significant of these changes is that now primarily based on UniProtKB reference proteomes, with counts matched sequences and species reported website restricted this smaller set. Building families proteomes brings greater stability, which decreases amount manual curation required maintain...
Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since was last described in this journal, over 350 new have been added 33.1 numerous improvements made to existing entries. To facilitate research on COVID-19, we revised the entries that cover SARS-CoV-2 proteome, built regions were not covered by Pfam. We reintroduced Pfam-B which provides an automatically generated supplement contains 136 730 novel clusters of are yet matched...
The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). number of families has grown substantially to a total 17,929 release 32.0. New additions been coupled with efforts improve existing families, including refinement domain boundaries, their classification into clans, as well functional annotation. We recently began collaborate the RepeatsDB resource definition tandem repeat within Pfam. carried out comparison structural database, namely Evolutionary...
Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated families as release 26.0. available via servers in the UK (http://pfam.sanger.ac.uk/), USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over last 2 years, generated 1840 new increased coverage UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, taken step opening up annotation...
Pfam is a widely used database of protein families and domains. This article describes set major updates that we have implemented in the latest release (version 24.0). The most important change now use HMMER3, version popular profile hidden Markov model package. software ∼100 times faster than HMMER2 more sensitive due to routine forward algorithm. move HMMER3 has necessitated numerous changes are described detail. 24.0 contains 11 912 families, which large number been significantly updated...
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY TIGRFAMs. Integration is performed manually approximately half of the total 58,000 signatures available in databases belong to an entry. Recently, we have started also display remaining un-integrated via our web...
Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments profile hidden Markov models. The current release (22.0) contains 9318 families. now based not only on the UniProtKB database, but also NCBI GenPept sequences from selected metagenomics projects. available web consortium members using new, consistent improved website design in UK ( http://pfam.sanger.ac.uk/ ), USA http://pfam.janelia.org/ ) Sweden http://pfam.sbc.su.se/ well mirror...
Detection of protein homology via sequence similarity has important applications in biology, from structure and function prediction to reconstruction phylogenies. Although current methods for aligning sequences are powerful, challenges remain, including problems with homologous overextension alignments regions under convergent evolution. Here, we test the ability profile hidden Markov model method HMMER3 correctly assign >13 000 manually curated families Pfam database. We identify problem...
InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and predict the presence of important domains sites. InterProScan underlying software that allows both nucleic acid be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with associated software, including addition two new databases (SFLD CDD), functionality include residue-level annotation...
InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D PANTHER. The latter two new member databases have been since last publication in this journal. There several developments InterPro, including additional reading field, database links, extensions to web interface match XML files. has always provided matches UniProtKB proteins on...
Four distinct Plasmodium species are known to regularly infect humans: falciparum, P. vivax, malariae and ovale. The genome sequence of the cause most severe type human malaria, was completed in 2002 at same time as mosquito vector, Anopheles gambiae. In this week's Nature, which focuses on malaria parasite, two further sequences described. First that contributes significant numbers incidence humans, though contrast resulting disease is usually not fatal. rather neglected presented together...
Approximately 5% of Pfam families are enzymatic, but only a small fraction the sequences within these (<0.5%) have had residues responsible for catalysis determined. To increase active site annotations in database, we developed strict set rules, chosen to reduce rate false positives, which enable transfer experimentally determined residue data other same family. We created large database predicted residues. On comparing our predictions those found UniProtKB, Catalytic Site Atlas, PROSITE and...
Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing in previous NAR paper, we have substantially extended improved the resource. We annotated representatives from Pfam families to improve coverage of diverse sequences added fast sequence search website allow users find Genome3D-annotated similar their own. data, enlarging source data set three model organisms 10, adding VIVACE, new...
QSAR models for a diverse set of compounds cytochrome P450 1A2 inhibition have been produced using 4 statistical approaches; partial least squares (PLS), multiple linear regression (MLR), classification and trees (CART), bayesian neural networks (BNN). The complement one another identified the following descriptors as important features CYP1A2 inhibition; lipophilicity, aromaticity, charge, HOMO/LUMO energies. Furthermore all are global used to predict independent compounds. For first time...
Understanding tumor heterogeneity is a major challenge that was recognized as one of the first Cancer Grand Challenges, with call to provide solutions visualize heterogeneity. The Rosetta team took on this challenge, exploiting advances in spatial-omics approaches centered around mass spectrometry imaging map at cellular and molecular scales different levels resolution. See related article by Bressan et al., p. 16 Stratton 22 Bhattacharjee 28.
<div>Summary:<p>Understanding tumor heterogeneity is a major challenge that was recognized as one of the first Cancer Grand Challenges, with call to provide solutions visualize heterogeneity. The Rosetta team took on this challenge, exploiting advances in spatial-omics approaches centered around mass spectrometry imaging map at cellular and molecular scales different levels resolution.</p><p><a...
<p>Cancer Research UK Rosetta Grand Challenge consortium list of members</p>
<p>Cancer Research UK Rosetta Grand Challenge consortium list of members</p>
It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far have progressed this endeavour. Ninety per cent proteome matched at least one 5494 manually curated Pfam-A families. In contrast, residue coverage by families was <45%, with 9418 automatically generated Pfam-B adding further 10%. Even after excluding predicted signal peptide regions and short (<50 consecutive residues) unlikely harbour new families, for...
SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The causes infectious disease COVID-19. biology coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly have only recently developed as rapid reaction to need fast detection, understanding, and treatment To control ongoing COVID-19 pandemic, it utmost importance get insight into evolution pathogenesis virus. In this review, we cover workflows...
High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in Protein Data Bank (PDB), repository all publicly available protein structures, continues increase, with more than 8000 structures released 2012 alone. authors this article have studied coverage protein-sequence space has changed over time by monitoring Pfam families that acquired their first representative structure each year from 1976 2012. Twenty years ago,...