Jaina Mistry

ORCID: 0000-0003-2479-5322
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • Advanced Proteomics Techniques and Applications
  • Machine Learning in Bioinformatics
  • Bioinformatics and Genomic Networks
  • Genetics, Bioinformatics, and Biomedical Research
  • RNA and protein synthesis mechanisms
  • Computational Drug Discovery Methods
  • Enzyme Structure and Function
  • Protein Structure and Dynamics
  • SARS-CoV-2 and COVID-19 Research
  • vaccines and immunoinformatics approaches
  • Microbial Community Ecology and Physiology
  • Algorithms and Data Compression
  • Biotechnology and Related Fields
  • COVID-19 diagnosis using AI
  • Saffron Plant Research Studies
  • Analytical Chemistry and Chromatography
  • Cell Image Analysis Techniques
  • Enzyme Catalysis and Immobilization
  • Drug Transport and Resistance Mechanisms
  • Forensic and Genetic Research
  • Mosquito-borne diseases and control
  • Metabolism, Diabetes, and Cancer
  • Scientific Computing and Data Management
  • Cerebral Venous Sinus Thrombosis

Cambridge University Hospitals NHS Foundation Trust
2023

European Bioinformatics Institute
2008-2020

Wellcome Trust
2013-2018

University of California, San Diego
2015

Wellcome Sanger Institute
2007-2013

Science for Life Laboratory
2013

Howard Hughes Medical Institute
2007-2009

Stockholm University
2007-2009

University College London
2008

Université Claude Bernard Lyon 1
2008

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries current release, version 27.0. Since last update article 2 years ago, we have generated 1182 new families maintained sequence coverage UniProt Knowledgebase (UniProtKB) at nearly 80%, despite 50% increase size underlying database. our 2012 describing also undertaken comprehensive review features that...

10.1093/nar/gkt1223 article EN cc-by Nucleic Acids Research 2013-11-27

In the last two years Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce effort involved in making release, thereby permitting more frequent releases. Arguably most significant of these changes is that now primarily based on UniProtKB reference proteomes, with counts matched sequences and species reported website restricted this smaller set. Building families proteomes brings greater stability, which decreases amount manual curation required maintain...

10.1093/nar/gkv1344 article EN cc-by-nc Nucleic Acids Research 2015-12-15

Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since was last described in this journal, over 350 new have been added 33.1 numerous improvements made to existing entries. To facilitate research on COVID-19, we revised the entries that cover SARS-CoV-2 proteome, built regions were not covered by Pfam. We reintroduced Pfam-B which provides an automatically generated supplement contains 136 730 novel clusters of are yet matched...

10.1093/nar/gkaa913 article EN cc-by Nucleic Acids Research 2020-10-06

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). number of families has grown substantially to a total 17,929 release 32.0. New additions been coupled with efforts improve existing families, including refinement domain boundaries, their classification into clans, as well functional annotation. We recently began collaborate the RepeatsDB resource definition tandem repeat within Pfam. carried out comparison structural database, namely Evolutionary...

10.1093/nar/gky995 article EN cc-by Nucleic Acids Research 2018-10-09

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated families as release 26.0. available via servers in the UK (http://pfam.sanger.ac.uk/), USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over last 2 years, generated 1840 new increased coverage UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, taken step opening up annotation...

10.1093/nar/gkr1065 article EN Nucleic Acids Research 2011-11-29

Pfam is a widely used database of protein families and domains. This article describes set major updates that we have implemented in the latest release (version 24.0). The most important change now use HMMER3, version popular profile hidden Markov model package. software ∼100 times faster than HMMER2 more sensitive due to routine forward algorithm. move HMMER3 has necessitated numerous changes are described detail. 24.0 contains 11 912 families, which large number been significantly updated...

10.1093/nar/gkp985 article EN cc-by-nc Nucleic Acids Research 2009-11-17

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY TIGRFAMs. Integration is performed manually approximately half of the total 58,000 signatures available in databases belong to an entry. Recently, we have started also display remaining un-integrated via our web...

10.1093/nar/gkn785 article EN cc-by-nc Nucleic Acids Research 2008-10-21

Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments profile hidden Markov models. The current release (22.0) contains 9318 families. now based not only on the UniProtKB database, but also NCBI GenPept sequences from selected metagenomics projects. available web consortium members using new, consistent improved website design in UK ( http://pfam.sanger.ac.uk/ ), USA http://pfam.janelia.org/ ) Sweden http://pfam.sbc.su.se/ well mirror...

10.1093/nar/gkm960 article EN cc-by-nc Nucleic Acids Research 2007-11-26

Detection of protein homology via sequence similarity has important applications in biology, from structure and function prediction to reconstruction phylogenies. Although current methods for aligning sequences are powerful, challenges remain, including problems with homologous overextension alignments regions under convergent evolution. Here, we test the ability profile hidden Markov model method HMMER3 correctly assign >13 000 manually curated families Pfam database. We identify problem...

10.1093/nar/gkt263 article EN cc-by Nucleic Acids Research 2013-04-17

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and predict the presence of important domains sites. InterProScan underlying software that allows both nucleic acid be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with associated software, including addition two new databases (SFLD CDD), functionality include residue-level annotation...

10.1093/nar/gkw1107 article EN cc-by Nucleic Acids Research 2016-10-27

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D PANTHER. The latter two new member databases have been since last publication in this journal. There several developments InterPro, including additional reading field, database links, extensions to web interface match XML files. has always provided matches UniProtKB proteins on...

10.1093/nar/gkl841 article EN cc-by-nc Nucleic Acids Research 2007-01-03

Four distinct Plasmodium species are known to regularly infect humans: falciparum, P. vivax, malariae and ovale. The genome sequence of the cause most severe type human malaria, was completed in 2002 at same time as mosquito vector, Anopheles gambiae. In this week's Nature, which focuses on malaria parasite, two further sequences described. First that contributes significant numbers incidence humans, though contrast resulting disease is usually not fatal. rather neglected presented together...

10.1038/nature07306 article EN cc-by-nc-sa Nature 2008-10-01

Approximately 5% of Pfam families are enzymatic, but only a small fraction the sequences within these (<0.5%) have had residues responsible for catalysis determined. To increase active site annotations in database, we developed strict set rules, chosen to reduce rate false positives, which enable transfer experimentally determined residue data other same family. We created large database predicted residues. On comparing our predictions those found UniProtKB, Catalytic Site Atlas, PROSITE and...

10.1186/1471-2105-8-298 article EN cc-by BMC Bioinformatics 2007-08-09

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing in previous NAR paper, we have substantially extended improved the resource. We annotated representatives from Pfam families to improve coverage of diverse sequences added fast sequence search website allow users find Genome3D-annotated similar their own. data, enlarging source data set three model organisms 10, adding VIVACE, new...

10.1093/nar/gku973 article EN cc-by Nucleic Acids Research 2014-10-27

QSAR models for a diverse set of compounds cytochrome P450 1A2 inhibition have been produced using 4 statistical approaches; partial least squares (PLS), multiple linear regression (MLR), classification and trees (CART), bayesian neural networks (BNN). The complement one another identified the following descriptors as important features CYP1A2 inhibition; lipophilicity, aromaticity, charge, HOMO/LUMO energies. Furthermore all are global used to predict independent compounds. For first time...

10.1021/jm048959a article EN Journal of Medicinal Chemistry 2005-07-12
Richard J. A. Goodwin John F. Marshall George Poulogiannis Mariia Yuneva Kevin M. Brindle and 92 more Zoltán Takáts Owen J. Sansom Josephine Bunch Simon T. Barry Ala Al-Afeef Ala Amgheib Sassan M. Azarian Frederic Brochu-Williams Amy Burton Andrew D. Campbell Vanina Cristaudo Ali Anıl Demirçalı Alex Dexter Efstathios A. Elia Maria Fala Magali Garrett Avinash Ghanate Ian S. Gilmore Ariadna Gonzalez Paul Grant Amit Gupta Dipa Gurung Harry Hall Grégory Hamm Paolo Inglese Nahid Islam Stewart Jones Evdoxia Karali Emine Kazanç Hanifa Koguna Manas Kohli Hanifa Koquna Nikos Koundouros Peter Kreuzaler Melina Kyriazi Sharanpreet Lall David Y. Lewis Stephanie Ling Xavier Loizeau Dominika Luptáková Stefania Maneta-Stravrakaki Daniel McGill Daniel McGill James S. McKenzie G. McMahon Martin Metodiev Jaina Mistry Alvaro Perdones Monteiro Erica Montezuma Jennifer P. Morton Catherine Munteanu Teresa Murta Arafath K. Najumudeen Chandan Seth Nanda Ammar Nasif Chelsea J. Nikula Madelon Paauwe Petra Paiz Yulia Panina Robin Philip Liam Poynter Pamela Pruski Alan Race Jyotsna U. Rao Jack Richings Adele Savage Chandan Seth Nanda Shreya Sharma Renata Soares Dmitry A. Soloviev Amy R. Spicer-Hadlington Caroline Sproat Rory T. Steven David Sumpton Adam Taylor Spencer A. Thomas Daria Thompson Aurélien Tripp Thanasis Tsalikis Anastasia Tsyben Seyma Turseven Johan Vande Voorde Jean‐Luc Vorng Emma White Alan F. Wright Vincen Wu Yong‐Bing Xiang Bin Yan Jiang Zao Lucas Zeiger Junting Zhang Weiwei Zhou

Understanding tumor heterogeneity is a major challenge that was recognized as one of the first Cancer Grand Challenges, with call to provide solutions visualize heterogeneity. The Rosetta team took on this challenge, exploiting advances in spatial-omics approaches centered around mass spectrometry imaging map at cellular and molecular scales different levels resolution. See related article by Bressan et al., p. 16 Stratton 22 Bhattacharjee 28.

10.1158/2159-8290.cd-24-0016 article EN Cancer Discovery 2025-01-13

&lt;div&gt;Summary:&lt;p&gt;Understanding tumor heterogeneity is a major challenge that was recognized as one of the first Cancer Grand Challenges, with call to provide solutions visualize heterogeneity. The Rosetta team took on this challenge, exploiting advances in spatial-omics approaches centered around mass spectrometry imaging map at cellular and molecular scales different levels resolution.&lt;/p&gt;&lt;p&gt;&lt;a...

10.1158/2159-8290.c.7623370 preprint EN 2025-01-13

&lt;p&gt;Cancer Research UK Rosetta Grand Challenge consortium list of members&lt;/p&gt;

10.1158/2159-8290.28193758 preprint EN cc-by 2025-01-13

&lt;p&gt;Cancer Research UK Rosetta Grand Challenge consortium list of members&lt;/p&gt;

10.1158/2159-8290.28229000 preprint EN 2025-01-17

10.1007/978-1-59745-515-2_4 article EN Methods in molecular biology 2007-01-01

It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far have progressed this endeavour. Ninety per cent proteome matched at least one 5494 manually curated Pfam-A families. In contrast, residue coverage by families was <45%, with 9418 automatically generated Pfam-B adding further 10%. Even after excluding predicted signal peptide regions and short (<50 consecutive residues) unlikely harbour new families, for...

10.1093/database/bat023 article EN cc-by-nc Database 2013-01-01

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The causes infectious disease COVID-19. biology coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly have only recently developed as rapid reaction to need fast detection, understanding, and treatment To control ongoing COVID-19 pandemic, it utmost importance get insight into evolution pathogenesis virus. In this review, we cover workflows...

10.20944/preprints202005.0376.v1 preprint EN 2020-05-23

High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in Protein Data Bank (PDB), repository all publicly available protein structures, continues increase, with more than 8000 structures released 2012 alone. authors this article have studied coverage protein-sequence space has changed over time by monitoring Pfam families that acquired their first representative structure each year from 1976 2012. Twenty years ago,...

10.1107/s0907444913027157 article EN cc-by Acta Crystallographica Section D Biological Crystallography 2013-10-11
Coming Soon ...