NFDI4DS | UHH-SEMS - Publication Details

Jaina Mistry

ORCID: 0000-0003-2479-5322

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5024885643

Research Areas

Genomics and Phylogenetic Studies
Advanced Proteomics Techniques and Applications
Machine Learning in Bioinformatics
Bioinformatics and Genomic Networks
Genetics, Bioinformatics, and Biomedical Research
RNA and protein synthesis mechanisms
Computational Drug Discovery Methods
Enzyme Structure and Function
Protein Structure and Dynamics
SARS-CoV-2 and COVID-19 Research
vaccines and immunoinformatics approaches
Microbial Community Ecology and Physiology
Algorithms and Data Compression
Biotechnology and Related Fields
COVID-19 diagnosis using AI
Saffron Plant Research Studies
Analytical Chemistry and Chromatography
Cell Image Analysis Techniques
Enzyme Catalysis and Immobilization
Drug Transport and Resistance Mechanisms
Forensic and Genetic Research
Mosquito-borne diseases and control
Metabolism, Diabetes, and Cancer
Scientific Computing and Data Management
Cerebral Venous Sinus Thrombosis

Cambridge University Hospitals NHS Foundation Trust
2023

European Bioinformatics Institute
2008-2020

Wellcome Trust
2013-2018

University of California, San Diego
2015

Wellcome Sanger Institute
2007-2013

Science for Life Laboratory
2013

Howard Hughes Medical Institute
2007-2009

Stockholm University
2007-2009

University College London
2008

Université Claude Bernard Lyon 1
2008

Pfam: the protein families database

OPENALEX - Publications

ROBERT FINN Alex Bateman Jody Clements Penelope Coggill Ruth Y. Eberhardt and 8 more

Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries current release, version 27.0. Since last update article 2 years ago, we have generated 1182 new families maintained sequence coverage UniProt Knowledgebase (UniProtKB) at nearly 80%, despite 50% increase size underlying database. our 2012 describing also undertaken comprehensive review features that...

10.1093/nar/gkt1223 article EN cc-by Nucleic Acids Research 2013-11-27

The Pfam protein families database: towards a more sustainable future

OPENALEX - Publications

ROBERT FINN Penelope Coggill Ruth Y. Eberhardt Sean R. Eddy Jaina Mistry and 8 more

In the last two years Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce effort involved in making release, thereby permitting more frequent releases. Arguably most significant of these changes is that now primarily based on UniProtKB reference proteomes, with counts matched sequences and species reported website restricted this smaller set. Building families proteomes brings greater stability, which decreases amount manual curation required maintain...

10.1093/nar/gkv1344 article EN cc-by-nc Nucleic Acids Research 2015-12-15

Pfam: The protein families database in 2021

OPENALEX - Publications

Jaina Mistry Sara Chuguransky Lowri Williams Matloob Qureshi Gustavo A Salazar and 7 more

Abstract The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since was last described in this journal, over 350 new have been added 33.1 numerous improvements made to existing entries. To facilitate research on COVID-19, we revised the entries that cover SARS-CoV-2 proteome, built regions were not covered by Pfam. We reintroduced Pfam-B which provides an automatically generated supplement contains 136 730 novel clusters of are yet matched...

10.1093/nar/gkaa913 article EN cc-by Nucleic Acids Research 2020-10-06

The Pfam protein families database in 2019

OPENALEX - Publications

Sara El-Gebali Jaina Mistry Alex Bateman Sean R. Eddy Aurélien Luciani and 11 more

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). number of families has grown substantially to a total 17,929 release 32.0. New additions been coupled with efforts improve existing families, including refinement domain boundaries, their classification into clans, as well functional annotation. We recently began collaborate the RepeatsDB resource definition tandem repeat within Pfam. carried out comparison structural database, namely Evolutionary...

10.1093/nar/gky995 article EN cc-by Nucleic Acids Research 2018-10-09

The Pfam protein families database

OPENALEX - Publications

Marco Punta Penny Coggill Ruth Y. Eberhardt Jaina Mistry John Tate and 11 more

Pfam is a widely used database of protein families, currently containing more than 13,000 manually curated families as release 26.0. available via servers in the UK (http://pfam.sanger.ac.uk/), USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over last 2 years, generated 1840 new increased coverage UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, taken step opening up annotation...

10.1093/nar/gkr1065 article EN Nucleic Acids Research 2011-11-29

The Pfam protein families database

OPENALEX - Publications

ROBERT FINN Jaina Mistry John Tate Penny Coggill Andreas Heger and 9 more

Pfam is a widely used database of protein families and domains. This article describes set major updates that we have implemented in the latest release (version 24.0). The most important change now use HMMER3, version popular profile hidden Markov model package. software ∼100 times faster than HMMER2 more sensitive due to routine forward algorithm. move HMMER3 has necessitated numerous changes are described detail. 24.0 contains 11 912 families, which large number been significantly updated...

10.1093/nar/gkp985 article EN cc-by-nc Nucleic Acids Research 2009-11-17

InterPro: the integrative protein signature database

OPENALEX - Publications

Sarah Hunter Rolf Apweiler Teresa K. Attwood Amos Bairoch Alex Bateman and 33 more

The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY TIGRFAMs. Integration is performed manually approximately half of the total 58,000 signatures available in databases belong to an entry. Recently, we have started also display remaining un-integrated via our web...

10.1093/nar/gkn785 article EN cc-by-nc Nucleic Acids Research 2008-10-21

The Pfam protein families database

OPENALEX - Publications

ROBERT FINN John Tate Jaina Mistry Penny Coggill Stephen‐John Sammut and 6 more

Pfam is a comprehensive collection of protein domains and families, represented as multiple sequence alignments profile hidden Markov models. The current release (22.0) contains 9318 families. now based not only on the UniProtKB database, but also NCBI GenPept sequences from selected metagenomics projects. available web consortium members using new, consistent improved website design in UK ( http://pfam.sanger.ac.uk/ ), USA http://pfam.janelia.org/ ) Sweden http://pfam.sbc.su.se/ well mirror...

10.1093/nar/gkm960 article EN cc-by-nc Nucleic Acids Research 2007-11-26

Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions

OPENALEX - Publications

Jaina Mistry ROBERT FINN Sean R. Eddy Alex Bateman Marco Punta

Detection of protein homology via sequence similarity has important applications in biology, from structure and function prediction to reconstruction phylogenies. Although current methods for aligning sequences are powerful, challenges remain, including problems with homologous overextension alignments regions under convergent evolution. Here, we test the ability profile hidden Markov model method HMMER3 correctly assign >13 000 manually curated families Pfam database. We identify problem...

10.1093/nar/gkt263 article EN cc-by Nucleic Acids Research 2013-04-17

InterPro in 2017—beyond protein family and domain annotations

OPENALEX - Publications

ROBERT FINN Teresa K. Attwood Patricia C. Babbitt Alex Bateman Peer Bork and 42 more

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and predict the presence of important domains sites. InterProScan underlying software that allows both nucleic acid be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with associated software, including addition two new databases (SFLD CDD), functionality include residue-level annotation...

10.1093/nar/gkw1107 article EN cc-by Nucleic Acids Research 2016-10-27

New developments in the InterPro database

OPENALEX - Publications

Nicola Mulder Rolf Apweiler Teresa K. Attwood Amos Bairoch Alex Bateman and 40 more

InterPro is an integrated resource for protein families, domains and functional sites, which integrates the following signature databases: PROSITE, PRINTS, ProDom, Pfam, SMART, TIGRFAMs, PIRSF, SUPERFAMILY, Gene3D PANTHER. The latter two new member databases have been since last publication in this journal. There several developments InterPro, including additional reading field, database links, extensions to web interface match XML files. has always provided matches UniProtKB proteins on...

10.1093/nar/gkl841 article EN cc-by-nc Nucleic Acids Research 2007-01-03

The genome of the simian and human malaria parasite Plasmodium knowlesi

OPENALEX - Publications

Arnab Pain Ulrike Böhme Andrew Berry Karen Mungall ROBERT FINN and 49 more

Four distinct Plasmodium species are known to regularly infect humans: falciparum, P. vivax, malariae and ovale. The genome sequence of the cause most severe type human malaria, was completed in 2002 at same time as mosquito vector, Anopheles gambiae. In this week's Nature, which focuses on malaria parasite, two further sequences described. First that contributes significant numbers incidence humans, though contrast resulting disease is usually not fatal. rather neglected presented together...

10.1038/nature07306 article EN cc-by-nc-sa Nature 2008-10-01

Predicting active site residue annotations in the Pfam database

OPENALEX - Publications

Jaina Mistry Alex Bateman ROBERT FINN

Approximately 5% of Pfam families are enzymatic, but only a small fraction the sequences within these (<0.5%) have had residues responsible for catalysis determined. To increase active site annotations in database, we developed strict set rules, chosen to reduce rate false positives, which enable transfer experimentally determined residue data other same family. We created large database predicted residues. On comparing our predictions those found UniProtKB, Catalytic Site Atlas, PROSITE and...

10.1186/1471-2105-8-298 article EN cc-by BMC Bioinformatics 2007-08-09

Genome3D: exploiting structure to help users understand their sequences

OPENALEX - Publications

Tony E. Lewis Ian Sillitoe Antonina Andreeva Tom L. Blundell Daniel Buchan and 19 more

Genome3D (http://www.genome3d.eu) is a collaborative resource that provides predicted domain annotations and structural models for key sequences. Since introducing in previous NAR paper, we have substantially extended improved the resource. We annotated representatives from Pfam families to improve coverage of diverse sequences added fast sequence search website allow users find Genome3D-annotated similar their own. data, enlarging source data set three model organisms 10, adding VIVACE, new...

10.1093/nar/gku973 article EN cc-by Nucleic Acids Research 2014-10-27

A Rapid Computational Filter for Cytochrome P450 1A2 Inhibition Potential of Compound Libraries

OPENALEX - Publications

Kamaldeep K. Chohan Stuart W. Paine Jaina Mistry Patrick Barton A. M. Davis

QSAR models for a diverse set of compounds cytochrome P450 1A2 inhibition have been produced using 4 statistical approaches; partial least squares (PLS), multiple linear regression (MLR), classification and trees (CART), bayesian neural networks (BNN). The complement one another identified the following descriptors as important features CYP1A2 inhibition; lipophilicity, aromaticity, charge, HOMO/LUMO energies. Furthermore all are global used to predict independent compounds. For first time...

10.1021/jm048959a article EN Journal of Medicinal Chemistry 2005-07-12

Visualizing Cancer Heterogeneity at the Molecular and Cellular Levels: Lessons from Rosetta

OPENALEX - Publications

Richard J. A. Goodwin John F. Marshall George Poulogiannis Mariia Yuneva Kevin M. Brindle and 92 more

Understanding tumor heterogeneity is a major challenge that was recognized as one of the first Cancer Grand Challenges, with call to provide solutions visualize heterogeneity. The Rosetta team took on this challenge, exploiting advances in spatial-omics approaches centered around mass spectrometry imaging map at cellular and molecular scales different levels resolution. See related article by Bressan et al., p. 16 Stratton 22 Bhattacharjee 28.

10.1158/2159-8290.cd-24-0016 article EN Cancer Discovery 2025-01-13

Data from Visualizing Cancer Heterogeneity at the Molecular and Cellular Levels: Lessons from Rosetta

OPENALEX - Publications

Richard J. A. Goodwin John F. Marshall George Poulogiannis Mariia Yuneva Kevin M. Brindle and 90 more

<div>Summary:Understanding tumor heterogeneity is a major challenge that was recognized as one of the first Cancer Grand Challenges, with call to provide solutions visualize heterogeneity. The Rosetta team took on this challenge, exploiting advances in spatial-omics approaches centered around mass spectrometry imaging map at cellular and molecular scales different levels resolution.<a...

10.1158/2159-8290.c.7623370 preprint EN 2025-01-13

List of Consortium Members from Visualizing Cancer Heterogeneity at the Molecular and Cellular Levels: Lessons from Rosetta

OPENALEX - Publications

Richard J. A. Goodwin John F. Marshall George Poulogiannis Mariia Yuneva Kevin M. Brindle and 90 more

Cancer Research UK Rosetta Grand Challenge consortium list of members

10.1158/2159-8290.28193758 preprint EN cc-by 2025-01-13

List of Consortium Members from Visualizing Cancer Heterogeneity at the Molecular and Cellular Levels: Lessons from Rosetta

OPENALEX - Publications

Richard J. A. Goodwin John F. Marshall George Poulogiannis Mariia Yuneva Kevin M. Brindle and 90 more

Cancer Research UK Rosetta Grand Challenge consortium list of members

10.1158/2159-8290.28229000 preprint EN 2025-01-17

Pfam

OPENALEX - Publications

Jaina Mistry ROBERT FINN

10.1007/978-1-59745-515-2_4 article EN Methods in molecular biology 2007-01-01

The challenge of increasing Pfam coverage of the human proteome

OPENALEX - Publications

Jaina Mistry Penny Coggill Ruth Y. Eberhardt Antonio Deiana Andrea Giansanti and 3 more

It is a worthy goal to completely characterize all human proteins in terms of their domains. Here, using the Pfam database, we asked how far have progressed this endeavour. Ninety per cent proteome matched at least one 5494 manually curated Pfam-A families. In contrast, residue coverage by families was <45%, with 9418 automatically generated Pfam-B adding further 10%. Even after excluding predicted signal peptide regions and short (<50 consecutive residues) unlikely harbour new families, for...

10.1093/database/bat023 article EN cc-by-nc Database 2013-01-01

Computational Strategies to Combat COVID-19: Useful Tools to Accelerate SARS-CoV-2 and Coronavirus Research

OPENALEX - Publications

Franziska Hufsky Kevin Lamkiewicz Alexandre Almeida Abdel Aouacheria Cecilia Arighi and 50 more

SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) is a novel virus of the family Coronaviridae. The causes infectious disease COVID-19. biology coronaviruses has been studied for many years. However, bioinformatics tools designed explicitly have only recently developed as rapid reaction to need fast detection, understanding, and treatment To control ongoing COVID-19 pandemic, it utmost importance get insight into evolution pathogenesis virus. In this review, we cover workflows...

10.20944/preprints202005.0376.v1 preprint EN 2020-05-23

An estimated 5% of new protein structures solved today represent a new Pfam family

OPENALEX - Publications

Jaina Mistry Edda Kloppmann Burkhard Rost Marco Punta

High-resolution structural knowledge is key to understanding how proteins function at the molecular level. The number of entries in Protein Data Bank (PDB), repository all publicly available protein structures, continues increase, with more than 8000 structures released 2012 alone. authors this article have studied coverage protein-sequence space has changed over time by monitoring Pfam families that acquired their first representative structure each year from 1976 2012. Twenty years ago,...

10.1107/s0907444913027157 article EN cc-by Acta Crystallographica Section D Biological Crystallography 2013-10-11

Coming Soon ...