Stephen Anyango

ORCID: 0000-0003-4838-443X
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Enzyme Structure and Function
  • Protein Structure and Dynamics
  • Advanced Proteomics Techniques and Applications
  • Genomics and Phylogenetic Studies
  • Microbial Metabolic Engineering and Bioproduction
  • RNA and protein synthesis mechanisms
  • Bioinformatics and Genomic Networks
  • Genetics, Bioinformatics, and Biomedical Research
  • RNA modifications and cancer
  • Cell Image Analysis Techniques
  • Biofuel production and bioconversion
  • Computational Drug Discovery Methods
  • RNA Research and Splicing
  • Machine Learning in Bioinformatics

European Bioinformatics Institute
2017-2025

The AlphaFold Protein Structure Database (AlphaFold DB, https://alphafold.ebi.ac.uk) is an openly accessible, extensive database of high-accuracy protein-structure predictions. Powered by v2.0 DeepMind, it has enabled unprecedented expansion the structural coverage known protein-sequence space. DB provides programmatic access to and interactive visualization predicted atomic coordinates, per-residue pairwise model-confidence estimates aligned errors. initial release contains over 360,000...

10.1093/nar/gkab1061 article EN cc-by Nucleic Acids Research 2021-10-19

The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data biological macromolecules. Since 2003, PDB has been managed by Worldwide (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination 3D macromolecular data. Core Archive houses atomic coordinates more than 144 000 structural models proteins, DNA/RNA, their complexes with metals small...

10.1093/nar/gky949 article EN cc-by Nucleic Acids Research 2018-10-05

Abstract The Protein Data Bank in Europe (PDBe), a founding member of the Worldwide (wwPDB), actively participates deposition, curation, validation, archiving and dissemination macromolecular structure data. PDBe supports diverse research communities their use structures by enriching PDB data providing advanced tools services for effective access, visualization analysis. This paper details enrichment at PDBe, including mapping RNA to Rfam, identification molecules that act as cofactors. has...

10.1093/nar/gkz990 article EN cc-by Nucleic Acids Research 2019-10-25

Abstract The Protein Data Bank in Europe-Knowledge Base (PDBe-KB, https://pdbe-kb.org) is a community-driven, collaborative resource for literature-derived, manually curated and computationally predicted structural functional annotations of macromolecular structure data, contained the (PDB). goal PDBe-KB two-fold: (i) to increase visibility reduce fragmentation contributed by specialist data resources, make these more findable, accessible, interoperable reusable (FAIR) (ii) place their...

10.1093/nar/gkz853 article EN cc-by Nucleic Acids Research 2019-10-01

The Protein Data Bank in Europe - Knowledge Base (PDBe-KB, https://pdbe-kb.org) is an open collaboration between world-leading specialist data resources contributing functional and biophysical annotations derived from or relevant to the (PDB). goal of PDBe-KB place macromolecular structure their biological context by developing standardised exchange formats integrating partner into a knowledge graph that can provide valuable insights. Since we described 2019, there have been significant...

10.1093/nar/gkab988 article EN cc-by Nucleic Acids Research 2021-10-15

Studying protein dynamics and conformational heterogeneity is crucial for understanding biomolecular systems treating disease. Despite the deposition of over 215 000 macromolecular structures in Protein Data Bank advent AI-based structure prediction tools such as AlphaFold2, RoseTTAFold, ESMFold, static representations are typically produced, which fail to fully capture motion. Here, we discuss importance integrating experimental with computational clustering explore landscapes that manifest...

10.1063/4.0000251 article EN cc-by Structural Dynamics 2024-05-01

The Protein Data Bank in Europe (PDBe, pdbe.org) is actively engaged the deposition, annotation, remediation, enrichment and dissemination of macromolecular structure data. This paper describes new developments improvements at PDBe addressing three challenging areas: data enrichment, functional reusability. New features Web site are discussed, including a context dependent menu providing links to raw experimental improved presentation structures solved by hybrid methods. also summarizes...

10.1093/nar/gkx1070 article EN cc-by Nucleic Acids Research 2017-10-26

Abstract The archiving and dissemination of protein nucleic acid structures as well their structural, functional biophysical annotations is an essential task that enables the broader scientific community to conduct impactful research in multiple fields life sciences. Protein Data Bank Europe (PDBe; pdbe.org ) team develops maintains several databases web services address this fundamental need. From data a member Worldwide PDB consortium (wwPDB; wwpdb.org ), PDBe Knowledge Base (PDBe‐KB;...

10.1002/pro.4439 article EN cc-by Protein Science 2022-09-28

RNA secondary (2D) structure visualization is an essential tool for understanding function. R2DT a software package designed to visualize 2D structures in consistent, recognizable, and reproducible layouts. The latest release, 2.0, introduces multiple significant features, including the ability display position-specific information, such as single nucleotide polymorphisms or SHAPE reactivities. It also offers new template-free mode allowing of RNAs without pre-existing templates, alongside...

10.1093/nar/gkaf032 article EN cc-by Nucleic Acids Research 2025-01-14

While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, gap between number known protein sequences and experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational modeling approaches. powerful on own, most methods have strengths weaknesses. Therefore, it benefits researchers examine models various model providers perform comparative analysis...

10.1093/gigascience/giac118 article EN GigaScience 2022-01-01

The PDBe aggregated API is an open-access and open-source RESTful that provides programmatic access to a wealth of macromolecular structural data their functional biophysical annotations through 80+ endpoints. powered by the graph database (https://pdbe.org/graph-schema), integrative knowledge can be used as discovery tool answer complex biological questions.The up-to-date database, which has weekly releases with latest from Protein Data Bank, integrated updated UniProt, Pfam, CATH, SCOP...

10.1093/bioinformatics/btab424 article EN cc-by Bioinformatics 2021-06-02

More than 61,000 proteins have up-to-date correspondence between their amino acid sequence (UniProtKB) and 3D structures (PDB), enabled by the Structure Integration with Function, Taxonomy Sequences (SIFTS) resource. SIFTS incorporates residue-level annotations from many other biological resources. data is available in various formats like XML, CSV TSV format or also accessible via PDBe REST API but always maintained separately structure (PDBx/mmCIF file) PDB archive. Here, we extended wwPDB...

10.1038/s41597-023-02101-6 article EN cc-by Scientific Data 2023-04-12

While the Protein Data Bank (PDB) contains a wealth of structural information on ligands bound to macromolecules, their analysis can be challenging due large amount and diversity data. Here, we present PDBe CCDUtils, versatile toolkit for processing analysing small molecules from PDB in PDBx/mmCIF format. CCDUtils provides streamlined access all metadata offers set convenient methods compute various properties using RDKit, such as 2D depictions, 3D conformers, physicochemical properties,...

10.1186/s13321-023-00786-w article EN cc-by Journal of Cheminformatics 2023-12-02

Abstract Proteins, as molecular machines, are necessarily dynamic macromolecules that carry out essential cellular functions. Recognising their stable conformations is important for understanding the mechanisms of disease. While AI-based computational methods have enabled protein structure prediction, prediction dynamics remains a challenge. Here, we present deterministic pipeline clusters experimentally determined structures to comprehensively recognise conformational states across Protein...

10.1101/2023.07.13.545008 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-07-13

Abstract Background Biomacromolecular structural data outgrew the legacy Protein Data Bank (PDB) format which scientific community relied on for decades, yet use of its successor PDBx/Macromolecular Crystallographic Information File (PDBx/mmCIF) is still not widespread. Perhaps one reasons availability easy to tools that only support format, but also inherent difficulties processing mmCIF files correctly, given number edge cases make efficient parsing problematic. Nevertheless, fully exploit...

10.1186/s12859-021-04271-9 article EN cc-by BMC Bioinformatics 2021-07-23

ABSTRACT RNA secondary (2D) structure visualisation is an essential tool for understanding function. R2DT a software package designed to visualise 2D structures in consistent, recognisable, and reproducible layouts. The latest release, 2.0, introduces multiple significant features, including the ability display position-specific information, such as single nucleotide polymorphisms (SNPs) or SHAPE reactivities. It also offers new template-free mode allowing of RNAs without pre-existing...

10.1101/2024.09.29.611006 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-09-30

Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository experimentally determined structures of macromolecules. Structural data PDB offer valuable insights into dynamics, conformation, states biological assemblies. However, current annotation practices lack standardised naming conventions assemblies...

10.1038/s41597-023-02778-9 article EN cc-by Scientific Data 2023-12-01

Abstract More than 58,000 proteins have up-to-date correspondence between their amino acid sequence (UniProtKB) and 3D structures (PDB), enabled by the Structure Integration with Function, Taxonomy Sequences (SIFTS) resource. In addition to this fundamental mapping, SIFTS incorporates residue-level annotations from other biological resources such as Pfam, InterPro, SCOP, SCOP2, CATH, IntEnz, GO, PubMed, Ensembl, NCBI taxonomy database Homologene. The data is exported in XML format per...

10.1101/2022.08.10.503473 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2022-08-13

Abstract Macromolecular complexes are essential functional units in nearly all cellular processes, and their atomic-level understanding is critical for elucidating modulating molecular mechanisms. The Protein Data Bank (PDB) serves as the global repository experimentally determined structures of macromolecules. Structural data PDB offer valuable insights into dynamics, conformation, states biological assemblies. However, current annotation practices lack standardised naming conventions...

10.1101/2023.05.15.540692 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-05-15

Abstract While the Protein Data Bank (PDB) contains a wealth of structural information on ligands bound to macromolecules, their analysis can be challenging due large amount and diversity data. Here, we present PDBe CCDUtils, versatile toolkit for processing analysing small molecules from PDB in PDBx/mmCIF format. CCDUtils provides streamlined access all metadata offers set convenient methods compute various properties using RDKit, such as 2D depictions, 3D conformers, physicochemical...

10.1101/2023.08.04.552003 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2023-08-07

Summary: PDBImages is an innovative, open-source Node.js package that harnesses the power of popular macromolecule structure visualization software Mol*. Designed for use by scientific community, provides a means to generate high-quality images PDB and AlphaFold DB models. Its unique ability render save directly files in browserless mode sets it apart, offering users streamlined, automated process macromolecular visualization. Here, we detail implementation PDBImages, enumerating its diverse...

10.48550/arxiv.2308.00563 preprint EN cc-by arXiv (Cornell University) 2023-01-01

PDBImages is an innovative, open-source Node.js package that harnesses the power of popular macromolecule structure visualization software Mol*. Designed for use by scientific community, provides a means to generate high-quality images PDB and AlphaFold DB models. Its unique ability render save directly files in browserless mode sets it apart, offering users streamlined, automated process macromolecular visualization. Here, we detail implementation PDBImages, enumerating its diverse image...

10.1093/bioinformatics/btad744 article EN cc-by Bioinformatics 2023-12-01

Abstract While scientists can often infer the biological function of proteins from their 3-dimensional quaternary structures, gap between number known protein sequences and experimentally determined structures keeps increasing. A potential solution to this problem is presented by ever more sophisticated computational modelling approaches. powerful on own, most methods have strengths weaknesses. Therefore, it benefits researchers examine models various model providers perform comparative...

10.1101/2022.08.01.501973 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-08-03
Coming Soon ...