Johannes Söding

ORCID: 0000-0001-9642-8244
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Genomics and Phylogenetic Studies
  • RNA and protein synthesis mechanisms
  • Protein Structure and Dynamics
  • Machine Learning in Bioinformatics
  • Genomics and Chromatin Dynamics
  • RNA Research and Splicing
  • Bacteriophages and microbial interactions
  • RNA modifications and cancer
  • Microbial Community Ecology and Physiology
  • Enzyme Structure and Function
  • Cold Atom Physics and Bose-Einstein Condensates
  • Bioinformatics and Genomic Networks
  • Advanced Proteomics Techniques and Applications
  • Glycosylation and Glycoproteins Research
  • Gene expression and cancer classification
  • Atomic and Subatomic Physics Research
  • Bacterial Genetics and Biotechnology
  • Orbital Angular Momentum in Optics
  • Mechanical and Optical Resonators
  • Quantum, superfluid, helium dynamics
  • CRISPR and Genetic Engineering
  • Quantum optics and atomic interactions
  • Genetic Associations and Epidemiology
  • Protist diversity and phylogeny
  • Genetics, Bioinformatics, and Biomedical Research

University of Göttingen
2020-2025

Max Planck Institute for Multidisciplinary Sciences
2022-2025

Max Planck Institute for Biophysical Chemistry
2014-2024

Weizmann Institute of Science
2024

Seoul National University
2024

Tissue Dynamics (Israel)
2024

Max Planck Society
2006-2021

Ludwig-Maximilians-Universität München
2008-2016

Center for Integrated Protein Science Munich
2008-2016

Max Planck Institute for Developmental Biology
2004-2015

HHpred is a fast server for remote protein homology detection and structure prediction the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows search wide choice databases, such as PDB, SCOP, Pfam, SMART, COGs CDD. accepts single query sequence or multiple alignment input. Within only few minutes it returns results in user-friendly format similar that PSI-BLAST. Search options include local global scoring secondary similarity. can produce query-template...

10.1093/nar/gki408 article EN Nucleic Acids Research 2005-06-26

Abstract Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction evolution. Results: We have generalized sequences with a profile hidden Markov model (HMM) to case pairwise HMMs. present method for detecting distant homologous relationships between proteins based on this approach. The (HHsearch) is benchmarked together BLAST, PSI-BLAST, HMMER profile–profile comparison tools PROF_SIM COMPASS, in an all-against-all...

10.1093/bioinformatics/bti125 article EN Bioinformatics 2004-11-05

Abstract As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the query against database by describing tertiary amino acid interactions within proteins as sequences over structural alphabet. decreases computation times four to five orders magnitude with 86%, 88% and 133% sensitivities Dali, TM-align CE, respectively.

10.1038/s41587-023-01773-0 article EN cc-by Nature Biotechnology 2023-05-08

HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple alignments homologous proteins.We developed single-instruction multiple-data (SIMD) vectorized implementation the Viterbi algorithm HMM introduced various other speed-ups. These accelerated search methods HHsearch by factor 4 HHblits 2 over previous version 2.0.16. HHblits3...

10.1186/s12859-019-3019-7 article EN cc-by BMC Bioinformatics 2019-09-14

Metagenomic datasets contain billions of protein sequences that could greatly enhance large-scale functional annotation and structure prediction. Utilizing this enormous resource would require reducing its redundancy by similarity clustering. However, clustering hundreds millions is impractical using current algorithms because their runtimes scale as the input set size N times number clusters K, which typically similar order N, resulting in increase almost quadratically with N. We developed...

10.1038/s41467-018-04964-5 article EN cc-by Nature Communications 2018-06-25

We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and databases of multiple alignments (MSAs), Uniboost10, Uniboost20 Uniboost30, as a resource for analysis, function prediction searches. The Uniclust cluster UniProtKB sequences at the level 90%, 50% 30% pairwise identity. Uniclust90 Uniclust50 clusters showed better consistency functional annotation than those UniRef90 UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2...

10.1093/nar/gkw1081 article EN cc-by Nucleic Acids Research 2016-11-01

The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) provides interactive access to a wide range of the best-performing bioinformatics tools and databases, including state-of-the-art protein sequence comparison methods HHblits HHpred. currently includes 35 external in-house tools, covering functionalities such as similarity searching, prediction features, classification. Due this breadth functionality, tight interconnection its constituent ease use, has become an important...

10.1002/cpbi.108 article EN cc-by Current Protocols in Bioinformatics 2020-12-01

The MMseqs2 desktop and web server app facilitates interactive sequence searches through custom protein profile databases on personal workstations. By eliminating MMseqs2's runtime overhead, we reduced response times to a few seconds at sensitivities close BLAST.The is easy install for non-experts. GPLv3-licensed code, pre-built packages Windows, MacOS Linux, Docker images the application demo are available https://search.mmseqs.com.Supplementary data Bioinformatics online.

10.1093/bioinformatics/bty1057 article EN cc-by Bioinformatics 2019-01-04

Abstract Automated protein structure prediction is becoming a mainstream tool for biological research. This has been fueled by steady improvements of publicly available automated servers over the last decade, in particular their ability to build good homology models an increasing number targets reliably detecting and aligning more remotely homologous templates. Here, we describe three fully versions HHpred server that participated community‐wide blind competition CASP8. What makes unique...

10.1002/prot.22499 article EN Proteins Structure Function and Bioinformatics 2009-01-01

Abstract Motivation : Recent breakthroughs in protein residue–residue contact prediction have made reliable de novo of structures possible. The key was to apply statistical methods that can distinguish direct couplings between pairs columns a multiple sequence alignment from merely correlated pairs, i.e. separate indirect effects. Two classes such exist, either relying on regularized inversion the covariance matrix or pseudo-likelihood maximization (PLM). Although PLM-based offer clearly...

10.1093/bioinformatics/btu500 article EN cc-by Bioinformatics 2014-07-26

The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) internally HHpred, HHblits, PCOILS). While beta version the was released 10 years ago, current production-level release has been available since 2008 serviced...

10.1093/nar/gkw348 article EN cc-by-nc Nucleic Acids Research 2016-04-29

As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the query against database by describing amino acid backbone proteins as sequences over structural alphabet. decreases computation times four to five orders magnitude with 86%, 88% and 133% sensitivities DALI, TM-align CE, respectively.

10.1101/2022.02.07.479398 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-02-09

WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over best current tool, WisH shows much improved on sequences a few kbp length and runs hundreds times faster, making it suited metagenomics studies.OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish.clovis.galiez@mpibpc.mpg.de or soeding@mpibpc.mpg.de.Supplementary data are...

10.1093/bioinformatics/btx383 article EN cc-by Bioinformatics 2017-07-11

The seemingly limitless diversity of proteins in nature arose from only a few thousand domain prototypes, but the origin these themselves has remained unclear. We are pursuing hypothesis that they by fusion and accretion an ancestral set peptides active as co-factors RNA-dependent replication catalysis. Should this be true, contemporary domains may still contain vestiges such peptides, which could reconstructed comparative approach same way ancient vocabularies have been study modern...

10.7554/elife.09410 article EN cc-by eLife 2015-12-14

Abstract Cells form and use biomolecular condensates to execute biochemical reactions. The molecular properties of non-membrane-bound are directly connected the amino acid content disordered protein regions. Lysine plays an important role in cellular function, but little is known about its condensation. Here we show that disorder abundant protein/RNA granules lysine enriched regions proteins P-bodies compared entire human proteome. Lysine-rich polypeptides phase separate into...

10.1038/s41467-019-10792-y article EN cc-by Nature Communications 2019-07-02

Abstract Motivation: Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence by similarity clustering improves speed and sensitivity iterative searches. But tools cannot efficiently cluster size UniProt to 50% maximum pairwise identity or below. Furthermore, in metagenomics experiments typically large fractions reads be matched any known anymore because searching with sensitive but relatively slow (e.g. BLAST HMMER3) through...

10.1093/bioinformatics/btw006 article EN Bioinformatics 2016-01-06

Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity organisms without need for prior cultivation. Unicellular eukaryotes play essential roles most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, parasites plants animals. Investigating therefore great interest ecology, biotechnology, human health,...

10.1186/s40168-020-00808-x article EN cc-by Microbiome 2020-04-03

MMseqs2 taxonomy is a new tool to assign taxonomic labels metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute annotation, assigns them with robust and determines the contig's identity by weighted voting. Its fragment extraction step suitable for analysis of domains life. 2-18× faster than state-of-the-art tools also contains modules creating manipulating reference databases as well reporting visualizing...

10.1093/bioinformatics/btab184 article EN cc-by Bioinformatics 2021-03-16

RECQL5 is the sole member of RECQ family helicases associated with RNA polymerase II (RNAPII). We now show that a general elongation factor important for preserving genome stability during transcription. Depletion or overexpression results in corresponding shifts genome-wide RNAPII density profile. Elongation particularly affected, depletion causing striking increase average rate, concurrent increased stalling, pausing, arrest, and/or backtracking (transcription stress). therefore controls...

10.1016/j.cell.2014.03.048 article EN cc-by Cell 2014-05-01
Coming Soon ...