- Genomics and Phylogenetic Studies
- RNA and protein synthesis mechanisms
- Protein Structure and Dynamics
- Machine Learning in Bioinformatics
- Genomics and Chromatin Dynamics
- RNA Research and Splicing
- Bacteriophages and microbial interactions
- RNA modifications and cancer
- Microbial Community Ecology and Physiology
- Enzyme Structure and Function
- Cold Atom Physics and Bose-Einstein Condensates
- Bioinformatics and Genomic Networks
- Advanced Proteomics Techniques and Applications
- Glycosylation and Glycoproteins Research
- Gene expression and cancer classification
- Atomic and Subatomic Physics Research
- Bacterial Genetics and Biotechnology
- Orbital Angular Momentum in Optics
- Mechanical and Optical Resonators
- Quantum, superfluid, helium dynamics
- CRISPR and Genetic Engineering
- Quantum optics and atomic interactions
- Genetic Associations and Epidemiology
- Protist diversity and phylogeny
- Genetics, Bioinformatics, and Biomedical Research
University of Göttingen
2020-2025
Max Planck Institute for Multidisciplinary Sciences
2022-2025
Max Planck Institute for Biophysical Chemistry
2014-2024
Weizmann Institute of Science
2024
Seoul National University
2024
Tissue Dynamics (Israel)
2024
Max Planck Society
2006-2021
Ludwig-Maximilians-Universität München
2008-2016
Center for Integrated Protein Science Munich
2008-2016
Max Planck Institute for Developmental Biology
2004-2015
HHpred is a fast server for remote protein homology detection and structure prediction the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows search wide choice databases, such as PDB, SCOP, Pfam, SMART, COGs CDD. accepts single query sequence or multiple alignment input. Within only few minutes it returns results in user-friendly format similar that PSI-BLAST. Search options include local global scoring secondary similarity. can produce query-template...
Abstract Motivation: Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction evolution. Results: We have generalized sequences with a profile hidden Markov model (HMM) to case pairwise HMMs. present method for detecting distant homologous relationships between proteins based on this approach. The (HHsearch) is benchmarked together BLAST, PSI-BLAST, HMMER profile–profile comparison tools PROF_SIM COMPASS, in an all-against-all...
Abstract As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the query against database by describing tertiary amino acid interactions within proteins as sequences over structural alphabet. decreases computation times four to five orders magnitude with 86%, 88% and 133% sensitivities Dali, TM-align CE, respectively.
HH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple alignments homologous proteins.We developed single-instruction multiple-data (SIMD) vectorized implementation the Viterbi algorithm HMM introduced various other speed-ups. These accelerated search methods HHsearch by factor 4 HHblits 2 over previous version 2.0.16. HHblits3...
Metagenomic datasets contain billions of protein sequences that could greatly enhance large-scale functional annotation and structure prediction. Utilizing this enormous resource would require reducing its redundancy by similarity clustering. However, clustering hundreds millions is impractical using current algorithms because their runtimes scale as the input set size N times number clusters K, which typically similar order N, resulting in increase almost quadratically with N. We developed...
We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and databases of multiple alignments (MSAs), Uniboost10, Uniboost20 Uniboost30, as a resource for analysis, function prediction searches. The Uniclust cluster UniProtKB sequences at the level 90%, 50% 30% pairwise identity. Uniclust90 Uniclust50 clusters showed better consistency functional annotation than those UniRef90 UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2...
The MPI Bioinformatics Toolkit (https://toolkit.tuebingen.mpg.de) provides interactive access to a wide range of the best-performing bioinformatics tools and databases, including state-of-the-art protein sequence comparison methods HHblits HHpred. currently includes 35 external in-house tools, covering functionalities such as similarity searching, prediction features, classification. Due this breadth functionality, tight interconnection its constituent ease use, has become an important...
The MMseqs2 desktop and web server app facilitates interactive sequence searches through custom protein profile databases on personal workstations. By eliminating MMseqs2's runtime overhead, we reduced response times to a few seconds at sensitivities close BLAST.The is easy install for non-experts. GPLv3-licensed code, pre-built packages Windows, MacOS Linux, Docker images the application demo are available https://search.mmseqs.com.Supplementary data Bioinformatics online.
Abstract Automated protein structure prediction is becoming a mainstream tool for biological research. This has been fueled by steady improvements of publicly available automated servers over the last decade, in particular their ability to build good homology models an increasing number targets reliably detecting and aligning more remotely homologous templates. Here, we describe three fully versions HHpred server that participated community‐wide blind competition CASP8. What makes unique...
Abstract Motivation : Recent breakthroughs in protein residue–residue contact prediction have made reliable de novo of structures possible. The key was to apply statistical methods that can distinguish direct couplings between pairs columns a multiple sequence alignment from merely correlated pairs, i.e. separate indirect effects. Two classes such exist, either relying on regularized inversion the covariance matrix or pseudo-likelihood maximization (PLM). Although PLM-based offer clearly...
The MPI Bioinformatics Toolkit (http://toolkit.tuebingen.mpg.de) is an open, interactive web service for comprehensive and collaborative protein bioinformatic analysis. It offers a wide array of interconnected, state-of-the-art bioinformatics tools to experts non-experts alike, developed both externally (e.g. BLAST+, HMMER3, MUSCLE) internally HHpred, HHblits, PCOILS). While beta version the was released 10 years ago, current production-level release has been available since 2008 serviced...
As structure prediction methods are generating millions of publicly available protein structures, searching these databases is becoming a bottleneck. Foldseek aligns the query against database by describing amino acid backbone proteins as sequences over structural alphabet. decreases computation times four to five orders magnitude with 86%, 88% and 133% sensitivities DALI, TM-align CE, respectively.
WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over best current tool, WisH shows much improved on sequences a few kbp length and runs hundreds times faster, making it suited metagenomics studies.OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish.clovis.galiez@mpibpc.mpg.de or soeding@mpibpc.mpg.de.Supplementary data are...
The seemingly limitless diversity of proteins in nature arose from only a few thousand domain prototypes, but the origin these themselves has remained unclear. We are pursuing hypothesis that they by fusion and accretion an ancestral set peptides active as co-factors RNA-dependent replication catalysis. Should this be true, contemporary domains may still contain vestiges such peptides, which could reconstructed comparative approach same way ancient vocabularies have been study modern...
Abstract Cells form and use biomolecular condensates to execute biochemical reactions. The molecular properties of non-membrane-bound are directly connected the amino acid content disordered protein regions. Lysine plays an important role in cellular function, but little is known about its condensation. Here we show that disorder abundant protein/RNA granules lysine enriched regions proteins P-bodies compared entire human proteome. Lysine-rich polypeptides phase separate into...
Abstract Motivation: Sequence databases are growing fast, challenging existing analysis pipelines. Reducing the redundancy of sequence by similarity clustering improves speed and sensitivity iterative searches. But tools cannot efficiently cluster size UniProt to 50% maximum pairwise identity or below. Furthermore, in metagenomics experiments typically large fractions reads be matched any known anymore because searching with sensitive but relatively slow (e.g. BLAST HMMER3) through...
Metagenomics is revolutionizing the study of microorganisms and their involvement in biological, biomedical, geochemical processes, allowing us to investigate by direct sequencing a tremendous diversity organisms without need for prior cultivation. Unicellular eukaryotes play essential roles most microbial communities as chief predators, decomposers, phototrophs, bacterial hosts, symbionts, parasites plants animals. Investigating therefore great interest ecology, biotechnology, human health,...
MMseqs2 taxonomy is a new tool to assign taxonomic labels metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute annotation, assigns them with robust and determines the contig's identity by weighted voting. Its fragment extraction step suitable for analysis of domains life. 2-18× faster than state-of-the-art tools also contains modules creating manipulating reference databases as well reporting visualizing...
RECQL5 is the sole member of RECQ family helicases associated with RNA polymerase II (RNAPII). We now show that a general elongation factor important for preserving genome stability during transcription. Depletion or overexpression results in corresponding shifts genome-wide RNAPII density profile. Elongation particularly affected, depletion causing striking increase average rate, concurrent increased stalling, pausing, arrest, and/or backtracking (transcription stress). therefore controls...