- Machine Learning in Bioinformatics
- RNA and protein synthesis mechanisms
- Genomics and Phylogenetic Studies
- Antimicrobial Peptides and Activities
- Chemical Synthesis and Analysis
- Evolutionary Algorithms and Applications
- vaccines and immunoinformatics approaches
- Natural Language Processing Techniques
- Microbial Inactivation Methods
- Gaussian Processes and Bayesian Inference
- Algorithms and Data Compression
- Advanced Proteomics Techniques and Applications
- Receptor Mechanisms and Signaling
- Food Drying and Modeling
- Protein Structure and Dynamics
- Magnetic and Electromagnetic Effects
University of Copenhagen
2022-2024
Novo Nordisk (Denmark)
2022-2024
Digital Science (United States)
2024
Center for Systems Biology
2024
Harvard University
2024
Novo Nordisk (United Kingdom)
2024
Technical University of Denmark
2021-2022
ETH Zurich
2021
BOKU University
2018
Abstract Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. SPs can be predicted from sequence data, but existing algorithms unable to detect known types of SPs. We introduce SignalP 6.0, a machine learning model detects five SP is applicable metagenomic data.
DeepLoc 2.0 is a popular web server for the prediction of protein subcellular localization and sorting signals. Here, we introduce 2.1, which additionally classifies input proteins into membrane types Transmembrane, Peripheral, Lipid-anchored Soluble. Leveraging pre-trained transformer-based language models, utilizes three-stage architecture sequence-based, multi-label predictions. Comparative evaluations with other established tools on test set 4933 eukaryotic sequences, constructed...
Abstract Protein subcellular location prediction is a widely explored task in bioinformatics because of its importance proteomics research. We propose DeepLocPro, an extension to the popular method DeepLoc, tailored specifically archaeal and bacterial organisms. DeepLocPro multiclass tool for prokaryotic proteins, trained on experimentally verified data curated from UniProt PSORTdb. compares favorably PSORTb 3.0 ensemble method, surpassing performance across multiple metrics our benchmark...
Abstract Peptides play important roles in regulating biological processes and form the basis of a multiplicity therapeutic drugs. To date, only about 300 peptides human have confirmed bioactivity, although tens thousands been reported literature. The majority these are inactive degradation products endogenous proteins peptides, presenting needle-in-a-haystack problem identifying most promising candidate from large-scale peptidomics experiments to test for bioactivity. address this challenge,...
When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too-closely related pairs sequences ending up in different partitions. If this ignored, performance prediction methods will tend be overestimated. Several algorithms have been proposed homology reduction, where are removed until no remain. We present GraphPart, an algorithm partitioning that divides such closely always end same partition, while keeping as many possible...
Abstract Motivation Protein subcellular location prediction is a widely explored task in bioinformatics because of its importance proteomics research. We propose DeepLocPro, an extension to the popular method DeepLoc, tailored specifically archaeal and bacterial organisms. Results DeepLocPro multiclass tool for prokaryotic proteins, trained on experimentally verified data curated from UniProt PSORTdb. compares favorably PSORTb 3.0 ensemble method, surpassing performance across multiple...
Abstract Signal peptides (SPs) are short amino acid sequences that control protein secretion and translocation in all living organisms. As experimental characterization of SPs is costly, prediction algorithms applied to predict them from sequence data. However, existing methods unable detect known types SPs. We introduce SignalP 6.0, the first model capable detecting five SP types. Additionally, accurately identifies positions regions within SPs, revealing defining biochemical properties...
Abstract Background AlphaFold’s accuracy, which is often comparable to that of experimentally determined structures, has revolutionized protein structure research. Being a statistical method, AlphaFold implicitly infers the cellular environment, e.g. cell membrane, from sequence. Membrane topology prediction methods predict environment for each residue but not structure. Current and tools thus provide complementary information. Results We introduce web server MembraneFold. MembraneFold...
The genome sequence contains the blueprint for governing cellular processes. While availability of genomes has vastly increased over last decades, experimental annotation various functional, non-coding and regulatory elements encoded in DNA remains both expensive challenging. This sparked interest unsupervised language modeling genomic DNA, a paradigm that seen great success protein data. Although models have been proposed, evaluation tasks often differ between individual works, might not...
Abstract Motivation Peptides are ubiquitous throughout life and involved in a wide range of biological processes, ranging from neural signaling higher organisms to antimicrobial peptides bacteria. Many generated post-translationally by cleavage precursor proteins can thus not be detected directly genomics data, as the specificities responsible proteases often completely understood. Results We present DeepPeptide, deep learning model that predicts cleaved amino acid sequence. DeepPeptide...
Abstract Many secreted endogenous peptides rely on signalling pathways to exert their function in the body. While can be discovered through high throughput technologies, cognate receptors typically cannot, hindering understanding of mode action. We investigate use AlphaFold-Multimer for identifying human receptor libraries without any prior knowledge about likely candidates. find that AlphaFold’s predicted confidence metrics have strong performance prioritizing true peptide-receptor...
Abstract When splitting biological sequence data for the development and testing of predictive models, it is necessary to avoid too closely related pairs sequences ending up in different partitions. If this ignored, performance estimates prediction methods will tend be exaggerated. Several algorithms have been proposed homology reduction, where are removed until no remain. We present GraphPart, an algorithm partitioning, as many possible kept dataset, but partitions defined such that always...
Bayesian optimization (BO) is an attractive machine learning framework for performing sample-efficient global of black-box functions. The process guided by acquisition function that selects points to acquire in each round BO. In batched BO, when multiple are acquired parallel, commonly used functions often high-dimensional and intractable, leading the use sampling-based alternatives. We propose a statistical physics inspired BO with Gaussian processes can natively handle batches. Batched...
Abstract Genetic studies reveal extensive disease-associated variation across the human genome, predominantly in noncoding regions, such as promoters. Quantifying impact of these variants on disease risk is crucial to our understanding underlying mechanisms and advancing personalized medicine. However, current computational methods struggle capture variant effects, particularly those insertions deletions (indels), which can significantly disrupt gene expression. To address this challenge, we...