NFDI4DS | UHH-SEMS - Publication Details

Daniel Berenberg

ORCID: 0000-0003-4631-0947

Publications

Citations

Views

---

Saved

---

About

Contact & Profiles

A5078906955

Research Areas

Protein Structure and Dynamics
Machine Learning in Bioinformatics
Genomics and Phylogenetic Studies
RNA and protein synthesis mechanisms
Enzyme Structure and Function
Complex Network Analysis Techniques
Microbial Metabolic Engineering and Bioproduction
Mobile Crowdsensing and Crowdsourcing
Human Mobility and Location-Based Analysis
Cell Image Analysis Techniques
Bioinformatics and Genomic Networks
Computational and Text Analysis Methods
Cognitive Science and Education Research
Explainable Artificial Intelligence (XAI)
Advanced MRI Techniques and Applications
Monoclonal and Polyclonal Antibodies Research
Single-cell and spatial transcriptomics
Privacy-Preserving Technologies in Data
Complex Systems and Time Series Analysis
Reproductive tract infections research
Topic Modeling
Computational Drug Discovery Methods
Advanced Proteomics Techniques and Applications
Opinion Dynamics and Social Influence
Historical Art and Architecture Studies

Courant Institute of Mathematical Sciences
2021-2024

New York University
2021-2024

Simons Foundation
2019-2023

Flatiron Institute
2021

Flatiron Health (United States)
2021

University of Vermont
2018

Structure-based protein function prediction using graph convolutional networks

OPENALEX - Publications

Vladimir Gligorijević P. Douglas Renfrew Tomasz Kościółek Julia Koehler Leman Daniel Berenberg and 9 more

Abstract The rapid increase in the number of proteins sequence databases and diversity their functions challenge computational approaches for automated function prediction. Here, we introduce DeepFRI, a Graph Convolutional Network predicting protein by leveraging features extracted from language model structures. It outperforms current leading methods sequence-based Neural Networks scales to size repositories. Augmenting training set experimental structures with homology models allows us...

10.1038/s41467-021-23303-9 article EN cc-by Nature Communications 2021-05-26

OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

OPENALEX - Publications

Gustaf Ahdritz Nazim Bouatta Christina Floristean Sachin Kadyan Qinghui Xia and 29 more

AlphaFold2 revolutionized structural biology with the ability to predict protein structures exceptionally high accuracy. Its implementation, however, lacks code and data required train new models. These are necessary (1) tackle tasks, like protein–ligand complex structure prediction, (2) investigate process by which model learns (3) assess model's capacity generalize unseen regions of fold space. Here we report OpenFold, a fast, memory efficient trainable implementation AlphaFold2. We...

10.1038/s41592-024-02272-z article EN cc-by Nature Methods 2024-05-14

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

OPENALEX - Publications

Gustaf Ahdritz Nazim Bouatta Christina Floristean Sachin Kadyan Qinghui Xia and 24 more

Abstract AlphaFold2 revolutionized structural biology with the ability to predict protein structures exceptionally high accuracy. Its implementation, however, lacks code and data required train new models. These are necessary (i) tackle tasks, like protein-ligand complex structure prediction, (ii) investigate process by which model learns, remains poorly understood, (iii) assess model’s generalization capacity unseen regions of fold space. Here we report OpenFold, a fast, memory-efficient,...

10.1101/2022.11.20.517210 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-11-22

Protein remote homology detection and structural alignment using deep learning

OPENALEX - Publications

Tymor Hamamsy James T. Morton Robert Blackwell Daniel Berenberg Nicholas Carriero and 5 more

Abstract Exploiting sequence–structure–function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning address this gap, TM-Vec and DeepBLAST. allows searching structure–structure similarities large databases. It is trained accurately predict TM-scores as a metric of structural directly from pairs without the need intermediate computation or solution structures. Once...

10.1038/s41587-023-01917-2 article EN cc-by Nature Biotechnology 2023-09-07

Sequence-structure-function relationships in the microbial protein universe

OPENALEX - Publications

Julia Koehler Leman Paweł Szczerbiak P. Douglas Renfrew Vladimir Gligorijević Daniel Berenberg and 11 more

Abstract For the past half-century, structural biologists relied on notion that similar protein sequences give rise to structures and functions. While this assumption has driven research explore certain parts of universe, it disregards spaces don’t rely assumption. Here we areas universe where functions can be achieved by different structures. We predict ~200,000 for diverse from 1,003 representative genomes across microbial tree life annotate them functionally a per-residue basis. Structure...

10.1038/s41467-023-37896-w article EN cc-by Nature Communications 2023-04-26

Lab-in-the-loop therapeutic antibody design with deep learning

OPENALEX - Publications

Nathan C. Frey Isidro Hötzel Samuel D Stanton Ryan L. Kelly Robert G. Alberstein and 59 more

Therapeutic antibody design is a complex multi-property optimization problem that traditionally relies on expensive search through sequence space. Here, we introduce "Lab-in-the-loop," paradigm shift for orchestrates generative machine learning models, multi-task property predictors, active ranking and selection, in vitro experimentation semi-autonomous, iterative loop. By automating the of variants, prediction, selection designs to assay lab, ingestion data, enable holistic, end-to-end...

10.1101/2025.02.19.639050 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2025-02-24

Structure-Based Protein Function Prediction using Graph Convolutional Networks

OPENALEX - Publications

Vladimir Gligorijević P. Douglas Renfrew Tomasz Kościółek Julia Koehler Leman Daniel Berenberg and 9 more

The large number of available sequences and the diversity protein functions challenge current experimental computational approaches to determining predicting function. We present a deep learning Graph Convolutional Network (GCN) for concurrently identifying functionally important residues. This model is initially trained using experimentally determined structures from Protein Data Bank (PDB) but has significant de-noising capability, with only minor drop in performance observed when...

10.1101/786236 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2019-10-04

Function-guided protein design by deep manifold sampling

OPENALEX - Publications

Vladimir Gligorijević Daniel Berenberg Stephen Ra Andrew M. Watkins Simon Kelow and 2 more

Abstract Protein design is challenging because it requires searching through a vast combinatorial space that only sparsely functional. Self-supervised learning approaches offer the potential to navigate this more effectively and thereby accelerate protein engineering. We introduce sequence denoising autoencoder (DAE) learns manifold of sequences from large amount potentially unlabelled proteins. This DAE combined with function predictor guides sampling towards higher levels desired...

10.1101/2021.12.22.473759 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-12-23

TM-Vec: template modeling vectors for fast homology detection and alignment

OPENALEX - Publications

Tymor Hamamsy James T. Morton Daniel Berenberg Nicholas Carriero Vladimir Gligorijević and 5 more

Abstract Exploiting sequence-structure-function relationships in molecular biology and computational modeling relies on detecting proteins with high sequence similarities. However, the most commonly used alignment-based methods, such as BLAST, frequently fail low similarity to previously annotated proteins. We developed a deep learning method, TM-Vec, that uses alignments learn structural features can then be search for structure-structure similarities large databases. train TM-Vec...

10.1101/2022.07.25.501437 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-07-27

Protein Structural Alignments From Sequence

OPENALEX - Publications

James T. Morton Charlie E. M. Strauss Robert Blackwell Daniel Berenberg Vladimir Gligorijević and 1 more

Abstract Computing sequence similarity is a fundamental task in biology, with alignment forming the basis for annotation of genes and genomes providing core data structures evolutionary analysis. Standard approaches are mainstay modern molecular biology rely on variations edit distance to obtain explicit alignments between pairs biological sequences. However, algorithms struggle remote homology tasks cannot identify similarities many proteins similar likely homology. Recent work suggests...

10.1101/2020.11.03.365932 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2020-11-04

Comparative genomics of the sexually transmitted parasiteTrichomonas vaginalisreveals relaxed and convergent evolution and genes involved in spillover from birds to humans

OPENALEX - Publications

Steven A. Sullivan Jordan C. Orosco Francisco Callejas‐Hernández Frances Blow Hayan Lee and 21 more

Abstract Trichomonas vaginalis is the causative agent of venereal disease trichomoniasis which infects men and women globally associated with serious outcomes during pregnancy cancers human reproductive tract. Trichomonads parasitize a range hosts in addition to humans including birds, livestock, domesticated animals. Recent genetic analysis trichomonads recovered from columbid birds has provided evidence that these parasite species undergo frequent host-switching, current epoch spillover...

10.1101/2024.12.22.629724 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-12-25

Efficient Crowd Exploration of Large Networks

OPENALEX - Publications

Daniel Berenberg James P. Bagrow

Accurately and efficiently crowdsourcing complex, open-ended tasks can be difficult, as crowd participants tend to favor short, repetitive "microtasks". We study the of large networks where provides network topology via microtasks. Crowds explore many types social information networks, but we focus on causal attributions, an important that signifies cause-and-effect relationships. conduct experiments Amazon Mechanical Turk (AMT) testing how workers propose validate individual relationships...

10.1145/3274293 article EN Proceedings of the ACM on Human-Computer Interaction 2018-11-01

Protein Discovery with Discrete Walk-Jump Sampling

OPENALEX - Publications

Nathan C. Frey Daniel Berenberg Karina Zadorozhny Joseph Kleinhenz Julien Lafrance‐Vanasse and 8 more

We resolve difficulties in training and sampling from a discrete generative model by learning smoothed energy function, the data manifold with Langevin Markov chain Monte Carlo (MCMC), projecting back to true one-step denoising. Our Discrete Walk-Jump Sampling formalism combines contrastive divergence of an energy-based improved sample quality score-based model, while simplifying requiring only single noise level. evaluate robustness our approach on modeling antibody proteins introduce...

10.48550/arxiv.2306.12360 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Sequence-structure-function relationships in the microbial protein universe

OPENALEX - Publications

Julia Koehler Leman Paweł Szczerbiak P. Douglas Renfrew Vladimir Gligorijević Daniel Berenberg and 11 more

Abstract / Summary For the past half-century, structural biologists relied on notion that similar protein sequences give rise to structures and functions. While this assumption has driven research explore certain parts of universe, it disregards spaces don’t rely assumption. Here we areas universe where functions can be achieved by different structures. We predict ∼200,000 for diverse from 1,003 representative genomes 1 across microbial tree life, annotate them functionally a per-residue...

10.1101/2022.03.18.484903 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-03-20

Neural language representations predict outcomes of scientific research

OPENALEX - Publications

James P. Bagrow Daniel Berenberg Joshua Bongard

Many research fields codify their findings in standard formats, often by reporting correlations between quantities of interest. But the space all testable correlates is far larger than scientific resources can currently address, so ability to accurately predict would be useful plan and allocate resources. Using a dataset approximately 170,000 correlational extracted from leading social science journals, we show that trained neural network reported using only text descriptions correlates....

10.48550/arxiv.1805.06879 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Multi-segment preserving sampling for deep manifold sampler

OPENALEX - Publications

Daniel Berenberg Jae Hyeon Lee Simon Kelow Ji Won Park Andrew M. Watkins and 4 more

Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit insight and model flexibility. The deep manifold sampler was recently proposed as means to iteratively sample variable-length protein by exploiting gradients from function predictor. We introduce an alternative approach this guided sampling procedure, multi-segment preserving sampling, that enables direct inclusion of domain-specific knowledge designating...

10.48550/arxiv.2205.04259 preprint EN cc-by arXiv (Cornell University) 2022-01-01

OpenProteinSet: Training data for structural biology at scale

OPENALEX - Publications

Gustaf Ahdritz Nazim Bouatta Sachin Kadyan Lukas Jarosch Daniel Berenberg and 5 more

Multiple sequence alignments (MSAs) of proteins encode rich biological information and have been workhorses in bioinformatic methods for tasks like protein design structure prediction decades. Recent breakthroughs AlphaFold2 that use transformers to attend directly over large quantities raw MSAs reaffirmed their importance. Generation is highly computationally intensive, however, no datasets comparable those used train made available the research community, hindering progress machine...

10.48550/arxiv.2308.05326 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Inferring the size of the causal universe: features and fusion of causal attribution networks

OPENALEX - Publications

Daniel Berenberg James P. Bagrow

Cause-and-effect reasoning, the attribution of effects to causes, is one most powerful and unique skills humans possess. Multiple surveys are mapping out causal attributions as networks, but it unclear how well these efforts can be combined. Further, total size collective network held by currently unknown, making challenging assess progress surveys. Here we study three networks determine they combined into a single network. Combining requires dealing with ambiguous nodes, nodes represent...

10.48550/arxiv.1812.06038 preprint EN other-oa arXiv (Cornell University) 2018-01-01

Coming Soon ...