Samuel Sledzieski

ORCID: 0000-0002-0170-3029
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Machine Learning in Bioinformatics
  • Protein Structure and Dynamics
  • RNA and protein synthesis mechanisms
  • Genomics and Phylogenetic Studies
  • Bioinformatics and Genomic Networks
  • Evolution and Genetic Dynamics
  • Computational Drug Discovery Methods
  • Microbial Metabolic Engineering and Bioproduction
  • Coral and Marine Ecosystems Studies
  • vaccines and immunoinformatics approaches
  • Gene Regulatory Network Analysis
  • Aquaculture disease management and microbiota
  • Gut microbiota and health
  • Monoclonal and Polyclonal Antibodies Research
  • Marine Sponges and Natural Products
  • Mosquito-borne diseases and control
  • Cancer Genomics and Diagnostics
  • Aquaculture Nutrition and Growth
  • Biomedical Text Mining and Ontologies
  • Neurobiology and Insect Physiology Research
  • Lipid Membrane Structure and Behavior
  • CRISPR and Genetic Engineering
  • Single-cell and spatial transcriptomics
  • Animal Virus Infections Studies
  • Bat Biology and Ecology Studies

Massachusetts Institute of Technology
2021-2025

Microsoft (United States)
2023-2025

Flatiron Health (United States)
2024-2025

Flatiron Institute
2024

Moscow Institute of Thermal Technology
2024

Tufts University
2022

Broad Institute
2022

University of Connecticut
2019-2020

We combine advances in neural language modeling and structurally motivated design to develop D-SCRIPT, an interpretable generalizable deep-learning model, which predicts interaction between two proteins using only their sequence maintains high accuracy with limited training data across species. show that a D-SCRIPT model trained on 38,345 human PPIs enables significantly improved functional characterization of fly compared the state-of-the-art approach. Evaluating same protein complexes...

10.1016/j.cels.2021.08.010 article EN cc-by Cell Systems 2021-10-01

Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational needs be generalizable and scalable while remaining sensitive subtle variations in inputs. However, current techniques fail simultaneously meet these goals, often sacrificing performance one achieve others. We develop a deep learning model, ConPLex, successfully leveraging advances pretrained protein language models ("PLex") employing...

10.1073/pnas.2220778120 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2023-06-08

Proteomics has been revolutionized by large protein language models (PLMs), which learn unsupervised representations from corpora of sequences. These are typically fine-tuned in a supervised setting to adapt the model specific downstream tasks. However, computational and memory footprint fine-tuning (FT) PLMs presents barrier for many research groups with limited resources. Natural processing seen similar explosion size models, where these challenges have addressed methods...

10.1073/pnas.2405840121 article EN cc-by Proceedings of the National Academy of Sciences 2024-06-20

Abstract Summary Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ that infer properties from the characteristics of individual protein sequences, or global ‘top-down’ pattern already known PPIs in species interest. However, a way incorporate top-down insights bottom-up PPI prediction has been elusive. We thus introduce Topsy-Turvy, method newly synthesizes both views sequence-based, multi-scale, deep-learning model for...

10.1093/bioinformatics/btac258 article EN cc-by Bioinformatics 2022-04-14

Abstract The majority of proteins must form higher-order assemblies to perform their biological functions, yet few machine learning models can accurately and rapidly predict the symmetry involving multiple copies same protein chain. Here, we address this gap by finetuning several classes foundation models, homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes ESM2, outperforms existing template-based deep methods achieving an average AUC-PR 0.47, 0.44 0.49 across symmetries...

10.1038/s41467-025-57148-3 article EN cc-by Nature Communications 2025-02-27

High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these predict which pairs of proteins interact a high-throughput manner is not immediately clear. The recent Foldseek method van Kempen et al. encodes information distances angles along backbone into linear string same length as string, using tokens from 21-letter discretized alphabet (3Di).

10.1093/bioinformatics/btad663 article EN cc-by Bioinformatics 2023-10-27

Protein language models (PLMs) based on machine learning have demon-strated impressive success in predicting protein structure and function. However, general-purpose (“foundational”) PLMs limited performance antibodies due to the latter’s hypervariable regions, which do not conform evolutionary conservation principles that such rely on. In this study, we propose a new transfer framework called AbMAP, fine-tunes foundational for antibody-sequence inputs by supervising antibody binding...

10.1101/2023.04.26.538476 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-04-28

Abstract Protein-protein interaction (PPI) networks have proven to be a valuable tool in systems biology facilitate the discovery and understanding of protein function. Unfortunately, experimental PPI data remains sparse most model organisms even more so other species. Existing methods for computational prediction PPIs seek address this limitation, while they perform well when sufficient within-species training is available, generalize poorly new species or often require specific types sizes...

10.1101/2021.01.22.427866 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2021-01-25

<title>Abstract</title> The majority of proteins must form higher-order assemblies to perform their biological functions. Despite the importance protein quaternary structure, there are few machine learning models that can accurately and rapidly predict symmetry involving multiple copies same chain. Here, we address this gap by training several classes foundation models, including ESM-MSA, ESM2, RoseTTAFold2, homo-oligomer symmetry. Our best model named Seq2Symm, which utilizes outperforms...

10.21203/rs.3.rs-4215086/v1 preprint EN Research Square (Research Square) 2024-04-26

With the ease of gene sequencing and technology available to study manipulate non-model organisms, extension methodological toolbox required translate our understanding model organisms has become an urgent problem. For example, mining large coral their symbiont sequence data is a challenge, but also provides opportunity for functionality evolution these other organisms. Much more information than any eukaryotic species humans, especially related signal transduction diseases. However,...

10.1371/journal.pone.0270965 article EN cc-by PLoS ONE 2023-02-03

Proteomics has been revolutionized by large pre-trained protein language models, which learn unsupervised representations from corpora of sequences. The parameters these models are then fine-tuned in a supervised setting to tailor the model specific downstream task. However, as size increases, computational and memory footprint fine-tuning becomes barrier for many research groups. In field natural processing, seen similar explosion challenges have addressed methods parameter-efficient...

10.1101/2023.11.09.566187 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2023-11-10

Protein Language Models (PLMs) trained on large databases of protein sequences have proven effective in modeling biology across a wide range applications. However, while PLMs excel at capturing individual properties, they face challenges natively representing protein–protein interactions (PPIs), which are crucial to understanding cellular processes and disease mechanisms. Here, we introduce MINT, PLM specifically designed model sets interacting proteins contextual scalable manner. Using...

10.1101/2025.03.09.642188 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2025-03-10

Protein language models (PLMs) have demonstrated impressive success in modeling proteins. However, general-purpose “foundational” PLMs limited performance antibodies due to the latter’s hypervariable regions, which do not conform evolutionary conservation principles that such rely on. In this study, we propose a transfer learning framework called Antibody Mutagenesis-Augmented Processing (AbMAP), fine-tunes foundational for antibody-sequence inputs by supervising on antibody structure and...

10.1073/pnas.2418918121 article EN cc-by-nc-nd Proceedings of the National Academy of Sciences 2024-12-30

Abstract We consider the problem of sequence-based drug-target interaction (DTI) prediction, showing that a straightforward deep learning architecture leverages pre-trained protein language models (PLMs) for embedding outperforms state art approaches, achieving higher accuracy, expanded generalizability, and an order magnitude faster training. PLM embeddings are found to contain general information is especially useful in few-shot (small training data set) zero-shot instances (unseen...

10.1101/2022.11.03.515084 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-11-04

Abstract Despite significant advances in identifying genetic drivers of neurodegenerative disorders, the majority affected individuals lack molecular diagnosis, with somatic mutations proposed as one potential contributor to increased risk. Here, we report first cell-type-specific map mosaicism Alzheimer’s Dementia (AlzD), using 4,014 cells from prefrontal cortex samples 19 AlzD and 17 non-AlzD individuals. We integrate full-transcript single-nucleus RNA-seq (SMART-Seq) matched...

10.1101/2022.04.21.489103 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2022-04-22

Many existing methods for estimation of infectious disease transmission networks use a phylogeny the infecting strains as basis network inference, and accurate inference relies on accuracy this underlying evolutionary history. However, phylogenetic reconstruction can be highly error prone more sophisticated fail to scale larger outbreaks, negatively impacting downstream inference.We introduce new method, TreeFix-TP, scalable phylogenies based an error-correction framework. Our method uses...

10.7490/f1000research.1118422.1 article EN 2020-12-08

An accurate understanding of the evolutionary history rapidly-evolving viruses like SARS-CoV-2, responsible for COVID-19 pandemic, is crucial to tracking and preventing spread emerging pathogens. However, undergo frequent recombination, which makes it difficult trace their using traditional phylogenetic methods. In this study, we present a workflow, virDTL, analyzing viral evolution in presence recombination. Our approach leverages reconciliation methods developed inferring horizontal gene...

10.1089/cmb.2021.0507 article EN cc-by-nc Journal of Computational Biology 2022-09-20

Abstract Background Many existing methods for estimation of infectious disease transmission networks use a phylogeny the infecting strains as basis network inference, and accurate inference relies on accuracy this underlying evolutionary history. However, phylogenetic reconstruction can be highly error prone more sophisticated fail to scale larger outbreaks, negatively impacting downstream inference. Additionally, there are no currently available which able within-host diversity improve...

10.1101/813931 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2019-10-22

Abstract An accurate understanding of the evolutionary history rapidly-evolving viruses like SARS-CoV-2, responsible for COVID-19 pandemic, is crucial to tracking and preventing spread emerging pathogens. However, undergo frequent recombination, which makes it difficult trace their using traditional phylogenetic methods. Here, we present a workflow, virDTL, analyzing viral evolution in presence recombination. Our approach leverages reconciliation methods developed inferring horizontal gene...

10.1101/2021.08.12.456131 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-08-13

Once thought to be a unique capability of the Langerhans Islands in pancreas mammals, insulin production is now recognized as an evolutionarily ancient function going back prokaryotes, ubiquitously present unicellular eukaryotes, fungi, worm, Drosophila and course human. While functionality signaling pathway has been experimentally demonstrated some these organisms, it not yet exploited for pharmacological applications. To enable such applications, we need understand extent which structure...

10.22541/au.170666200.07483513/v1 preprint EN Authorea (Authorea) 2024-01-31

Protein-protein interaction (PPI) networks are a fundamental resource for modeling cellular and molecular function, large sophisticated toolbox has been developed to leverage their structure topological organization predict the functional roles of under-studied genes, proteins, pathways. However, overwhelming majority experimentally-determined interactions from which such constructed come small number well-studied model organisms. Indeed, most species lack even single in these databases,...

10.1101/2024.10.25.620267 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-10-29
Coming Soon ...