- Evolution and Genetic Dynamics
- Virus-based gene therapy research
- Viral gastroenteritis research and epidemiology
- RNA and protein synthesis mechanisms
- Evolutionary Game Theory and Cooperation
- Evolutionary Algorithms and Applications
- Metaheuristic Optimization Algorithms Research
- Protein Structure and Dynamics
- Origins and Evolution of Life
- Advanced Multi-Objective Optimization Algorithms
- Machine Learning in Materials Science
- Genomics and Phylogenetic Studies
- Machine Learning in Bioinformatics
- Computational Drug Discovery Methods
- CAR-T cell therapy research
- Multimodal Machine Learning Applications
- Bioinformatics and Genomic Networks
- Machine Learning and Data Classification
- Photoreceptor and optogenetics research
- Visual Attention and Saliency Detection
- Viral Infectious Diseases and Gene Expression in Insects
- Nonlinear Dynamics and Pattern Formation
- Genetics, Bioinformatics, and Biomedical Research
- Human Pose and Action Recognition
- Viral Infections and Immunology Research
C4 Therapeutics (United States)
2024
Harvard University
2017-2021
Inspire Institute
2019
Evolutionary Genomics (United States)
2017-2019
Massachusetts Institute of Technology
2014
Adeno-associated virus (AAV) capsids can deliver transformative gene therapies, but our understanding of AAV biology remains incomplete. We generated the complete first-order AAV2 capsid fitness landscape, characterizing all single-codon substitutions, insertions, and deletions across multiple functions relevant for in vivo delivery. discovered a frameshifted VP1 region that expresses membrane-associated accessory protein limits production through competitive exclusion. Mutant...
Recent developments in protein design rely on large neural networks with up to 100s of millions parameters, yet it is unclear which residue dependencies are critical for determining function. Here, we show that amino acid preferences at individual residues-without accounting mutation interactions-explain much and sometimes virtually all the combinatorial effects across 8 datasets (R
Proteins are responsible for the most diverse set of functions in biology. The ability to extract information from protein sequences and predict effects mutations is extremely valuable many domains biology medicine. However mapping between sequence function complex poorly understood. Here we present an embedding natural using a Variational Auto-Encoder use it how affect function. We this unsupervised approach cluster variants learn interactions sets positions within protein. This generally...
Summary Adeno-associated virus (AAV) capsids have shown clinical promise as delivery vectors for gene therapy. However, the high prevalence of pre-existing immunity against natural poses a challenge widespread treatment. The generation diverse that are potentially more capable immune evasion is challenging because introducing multiple mutations often breaks capsid assembly. Here we target representative, immunologically relevant 28-amino-acid segment AAV2 and show low-complexity Variational...
A key hurdle to making adeno-associated virus (AAV) capsid mediated gene therapy broadly beneficial all patients is overcoming pre-existing and therapy-induced immune responses these vectors. Recent advances in high-throughput DNA synthesis, multiplexing sequencing technologies have accelerated engineering of improved properties such as production yield, packaging efficiency, biodistribution transduction efficiency. Here we outline how machine learning, viral immunology, measurements can...
Efficient design of biological sequences will have a great impact across many industrial and healthcare domains. However, discovering improved requires solving difficult optimization problem. Traditionally, this challenge was approached by biologists through model-free method known as "directed evolution", the iterative process random mutation selection. As ability to build models that capture sequence-to-function map improves, such can be used oracles screen before running experiments. In...
Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for therapeutic applications. However, significant inefficiencies remain in communication between these fields result biologists finding progress machine inaccessible, and hinder scientists contributing impactful problems bioengineering. Sequence design can be seen...
Abstract Recent developments in protein design have adapted large neural networks with up to 100s of millions parameters learn complex sequence-function mappings. However, it is unclear which dependencies between residues are critical for determining function, and a better empirical understanding could enable high quality models that also more data- resource-efficient. Here, we observe the per residue amino acid preferences - without considering interactions mutations sufficient explain...
Abstract Major evolutionary transitions, including the emergence of life, likely occurred in aqueous environments. While role water’s chemistry early life is well studied, effects ability to manipulate population structure are less clear. Population known be critical, as effective replicators must insulated from parasites. Here, we propose that turbulent coherent structures, long-lasting flow patterns which trap particles, may serve many properties associated with compartments —...
Artificial Intelligence models encoding biology and chemistry are opening new routes to high-throughput high-quality in-silico drug development. However, their training increasingly relies on computational scale, with recent protein language (pLM) hundreds of graphical processing units (GPUs). We introduce the BioNeMo Framework facilitate AI across GPUs. Its modular design allows integration individual components, such as data loaders, into existing workflows is open community contributions....
Compartments are ubiquitous throughout biology, yet their importance stretches back to the origin of cells. In context life, we assume that a protocell, compartment enclosing functional components, requires $N$ components be evolvable. We take interest in timescale which minimal evolvable protocell is produced. show when protocells fuse and share information, time produce an scales algebraically $N$, contrast exponential scaling absence fusion. discuss implications this result for origins as...
The transition from prelife where self-replication does not occur, to life which exhibits and evolution, has been a subject of interest for many decades. Membranes, forming compartments, seem be critical component this as they provide several concurrent benefits. They maintain localized interactions, generate electro-chemical gradients, help in selecting cooperative functions arise. These pave the way emergence maintenance simple metabolic cycles polymers. In context origin life, evolution...
Fitness functions map large combinatorial spaces of biological sequences to properties interest. Inferring these multimodal from experimental data is a central task in modern protein engineering. Global epistasis models are an effective and physically-grounded class for estimating fitness observed data. These assume that sparse latent function transformed by monotonic nonlinearity emit measurable fitness. Here we demonstrate minimizing contrastive loss functions, such as the Bradley-Terry...
Model-based optimization (MBO) is increasingly applied to design problems in science and engineering. A common scenario involves using a fixed training set train models, with the goal of designing new samples that outperform those present data. major challenge this setting distribution shift, where distributions are different. While some shift expected, as create better designs, change can negatively affect model accuracy subsequently, quality. Despite widespread nature problem, addressing...
The ability to design and optimize biological sequences with specific functionalities would unlock enormous value in technology healthcare. In recent years, machine learning-guided sequence has progressed this goal significantly, though validating designed the lab or clinic takes many months substantial labor. It is therefore valuable assess likelihood that a set contains of desired quality (which often lies outside label distribution our training data) before committing resources an...