- Protein purification and stability
- Monoclonal and Polyclonal Antibodies Research
- Protein Structure and Dynamics
- Natural Language Processing Techniques
- Gene expression and cancer classification
- RNA and protein synthesis mechanisms
- Machine Learning in Materials Science
- Gene Regulatory Network Analysis
- Viral Infectious Diseases and Gene Expression in Insects
- Topic Modeling
- Analytical Chemistry and Chromatography
- Machine Learning in Bioinformatics
- Evolutionary Algorithms and Applications
Stanford University
2024-2025
Merck & Co., Inc., Rahway, NJ, USA (United States)
2023-2024
Identification of favorable biophysical properties for protein therapeutics as part developability assessment is a crucial the preclinical development process. Successful prediction such and bioassay results from calculated in silico features has potential to reduce time cost delivering clinical-grade material patients, but nevertheless remained an ongoing challenge field. Here, we demonstrate automated flexible machine learning workflow designed compare identify most powerful...
Leading deep learning-based methods for fixed-backbone protein sequence design do not model sidechain conformation during generation despite the large role three-dimensional arrangement of atoms play in conformation, stability, and overall function. Instead, these models implicitly reason about crucial interactions based on backbone geometry known amino acid labels. To address this, we present FAMPNN (Full-Atom MPNN), a method that explicitly both identity each residue, where per-token...
Abstract Motivation Pre-trained protein language and/or structural models are often fine-tuned on drug development properties (i.e. developability properties) to accelerate discovery initiatives. However, these generally rely a single conformation sequence as molecular representation. We present physics-based model, whereby 3D conformational ensemble representations fused by transformer-based architecture and concatenated representation predict antibody properties. Antibody fusion enables...
Abstract Generative models trained on unlabeled protein datasets have demonstrated a remarkable ability to predict some biological functions without any task-specific training data. However, this capability does not extend all relevant and, in many cases, the unsupervised model still underperforms task-specific, supervised baselines. We hypothesize that is due fundamental “alignment gap” which rules learned during are guaranteed be related function of interest. Here, we demonstrate how...
Autoregressive protein language models (pLMs) have emerged as powerful tools to efficiently design functional proteins with extraordinary diversity, evidenced by the successful generation of diverse enzyme families, including lysozymes or carbonic anhydrases. However, a fundamental limitation pLMs is their propensity sample from dense regions within training distribution, which constrains ability rare, high-value sequence space. This becomes particularly critical in applications targeting...