Brian Hie

ORCID: 0000-0003-3224-8142
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Single-cell and spatial transcriptomics
  • RNA and protein synthesis mechanisms
  • Machine Learning in Bioinformatics
  • Protein Structure and Dynamics
  • vaccines and immunoinformatics approaches
  • Genomics and Phylogenetic Studies
  • Cell Image Analysis Techniques
  • Immune cells in cancer
  • Advanced Neural Network Applications
  • Computer Graphics and Visualization Techniques
  • CRISPR and Genetic Engineering
  • Gene Regulatory Network Analysis
  • Extracellular vesicles in disease
  • SARS-CoV-2 and COVID-19 Research
  • Neuroinflammation and Neurodegeneration Mechanisms
  • Genomics and Chromatin Dynamics
  • Computational Drug Discovery Methods
  • Machine Learning in Materials Science
  • Evolution and Genetic Dynamics
  • Topic Modeling
  • Enzyme Structure and Function
  • Fractal and DNA sequence analysis
  • Advanced Vision and Imaging
  • Genetic Associations and Epidemiology
  • Cancer Genomics and Diagnostics

Arc Research Institute
2024-2025

Stanford University
2016-2025

Palo Alto Institute
2024

Ragon Institute of MGH, MIT and Harvard
2020-2024

Massachusetts Institute of Technology
2018-2024

Meta (United States)
2024

Palo Alto University
2023

Protein Express (United States)
2022

Art Institute of Portland
2022

Meta (Israel)
2022

Recent advances in machine learning have leveraged evolutionary information multiple sequence alignments to predict protein structure. We demonstrate direct inference of full atomic-level structure from primary using a large language model. As models sequences are scaled up 15 billion parameters, an atomic-resolution picture emerges the learned representations. This results order-of-magnitude acceleration high-resolution prediction, which enables large-scale structural characterization...

10.1126/science.ade2574 article EN cc-by Science 2023-03-16

Natural language predicts viral escape Viral mutations that evade neutralizing antibodies, an occurrence known as escape, can occur and may impede the development of vaccines. To predict which lead to Hie et al. used a machine learning technique for natural processing with two components: grammar (or syntax) meaning semantics) (see Perspective by Kim Przytycka). Three different unsupervised models were constructed influenza A hemagglutinin, HIV-1 envelope glycoprotein, severe acute...

10.1126/science.abd7331 article EN Science 2021-01-14

Abstract Artificial intelligence has the potential to open insight into structure of proteins at scale evolution. It only recently been possible extend protein prediction two hundred million cataloged proteins. Characterizing structures exponentially growing billions sequences revealed by large gene sequencing experiments would necessitate a break-through in speed folding. Here we show that direct inference from primary sequence using language model enables an order magnitude speed-up high...

10.1101/2022.07.20.500902 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-07-21

Abstract We consider the problem of predicting a protein sequence from its backbone atom coordinates. Machine learning approaches to this date have been limited by number available experimentally determined structures. augment training data nearly three orders magnitude structures for 12M sequences using AlphaFold2. Trained with additional data, sequence-to-sequence transformer invariant geometric input processing layers achieves 51% native recovery on structurally held-out backbones 72%...

10.1101/2022.04.10.487779 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-04-10

Abstract Natural evolution must explore a vast landscape of possible sequences for desirable yet rare mutations, suggesting that learning from natural evolutionary strategies could guide artificial evolution. Here we report general protein language models can efficiently evolve human antibodies by mutations are evolutionarily plausible, despite providing the model with no information about target antigen, binding specificity or structure. We performed language-model-guided affinity...

10.1038/s41587-023-01763-2 article EN cc-by Nature Biotechnology 2023-04-24

SARS-CoV-2 evolution threatens vaccine- and natural infection-derived immunity as well the efficacy of therapeutic antibodies. To improve public health preparedness, we sought to predict which existing amino acid mutations in might contribute future variants concern. We tested predictive value features comprising epidemiology, evolution, immunology, neural network-based protein sequence modeling, identified primary biological drivers intra-pandemic evolution. found evidence that...

10.1126/scitranslmed.abk3445 article EN cc-by Science Translational Medicine 2022-01-11

The degree to which evolution is predictable a fundamental question in biology. Previous attempts predict the of protein sequences have been limited specific proteins and small changes, such as single-residue mutations. Here, we demonstrate that by using language model local within families, recover dynamic "vector field" call evolutionary velocity (evo-velocity). Evo-velocity generalizes over vastly different timescales, from viral evolving years eukaryotic geologic eons, can dynamics were...

10.1016/j.cels.2022.01.003 article EN cc-by Cell Systems 2022-02-03

The genome is a sequence that encodes the DNA, RNA, and proteins orchestrate an organism’s function. We present Evo, long-context genomic foundation model with frontier architecture trained on millions of prokaryotic phage genomes, report scaling laws DNA to complement observations in language vision. Evo generalizes across proteins, enabling zero-shot function prediction competitive domain-specific models generation functional CRISPR-Cas transposon systems, representing first examples...

10.1126/science.ado9336 article EN Science 2024-11-14

The genome is a sequence that completely encodes the DNA, RNA, and proteins orchestrate function of whole organism. Advances in machine learning combined with massive datasets genomes could enable biological foundation model accelerates mechanistic understanding generative design complex molecular interactions. We report Evo, genomic enables prediction generation tasks from to scale. Using an architecture based on advances deep signal processing, we scale Evo 7 billion parameters context...

10.1101/2024.02.27.582234 preprint EN cc-by bioRxiv (Cold Spring Harbor Laboratory) 2024-02-27

Large language models trained on sequence information alone can learn high-level principles of protein design. However, beyond sequence, the three-dimensional structures proteins determine their specific function, activity, and evolvability. Here, we show that a general model augmented with structure backbone coordinates guide evolution for diverse without need to individual functional tasks. We also demonstrate ESM-IF1, which was only single-chain structures, be extended engineer complexes....

10.1126/science.adk8946 article EN Science 2024-07-04

Abstract All of life encodes information with DNA. While tools for sequencing, synthesis, and editing genomic code have transformed biological research, intelligently composing new systems would also require a deep understanding the immense complexity encoded by genomes. We introduce Evo 2, foundation model trained on 9.3 trillion DNA base pairs from highly curated atlas spanning all domains life. train 2 7B 40B parameters to an unprecedented 1 million token context window single-nucleotide...

10.1101/2025.02.18.638918 preprint EN cc-by-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-02-21

Machine learning that generates biological hypotheses has transformative potential, but most algorithms are susceptible to pathological failure when exploring regimes beyond the training data distribution. A solution address this issue is quantify prediction uncertainty so can gracefully handle novel phenomena confound standard methods. Here, we demonstrate broad utility of robust in discovery. By leveraging Gaussian process-based on modern pre-trained features, train a model just 72...

10.1016/j.cels.2020.09.007 article EN cc-by-nc-nd Cell Systems 2020-10-15

Single-cell RNA sequencing (scRNA-seq) has provided a high-dimensional catalog of millions cells across species and diseases. These data have spurred the development hundreds computational tools to derive novel biological insights. Here, we outline components scRNA-seq analytical pipelines methods that underlie these steps. We describe available methods, highlight well-executed benchmarking studies, identify opportunities for additional studies methods. As biochemical approaches single-cell...

10.1146/annurev-biodatasci-012220-100601 article EN Annual Review of Biomedical Data Science 2020-05-27

Large language models (LLMs) are a type of machine learning model that learn statistical patterns over text, such as predicting the next words in sequence text. Both general purpose and task-specific LLMs have demonstrated potential across diverse applications. Science medicine many data types highly suitable for LLMs, scientific texts (publications, patents textbooks), electronic medical records, large databases DNA protein sequences chemical compounds. Carefully validated systems can...

10.1111/eci.14183 article EN European Journal of Clinical Investigation 2024-02-21

Engineering new molecules with desirable functions and properties has the potential to extend our ability engineer proteins beyond what nature so far evolved. Advances in so-called 'de novo' design problem have recently been brought forward by developments artificial intelligence. Generative architectures, such as language models diffusion processes, seem adept at generating novel, yet realistic that display perform specified functions. State-of-the-art protocols now achieve experimental...

10.1016/j.sbi.2024.102794 article EN cc-by Current Opinion in Structural Biology 2024-04-24

Genome-wide association studies (GWAS) are a powerful approach for connecting genotype to phenotype. Most GWAS hits located in cis-regulatory regions, but the underlying causal variants and their molecular mechanisms remain unknown. To better understand human variation, we mapped quantitative trait loci chromatin accessibility (caQTLs)—a key step cis-regulation—in 1000 individuals from 10 diverse populations. caQTLs were shared across populations, allowing us leverage genetic diversity...

10.7554/elife.39595 article EN cc-by eLife 2019-01-16

Although combining data from multiple entities could power life-saving breakthroughs, open sharing of pharmacological is generally not viable because privacy and intellectual property concerns. To this end, we leverage modern cryptographic tools to introduce a computational protocol for securely training predictive model drug-target interactions (DTIs) on pooled dataset that overcomes barriers by provably ensuring the confidentiality all underlying drugs, targets, observed interactions. Our...

10.1126/science.aat4807 article EN Science 2018-10-18

Abstract Combining a basic set of building blocks into more complex forms is universal design principle. Most protein designs have proceeded from manual bottom-up approach using parts created by nature, but top-down proteins fundamentally hard due to biological complexity. We demonstrate how the modularity and programmability long sought for can be realized through generative artificial intelligence. Advanced language models emergent learning atomic resolution structure principles. leverage...

10.1101/2022.12.21.521526 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2022-12-22

A complete understanding of biological processes requires synthesizing information across heterogeneous modalities, such as age, disease status, or gene expression. Technological advances in single-cell profiling have enabled researchers to assay multiple modalities simultaneously. We present Schema, which uses a principled metric learning strategy that identifies informative features modality synthesize disparate into single coherent interpretation. use Schema infer cell types by...

10.1186/s13059-021-02313-2 article EN cc-by Genome biology 2021-05-03

Abstract Natural evolution must explore a vast landscape of possible sequences for desirable yet rare mutations, suggesting that learning from natural evolutionary strategies could accelerate artificial evolution. Here, we report deep algorithms known as protein language models can evolve human antibodies with high efficiency, despite providing the no information about target antigen, binding specificity, or structure, and also requiring additional task-specific finetuning supervision. We...

10.1101/2022.04.10.487811 preprint EN cc-by-nc bioRxiv (Cold Spring Harbor Laboratory) 2022-04-11
Coming Soon ...