- Protein Structure and Dynamics
- Bioinformatics and Genomic Networks
- Machine Learning in Bioinformatics
- Computational Drug Discovery Methods
- Advanced Graph Neural Networks
- Machine Learning in Materials Science
- Generative Adversarial Networks and Image Synthesis
- Gene expression and cancer classification
- Domain Adaptation and Few-Shot Learning
- RNA and protein synthesis mechanisms
- Cell Image Analysis Techniques
- Advanced Neuroimaging Techniques and Applications
- Bayesian Methods and Mixture Models
- Bacterial Genetics and Biotechnology
- Neural Networks and Applications
- Natural Language Processing Techniques
- Topological and Geometric Data Analysis
- Genomics and Chromatin Dynamics
- Gene Regulatory Network Analysis
- Advanced Graph Theory Research
- Graph Theory and Algorithms
- Machine Learning and Algorithms
- Data Management and Algorithms
- Genetics, Bioinformatics, and Biomedical Research
- Mathematical Biology Tumor Growth
Massachusetts Institute of Technology
2021-2024
Moscow Institute of Thermal Technology
2022
University of Cambridge
2020-2021
Graph Neural Networks (GNNs) have been shown to be effective models for different predictive tasks on graph-structured data. Recent work their expressive power has focused isomorphism and countable feature spaces. We extend this theoretical framework include continuous features - which occur regularly in real-world input domains within the hidden layers of GNNs we demonstrate requirement multiple aggregation functions context. Accordingly, propose Principal Neighbourhood Aggregation (PNA), a...
Predicting the binding structure of a small molecule ligand to protein -- task known as molecular docking is critical drug design. Recent deep learning methods that treat regression problem have decreased runtime compared traditional search-based but yet offer substantial improvements in accuracy. We instead frame generative modeling and develop DiffDock, diffusion model over non-Euclidean manifold poses. To do so, we map this product space degrees freedom (translational, rotational,...
Understanding biomolecular interactions is fundamental to advancing fields like drug discovery and protein design. In this paper, we introduce Boltz-1, an open-source deep learning model incorporating innovations in architecture, speed optimization, data processing achieving A lpha F old 3-level accuracy predicting the 3D structures of complexes. Boltz-1 demonstrates a performance on-par with state-of-the-art commercial models on range diverse benchmarks, setting new benchmark for...
Molecular conformer generation is a fundamental task in computational chemistry. Several machine learning approaches have been developed, but none outperformed state-of-the-art cheminformatics methods. We propose torsional diffusion, novel diffusion framework that operates on the space of torsion angles via process hypertorus and an extrinsic-to-intrinsic score model. On standard benchmark drug-like molecules, generates superior ensembles compared to methods terms both RMSD chemical...
Abstract Generative AI is rapidly transforming the frontier of research in computational structural biology. Indeed, recent successes have substantially advanced protein design and drug discovery. One key methodologies underlying these advances diffusion models (DM). Diffusion originated computer vision, taking over image generation offering superior quality performance. These were subsequently extended modified for uses other areas including DMs are well equipped to model high dimensional,...
Abstract Protein-ligand interactions (PLI) are foundational to small molecule drug design. With computational methods striving towards experimental accuracy, there is a critical demand for well-curated and diverse PLI dataset. Existing datasets often limited in size diversity, commonly used evaluation sets suffer from training information leakage, hindering the realistic assessment of method generalization capabilities. To address these shortcomings, we present PLIN-DER, largest most...
Protein structure prediction has reached revolutionary levels of accuracy on single structures, yet distributional modeling paradigms are needed to capture the conformational ensembles and flexibility that underlie biological function. Towards this goal, we develop EigenFold, a diffusion generative framework for sampling distribution structures from given protein sequence. We define process models as system harmonic oscillators which naturally induces cascading-resolution along eigenmodes...
Abstract Protein-protein interactions (PPIs) are fundamental to understanding biological processes and play a key role in therapeutic advancements. As deep-learning docking methods for PPIs gain traction, benchmarking protocols datasets tailored effective training evaluation of their generalization capabilities performance across real-world scenarios become imperative. Aiming overcome limitations existing approaches, we introduce PINDER, comprehensive annotated dataset that uses structural...
Understanding how proteins structurally interact is crucial to modern biology, with applications in drug discovery and protein design. Recent machine learning methods have formulated protein-small molecule docking as a generative problem significant performance boosts over both traditional deep baselines. In this work, we propose similar approach for rigid protein-protein docking: DiffDock-PP diffusion model that learns translate rotate unbound structures into their bound conformations. We...
The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical large-scale research. However, popular machine learning approaches, based on continuous Euclidean spaces, have struggled with the discrete combinatorial formulation edit models evolution hierarchical relationship characterises real-world datasets. We present Neural Distance Embeddings (NeuroSEED), a general framework to embed in geometric vector...
Diffusion models (DMs) have revolutionized generative learning. They utilize a diffusion process to encode data into simple Gaussian distribution. However, encoding complex, potentially multimodal distribution single continuous arguably represents an unnecessarily challenging learning problem. We propose Discrete-Continuous Latent Variable Models (DisCo-Diff) simplify this task by introducing complementary discrete latent variables. augment DMs with learnable latents, inferred encoder, and...
In light of the widespread success generative models, a significant amount research has gone into speeding up their sampling time. However, models are often sampled multiple times to obtain diverse set incurring cost that is orthogonal We tackle question how improve diversity and sample efficiency by moving beyond common assumption independent samples. propose particle guidance, an extension diffusion-based where joint-particle time-evolving potential enforces diversity. analyze...
Score-based models generate samples by mapping noise to data (and vice versa) via a high-dimensional diffusion process. We question whether it is necessary run this entire process at high dimensionality and incur all the inconveniences thereof. Instead, we restrict projections onto subspaces as distribution evolves toward noise. When applied state-of-the-art models, our framework simultaneously improves sample quality -- reaching an FID of 2.17 on unconditional CIFAR-10 reduces computational...
Searching for a path between two nodes in graph is one of the most well-studied and fundamental problems computer science. In numerous domains such as robotics, AI, or biology, practitioners develop search heuristics to accelerate their pathfinding algorithms. However, it laborious complex process hand-design based on problem structure given use case. Here we present PHIL (Path Heuristic with Imitation Learning), novel neural architecture training algorithm discovering navigation from data...
Traditional Graph Neural Networks (GNNs) rely on message passing, which amounts to permutation-invariant local aggregation of neighbour features. Such a process is isotropic and there no notion `direction' the graph. We present new GNN architecture called Anisotropic Diffusion. Our model alternates between linear diffusion, for closed-form solution available, anisotropic filters obtain efficient multi-hop kernels. test our two common molecular property prediction benchmarks (ZINC QM9) show...
This paper presents the computational challenge on differential geometry and topology that happened within ICLR 2021 workshop "Geometric Topological Representation Learning". The competition asked participants to provide creative contributions fields of through open-source repositories Geomstats Giotto-TDA. attracted 16 teams in its two month duration. describes design summarizes main findings.