- Machine Learning in Materials Science
- Computational Drug Discovery Methods
- Protein Structure and Dynamics
- Topic Modeling
- Generative Adversarial Networks and Image Synthesis
- Advanced Graph Neural Networks
- Advanced Neuroimaging Techniques and Applications
- Reinforcement Learning in Robotics
- Natural Language Processing Techniques
- Model Reduction and Neural Networks
- Multimodal Machine Learning Applications
- Wireless Signal Modulation Classification
- Digital Media Forensic Detection
- Adversarial Robustness in Machine Learning
- Speech and Audio Processing
- RNA and protein synthesis mechanisms
- Asymmetric Hydrogenation and Catalysis
- Process Optimization and Integration
- Graph Theory and Algorithms
- Multi-Criteria Decision Making
- Mathematical Biology Tumor Growth
- Explainable Artificial Intelligence (XAI)
- Opinion Dynamics and Social Influence
- Image Retrieval and Classification Techniques
- Human Motion and Animation
Stanford University
2023-2025
Palo Alto University
2023
Tongji University
2022
State Key Laboratory of Pollution Control and Resource Reuse
2022
Université de Montréal
2021
Microsoft (United States)
2020
Shanghai Jiao Tong University
2020
Microsoft Research (United Kingdom)
2020
Molecular graph generation is a fundamental problem for drug discovery and has been attracting growing attention. The challenging since it requires not only generating chemically valid molecular structures but also optimizing their chemical properties in the meantime. Inspired by recent progress deep generative models, this paper we propose flow-based autoregressive model called GraphAF. GraphAF combines advantages of both approaches enjoys: (1) high flexibility data density estimation; (2)...
Predicting molecular conformations from graphs is a fundamental problem in cheminformatics and drug discovery. Recently, significant progress has been achieved with machine learning approaches, especially deep generative models. Inspired by the diffusion process classical non-equilibrium thermodynamics where heated particles will diffuse original states to noise distribution, this paper, we propose novel model named GeoDiff for conformation prediction. treats each atom as particle learns...
Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically sidechains, is an important need in protein design. However, constructing all-atom generative model requires appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded structure sequence. We describe diffusion structure, Protpardelle, represents all sidechain states at once as a “superposition” state; superpositions defining collapsed into...
A fundamental problem in computational chemistry is to find a set of reactants synthesize target molecule, a.k.a. retrosynthesis prediction. Existing state-of-the-art methods rely on matching the molecule with large reaction templates, which are very computationally expensive and also suffer from coverage. In this paper, we propose novel template-free approach called G2Gs by transforming molecular graph into reactant graphs. first splits synthons identifying centers, then translates final...
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries natural sciences. Today, AI has started to advance sciences by improving, accelerating, and enabling our understanding phenomena at wide range spatial temporal scales, giving rise area research known as for science (AI4Science). Being an emerging paradigm, AI4Science is unique that it enormous highly interdisciplinary area. Thus, unified technical treatment this field needed yet challenging. This work aims...
Generative models, especially diffusion models (DMs), have achieved promising results for generating feature-rich geometries and advancing foundational science problems such as molecule design. Inspired by the recent huge success of Stable (latent) Diffusion we propose a novel principled method 3D generation named Geometric Latent Models (GeoLDM). GeoLDM is first latent DM model molecular geometry domain, composed autoencoders encoding structures into continuous codes DMs operating in space....
A bstract Proteins mediate their functions through chemical interactions; modeling these interactions, which are typically sidechains, is an important need in protein design. However, constructing all-atom generative model requires appropriate scheme for managing the jointly continuous and discrete nature of proteins encoded structure sequence. We describe diffusion structure, Protpardelle, instantiates a “superposition” over possible sidechain states, collapses it to conduct reverse sample...
We study a fundamental problem in computational chemistry known as molecular conformation generation, trying to predict stable 3D structures from 2D graphs. Existing machine learning approaches usually first distances between atoms and then generate structure satisfying the distances, where noise predicted may induce extra errors during coordinate generation. Inspired by traditional force field methods for dynamics simulation, this paper, we propose novel approach called ConfGF directly...
Homophily principle, i.e., nodes with the same labels are more likely to be connected, has been believed main reason for performance superiority of Graph Neural Networks (GNNs) over on node classification tasks. Recent research suggests that, even in absence homophily, advantage GNNs still exists as long from class share similar neighborhood patterns. However, this argument only considers intra-class Node Distinguishability (ND) but neglects inter-class ND, which provides incomplete...
Molecule generation is a very important practical problem, with uses in drug discovery and material design, AI methods promise to provide useful solutions. However, existing for molecule focus either on 2D graph structure or 3D geometric structure, which not sufficient represent complete as captures mainly topology while geometry spatial atom arrangements. Combining these representations essential better molecule. In this paper, we present new model generating comprehensive representation of...
We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency robustness large language models (LLMs). Specifically, we propose meta-buffer to store series informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then each problem, retrieve relevant thought-template adaptively instantiate it with specific structures conduct efficient reasoning. To...
Powder X-ray diffraction (PXRD) is a cornerstone technique in materials characterization. However, complete structure determination from PXRD patterns alone remains time-consuming and often intractable, especially for novel materials. Current machine learning (ML) approaches to analysis predict only subset of the total information that comprises crystal structure. We developed pioneering generative ML model designed solve structures real-world experimental data. In addition strong...
We study how to generate molecule conformations (i.e., 3D structures) from a molecular graph. Traditional methods, such as dynamics, sample via computationally expensive simulations. Recently, machine learning methods have shown great potential by training on large collection of conformation data. Challenges arise the limited model capacity for capturing complex distributions and difficulty in modeling long-range dependencies between atoms. Inspired recent progress deep generative models,...
Homophily principle, \ie{} nodes with the same labels or similar attributes are more likely to be connected, has been commonly believed main reason for superiority of Graph Neural Networks (GNNs) over traditional (NNs) on graph-structured data, especially node-level tasks. However, recent work identified a non-trivial set datasets where GNN's performance compared NN's is not satisfactory. Heterophily, i.e. low homophily, considered cause this empirical observation. People have begun revisit...
Coarse-graining (CG) of molecular simulations simplifies the particle representation by grouping selected atoms into pseudo-beads and drastically accelerates simulation. However, such CG procedure induces information losses, which makes accurate backmapping, i.e., restoring fine-grained (FG) coordinates from coordinates, a long-standing challenge. Inspired recent progress in generative models equivariant networks, we propose novel model that rigorously embeds vital probabilistic nature...
We tackle a common scenario in imitation learning (IL), where agents try to recover the optimal policy from expert demonstrations without further access or environment reward signals. Except simple Behavior Cloning (BC) that adopts supervised followed by problem of compounding error, previous solutions like inverse reinforcement (IRL) and recent generative adversarial methods involve bi-level alternating optimization for updating function policy, suffering high computational cost training...
Modeling the complex three-dimensional (3D) dynamics of relational systems is an important problem in natural sciences, with applications ranging from molecular simulations to particle mechanics. Machine learning methods have achieved good success by graph neural networks model spatial interactions. However, these approaches do not faithfully capture temporal correlations since they only next-step predictions. In this work, we propose Equivariant Graph Neural Operator (EGNO), a novel and...
Generative models have shown great promise in generating 3D geometric systems, which is a fundamental problem many natural science domains such as molecule and protein design. However, existing approaches only operate on static structures, neglecting the fact that physical systems are always dynamic nature. In this work, we propose trajectory diffusion (GeoTDM), first model for modeling temporal distribution of trajectories. Modeling challenging it requires capturing both complex spatial...