- Computational Drug Discovery Methods
- Advanced Graph Neural Networks
- Machine Learning in Materials Science
- Protein Structure and Dynamics
- Privacy-Preserving Technologies in Data
- Chemical Synthesis and Analysis
- Adversarial Robustness in Machine Learning
- Innovative Microfluidic and Catalytic Techniques Innovation
- Network Security and Intrusion Detection
- Anomaly Detection Techniques and Applications
- HIV, Drug Use, Sexual Risk
- Explainable Artificial Intelligence (XAI)
- Law, AI, and Intellectual Property
- Microbial Natural Products and Biosynthesis
- Machine Learning in Healthcare
- RNA and protein synthesis mechanisms
- Online Learning and Analytics
- Terrorism, Counterterrorism, and Political Violence
- Software Engineering Research
- Scientific Computing and Data Management
- Recommender Systems and Techniques
- Mental Health via Writing
- Topic Modeling
- Stochastic Gradient Optimization Techniques
- Domain Adaptation and Few-Shot Learning
Princeton University
2024-2025
Harvard University
2024
University of Science and Technology of China
2021-2024
Anhui University
2023-2024
Duke University
2021
In this work, we propose the first backdoor attack to graph neural networks (GNN). Specifically, a subgraph based GNN for classification. our attack, classifier predicts an attacker-chosen target label testing once predefined is injected graph. Our empirical results on three real-world datasets show that attacks are effective with small impact GNN's prediction accuracy clean graphs. Moreover, generalize randomized smoothing certified defense defend against attacks. in some cases but...
Federated learning (FL) is vulnerable to model poisoning attacks, in which malicious clients corrupt the global via sending manipulated updates server. Existing defenses mainly rely on Byzantine-robust or provably robust FL methods, aim learn an accurate even if some are malicious. However, they can only resist a small number of clients. It still open challenge how defend against attacks with large Our FLDetector addresses this detecting aims detect and remove majority such that method using...
Predicting molecular properties with data-driven methods has drawn much attention in recent years. Particularly, Graph Neural Networks (GNNs) have demonstrated remarkable success various generation and prediction tasks. In cases where labeled data is scarce, GNNs can be pre-trained on unlabeled to first learn the general semantic structural information before being fine-tuned for specific However, most existing self-supervised pre-training frameworks only focus node-level or graph-level...
Despite the recent progress in Graph Neural Networks (GNNs), it remains challenging to explain predictions made by GNNs. Existing explanation methods mainly focus on post-hoc explanations where another explanatory model is employed provide for a trained GNN. The fact that fail reveal original reasoning process of GNNs raises need building with built-in interpretability. In this work, we propose Prototype Network (ProtGNN), which combines prototype learning and provides new perspective...
Due to its distributed nature, federated learning is vulnerable poisoning attacks, in which malicious clients poison the training process via manipulating their local data and/or model updates sent cloud server, such that poisoned global misclassifies many indiscriminate test inputs or attacker-chosen ones. Existing defenses mainly leverage Byzantine-robust methods detect clients. However, these do not have provable security guarantees against attacks and may be more advanced attacks. In...
Federated learning is vulnerable to poisoning attacks in which malicious clients poison the global model via sending updates server. Existing defenses focus on preventing a small number of from robust federated methods and detecting when there are large them. However, it still an open challenge how recover after detected. A naive solution remove detected train new scratch using remaining clients. such train-from-scratch recovery method incurs computation communication cost, may be...
Federated recommendation (FedRec) can train personalized recommenders without collecting user data, but the decentralized nature makes it susceptible to poisoning attacks. Most previous studies focus on targeted attack promote certain items, while untargeted that aims degrade overall performance of FedRec system remains less explored. In fact, attacks disrupt experience and bring severe financial loss service provider. However, existing methods are either inapplicable or ineffective against...
Many data mining tasks rely on graphs to model relational structures among individuals (nodes). Since are often sensitive, there is an urgent need evaluate the privacy risks in graph data. One famous attack against analysis models inversion attack, which aims infer sensitive training dataset and leads great concerns. Despite its success grid-like domains, directly applying attacks non-grid domains such as poor performance. This mainly due failure consider unique properties of graphs. To...
As machine learning becomes more widely used for critical applications, the need to study its implications in privacy urgent. Given access target model and auxiliary information, inversion attack aims infer sensitive features of training dataset, which leads great concerns. Despite success grid domain, directly applying techniques on non domains such as graph achieves poor performance due difficulty fully exploit intrinsic properties graphs attributes nodes GNN models. To bridge this gap, we...
Deep neural networks (DNNs) are recently shown to be vulnerable backdoor attacks, where attackers embed hidden backdoors in the DNN model by injecting a few poisoned examples into training dataset. While extensive efforts have been made detect and remove from backdoored DNNs, it is still not clear whether backdoor-free clean can directly obtained datasets. In this paper, we first construct causal graph generation process of data find that attack acts as confounder, which brings spurious...
The Transformer architecture has achieved remarkable success in a number of domains including natural language processing and computer vision. However, when it comes to graph-structured data, transformers have not competitive performance, especially on large graphs. In this paper, we identify the main deficiencies current graph transformers:(1) Existing node sampling strategies Graph Transformers are agnostic characteristics training process. (2) Most only focus local neighbors neglect...
Designing molecules with desirable physiochemical properties and functionalities is a long-standing challenge in chemistry, material science, drug discovery. Recently, machine learning-based generative models have emerged as promising approaches for
Protein-ligand bioactivity data published in literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated extraction is challenging due the multi-modal distribution of information (text, tables, figures, structures) and complexity chemical representations (e.g., Markush structures). Furthermore, lack standardized benchmarks impedes evaluation development methods. In this work, we introduce BioMiner, a system designed...
Structure-based drug design (SBDD) aims to generate 3D ligand molecules that bind specific protein targets. Existing deep generative models including diffusion have shown great promise for SBDD. However, it is complex capture the essential protein-ligand interactions exactly in space molecular generation. To address this problem, we propose a novel framework, namely Binding-Adaptive Diffusion Models (BindDM). In BindDM, adaptively extract subcomplex, part of binding sites responsible...
Abstract Designing protein-binding proteins is critical for drug discovery. However, artificial-intelligence-based design of such challenging due to the complexity protein–ligand interactions, flexibility ligand molecules and amino acid side chains, sequence–structure dependencies. We introduce PocketGen, a deep generative model that produces residue sequence atomic structure protein regions in which interactions occur. PocketGen promotes consistency between by using graph transformer...
The rapid adoption of generative AI (GenAI) in biotechnology offers immense potential but also raises serious safety concerns. models for protein engineering, genome editing, and molecular synthesis can be misused to enhance viral virulence, design toxins, or modify human embryos, while ethical policy discussions lag behind technological advances. This Correspondence calls proactive, built-in, AI-native safeguards within GenAI tools. With more research development, emerging...
Molecular interactions underlie nearly all biological processes, but most machine learning models treat molecules in isolation or specialize a single type of interaction, such as protein-ligand protein-protein binding. This siloed approach prevents generalization across biomolecular classes and limits the ability to model interaction interfaces systematically. We introduce ATOMICA, geometric deep that learns atomic-scale representations intermolecular diverse modalities, including small...
Generating molecules with high binding affinities to target proteins (a.k.a. structure-based drug design) is a fundamental and challenging task in discovery. Recently, deep generative models have achieved remarkable success generating 3D conditioned on the protein pocket. However, most existing methods consider molecular generation for pockets independently while neglecting underlying connections such as subpocket-level similarities. Subpockets are local environments of ligand fragments...
Abstract Designing protein-binding proteins is critical for drug discovery. However, the AI-based design of such challenging due to complexity ligand-protein interactions, flexibility ligand molecules and amino acid side chains, sequence-structure dependencies. We introduce PocketGen, a deep generative model that simultaneously produces both residue sequence atomic structure protein regions where interactions occur. PocketGen ensures consistency between by using graph transformer structural...
Despite the recent progress in Graph Neural Networks (GNNs), it remains challenging to explain predictions made by GNNs. Existing explanation methods mainly focus on post-hoc explanations where another explanatory model is employed provide for a trained GNN. The fact that fail reveal original reasoning process of GNNs raises need building with built-in interpretability. In this work, we propose Prototype Network (ProtGNN), which combines prototype learning and provides new perspective...
Structure-based drug design (SBDD) utilizes the three-dimensional geometry of proteins to identify potential candidates. Traditional methods, grounded in physicochemical modeling and informed by domain expertise, are resource-intensive. Recent developments geometric deep learning, focusing on integration processing 3D data, coupled with availability accurate protein structure predictions from tools like AlphaFold, have greatly advanced field structure-based design. This paper systematically...
ABSTRACT RNA molecule plays an essential role in a wide range of biological processes. Gaining deeper understanding their functions can significantly advance our knowledge life’s mechanisms and drive the development drugs for various diseases. Recently, advances foundation models have enabled new approaches to engineering, yet existing methods fall short generating novel sequences with specific functions. Here, we introduce RNAGenesis, model that combines sequence de novo design through...
In this work, we propose the first backdoor attack to graph neural networks (GNN). Specifically, a \emph{subgraph based attack} GNN for classification. our attack, classifier predicts an attacker-chosen target label testing once predefined subgraph is injected graph. Our empirical results on three real-world datasets show that attacks are effective with small impact GNN's prediction accuracy clean graphs. Moreover, generalize randomized smoothing certified defense defend against attacks. in...