- Computational Drug Discovery Methods
- Machine Learning in Materials Science
- Advanced Graph Neural Networks
- Chemical Synthesis and Analysis
- Topic Modeling
- Microbial Natural Products and Biosynthesis
- Protein Structure and Dynamics
- Plant Gene Expression Analysis
- Genetics, Bioinformatics, and Biomedical Research
- Plant nutrient uptake and metabolism
- Artificial Intelligence in Healthcare
- Machine Learning in Bioinformatics
- Plant Stress Responses and Tolerance
- Biomedical Text Mining and Ontologies
- Recommender Systems and Techniques
- AI in cancer detection
- Plant Molecular Biology Research
- Analytical Chemistry and Chromatography
- Machine Learning and ELM
- Stochastic Gradient Optimization Techniques
- Plant biochemistry and biosynthesis
- Domain Adaptation and Few-Shot Learning
- Natural Language Processing Techniques
- Machine Learning and Data Classification
- Aquaculture Nutrition and Growth
University of California, Berkeley
2023-2024
Agricultural Genomics Institute at Shenzhen
2023-2024
Chinese Academy of Agricultural Sciences
2023-2024
Ministry of Agriculture and Rural Affairs
2023-2024
Shanghai Ocean University
2024
HEC Montréal
2023
Zhejiang University
2019-2023
Mila - Quebec Artificial Intelligence Institute
2022-2023
Morgridge Institute for Research
2018-2023
University of Wisconsin–Madison
2018-2023
Paclitaxel is a well known anticancer compound. Its biosynthesis involves the formation of highly functionalized diterpenoid core skeleton (baccatin III) and subsequent assembly phenylisoserinoyl side chain. Despite intensive investigation for half century, complete biosynthetic pathway baccatin III remains unknown. In this work, we identified bifunctional cytochrome P450 enzyme [taxane oxetanase 1 (TOT1)] in
We construct a data set of metal-organic framework (MOF) linkers and employ fine-tuned GPT assistant to propose MOF linker designs by mutating modifying the existing structures. This strategy allows model learn intricate language chemistry in molecular representations, thereby achieving an enhanced accuracy generating structures compared with its base models. Aiming highlight significance design strategies advancing discovery water-harvesting MOFs, we conducted systematic variant expansion...
Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that are facets a general sparsification method can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. present ATOMO, framework for Given gradient, an decomposition, sparsity...
In settings with related prediction tasks, integrated multi-task learning models can often improve performance relative to independent single-task models. However, even when the average task improves, individual tasks may experience negative transfer in which model’s predictions are worse than model’s. We show prevalence of a computational chemistry case study 128 and introduce framework that provides foundation for reducing multitask Our Loss-Balanced Task Weighting approach dynamically...
Empirical testing of chemicals for drug efficacy costs many billions dollars every year. The ability to predict the action molecules in silico would greatly increase speed and decrease cost prioritizing leads. Here, we asked whether function, defined as MeSH "therapeutic use" classes, can be predicted from only a chemical structure. We evaluated two chemical-structure-derived classification methods, images with convolutional neural networks molecular fingerprints random forests, both which...
Molecular graph representation learning is a fundamental problem in modern drug and material discovery. graphs are typically modeled by their 2D topological structures, but it has been recently discovered that 3D geometric information plays more vital role predicting molecular functionalities. However, the lack of real-world scenarios significantly impeded representation. To cope with this challenge, we propose Graph Multi-View Pre-training (GraphMVP) framework where self-supervised (SSL)...
Abstract Sugars are fundamental to plant developmental processes. For fruits, the accumulation and proportion of sugars play crucial roles in development quality attractiveness. In citrus (Citrus reticulata Blanco.), we found that difference sweetness between mature fruits “Gongchuan” its bud sport “Youliang” is related hexose contents. Expression a SuS (sucrose synthase) gene CitSUS5 SWEET (sugars will eventually be exported transporter) CitSWEET6, characterized by transcriptome analysis at...
Machine learning techniques have recently been adopted in various applications medicine, biology, chemistry, and material engineering. An important task is to predict the properties of molecules, which serves as main subroutine many downstream such virtual screening drug design. Despite increasing interest, key challenge construct proper representations molecules for algorithms. This paper introduces N-gram graph, a simple unsupervised representation molecules. The method first embeds...
Virtual (computational) high-throughput screening provides a strategy for prioritizing compounds experimental screens, but the choice of virtual algorithm depends on data set and evaluation strategy. We consider wide range ligand-based machine learning docking-based approaches two protein–protein interactions, PriA-SSB RMI-FANCM, present choosing which is best prospective compound prioritization. Our workflow identifies random forest as these targets over more sophisticated neural...
Current AI-assisted protein design mainly utilizes sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether incorporation of such data can help tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for design. ProteinDT consists three subsequent steps: ProteinCLAP which aligns representation...
Over the last decade, there has been significant progress in field of machine learning for de novo drug design, particularly deep generative models. However, current approaches exhibit a challenge as they do not ensure that proposed molecular structures can be feasibly synthesized nor provide synthesis routes small molecules, thereby seriously limiting their practical applicability. In this work, we propose novel forward framework powered by reinforcement (RL) Policy Gradient Forward...
Citric acid plays significant roles in numerous physiological processes plants, including carbon metabolism, signal transduction, and tolerance to environmental stress. For fruits, it has a major effect on fruit organoleptic quality by directly influencing consumer taste. citrus is mainly regulated the balance between synthesis, degradation, vacuolar storage. The genetic molecular regulations of citric synthesis degradation have been comprehensively elucidated. However, transporters for...
Recent advancements in conversational large language models (LLMs), such as ChatGPT, have demonstrated remarkable promise various domains, including drug discovery. However, existing works mainly focus on investigating the capabilities of LLMs chemical reaction and retrosynthesis. While editing, a critical task discovery pipeline, remains largely unexplored. To bridge this gap, we propose ChatDrug, framework to facilitate systematic investigation editing using LLMs. ChatDrug jointly...
Abstract Heat stress is a major abiotic for plants, which can generate range of biochemical and genetic responses. In ‘Ponkan’ mandarin fruit, hot air treatment (HAT) accelerates the degradation citric acid. However, transcriptional regulatory mechanisms citrate in response to HAT remain be elucidated. Here, 17 heat shock transcription factor sequences were isolated, dual‐luciferase assays employed investigate whether encoded proteins that could trans‐activate promoters key genes GABA shunt,...
Molecule design is a fundamental problem in molecular science and has critical applications variety of areas, such as drug discovery, material science, etc. However, due to the large searching space, it impossible for human experts enumerate test all molecules wet-lab experiments. Recently, with rapid development machine learning methods, especially generative molecule achieved great progress by leveraging models generate candidate molecules. In this paper, we systematically review most...
Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD). The consensus explanation that has emerged credits the randomized nature of SGD for bias training process towards low-complexity models and, thus, implicit regularization. We take a careful look at this in context image classification with common deep network architectures. find if we do not regularize \emph{explicitly}, then can be easily made converge...
Citric acid is the most abundant organic in citrus fruit, and acetyl-CoA pathway potentially plays an important role citric degradation, which occurs during fruit ripening. Analysis of transcripts development key genes transient overexpression assay leaves indicated that CitAclα1 could be a potential target gene involved citrate degradation. In order to understand more about CitAclα1, 23 transcription factors coexpressed with were identified by RNA-seq. Using dual-luciferase assays, CitERF6...