Yaosen Min

ORCID: 0000-0003-4741-6188
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Computational Drug Discovery Methods
  • Machine Learning in Materials Science
  • Protein Structure and Dynamics
  • Machine Learning in Bioinformatics
  • Advanced Graph Neural Networks
  • Microbial Metabolic Engineering and Bioproduction
  • Bioinformatics and Genomic Networks
  • Biomedical Text Mining and Ontologies
  • Islanding Detection in Power Systems
  • RNA and protein synthesis mechanisms
  • Artificial Intelligence in Healthcare and Education
  • Click Chemistry and Applications
  • Catalytic Cross-Coupling Reactions
  • Cell Image Analysis Techniques
  • Gene Regulatory Network Analysis
  • Semantic Web and Ontologies
  • Chemical Synthesis and Analysis
  • Microgrid Control and Optimization
  • Power Systems and Renewable Energy
  • Sulfur-Based Synthesis Techniques
  • Catalytic C–H Functionalization Methods

Tsinghua University
2007-2024

Microsoft Research Asia (China)
2024

Abstract Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications not functions a single molecular but rather determined from the equilibrium distribution structures. Conventional methods obtaining these distributions, such as dynamics simulation, computationally expensive and often intractable. Here we introduce framework, called Distributional Graphormer (DiG), an attempt to...

10.1038/s42256-024-00837-3 article EN cc-by Nature Machine Intelligence 2024-05-08

Abstract Learning effective molecular feature representation to facilitate property prediction is of great significance for drug discovery. Recently, there has been a surge interest in pre-training graph neural networks (GNNs) via self-supervised learning techniques overcome the challenge data scarcity prediction. However, current learning-based methods suffer from two main obstacles: lack well-defined strategy and limited capacity GNNs. Here, we propose Knowledge-guided Pre-training Graph...

10.1038/s41467-023-43214-1 article EN cc-by Nature Communications 2023-11-21

Advances in deep learning have greatly improved structure prediction of molecules. However, many macroscopic observations that are important for real-world applications not functions a single molecular structure, but rather determined from the equilibrium distribution structures. Traditional methods obtaining these distributions, such as dynamics simulation, computationally expensive and often intractable. In this paper, we introduce novel framework, called Distributional Graphormer (DiG),...

10.48550/arxiv.2306.05445 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Foundation models have revolutionized natural language processing and artificial intelligence, significantly enhancing how machines comprehend generate human languages. Inspired by the success of these foundation models, researchers developed for individual scientific domains, including small molecules, materials, proteins, DNA, RNA. However, are typically trained in isolation, lacking ability to integrate across different domains. Recognizing that entities within domains can all be...

10.48550/arxiv.2502.07527 preprint EN arXiv (Cornell University) 2025-02-11

Abstract Accurate prediction of protein‐ligand binding affinities is an essential challenge in structure‐based drug design. Despite recent advances data‐driven methods for affinity prediction, their accuracy still limited, partially because they only take advantage static crystal structures while the actual are generally determined by thermodynamic ensembles between proteins and ligands. One effective way to approximate such a ensemble use molecular dynamics (MD) simulation. Here, MD dataset...

10.1002/advs.202405404 article EN cc-by Advanced Science 2024-08-29

Potential Drug-Drug Interactions (DDI) occur while treating complex or co-existing diseases with drug combinations, which may cause changes in drugs' pharmacological activity. Therefore, DDI prediction has been an important task the medical health machine learning community. Graph-based methods have recently aroused widespread interest and are proved to be a priority for this task. However, these often limited exploiting inter-view molecular structure ignoring drug's intra-view interaction...

10.1145/3442381.3449786 preprint EN 2021-04-19

Learning generalizable, transferable, and robust representations for molecule data has always been a challenge. The recent success of contrastive learning (CL) self-supervised graph representation provides novel perspective to learn representations. However, existing CL frameworks usually adopt stochastic augmentations or schemes according pre-defined rules ont he input obtain different views in various scales, which may destroy topological semantemes domain prior data, leading suboptimal...

10.1109/bibm52615.2021.9669302 article EN 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2021-12-09

Abstract Molecular representation learning is essential to deep for chemistry, where the molecules are embedded into continuous real-valued vectors as better representations in large chemical space. Traditional molecular requires high-quality labels molecules. However, precise physicochemical or pharmacological properties of expensive measure and collect. Therefore, self-supervised training models on large-scale cheap available data becoming an increasingly popular choice research practice....

10.21203/rs.3.rs-1746019/v1 preprint EN cc-by Research Square (Research Square) 2022-06-29

External oxidant-free cross-coupling of arylcopper and alkynylcopper reagents for the formation arylalkynes was performed.

10.1039/c7ra03348f article EN cc-by-nc RSC Advances 2017-01-01

Accurate prediction of protein-ligand binding affinities is an essential challenge in structure-based drug design. Despite recent advances data-driven methods for affinity prediction, their accuracy still limited, partially because they only take advantage static crystal structures while the actual are generally determined by thermodynamic ensembles between proteins and ligands. One effective way to approximate such a ensemble use molecular dynamics (MD) simulation. Here, MD dataset...

10.48550/arxiv.2208.10230 preprint EN cc-by-sa arXiv (Cornell University) 2022-01-01

The expanding application of Artificial Intelligence (AI) in scientific fields presents unprecedented opportunities for discovery and innovation. However, this growth is not without risks. AI models science, if misused, can amplify risks like creation harmful substances, or circumvention established regulations. In study, we aim to raise awareness the dangers misuse call responsible development use domain. We first itemize posed by contexts, then demonstrate highlighting real-world examples...

10.48550/arxiv.2312.06632 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Finding homologous proteins is the indispensable first step in many protein biology studies. Thus, building highly efficient "search engines" for databases a desired function bioinformatics. As of August 2018, there are more than 140,000 structures PDB, and this number still increasing rapidly. Such big introduces challenge scanning whole structure database with high speeds sensitivities at same time. Unfortunately, classic sequence alignment tools pairwise either not sensitive enough to...

10.1109/bibm.2018.8621532 article EN 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2018-12-01

A select few genes act as pivotal drivers in the process of cell state transitions. However, finding key involved different transitions is challenging. To address this problem, we present CellNavi, a deep learning-based framework designed to predict that drive CellNavi builds driver gene predictor upon manifold, which captures intrinsic features cells by learning from large-scale, high-dimensional transcriptomics data and integrating graphs with causal connections. Our analysis shows can...

10.1101/2024.10.27.620174 preprint EN cc-by-nc-nd 2024-10-29

Machine Learning approaches are required to predict accurately on test samples that distributionally different from training ones in the fields of drug discovery, computational biology, and cheminformatics. However, (i) labeled task-specific molecule data often scarce, (ii) poor generalization due molecules structurally those seen during training. To alleviate problems, we propose a cloze-style self-supervised learning model (MolCloze) obtain universal informative representations for...

10.1109/bibm52615.2021.9669794 article EN 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) 2021-12-09

Proteins, essential to biological systems, perform functions intricately linked their three-dimensional structures. Understanding the relationship between protein structures and amino acid sequences remains a core challenge in modeling. While traditional foundation models benefit from pre-training on vast unlabeled datasets, they often struggle capture critical co-evolutionary information, which evolutionary-based methods excel at. In this study, we introduce novel strategy for that...

10.48550/arxiv.2410.24022 preprint EN arXiv (Cornell University) 2024-10-31

ABSTRACT Learning generalizable, transferable, and robust representations for molecule data has always been a challenge. The recent success of contrastive learning (CL) self-supervised graph representation provides novel perspective to learn representations. most prevailing CL framework is maximize the agreement in different augmented views. However, existing frameworks usually adopt stochastic augmentations or schemes according pre-defined rules on input obtain views various scales (e.g....

10.1101/2021.12.03.471150 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2021-12-06

Abstract Finding homologous proteins is the indispensable first step in many protein biology studies. Thus, building highly efficient “search engines” for databases a desired function bioinformatics. As of August 2018, there are more than 140,000 structures PDB, and this number still increasing rapidly. Such big introduces challenge scanning whole structure database with high speeds sensitivities at same time. Unfortunately, classic sequence alignment tools pairwise either not sensitive...

10.1101/407106 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2018-09-03
Coming Soon ...