Songhao Jiang

ORCID: 0000-0001-6329-1624
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Computational Drug Discovery Methods
  • Mycobacterium research and diagnosis
  • Genomics and Phylogenetic Studies
  • RNA and protein synthesis mechanisms
  • Tuberculosis Research and Epidemiology
  • Gene expression and cancer classification
  • Metabolomics and Mass Spectrometry Studies
  • Topic Modeling
  • Domain Adaptation and Few-Shot Learning
  • Machine Learning in Bioinformatics
  • Bacterial Genetics and Biotechnology
  • Bacteriophages and microbial interactions
  • Microbial Community Ecology and Physiology
  • Amino Acid Enzymes and Metabolism
  • Microbial Metabolic Engineering and Bioproduction
  • Advanced Proteomics Techniques and Applications
  • Machine Learning in Materials Science
  • Adversarial Robustness in Machine Learning
  • Spam and Phishing Detection
  • Machine Learning in Healthcare
  • Advanced Neural Network Applications
  • Advanced Graph Neural Networks
  • vaccines and immunoinformatics approaches
  • Machine Learning and Algorithms
  • Protist diversity and phylogeny

Beijing Proteome Research Center
2021-2025

Chinese Academy of Medical Sciences & Peking Union Medical College
2021-2025

Academy of Military Medical Sciences
2024

Institute of Information Engineering
2022-2023

Chinese Academy of Sciences
2022-2023

University of Chinese Academy of Sciences
2022-2023

National Computer Network Emergency Response Technical Team/Coordination Center of Chinar
2022-2023

Hebei University
2021-2022

University of Chicago
2020-2021

Abstract Several groups of bacteria have complex life cycles involving cellular differentiation and multicellular structures. For example, actinobacteria the genus Streptomyces form vegetative hyphae, aerial spores. However, similar not yet been described for archaea. Here, we show that several haloarchaea family Halobacteriaceae display a cycle resembling bacteria. Strain YIM 93972 (isolated from salt marsh) undergoes into mycelia Other closely related strains are also able to mycelia,...

10.1038/s41467-023-37389-w article EN cc-by Nature Communications 2023-04-01

Drug optimization has become increasingly crucial in light of fast-mutating virus strains and drug-resistant cancer cells. Nevertheless, it remains challenging as necessitates retaining the beneficial properties original drug while simultaneously enhancing desired attributes beyond its scope. In this work, we aim to tackle challenge by introducing ScaffoldGPT, a novel Large Language Model (LLM) designed for based on molecular scaffolds. Our work comprises three key components: (1) A...

10.48550/arxiv.2502.06891 preprint EN arXiv (Cornell University) 2025-02-09

Finetuning a Large Language Model (LLM) is crucial for generating results towards specific objectives. This research delves into the realm of drug optimization and introduce novel reinforcement learning algorithm to finetune LLM-based generative model, enhancing original across target objectives, while retains beneficial chemical properties drug. work comprised two primary components: (1) DrugImprover: A framework tailored improving robustness efficiency in optimization. It includes LLM...

10.48550/arxiv.2502.07237 preprint EN arXiv (Cornell University) 2025-02-10

Large Language Models (LLMs) employ three popular training approaches: Masked (MLM), Causal (CLM), and Sequence-to-Sequence (seq2seq). However, each approach has its strengths limitations, faces challenges in addressing specific tasks that require controllable bidirectional generation, such as drug optimization. To address this challenge, inspired by the biological processes of growth evolution, which involve expansion, shrinking, mutation sequences, we introduce ControllableGPT. This...

10.48550/arxiv.2502.10631 preprint EN arXiv (Cornell University) 2025-02-14

Abstract Background Motivated by the size and availability of cell line drug sensitivity data, researchers have been developing machine learning (ML) models for predicting response to advance cancer treatment. As studies continue generating a common question is whether generalization performance existing prediction can be further improved with more training data. Methods We utilize empirical curves evaluating comparing data scaling properties two neural networks (NNs) gradient boosting...

10.1186/s12859-021-04163-y article EN cc-by BMC Bioinformatics 2021-05-17

Mixing data augmentation methods have been widely used in text classification recently. However, existing do not control the quality of augmented and low model explainability. To tackle these issues, this paper proposes an explainable solution based on attentive targeted mixing augmentation, ATMIX. Instead selecting for without control, ATMIX focuses misclassified training samples as target to better improve model's capability. Meanwhile, generate meaningful samples, it adopts a...

10.24963/ijcai.2023/565 article EN 2023-08-01

The objective of drug discovery is to identify chemical compounds that possess specific pharmaceutical properties toward a binding target. Existing large language models (LLMS) can achieve high token matching scores in terms likelihood for molecule generation. However, relying solely on LLM decoding often results the generation molecules are either invalid due single misused token, or suboptimal unbalanced exploration and exploitation as consequence LLMs prior experience. Here we propose...

10.48550/arxiv.2406.07025 preprint EN arXiv (Cornell University) 2024-06-11

Avian pathogenic Escherichia coli (APEC) leads to economic losses in poultry industry and is also a threat human health. Various strategies were used for searching virulence factors, while little known about the mechanism by which APEC survives host or eliminated host. Thus, chicken colibacillosis model was constructed intraperitoneally injecting E. O78 this study, then protein dynamic expression of spleen characterized at different post-infection times quantitative proteome. Comparative...

10.1080/21505594.2022.2150453 article EN cc-by Virulence 2022-11-22

Although members of the Mycobacterium tuberculosis complex (MTBC) exhibit high similarity, they are characterized by differences with respect to virulence, immune response, and transmissibility. To understand virulence these bacteria identify potential novel therapeutic targets, we systemically investigated total cell protein contents virulent H37Rv, attenuated H37Ra, avirulent M. bovis BCG vaccine strains at log stationary phases, based on tandem mass tag (TMT) quantitative proteomics. Data...

10.1080/21505594.2021.1965703 article EN cc-by Virulence 2021-10-11

Accurate identification of novel peptides remains challenging because the lack evaluation criteria in large-scale proteogenomic studies. Mirror proteases trypsin and lysargiNase can generate complementary b/y ion series, providing opportunity to efficiently assess authentic experiments other than filter potential targets by different false discovery rates (FDRs) ranking. In this study, a pair in-house developed acetylated mirror proteases, Ac-Trypsin Ac-LysargiNase, were used...

10.3389/fmicb.2022.1015140 article EN cc-by Frontiers in Microbiology 2022-10-12

Mycobacterium tuberculosis (MTB) is a severe causing agent of (TB). Although H37Rv, the type strain M. was sequenced in 1998, annotation errors encoding genes have been frequently reported hundreds papers. This phenomenon particularly at 5′ end genes. Here, we applied TMPP [(N-Succinimidyloxycarbonylmethyl) tris (2,4,6-trimethoxyphenyl) phosphonium bromide] labeling combined with StageTip separating strategy on H37Rv to characterize N-terminal start sites its annotated Totally, 1047 proteins...

10.1016/j.ygeno.2021.12.001 article EN cc-by-nc-nd Genomics 2021-12-13

Download This Paper Open PDF in Browser Add to My Library Share: Permalink Using these links will ensure access this page indefinitely Copy URL DOI

10.2139/ssrn.4755343 preprint EN 2024-01-01

Protein-ligand binding is the process by which a small molecule (drug or inhibitor) attaches to target protein. The affinity, refers strength of this interaction, central many important problems in bioinformatics such as drug design. An extensive amount work has been devoted predicting affinity over past decades due its significance. In paper, we review all significant recent works, focusing on methods, features, and benchmark datasets. We have observed rising trend use traditional machine...

10.48550/arxiv.2410.00709 preprint EN arXiv (Cornell University) 2024-09-29

Protein phosphorylation plays a key role in Mycobacterium tuberculosis, the pathogen of holding promise as new target anti-tuberculosis drugs. We used M. smegmatis, close relative model organism to study protein at different growth phases. identified 573 phosphorylated peptides and 816 sites 385 proteins smegmatis samples both logarithmic stationary phases, then established comprehensive dataset smegmatis. By comparing expression levels between phase with selected ion monitoring (SIM)...

10.13345/j.cjb.240358 article EN PubMed 2024-11-25

ABSTRACT Omic-based technologies are of particular interest and importance for non-animal chemical hazard risk characterization based on the premise that any apical endpoint change must be underpinned by some alterations measured at omic levels. In this work we studied cellular responses to caffeine coumarin generating integrating multi-omic data from transcriptomic, proteomic phosphoproteomic experiments. We have shown methodology presented here is able capture complete chain events first...

10.1101/2022.05.18.492410 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2022-05-19
Coming Soon ...