Bozitao Zhong

ORCID: 0000-0001-9363-6099
Publications
Citations
Views
---
Saved
---
About
Contact & Profiles
Research Areas
  • Protein Structure and Dynamics
  • Machine Learning in Bioinformatics
  • RNA and protein synthesis mechanisms
  • Enzyme Structure and Function
  • Genomics and Phylogenetic Studies
  • Advanced Proteomics Techniques and Applications
  • Microbial Metabolic Engineering and Bioproduction
  • Monoclonal and Polyclonal Antibodies Research
  • Machine Learning in Materials Science
  • vaccines and immunoinformatics approaches
  • Ubiquitin and proteasome pathways
  • Genetics, Bioinformatics, and Biomedical Research
  • Protein purification and stability
  • Viral Infectious Diseases and Gene Expression in Insects
  • Transgenic Plants and Applications
  • Bacteriophages and microbial interactions
  • Glycosylation and Glycoproteins Research
  • Parallel Computing and Optimization Techniques
  • Computational Drug Discovery Methods
  • Bacterial Genetics and Biotechnology
  • Peptidase Inhibition and Analysis
  • Microfluidic and Capillary Electrophoresis Applications
  • Cell Image Analysis Techniques
  • Enzyme Production and Characterization
  • HIV/AIDS drug development and treatment

Shanghai Jiao Tong University
2019-2025

Center for Life Sciences
2021-2025

Mila - Quebec Artificial Intelligence Institute
2023-2024

Université de Montréal
2023-2024

Intrinsically disordered proteins (IDPs) play pivotal roles in various biological functions and are closely linked to many human diseases including cancer, diabetes Alzheimer disease. Structural investigations of IDPs typically involve a combination molecular dynamics (MD) simulations experimental data correct for intrinsic biases simulation methods. However, these hindered by their high computational cost scarcity data, severely limiting applicability. Despite the recent advancements...

10.1101/2024.05.05.592611 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2024-05-07

Abstract Protein language models (PLMs) have shown remarkable capabilities in various protein function prediction tasks. However, while is intricately tied to structure, most existing PLMs do not incorporate structure information. To address this issue, we introduce ProSST, a Transformer-based model that seamlessly integrates both sequences and structures. ProSST incorporates quantization module Transformer architecture with disentangled attention. The translates 3D into sequence of discrete...

10.1101/2024.04.15.589672 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2024-04-17

Fine-tuning pretrained protein language models (PLMs) has emerged as a prominent strategy for enhancing downstream prediction tasks, often outperforming traditional supervised learning approaches. As widely applied powerful technique in natural processing, employing parameter-efficient fine-tuning techniques could potentially enhance the performance of PLMs. However, direct transfer to life science tasks is nontrivial due different training strategies and data forms. To address this gap, we...

10.1021/acs.jcim.4c00689 article EN Journal of Chemical Information and Modeling 2024-08-07

Hepatitis C virus (HCV) is a notorious member of the Flaviviridae family enveloped, positive-strand RNA viruses. Non-structural protein 5A (NS5A) plays key role in HCV replication and assembly. NS5A multi-domain which includes an N-terminal amphipathic membrane anchoring alpha helix, highly structured domain-1, two intrinsically disordered domains 2-3. The domain-1 contains zinc finger (Zf)-site, binding stabilizes overall structure, while ejection this from Zf-site destabilizes structure....

10.1039/d0cp06360f article EN Physical Chemistry Chemical Physics 2021-01-01

AlphaFold developed by DeepMind predicts protein structures from the amino acid sequence at or near experimental resolution, solving 50-year-old folding challenge, leading to progress transforming large-scale genomics data into structures. will also greatly change scientific research model low-throughput high-throughput manner. The overall prediction process consists of two stages: 1) MSA construction based on CPUs and 2) inferences GPUs. In first stage, uses only, taking up hours for a...

10.1145/3503470.3503471 preprint EN 2022-01-11

Ancestral metabolism has remained controversial due to a lack of evidence beyond sequence-based reconstructions. Although prebiotic chemists have provided hints that might originate from non-enzymatic protometabolic pathways, gaps between ancestral reconstruction and processes mean there is much still unknown. Here, we apply proteome-wide 3D structure predictions comparisons investigate ancestorial ancient bacteria archaea, provide information sequence as bridge the processes. We compare...

10.1038/s41467-022-35523-8 article EN cc-by Nature Communications 2022-12-21

Phosphorylation of proteins plays an important regulatory role at almost all levels cellular organization. Molecular dynamics (MD) simulation is a promising tool to reveal the mechanism how phosphorylation regulates many key biological processes atomistic level. MD accuracy depends on force field precision, while current fields for phospho-amino acids have resulted in notable inconsistency with experimental data. Here, new parameter (named FB18CMAP) generated by fitting against quantum...

10.1021/acs.jcim.3c00112 article EN Journal of Chemical Information and Modeling 2023-02-17

Proteins play a critical role in carrying out biological functions, and their 3D structures are essential determining functions. Accurately predicting the conformation of protein side-chains given backbones is important for applications structure prediction, design protein-protein interactions. Traditional methods computationally intensive have limited accuracy, while existing machine learning treat problem as regression task overlook restrictions imposed by constant covalent bond lengths...

10.48550/arxiv.2306.01794 preprint EN cc-by arXiv (Cornell University) 2023-01-01

Prokaryotic Argonaute (pAgo) proteins, a class of DNA/RNA-guided programmable endonucleases, have been extensively utilized in nucleic acid-based biosensors.

10.1039/d3sc06221j article EN cc-by-nc Chemical Science 2024-01-01

Understanding how amino acids influence protein expression is crucial for advancements in biotechnology and synthetic biology. In this study, we introduce Venus-TIGER, a deep learning model designed to accurately identify critical expression. By constructing two-dimensional matrix that links representations experimental fitness, Venus-TIGER achieves improved predictive accuracy enhanced extrapolation capability. We validated our approach on both public mutational scanning datasets...

10.1101/2025.01.06.631498 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-01-07

Predicting the fitness of viral proteins is fundamental to understanding evolution and developing antiviral strategies. This study introduces Venus-EEM, an entropy-driven ensemble model, aimed at improving performance zero-shot predictions for protein across diverse datasets. We demonstrate that entropy serves as effective criterion selecting optimal models, enabling adaptive model selection different prediction tasks. By incorporating entropy-weighted learning from multiple language...

10.1103/physrevresearch.7.013229 article EN cc-by Physical Review Research 2025-02-28

Intrinsically disordered proteins (IDPs) have garnered significant attention due to their critical roles in complex human diseases. Molecular dynamics (MD) simulations emerged as a valuable approach for studying IDPs, whose accuracy heavily depends on the of force fields. Despite this, high conformational flexibility IDPs presents limitations current fields precisely capturing features. Here, we developed tool generating field parameters, consisting two main components: construction and...

10.1021/acs.jcim.5c00140 article EN Journal of Chemical Information and Modeling 2025-04-02

Abstract The variational quantum eigensolver (VQE) has recently been demonstrated for solving the challenging Heisenberg Antiferromagnet (HAFM) models. Apart from ground state energy, many important issues such as excited states and general frustration HAFM are worth investigating, which have only partially solved by classical methods rarely approaches. Here, VQE is applied to GPU simulator calculate of a ‐ model on both square kagome lattices. invariant subspace property analyzed during...

10.1002/qute.202300240 article EN Advanced Quantum Technologies 2025-05-14

In-silico prediction of protein mutant stability, measured by the difference in Gibbs free energy change (ΔΔG), is fundamental for engineering. Current sequence-to-label methods typically employ two-stage pipeline: (i) encoding sequences using neural networks (e.g., transformers), followed (ii) ΔΔG regression from latent representations. Although these have demonstrated promising performance, their dependence on specialized network encoders significantly increases complexity. Additionally,...

10.1101/2025.05.30.656964 preprint EN cc-by-nc-nd bioRxiv (Cold Spring Harbor Laboratory) 2025-06-02

Accurate prediction of enzyme function is crucial for elucidating biological mechanisms and driving innovation across various sectors. Existing deep learning methods tend to rely solely on either sequence data or structural predict the EC number as a whole, neglecting intrinsic hierarchical structure numbers. To address these limitations, we introduce MAPred, novel multi-modality multi-scale model designed autoregressively proteins. MAPred integrates both primary amino acid 3D tokens...

10.48550/arxiv.2408.06391 preprint EN arXiv (Cornell University) 2024-08-11

In silico prediction of the ligand binding pose to a given protein target is crucial but challenging task in drug discovery. This work focuses on blind flexible selfdocking, where we aim predict positions, orientations and conformations docked molecules. Traditional physics-based methods usually suffer from inaccurate scoring functions high inference cost. Recently, data-driven based deep learning techniques are attracting growing interest thanks their efficiency during promising...

10.48550/arxiv.2210.06069 preprint EN cc-by arXiv (Cornell University) 2022-01-01

Abstract Deep learning-based methods for generating functional proteins address the growing need novel biocatalysts, allowing precise tailoring of functionalities to meet specific requirements. This emergence leads creation highly efficient and specialized with wide-ranging applications in scientific, technological, biomedical domains. study establishes a pipeline protein sequence generation conditional diffusion model, namely CPDiffusion, deliver diverse sequences enhanced functions....

10.1101/2023.08.10.552783 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2023-08-14

The dynamic nature of proteins is crucial for determining their biological functions and properties, which Monte Carlo (MC) molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD explore the conformational space through numerically evolving system via Markov chain Newtonian mechanics. However, high-energy barrier fields can hamper exploration both methods by rare event, resulting in inadequately sampled...

10.48550/arxiv.2306.03117 preprint EN other-oa arXiv (Cornell University) 2023-01-01

Deep learning has become a crucial tool in studying proteins. While the significance of modeling protein structure been discussed extensively literature, amino acid types are typically included input as default operation for many inference tasks. This study demonstrates with alignment task that embedding some cases may not help deep model learn better representation. To this end, we propose ProtLOCA, local geometry method based solely on The effectiveness ProtLOCA is examined by global...

10.48550/arxiv.2406.19755 preprint EN arXiv (Cornell University) 2024-06-28

Abstract Plastic waste, particularly polyethylene terephthalate (PET), presents significant environmental challenges, prompting extensive research into enzymatic biodegradation. Existing PET hydrolases are limited to a narrow sequence space and demonstrate insufficient performance for This study introduces novel discovery pipeline that combines protein language models (PLMs) with structural representation tree identify enzymes based on similarity. Using the crystal structure of Is PETase as...

10.1101/2024.11.13.623508 preprint EN bioRxiv (Cold Spring Harbor Laboratory) 2024-11-15

<title>Abstract</title> Plastic waste, particularly polyethylene terephthalate (PET), poses significant environmental challenges, prompting extensive research into enzymatic biodegradation. However, existing PET hydrolases (PETases) are constrained to a narrow sequence space and exhibited limited performance for effective This study introduces protein discovery pipeline, ProMine, which integrates language models (PLMs) with representation tree identify PETase based on structural similarity...

10.21203/rs.3.rs-5492523/v1 preprint EN cc-by Research Square (Research Square) 2024-12-16
Coming Soon ...