NFDI4DS | UHH-SEMS - Publication Details

Protein pKa Prediction by Tree-Based Machine Learning

Machine Learning 0301 basic medicine Kinetics 03 medical and health sciences Humans Proteins Algorithms

DOI: 10.1021/acs.jctc.1c01257 Publication Date: 2022-03-15T14:12:56Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Ada Y. Chen

Juyong Lee

Ana Damjanovic

Bernard R. Brooks

ABSTRACT

We present four tree-based machine learning models for protein pKa prediction. The four models, Random Forest, Extra Trees, eXtreme Gradient Boosting (XGBoost) and Light Gradient Boosting Machine (LightGBM), were trained on three experimental PDB and pKa datasets, two of which included a notable portion of internal residues. We observed similar performance among the four machine learning algorithms. The best model trained on the largest dataset performs 37% better than the widely used empirical pKa prediction tool PROPKA. The overall RMSE for this model is 0.69, with surface and buried RMSE values being 0.56 and 0.78, respectively, considering six residue types (Asp, Glu, His, Lys, Cys and Tyr), and 0.63 when considering Asp, Glu, His and Lys only. We provide pKa predictions for proteins in human proteome from the AlphaFold Protein Structure Database and observed that 1% of Asp/Glu/Lys residues have highly shifted pKa values close to the physiological pH.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (133)

CITATIONS (23)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Protein pKa Prediction by Tree-Based Machine Learning

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....