GlycoMine: a machine learning-based approach for predicting N-, C- and O-linked glycosylation in the human proteome

Proteome Human proteome project Identification Cheminformatics
DOI: 10.1093/bioinformatics/btu852 Publication Date: 2015-01-08T02:10:15Z
ABSTRACT
Glycosylation is a ubiquitous type of protein post-translational modification (PTM) in eukaryotic cells, which plays vital roles various biological processes (BPs) such as cellular communication, ligand recognition and subcellular recognition. It estimated that >50% the entire human proteome glycosylated. However, it still significant challenge to identify glycosylation sites, requires expensive/laborious experimental research. Thus, bioinformatics approaches can predict glycan occupancy at specific sequons sequences would be useful for understanding utilizing this important PTM.In study, we present novel tool called GlycoMine, comprehensive systematic silico identification C-linked, N-linked, O-linked sites proteome. GlycoMine was developed using random forest algorithm evaluated based on well-prepared up-to-date benchmark dataset encompasses all three types curated from multiple public resources. Heterogeneous functional features were derived sources, subjected further two-step feature selection characterize condensed subset optimal contributed most type-specific prediction sites. Five-fold cross-validation independent tests show approach significantly improved performance compared with four existing tools: NetNGlyc, NetOGlyc, EnsembleGly GPP. We demonstrated could candidate case study proteins applied many high-confidence target by screening proteome.The webserver, Java Applet, user instructions, datasets, predicted are freely available http://www.structbioinfor.org/Lab/GlycoMine/.Jiangning.Song@monash.edu or James.Whisstock@monash.edu zhangyang@nwsuaf.edu.cnSupplementary data Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (52)
CITATIONS (171)