NFDI4DS | UHH-SEMS - Publication Details

Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty

0301 basic medicine 03 medical and health sciences Vocabulary, Controlled Multiprotein Complexes Terminology as Topic Uncertainty Proteins Molecular Sequence Annotation Saccharomyces cerevisiae Algorithms Semantics

DOI: 10.1093/bioinformatics/bts129 Publication Date: 2012-04-21T01:24:27Z

Abstract Supplemental Material References Cited by

AUTHORS (3)

Haixuan Yang

Tamás Nepusz

Alberto Paccanaro

ABSTRACT

Abstract Motivation: Several measures have been recently proposed for quantifying the functional similarity between gene products according to well-structured controlled vocabularies where biological terms are organized in a tree or in a directed acyclic graph (DAG) structure. However, existing semantic similarity measures ignore two important facts. First, when calculating the similarity between two terms, they disregard the descendants of these terms. While this makes no difference when the ontology is a tree, we shall show that it has important consequences when the ontology is a DAG—this is the case, for example, with the Gene Ontology (GO). Second, existing similarity measures do not model the inherent uncertainty which comes from the fact that our current knowledge of the gene annotation and of the ontology structure is incomplete. Here, we propose a novel approach based on downward random walks that can be used to improve any of the existing similarity measures to exhibit these two properties. The approach is computationally efficient—random walks do not need to be simulated as we provide formulas to calculate their stationary distributions. Results: To show that our approach can potentially improve any semantic similarity measure, we test it on six different semantic similarity measures: three commonly used measures by Resnik (1999), Lin (1998), and Jiang and Conrath (1997); and three recently proposed measures: simUI, simGIC by Pesquita et al. (2008); GraSM by Couto et al. (2007); and Couto and Silva (2011). We applied these improved measures to the GO annotations of the yeast Saccharomyces cerevisiae, and tested how they correlate with sequence similarity, mRNA co-expression and protein–protein interaction data. Our results consistently show that the use of downward random walks leads to more reliable similarity measures. Availability: We have developed a suite of tools that implement existing semantic similarity measures and our improved measures based on random walks. The tools are implemented in Matlab and are freely available from: http://www.paccanarolab.org/papers/GOsim/ Contact: alberto@cs.rhul.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (25)

CITATIONS (71)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

Improving GO semantic similarity measures by exploring the ontology beneath the terms and modelling uncertainty

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....