From Best Hits to Best Matches
Best practice
DOI:
10.48550/arxiv.2001.00958
Publication Date:
2020-01-01
AUTHORS (8)
ABSTRACT
Many of the commonly used methods for orthology detection start from mutually most similar pairs genes (reciprocal best hits) as an approximation evolutionary closely related matches). This matches by hits becomes exact ultrametric dissimilarities, i.e., under Molecular Clock Hypothesis. It fails, however, whenever there are large lineage specific rate variations among paralogous genes. In practice, this introduces a high level noise into input data best-hit-based methods. If additive distances between known, then can be identified considering certain quartets provided that in each quartet outgroup relative to remaining three is known. \emph{A priori} knowledge underlying species phylogeny greatly facilitates identification required outgroup. Although workflow remains heuristic since correct cannot determined reliably all cases, simulations with biases and asymmetries show nearly perfect results achieved. realistic setting, where have estimated sequence hence noisy, it still possible obtain highly accurate sets matches. Improvements tree-free assessment expected combination inference reported here recent mathematical advances understanding (reciprocal) match graphs relations.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....