Statistical compression of protein sequences and inference of marginal probability landscapes over competing alignments using finite state models and Dirichlet priors

Sequence (biology)
DOI: 10.1093/bioinformatics/btz368 Publication Date: 2019-05-10T11:32:12Z
ABSTRACT
Abstract The information criterion of minimum message length (MML) provides a powerful statistical framework for inductive reasoning from observed data. We apply MML to the problem protein sequence comparison using finite state models with Dirichlet distributions. resulting allows us supersede ad hoc cost functions commonly used in field, by systematically addressing arbitrariness alignment parameters, and disconnect between substitution scores gap costs. Furthermore, our enables generation marginal probability landscapes over all possible hypotheses, potential facilitate users simultaneously rationalize assess competing relationships sequences, beyond simply reporting single (best) alignment. demonstrate performance program on benchmarks containing distantly related sequences. Availability implementation open-source supporting this work is available from: http://lcb.infotech.monash.edu.au/seqmmligner. Supplementary data are at Bioinformatics online.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (42)
CITATIONS (5)