NFDI4DS | UHH-SEMS - Publication Details

A generic motif discovery algorithm for sequential data

Models, Molecular 0301 basic medicine Sequence Homology, Amino Acid Protein Conformation Gene Expression Profiling Amino Acid Motifs Molecular Sequence Data Computational Biology Sequence Analysis, DNA Protein Structure, Secondary Pattern Recognition, Automated 03 medical and health sciences Cluster Analysis Humans Amino Acid Sequence Sequence Alignment Algorithms Conserved Sequence Software

DOI: 10.1093/bioinformatics/bti745 Publication Date: 2005-10-29T00:13:06Z

Abstract Supplemental Material References Cited by

AUTHORS (4)

Kyle L. Jensen

Mark P. Styczynski

Isidore Rigoutsos

Gregory N. Stepha...

ABSTRACT

Abstract Motivation: Motif discovery in sequential data is a problem of great interest and with many applications. However, previous methods have been unable to combine exhaustive search with complex motif representations and are each typically only applicable to a certain class of problems. Results: Here we present a generic motif discovery algorithm (Gemoda) for sequential data. Gemoda can be applied to any dataset with a sequential character, including both categorical and real-valued data. As we show, Gemoda deterministically discovers motifs that are maximal in composition and length. As well, the algorithm allows any choice of similarity metric for finding motifs. Finally, Gemoda's output motifs are representation-agnostic: they can be represented using regular expressions, position weight matrices or any number of other models for any type of sequential data. We demonstrate a number of applications of the algorithm, including the discovery of motifs in amino acids sequences, a new solution to the (l,d)-motif problem in DNA sequences and the discovery of conserved protein substructures. Availability: Gemoda is freely available at Contact: gregstep@mit.edu Supplementary Information: Available at

SUPPLEMENTAL MATERIAL

Coming soon ....

REFERENCES (49)

CITATIONS (47)

EXTERNAL LINKS

OPENAIRE - Products CROSSREF - Publications

PlumX Metrics

A generic motif discovery algorithm for sequential data

RECOMMENDATIONS

FAIR ASSESSMENT

Coming soon ....

JUPYTER LAB

Coming soon ....