Alignments anchored on genomic landmarks can aid in the identification of regulatory elements

0301 basic medicine Models, Statistical Base Sequence Nucleotides Amino Acid Motifs Molecular Sequence Data Computational Biology DNA Genomics Regulatory Sequences, Nucleic Acid 03 medical and health sciences Cluster Analysis Humans Transcription Initiation Site Databases, Protein Promoter Regions, Genetic Sequence Alignment Software
DOI: 10.1093/bioinformatics/bti1028 Publication Date: 2005-06-16T13:08:03Z
ABSTRACT
The transcription start site (TSS) has been located for an increasing number of genes across several organisms. Statistical tests have shown that some cis-acting regulatory elements positional preferences with respect to the TSS, but few strategies emerged locating by their preferences. This paper elaborates such a strategy. First, we align promoter regions without gaps, anchoring alignment on each promoter's TSS. Second, apply novel word-specific mask. Third, clustering test related gapless BLAST statistics. examines whether any specific word is placed unusually consistently Finally, our program A-GLAM, extension GLAM program, uses significant positions as new 'anchors' realign sequences. A Gibbs sampling algorithm then locates putative elements. Usually, requires preliminary masking step, avoid convergence onto dominant uninteresting signal from DNA repeat. However, since anchors focus A-GLAM motif interest, repeats during becomes unnecessary.In set human sequences experimentally characterized TSSs, placement 791 octonucleotide words was consistent (multiple corrected P < 0.05). Alignments anchored these sometimes statistically motifs inaccessible or AlignACE.The and list are available at ftp://ftp.ncbi.nih.gov/pub/spouge/papers/archive/AGLAM/.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (25)