A Universal Probe Set for Targeted Sequencing of 353 Nuclear Genes from Any Flowering Plant Designed Using k-medoids Clustering

Phylogenomics Sequence (biology) Lineage (genetic)
DOI: 10.1101/361618 Publication Date: 2018-07-04T14:45:26Z
ABSTRACT
Abstract Sequencing of target-enriched libraries is an efficient and cost-effective method for obtaining DNA sequence data from hundreds nuclear loci phylogeny reconstruction. Much the cost associated with developing targeted sequencing approaches preliminary needed identifying orthologous probe design. In plants, has proven difficult due to a large number whole-genome duplication events, especially in angiosperms (flowering plants). We used multiple alignments over 600 353 putatively single-copy protein-coding genes design set probes phylogenetic studies any angiosperm lineage. To maximize potential while minimizing production, we introduce k-medoids clustering approach identify minimum sequences necessary represent each coding final set. Using this method, five 15 representative were selected per locus, representing diversity more efficiently than if designed using available sequenced genomes alone. test our approximately 80,000 probes, hybridized 42 species spanning all higher-order lineages angiosperms, focus on taxa not present probes. Out possible sequences, recovered average 283 at least 100 species. Differences among recovery could be explained by relatedness design, suggesting that there no bias Our set, which 260 kbp sequence, achieved median 137 taxon regions, maximum 250 kbp, additional 212 flanking non-coding regions across These results suggest Angiosperms353 described here effective group flowering plants would useful level lineages, including angiosperms.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (62)
CITATIONS (9)