Identification of known and novel recurrent viral sequences in data from multiple patients and multiple cancers

Identification
DOI: 10.7490/f1000research.1113095.1 Publication Date: 2016-09-08
ABSTRACT
Virus discovery from high throughput sequencing data often follows a bottom-up approach where taxonomic annotation takes place prior to association disease. Albeit effective in some cases, the fails detect novel pathogens and remote variants not present reference databases. We have developed species independent pipeline that utilises sequence clustering for identification of nucleotide sequences co-occur across multiple instances. applied workflow 686 libraries 252 cancer samples different tissue types, 32 non-template controls, 24 test samples. Recurrent were statistically associated biological, methodological or technical features with aim identify plausible contaminants may associate particular kit method. provide examples identified inhabitants healthy flora as well experimental contaminants. Unmapped statistical significance potentially represent unknown space can be identified.
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES ()
CITATIONS ()
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....