Data from Misannotated Multi-Nucleotide Variants in Public Cancer Genomics Datasets Lead to Inaccurate Mutation Calls with Significant Implications
Merge (version control)
DOI:
10.1158/0008-5472.c.6512701.v1
Publication Date:
2023-03-31T21:31:27Z
AUTHORS (11)
ABSTRACT
<div>Abstract<p>Although next-generation sequencing is widely used in cancer to profile tumors and detect variants, most somatic variant callers these pipelines identify variants at the lowest possible granularity, single-nucleotide (SNV). As a result, multiple adjacent SNVs are called individually instead of as multi-nucleotide (MNV). With this approach, amino acid change from individual SNV within codon could be different based on MNV that results combining SNV, leading incorrect conclusions about downstream effects variants. Here, we analyzed 10,383 call files (VCF) Cancer Genome Atlas (TCGA) found 12,141 incorrectly annotated MNVs. Analysis seven commonly mutated genes 178 studies cBioPortal revealed MNVs were consistently missed 20 studies, whereas they correctly 15 more recent studies. At <i>BRAF</i> V600 locus, common example MNV, several public datasets reported separate V600E V600M single merged V600K variant. VCFs TCGA Mutect2 caller develop solution merge MNV. Our custom script phasing information VCF determined whether same needed into before annotation. This study shows institutions performing NGS for genomics should incorporate step merging best practice their pipelines.</p>Significance:<p>Identification mutation calls TCGA, including clinically relevant <i>KRAS</i> G12, will influence research potentially clinical decisions.</p></div>
SUPPLEMENTAL MATERIAL
Coming soon ....
REFERENCES (0)
CITATIONS (0)
EXTERNAL LINKS
PlumX Metrics
RECOMMENDATIONS
FAIR ASSESSMENT
Coming soon ....
JUPYTER LAB
Coming soon ....