Mullins Molecular Retrovirology Lab

  • Department of Microbiology
  • School of Medicine
  • University of Washington

Citation Information

Iyer S, Bouzek H, Deng W, Larsen B, Casey E, Mullins JI (2013). Quality score based identification and correction of pyrosequencing errors. PloS one, 8(9), e73015. (pubmed) (doi)

Abstract

Massively-parallel DNA sequencing using the 454/pyrosequencing platform allows in-depth probing of diverse sequence populations, such as within an HIV-1 infected individual. Analysis of this sequence data, however, remains challenging due to the shorter read lengths relative to that obtained by Sanger sequencing as well as errors introduced during DNA template amplification and during pyrosequencing. The ability to distinguish real variation from pyrosequencing errors with high sensitivity and specificity is crucial to interpreting sequence data. We introduce a new algorithm, CorQ (Correction through Quality), which utilizes the inherent base quality in a sequence-specific context to correct for homopolymer and non-homopolymer insertion and deletion (indel) errors. CorQ also takes uneven read mapping into account for correcting pyrosequencing miscall errors and it identifies and corrects carry forward errors. We tested the ability of CorQ to correctly call SNPs on a set of pyrosequences derived from ten viral genomes from an HIV-1 infected individual, as well as on six simulated pyrosequencing datasets generated using non-zero error rates to emulate errors introduced by PCR. When combined with the AmpliconNoise error correction method developed to remove ambiguities in signal intensities, we attained a 97% reduction in indel errors, a 98% reduction in carry forward errors, and >97% specificity of SNP detection. When compared to four other error correction methods, AmpliconNoise+CorQ performed at equal or higher SNP identification specificity, but the sensitivity of SNP detection was consistently higher (>98%) than other methods tested. This combined procedure will therefore permit examination of complex genetic populations with improved accuracy.

Supplemental Data

supplementary_SI_454_methods_v3.doc