skip to main content
Roche logo
6. GS Run Processor Appendices : 6.6 Phred-equivalent Base Quality Scores
Quality scores for individual called bases are determined by a method developed in collaboration with the Broad Institute(Genome Research,18(5): 763-70, 2008), whereby the methodology described by Ewing and Green (Genome Research, 8: 186-194, 1998) for the creation of quality scores as part of the Phred basecalling algorithm is applied to generating quality scores for 454 Sequencing reads. The quality scores computed for each called base are written to the CWF and SFF files (and optionally to a file paralleling the basecall FASTA file). Briefly, the method compares the properties of each base’s flowgram signals against properties that have been found to correlate with accurate and/or error-prone signal information, using training sets of read data. A multivariate analysis of those properties determines the sets of property values that best describe “bins” of basecalls, then assigns the training set accuracy rates of the basecalls in each bin as a quality score using the following scale:
Q = -10 log10 (error rate)
Local variation of the flowgram signals in a window surrounding the flow where the base is called (i.e., how far are those signals from the ideal 0.0, 1.0, 2.0, 3.0, … signals). This provides an estimate of the homopolymer accuracy.
Overall “separation” of the flowgram signals (i.e., how greatly separated are the 0-mer signals from 1-mer signals)