skip to main content
Roche logo
A green square appears on the Variants Tab after completion of a computation that included at least one known (or auto-detected) Variant, which is our case. We click on the Variants tab to observe the results of the analysis (Figure 2‑25). We choose to display the frequency and the number of reads of the Variant in the forward, reverse and combined orientations (‘All three’ and ‘Show denominators’ settings under “Show values”), to ascertain that the occurrence of the Variant isn’t orientation-dependent; the fact that it isn’t makes the observation more credible (verification of support in both orientations is helpful to eliminate false-positives that may occur due to artifacts in alignment or sequencing).
As can be seen, several of the consensi visible in the multi-alignment have many gaps in this region. To explore these in particular, we can select for viewing only the consensi with these gaps. This is done by right-clicking on a base (of any consensus) in a column within the stretch of gaps, and selecting for the gap character ‘-’ in the contextual menu. The result, shown in Figure 2‑28, is that only the consensi that have a gap at the position on which the selection was made (position 335 of the Reference Sequence in this case) are now displayed in the multi-alignment, and the Variation Frequency Plot is adjusted accordingly. Note in particular that the frequency axis (Variation %) is automatically re-scaled to best fit the data displayed, allowing us to clearly see that all the nucleotide positions in the stretch have the gap at a fairly consistent frequency, an observation consistent with a valid Variant. Note also that the frequency of 9.48% is close but a little on the high side compared to the value seen in the Variants Frequency Table for Var_1 (8.32%). The difference is caused by the fact that we made only one selection to focus the plot on the deletion area; not all the reads being displayed perfectly match our defined Variant. In part this is because there are some consensus reads representing basecalling/alignment problems that keep them from being counted as part of the Variant for the Variants Table frequency calculation. But more significantly, in this case, there is another deletion variant present that, as compared with Var_1, is shifted by a single base and is present at 0.82% (see the full list of automatically detected variants in Figure 2‑44, below).
We will now dig further by examining the individual reads that comprise the third of these consensi (CON_46). To do this, we right-click on the nucleotide at position 335 of this consensus (to keep the focus at the same location) and select the ‘Open Consensus Alignment’ option from the contextual menu. This loads the Consensus Align tab with a multi-alignment of all the reads that contributed to the consensus on which we clicked (Figure 2‑29). This view shows that certain reads lack an extra “A” nucleotide, compared to the rest of them. Looking at the sequence carefully we notice that the deletion has created a homopolymer of “A”, suggesting that the minority “gap extension” may actually be due to an undercall of this homopolymer in the reads that show it; this is supported by the fact that this is especially observed in reads in the reverse orientation (as shown in Figure 2‑29), which places an environment very rich in “A” nucleotides just before the gap.
Figure 2‑30: The Flowgram tab for the first read of the third consensus of Var_1 in Sample 1 Consensus Align view of CON_46 (in Figure 2‑28), showing that a gap of several nucleotide flow cycles in the read allows it to maintain alignment with the Reference Sequence on both sides of the gap
The initial deletion peaks in the graph on the Global Align tab plot were seen at a moderate percentage range (8.65 - 9.48%). The underlying alignments show that the deletions were linked together as a 15-bp deletion haplotype. The underlying flowgrams of reads exhibiting the haplotype further show that the deletions were not due to marginal calls, and demonstrate that the flows needed to be shifted to align properly. Taken together, the evidence is compelling that this 15 bp deletion is a true Variant in the sample. The 8.32% combined frequency for the Variant on the Variants Tab is a conservative estimate that seeks to measure perfect instances of the defined Variant in the context of consensus reads that, by their combination of individual reads, can distort the frequency statistics. So the actual frequency of the variation in the Sample is likely higher than 8.32%. As seen in Figure 2‑44, below, the combined Var_1 percentage, based on individual reads is 8.79%, closer to the lower range of observed deletion peak values. Further inspection of the alignment suggests an overlapping deletion (see the 5th, 6th and other consensus lines of Figure 2‑28 that end with a G just inside the deletion, rather than an A, and which end one base later inside the deletion, with a single A, rather than the double AA as occurs with Var_1). This additional deletion is reported as an automatically detected Variant (Figure 2‑44) with a combined frequency of 0.82%, and helps explain the range of deletion peaks observed.