2. Example Amplicon Project Design and Analysis
:
2.6 Other Issues of Special Interest
: 2.6.3 Should Amplicons Share a Reference Sequence or Have Individual Ones?
2.6.3
Should Amplicons Share a Reference Sequence or Have Individual Ones?
When you are setting up your Amplicons for a Project, you will need to consider two opposing issues. The first issue is that smaller Reference Sequences are more efficient for computation. Excessively large Reference Sequences can lead to long computation times and slow scrolling and navigation, so shorter ones are preferable on that count. On the other hand, alignment views are restricted by Sample and Reference Sequence combination. This means that if you want to look at alignments or difference plots for two or more different Amplicons at the same time, those Amplicons must be defined from within the same Reference Sequence.
It makes sense to use a common Reference Sequence when your Amplicons actually overlap with one another and to use separate ones for Amplicons that don’t overlap. However, you do have the capability to construct artificial Reference Sequences that allow you to view multiple unrelated Amplicons in a view at the same time. These artificial Reference Sequences can be constructed by concatenating Amplicon sequences together with a string of N’s as separators. Such a Reference Sequence would be convenient if you have a small to moderate set of Amplicons that you are measuring in Samples with unknown variation content. You would then be able to look at the difference plot and get an overview of all of the Amplicons at the same time to identify obvious variations.
However, if you use an artificial Reference Sequence with too many Amplicons in it, you will get diminishing returns; the longer Reference Sequence will slow down computation, and the alignments will get more inconvenient to navigate. In general it is best to keep your Reference Sequences as compact as possible, thus if you wanted to measure a large number of exons from a particular gene, it would be better to use a Reference Sequence constructed by concatenating together the exons with N-separators than to use the full genomic sequence of the gene. As long as the exons don’t overlap with each other, it would be even better to use separate Reference Sequences for each exon (provided viewing the exons within the same alignment or difference plot is not a priority).