|
1.
|
|
3.
|
|
1.3.2.1
|
|
•
|
|
•
|
|
1.
|
|
3.
|
|
Characters restriction: Be aware that only “nucleotide” characters (A, T, G, C, or N) are accepted when you enter a Reference Sequence into the AVA software (by typing or pasting). For convenience, when pasting sequences, characters that are not nucleotide characters and are also not IUPAC ambiguity characters (such as R for purine, Y for pyrimidine, etc.) are removed from the pasted entry. This is useful when pasting sequences from sources that may include non-sequence information (such as white space or numerical position information in the margin of each line). During such pastes, any IUPAC ambiguity characters are converted to “N” characters, as the other ambiguity characters are not supported by the software (typing individual “ambiguous” characters, however, does not result in their conversion to “N”; these are simply ignored and the text “Only ATGC and N” at the top of the Edit Sequence window turns bold and red to alert you that an invalid character was used). The restriction that no ambiguity characters other than N be present in a sequence is a requirement of many alignment algorithms and is not unique to the 454 Sequencing System software.
|
|
1.3.2.2
|
|
•
|
|
If the Reference Sequence does not yet contain a DNA sequence (see section 1.3.2.1.1), you will still be able to associate Amplicons to it, but you will not be able to fully define them. In particular, you will not be able to specify the Target Start and End for the Amplicons (see section 1.3.2.2.3, below) because these are set using the position numbering from the Reference Sequence.
|
|
|
1.
|
|
3.
|
|
Characters restriction: Be aware that only “nucleotide” characters (A, T, G, C, or N) are accepted when you enter a Primer Sequence into the AVA software (by typing or pasting). For convenience, when pasting sequences, characters that are not nucleotide characters and are also not IUPAC ambiguity characters (such as R for purine, Y for pyrimidine, etc.) are removed from the pasted entry. This is useful when pasting sequences from sources that may include non-sequence information (such as white space or numerical position information in the margin of each line). During such pastes, any IUPAC ambiguity characters are converted to “N” characters, as the other ambiguity characters are not supported by the software (typing individual “ambiguous” characters, however, does not result in their conversion to “N”; these are simply ignored and the text “Only ATGC and N” at the top of the Edit Sequence window turns bold and red to alert you that an invalid character was used). The restriction that no ambiguity characters other than N be present in a sequence is a requirement of many alignment algorithms and is not unique to the 454 Sequencing System software.
|
|
1.
|
|
a.
|
If the Target’s Start and End have not been specified for this Amplicon before (i.e. the Start and End cells were empty when you double-clicked them), the software automatically searches for the Primers (Primer 1 and Primer 2’) in the Reference Sequence; if it finds them (exact matches only), the software marks the Primers in yellow and the Target sequence (between the two Primers) in blue, and specifies default values for the Target’s Start and End positions in the boxes at the top of the window. The user should verify that the default positions are correct since, in some rare circumstances, there may be multiple Primer1-Primer2’ pairs of matches within the same Reference Sequence and the software simply gives the first such pair it finds. This Primer search function can also be elicited by typing a “0” (or a negative number) in either the Start or the End entry box. It is possible that exact matches for the Primers are not found in the Reference Sequence, as either or both Primers may actually not be represented by the Reference Sequence or, due to design considerations (or primer synthesis or sequencing errors), the Primers may slightly differ from the Reference Sequence so that they have a close, but inexact match. Whatever the reason, if no exact match can be found for Primer1, the AVA software will default the Target Start to the first base of the Reference Sequence; if no exact match can be found for Primer2’, the default for Target End will be the last base of the Reference Sequence. If this happens, verify that you have correctly defined the Primer and the Reference Sequence to which this Amplicon is associated; if the sequences are correct, but the default values supplied are incorrect, use one of the following methods to specify the Target Start and End positions.
|
|
3.
|
|
1.3.2.3
|
|
•
|
|
1.3.2.4
|
|
•
|
|
1.3.2.5
|
|
•
|
|
If the Reference Sequence does not yet contain a DNA sequence (see section 1.3.2.1.1), you will still be able to associate Variants to it, but you will not be able to fully define them. In particular, you will not be able to specify the Pattern for the Variant (see section 1.3.2.5.2, below) because this is set using the position numbering from the Reference Sequence.
|
|
A read satisfies this constraint when the nucleotide(s) at position “p” or in the range “p1-p2” (inclusive) of the Reference Sequence are identical to those of the Reference Sequence.
|
||
|
A read satisfies this constraint when the nucleotide(s) at position “p” or in the range “p1-p2” (inclusive) of the Reference Sequence are absent. Note that directly neighboring insertions may not also exist, as this combination would rather define a substitution.
|
|
1.
|
|
(d)
|
Click OK; the insertion appears in the sequence. The position of the inserted nucleotides use decimals so that the original Reference Sequence positions are maintained (e.g. position 66.5 means that the insertion is between the nucleotides at positions 66 and 67 of the Reference Sequence).
|
|
3.
|
|
1.3.2.5.3
|
|
1.3.2.6
|
|
•
|
|
•
|
|
Contrary to the situation with the GS De Novo Assembler and the GS Reference Mapper applications, the number of acceptable “reading errors” in the MIDs is not set by the user in the AVA software. Rather, the software dynamically calculates how many errors can be accepted by analyzing the set of MIDs used and determining how close they are to each other in terms of the minimum number of insertions, deletions, or substitutions that would be required to transform one MID into another.
|
|
Include all MIDs used in the experiment: The analysis of “MID closeness" for MID error correction described in the Note above is based on the MIDs specified in the Multiplexer definitions. For the purpose of this analysis it is important to include all MIDs that were actually used in the sequencing phase of the experiment, even if certain of these MIDs correspond to Samples that are not of interest in the particular AVA project. If any used MIDs were not specified in the project, the AVA software could overestimate the amount of allowable error correction as it tries to match reads to the MIDs it knows, which could result in MID “overcorrection” and the mis-assignment of reads to the known MIDs.
|
|
1.
|
|
3.
|
|
Characters restriction: Be aware that only “nucleotide” characters (A, T, G, C) are accepted when you enter an MID Sequence into the AVA software (by typing or pasting). For convenience, when pasting sequences, characters that are not nucleotide characters and are also not IUPAC ambiguity characters (such as R for purine, Y for pyrimidine, etc.) are removed from the pasted entry. This is useful when pasting sequences from sources that may include non-sequence information (such as white space or numerical position information in the margin of each line). If any IUPAC ambiguity characters are included, the paste will be cancelled entirely, and an error message will be displayed explaining the problem. If you directly type individual “ambiguous” characters, however, or any character other than A, T, G, or C, these characters are simply ignored.
The restriction that no ambiguity characters be present in an MID sequence is crucial because MIDs are intended to designate specific Samples. If you have a degenerate MID design in which multiple MID sequences specify the same Sample, enter all the specific MID sequences into the system and use the Multiplexer Sample editor to specify all the MIDs that encode each Sample (see section 1.3.2.7.3)
|
|
1.3.2.6.2
|
|
•
|
|
•
|
|
•
|
|
In the standard non-MID demultiplexing scheme, the AVA software looks for the template-specific primer sequences (Primer 1 and Primer 2) of the defined Amplicons at the beginning of each read. Once the Amplicon to which a read belongs is identified, the Sample-Amplicon associations defined for the Read Data Set that the read comes from are used to assign the read to its appropriate Sample. In other words, when MIDs are not used, the assignment of a read to an Amplicon, using the template-specific primers, is sufficient to further assign the read to the proper Sample. As explained before (see section 1.1.1.6, and Note and Caution sidebars in section 1.3.1.2), this scheme imposes the restriction that an Amplicon may only belong to a single Sample within a Read Data Set, to allow for unambiguous Sample assignment of the reads.
|
|
•
|
|
•
|
|
Selecting the proper encoding: It is crucially important to select the encoding method that truly corresponds to the way the libraries were prepared. For example, if libraries were prepared with ‘Either’ chemistry in mind, it may be tempting to use a ‘Primer 1 MID’ or ‘Primer2 MID’ encoded Multiplexer since the distal MID gets discounted in favor of the proximal MID in ‘Either’ encoding. However, the AVA software needs to know that MIDs are expected to be found at both ends: without that knowledge, the trimmer might get a suboptimal alignment of the distal primer, which in certain cases could drop valid reads out of the analysis.
|
|
1.3.2.7.1.1
|
|
1.3.2.7.1.2
|
|
1.3.2.7.1.3
|
|
•
|
In addition, the software creates virtual MID Groups based on the length of the MIDs defined in the Project. This is useful because, as mentioned above (see Note in section 1.3.2.6), all the MIDs used on a given end of an Amplicon must be of the same length.
|
|
◦
|
Note that MIDs without a defined sequence will appear in all length-restricted lists (e.g. see Figure 1‑36B). This allows undefined MIDs to be selected in a Multiplexer scheme and defined later. Once an MID has a sequence defined, it will lose its wild card status and will only appear in the list appropriate to its length.
|
|
1.3.2.7.3
|
|
1.3.2.7.3.2
|
|
1.3.2.7.3.3
|