skip to main content
Roche logo
1. GS Amplicon Variant Analyzer Application : 1.3 The Project Tab : 1.3.2 The “Definition Table” Sub-Tabs
The “Duplicate item” button is used to create copies of items in the Definition Tables. This is another contextual button that operates on an item that is selected in one of the Definition Table sub-tabs. Clicking on this button while a single item is selected in the Definition Table will add an extra row to the table that is identical to the selected item except that its name will have a suffix of the form “_copy_NUM” (where “NUM” increments from 1 when more than one copy is made of an original item). A copy of a copy adds another copy suffix (e.g., “ItemName_copy_2_copy_1” would result from a duplication of “ItemName_copy_2”). The duplication operation only duplicates data that is explicitly associated with the item in the Table row: it does not duplicate any associations the item might have as implied by the tree structures (such as Sample-Amplicon associations) unless they are specified in the Table (such as the Reference association in the Amplicon and Variant Definition Tables, and the content of Multiplexers). The duplication of Read Data is not currently supported.
1.3.2.1
The References Definition Table
The References Definition Table lists all the Reference Sequences defined in the Project, with the following three characteristics (Table columns; see Figure 1‑20):
For the procedures to add or remove Reference Sequences in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view). For the procedures to enter/edit the Name or Annotation information for a Reference Sequence, see section 1.3.2. The sub-section below provide the procedure to enter/edit the other characteristic of Reference Sequences, the DNA sequence itself.
1.3.2.1.1
To Enter or Edit the DNA Sequence of a Reference Sequence
Characters restriction: Be aware that only “nucleotide” characters (A, T, G, C, or N) are accepted when you enter a Reference Sequence into the AVA software (by typing or pasting). For convenience, when pasting sequences, characters that are not nucleotide characters and are also not IUPAC ambiguity characters (such as R for purine, Y for pyrimidine, etc.) are removed from the pasted entry. This is useful when pasting sequences from sources that may include non-sequence information (such as white space or numerical position information in the margin of each line). During such pastes, any IUPAC ambiguity characters are converted to “N” characters, as the other ambiguity characters are not supported by the software (typing individual “ambiguous” characters, however, does not result in their conversion to “N”; these are simply ignored and the text “Only ATGC and N” at the top of the Edit Sequence window turns bold and red to alert you that an invalid character was used). The restriction that no ambiguity characters other than N be present in a sequence is a requirement of many alignment algorithms and is not unique to the 454 Sequencing System software.
1.3.2.2
The Amplicons Definition Table
The Amplicons Definition Table lists all the Amplicons defined in the Project, with the following seven characteristics (Table columns; see Figure 1‑22):
For the procedures to add or remove Amplicons in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view, and concurrently create associations). For the procedures to enter/edit the Name or Annotation information for an Amplicon, see section 1.3.2. The sub-sections below provide the procedures to enter/edit the other characteristics of Amplicons.
1.3.2.2.1
To Enter or Edit the Reference Sequence to which an Amplicon is associated
If the Reference Sequence does not yet contain a DNA sequence (see section 1.3.2.1.1), you will still be able to associate Amplicons to it, but you will not be able to fully define them. In particular, you will not be able to specify the Target Start and End for the Amplicons (see section 1.3.2.2.3, below) because these are set using the position numbering from the Reference Sequence.
1.3.2.2.2
To Enter or Edit the Primer Sequences for the Amplicon
As mentioned earlier (section 1.1.1.3), Primer 1 and Primer 2 correspond to the “sequence-specific” part of the two Fusion Primers used to construct the Amplicon library, excluding the 19 bp “Primer A” and “Primer B” parts of the Fusion Primers.
Both Primer 1 and Primer 2 should be entered as their true 5’-->3’ sequence. To find the End of the Target (section 1.3.2.2.3, below), the software automatically determines the reverse-complement of Primer 2 (Primer 2’) and aligns this to the Reference Sequence.
Characters restriction: Be aware that only “nucleotide” characters (A, T, G, C, or N) are accepted when you enter a Primer Sequence into the AVA software (by typing or pasting). For convenience, when pasting sequences, characters that are not nucleotide characters and are also not IUPAC ambiguity characters (such as R for purine, Y for pyrimidine, etc.) are removed from the pasted entry. This is useful when pasting sequences from sources that may include non-sequence information (such as white space or numerical position information in the margin of each line). During such pastes, any IUPAC ambiguity characters are converted to “N” characters, as the other ambiguity characters are not supported by the software (typing individual “ambiguous” characters, however, does not result in their conversion to “N”; these are simply ignored and the text “Only ATGC and N” at the top of the Edit Sequence window turns bold and red to alert you that an invalid character was used). The restriction that no ambiguity characters other than N be present in a sequence is a requirement of many alignment algorithms and is not unique to the 454 Sequencing System software.
1.3.2.2.3
To Enter or Edit the Target Start and End Positions
a.
If the Target’s Start and End have not been specified for this Amplicon before (i.e. the Start and End cells were empty when you double-clicked them), the software automatically searches for the Primers (Primer 1 and Primer 2’) in the Reference Sequence; if it finds them (exact matches only), the software marks the Primers in yellow and the Target sequence (between the two Primers) in blue, and specifies default values for the Target’s Start and End positions in the boxes at the top of the window. The user should verify that the default positions are correct since, in some rare circumstances, there may be multiple Primer1-Primer2’ pairs of matches within the same Reference Sequence and the software simply gives the first such pair it finds. This Primer search function can also be elicited by typing a “0” (or a negative number) in either the Start or the End entry box. It is possible that exact matches for the Primers are not found in the Reference Sequence, as either or both Primers may actually not be represented by the Reference Sequence or, due to design considerations (or primer synthesis or sequencing errors), the Primers may slightly differ from the Reference Sequence so that they have a close, but inexact match. Whatever the reason, if no exact match can be found for Primer1, the AVA software will default the Target Start to the first base of the Reference Sequence; if no exact match can be found for Primer2’, the default for Target End will be the last base of the Reference Sequence. If this happens, verify that you have correctly defined the Primer and the Reference Sequence to which this Amplicon is associated; if the sequences are correct, but the default values supplied are incorrect, use one of the following methods to specify the Target Start and End positions.
1.3.2.3
The Read Data Definition Table
The Read Data Definition Table lists all the Read Data Sets defined in the Project, with the following four characteristics (Table columns; see Figure 1‑25):
For the procedures to add or remove Read Data Sets in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view, and concurrently create associations). For the procedures to enter/edit the Name or Annotation information for a Read Data Set, see section 1.3.2. The sub-sections below provide the procedures to enter/edit the other characteristics of Read Data Sets.
1.3.2.3.1
To Edit the Read Group of a Read Data Set
1.3.2.3.2
To Edit the “Active” status of a Read Data Set
1.3.2.4
The Samples Definition Table
The Samples Definition Table lists all the Samples defined in the Project, with only the following two characteristics (Table columns; see Figure 1‑26):
For the procedures to add or remove Samples in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view, and concurrently create associations). For the procedures to enter/edit the Name or Annotation information for a Sample, see section 1.3.2.
1.3.2.5
The Variants Definition Table
The Variants Definition Table lists all the Variants defined in the Project, with the following five characteristics (Table columns; see Figure 1‑27):
For the procedures to add or remove Variants in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view, and concurrently create associations). For the procedures to enter/edit the Name or Annotation information for a Variant, see section 1.3.2. The sub-sections below provide the procedures to enter/edit the other characteristics of Variants.
1.3.2.5.1
To Enter or Edit the Reference Sequence to which a Variant is associated
If the Reference Sequence does not yet contain a DNA sequence (see section 1.3.2.1.1), you will still be able to associate Variants to it, but you will not be able to fully define them. In particular, you will not be able to specify the Pattern for the Variant (see section 1.3.2.5.2, below) because this is set using the position numbering from the Reference Sequence.
1.3.2.5.2
To Enter or Edit the Pattern of a (Known) Variant
If you already know one or more Variants (e.g. from the scientific literature or from previous experiments), you can define them in the Project and have the AVA software report on the frequency at which they occur in the Read Data Sets included in the Project. Note that novel Variants observed in the reads of the Project itself can also be defined as described below, but the best way to specify novel Variants is to examine the multiple alignments of the putative Variants found by the AVA software during computation and to “Accept” them if you determine that they are legitimate (see section 1.3.2.5.3, below); also, you can “declare” novel Variants not identified by the software after you identify and evaluate them in the Global Align or Consensus Align tabs (see sections 1.6 and 1.7).
The AVA software uses 4 types of constraints to define Variants, and writes them following a strict Variant Definition Syntax, summarized in Table 1‑1. A Variant can be specified by one or more constraints which, collectively comprise the “Pattern” that defines the Variant.
A read satisfies this constraint when the nucleotide(s) at position “p” or in the range “p1-p2” (inclusive) of the Reference Sequence are identical to those of the Reference Sequence.
A read satisfies this constraint when the nucleotide(s) at position “p” or in the range “p1-p2” (inclusive) of the Reference Sequence are absent. Note that directly neighboring insertions may not also exist, as this combination would rather define a substitution.
(d)
Click OK; the insertion appears in the sequence. The position of the inserted nucleotides use decimals so that the original Reference Sequence positions are maintained (e.g. position 66.5 means that the insertion is between the nucleotides at positions 66 and 67 of the Reference Sequence).
1.3.2.5.3
To Edit the Status of a Variant
Variants exist within a Project, with one of three possible Status values: “Accepted”, “Putative”, or “Rejected”. Variants defined manually by the user (see section 1.3.2.5.2) receive the “Accepted” status by default. By contrast, variations between the Read Data Sets and the References that are identified by the AVA software (during computation; see section 1.4), are initially proposed as “Putative” Variants. After you have examined the data underlying a Variant and determined whether you believe it to be legitimate or not, you can change its assigned Status as described below.
1.3.2.6
The MIDs Definition Table
The MIDs Definition Table lists all the MIDs defined in the Project, with the following four characteristics (Table columns; see Figure 1‑30):
For the procedures to add or remove MIDs in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view). For the procedures to enter/edit the Name or Annotation information for an MID, see section 1.3.2. The sub-sections below provide the procedure to enter/edit the other characteristics of MIDs.
Contrary to the situation with the GS De Novo Assembler and the GS Reference Mapper applications, the number of acceptable “reading errors” in the MIDs is not set by the user in the AVA software. Rather, the software dynamically calculates how many errors can be accepted by analyzing the set of MIDs used and determining how close they are to each other in terms of the minimum number of insertions, deletions, or substitutions that would be required to transform one MID into another.
Include all MIDs used in the experiment: The analysis of “MID closeness" for MID error correction described in the Note above is based on the MIDs specified in the Multiplexer definitions. For the purpose of this analysis it is important to include all MIDs that were actually used in the sequencing phase of the experiment, even if certain of these MIDs correspond to Samples that are not of interest in the particular AVA project. If any used MIDs were not specified in the project, the AVA software could overestimate the amount of allowable error correction as it tries to match reads to the MIDs it knows, which could result in MID “overcorrection” and the mis-assignment of reads to the known MIDs.
1.3.2.6.1
To Enter or Edit the DNA Sequence of an MID
Characters restriction: Be aware that only “nucleotide” characters (A, T, G, C) are accepted when you enter an MID Sequence into the AVA software (by typing or pasting). For convenience, when pasting sequences, characters that are not nucleotide characters and are also not IUPAC ambiguity characters (such as R for purine, Y for pyrimidine, etc.) are removed from the pasted entry. This is useful when pasting sequences from sources that may include non-sequence information (such as white space or numerical position information in the margin of each line). If any IUPAC ambiguity characters are included, the paste will be cancelled entirely, and an error message will be displayed explaining the problem. If you directly type individual “ambiguous” characters, however, or any character other than A, T, G, or C, these characters are simply ignored.
1.3.2.6.2
To Edit the MID Group of an MID
1.3.2.7
The Multiplexers Definition Table
The Multiplexers Definition Table lists all the Multiplexers defined in the Project, with the following six characteristics (Table columns; see Figure 1‑32):
For the procedures to add or remove Multiplexers in a Project, see section 1.3.2 (or 1.3.1, to accomplish this in a “Project Tree” view). For the procedures to enter/edit the Name or Annotation information for a Multiplexer, see section 1.3.2. The sub-sections below provide the procedure to enter/edit the other characteristics of Multiplexers.
1.3.2.7.1
To Enter or Edit the Sample Encoding using Multiplexers
The AVA software provides 4 ways to encode the Sample to which a read belongs in the Multiplexer, based on the construction of the libraries (see section 4.6 for details on Amplicon library design with MIDs). The proper option must be selected from a drop down menu in the Multiplexers Definition Table (Figure 1‑33). The options, further described in the sub-sections below, are:
Selecting the proper encoding: It is crucially important to select the encoding method that truly corresponds to the way the libraries were prepared. For example, if libraries were prepared with ‘Either’ chemistry in mind, it may be tempting to use a ‘Primer 1 MID’ or ‘Primer2 MID’ encoded Multiplexer since the distal MID gets discounted in favor of the proximal MID in ‘Either’ encoding. However, the AVA software needs to know that MIDs are expected to be found at both ends: without that knowledge, the trimmer might get a suboptimal alignment of the distal primer, which in certain cases could drop valid reads out of the analysis.
1.3.2.7.1.1
“Primer 1 MID” and “Primer 2 MID” Encoding
When either of these encoding options is selected for a Multiplexer, only the corresponding Primer MID field, Primer 1 MIDs or Primer 2 MIDs, needs to be filled in the Multiplexer’s Definition Table, to identify the MIDs used in the scheme (see section 1.3.2.7.2). For example, a Multiplexer encoded as “Primer 1 MID” will have an empty column in the Definition Table for the “Primer 2 MIDs” field. The maximum number of Samples that can be encoded with this scheme is equal to the number of MIDs defined in the Primer 1 MIDs or Primer 2 MIDs field.
1.3.2.7.1.2
“Both” Encoding
1.3.2.7.1.3
“Either” Encoding
1.3.2.7.2
To Enter or Edit the Primer 1 MIDs and Primer 2 MIDs
To specify the MIDs for one end of a Multiplexer, double-click on the appropriate ‘Primer MIDs’ cell for that Multiplexer. The Edit Primer 1 MIDs (or Edit Primer 2 MIDs) window opens (Figure 1‑34). The window will not open unless at least one MID entry has already been specified into the MID Definition Table (though the MIDs do not have to have sequences defined at this stage). Select the MIDs of interest on the list on the left, and click . To remove MIDs that have been previously selected, highlight them in the list on the right, and click .
The MID Group drop down menu also contains some “virtual” groups that are automatically generated by the AVA software based on the MIDs currently defined in the Project. Figure 1‑35 shows an example where all 14 of the 454Standard MIDs (10-mers) have already been loaded into the Project, and four new MIDs have been added without groups: two 6-base MIDs (Mid15 and Mid16) and two MIDs for which no sequence has yet been defined (Mid17 and Mid18).
Figure 1‑36A, then shows the MID Group drop down menu for the MIDs in Figure 1‑35:
Note that MIDs without a defined sequence will appear in all length-restricted lists (e.g. see Figure 1‑36B). This allows undefined MIDs to be selected in a Multiplexer scheme and defined later. Once an MID has a sequence defined, it will lose its wild card status and will only appear in the list appropriate to its length.
The types of errors and warnings provided may include MIDs not all the same length, or undefined MIDs (Figure 1‑37). Note that the software gives the benefit of the doubt to undefined MIDs, and calls the attention of the user with a warning but does not assume an error. This provides the advantage that the structure of a Multiplexer can be defined independently, and possibly in advance, of the knowledge of the MID sequences themselves. However, prior to computation, all the MIDs used in defining Multiplexers that are associated with active Read Data Set must, naturally, be defined. The software also calculates the minimum edit distance even for defined MIDs of different lengths, assuming that corrections will be made prior to Project computation (i.e. that MIDs of unequal length will be corrected or eliminated).
1.3.2.7.3
To Enter or Edit the Samples Assignment
1.3.2.7.3.1
Sample Assignment with “Primer 1 MID” or “Primer 2 MID” Encoding
With these single-end MID encoding schemes, the Edit Samples window simply lists all the MIDs selected for the Multiplexer (see section 1.3.2.7.2), and the Sample is selected from the drop down menu (or can be typed) in the cell to the right of each MID name (Figure 1‑38A). The user can also type into the cells the names of Samples that have not yet been defined in the project: new samples with those names will automatically be created and appear in the Project’s Samples Definition Table when the user clicks . These samples will not be created, however, if the user clicks . A Sample assignment may be removed from a given MID by choosing the “--remove‑‑“ option from the drop down menu.
Certain shortcuts are available on this window as well: clicking the button assigns default-named Samples to any MID that does not yet have an assigned Sample (Figure 1‑38B); and a button empties all the Sample cells. The default Sample names contain three parts, in the following format:
In a manner analogous to the Edit Primer 1 MIDs or Edit Primer 2 MIDs windows (section 1.3.2.7.2), a summary of the assignment scheme is provided at the bottom of the Edit Samples window, including information on the number of MID-Sample associations defined and the total number that can be defined with the MIDs selected (Figure 1‑38). Any errors and warnings associated with the MIDs are also shown here, to alert the user that action must be taken to complete or correct the MID definitions or the Sample assignments (Figure 1‑39).
1.3.2.7.3.2
Sample Assignment with “Both” Encoding
The other features of this window (can have empty cells, shortcut buttons, summary and error/warning reporting, etc.) are the same as for the Primer 1 MID or Primer 2 MID encoding, described above. For “Both” encoding, the names of the Autofill Samples are of the form:
It is important to be aware of the directionality of the Amplicons when assigning the Samples to MID pairs: MIDs are selected separately for the Primer 1 and Primer 2 sides to support this directionality. In the Edit Samples window, the side corresponding to the two selected MID sets are identified by a “1” and a “2” with arrowheads, on the top-left corner of the Table. This is illustrated on Figure 1‑40: in this example, the table of the Edit Samples window has been populated using the button; the Primer 1 – MID1: Primer 2 – MID2 pair encodes Sample Sample_Multi7_Mid1_Mid2, while the Primer 1 – MID2: Primer 2 – MID1 pair encodes Sample Sample_Multi7_Mid2_Mid1. For convenience, a button can transpose the table.
1.3.2.7.3.3
Sample Assignment with “Either” Encoding
To help in this, the software grays out cells that become ineligible as Samples are assigned to Primer 1 MID – Primer 2 MID pairs. In the simplest case, the libraries are designed such that the same MIDs are placed at both ends of each Amplicon (Figure 1‑41A). For an Either encoded Multiplexer, the function is only enabled for this type of symmetric design (AutoFill expects to make sample assignments along the diagonal where the same MID is used on each end of the read: Mid1-Mid1, Mid2-Mid2, etc.).
However, asymmetric designs are also legitimate. The software flags this with a warning in case the asymmetry was unintended (Figure 1‑41B). Even if the same set of MIDs are selected for both the Primer 1 MIDs and the Primer 2 MIDs series (a symmetrical design), the Sample assignment does not have to be along the diagonal in the grid (Mid1-Mid1, Mid2-Mid2, etc.) as it would be with an AutoFill. As long as no MID at either end is assigned to more than one Sample, and every MID on one side that has a Sample assignment has some corresponding MID on the other side with the same Sample assignment, the design is still valid. Again, mis-assignment is prevented by graying out the ineligible cells (Figure 1‑41C).
In fact, it is even possible to have a different number of MIDs selected on the Primer 1 and Primer 2 sides. When this kind of design is used, the software displays a warning that there are unequal numbers of Primer 1 and Primer 2 MIDs, and specifies the number of unbalanced associations (Figure 1‑42). In this special case, one or more MIDs will have to be used more than once, yet the constraint that a given MID at a given end of the Amplicons must specify a single Sample (to allow for unambiguous assignment of the reads) must be respected. To accomplish this, the AVA software restricts the Sample choices in cells that may receive such secondary assignments (highlighted with a thicker gray border), to Samples already specified for a Primer 1 MID or a Primer 2 MID. Some of the specific circumstances one might encounter are illustrated in Figure 1‑42.
Again the other features of the Edit Samples window for “Either” encoding (can have empty cells, shortcut buttons, summary and error/warning reporting, etc.), are the same as for the Primer 1 MID, Primer 2 MID, or Both encoding, described above. Autofill Samples are handled the same way as for Both encoding.
1.3.2.7.4
Using Multiplexers for more than one Read Data
The “Select Amplicons associated with item” button can also provide a very useful shortcut when a given set of Amplicons is to be measured by multiple Read Data Set – Multiplexer pairs. This button is also located on the left margin of the Project Tab, and its functionality is described in section 1.3.1. Selecting a large number of disparate Amplicons from the Amplicons Definition Table, to associate them to a Multiplexer, can be laborious and painstaking; if many Multiplexers require the same (or similar) Amplicon associations, you need to create only the first of these Multiplexers manually, then select it in the Read Data Tree and click the “Select Amplicons associated with item” button; the software will switch to the Amplicons Definition Table sub-tab, and the subset of the Amplicons that are associated with the original Multiplexer will be selected, ready to be dragged to another Multiplexer in the Tree.