skip to main content
Roche logo
3.3.2
    Find help on general use of the command interpreter in the sections below.
Run 'help general <subsection>'.
    
        commandLine         Information about the command line arguments to start 
the command interpreter itself.
        parsing             Information about how commands are parsed.
        filePaths           Information about how file paths are interpreted.
        tabularCommands     Information about using tables to succinctly construct 
commands.
        recordNames         Information about record naming for the command line 
interpreter.
        abbreviations       Information about abbreviations for options and 
commands that can be used throughout the command line
interpreter.
        multiplexing        Information about how the GS Amplicon Variant Analyzer 
software supports multiplexing of amplicons and/or
samples within a single read data set.
3.3.2.1
CommandLine Help
    doAmplicon [<advanced options>] 
               [<files>] [-onErrors <"stop" or "continue">]
                         [-i[nteractive]]
                         [-v[erbose]]
                         [-c[ommand] <command>]
                         [-p[roject] <project path>]
                         [-h[elp]]
                         [-a[bout]]
 
    Runs the command interpreter.  If no '<files>' are given, the interpreter
    reads from standard input for its commands.  If one or more files is 
    given, each file is executed in order.  If "-" is encountered as one of 
    the file arguments, standard input is read for commands at that position.  
    For example:
 
         doAmplicon preamble.ava -
 
    will execute the preamble and start reading commands from standard input.
    
    The '-onErrors' option sets the value of the 'onErrors' parameter.  If
    'onErrors' is set to "stop", the command interpreter will exit if an error
    is encountered.  If 'onErrors' is set to "continue", the command
    interpreter will abort the command that caused the error but will continue
    running and executing subsequent commands.
 
    The '-interactive' option indicates that the interpreter is being used
    interactively.  A prompt is written to standard output and some commands
    attempt to interact with the user for further input when necessary.  
    
    If neither '<files>' nor the '-command' option are given, the interpreter
    implicitly enters interactive mode, even if '-interactive' is not 
    specified.  If "-" is supplied as one of the '<files>' option, then the
    interpreter will read from standard input, but will not implicitly enter 
    interactive mode.  Thus, one syntax allows the interpreter to be used
    as an interactive command line interface while the other facilitates the
    creation of automated pipelined scripts, as in:
    
        generateScript | doAmplicon - > resultFile
    
    Unless explicitly given an '-onErrors' option value, the interpreter, in 
    interactive mode, behaves as if 'onErrors' were set to "continue" and, in
    non-interactive mode, behaves as if 'onErrors' were set to "stop."
 
    The '-verbose' option will cause the interpreter to output information
    about the commands that it is executing as it executes them.
 
    The '-command' option can be used to execute a single command in the
    interpreter.  For example, if you want to create an empty project, you
    would execute:
 
        doAmplicon -command "create project /data/new/project/path"
 
    The '-project' option can be used to open a project before executing the
    rest of the specified commands.  For example, you may have a script that
    removes all of the variants in a project.  You could choose on what project
    to run this script by using the '-project' option.  For example:
 
        doAmplicon -project "/some/project" myRemoveVariants.ava
 
    Combined with the "-command" option, this can be used to execute single
    commands on a project.  For example:
 
        doAmplicon -project "/some/project" -command "list amplicon"
 
    This command will list all of the amplicons of the project at
    "/some/project".
    
    The '-project' option attempts to open the project with exclusive control
    and will fail if another instance of the program has control of the
    project.  To attempt to preempt control of the project or open it in 
    a read-only fashion, requires the use of the 'open' command from within 
    the interpreter itself.
    
    The '-help' option displays this help.  Online help for the interpreter
    commands is available by entering the "help" command to interpreter
    itself.
    
    The '-about' option displays version information about the interpreter.
    
    The <advanced options>, if provided, must all precede any of the files
    or other basic options on the command line and may be one or more of:
    
    
        --maxPerm           A number indicating, in megabytes, the maximum amount 
of "PermGen" memory that doAmplicon's Java environment
may use. (Default = 128)
        --maxHeap           A number indicating, in megabytes, the maximum amount 
of "Heap" memory that doAmplicon's Java environment
may use. (Default = 500)
        --cpu               A number indicating the number of processes that 
doAmplicon may use to parallelize computations.
        --configDir         The location of configuration files used by 
doAmplicon. (Default -
/opt/454/apps/amplicons/config/)
    
    
    Note that all the advanced options are preceded by two dashes, unlike
    the basic options that are preceded by only one.  Normally the default
    values of --maxPerm or --maxHeap, which are 128 and 500 megabytes,
    respectively, are sufficient.  If doAmplicon's underlying Java environment
    runs out of memory, a message will be displayed indicating which
    parameter needs adjusting.  Doubling the default values will typically
    resolve any memory issues.  The --cpu option, which defaults to 1,
    defines the number of parallel processes that may be used during the
    Trimming and Alignment steps performed when computing a project via
    the command 'computation start'.  Due to the memory and cpu resource
    requirements of the Trimming and Alignment steps, the --cpu option
    generally should not exceed the number of actual processors on the local
    machine, as all the processes will be run on the local machine (i.e.,
    not spread across a cluster).  If the amount of memory on the local machine
    is limited, then it is advisable to limit the --cpu value, because the
    parallelized steps will compete for memory resources and may lead to
    excessive swapping of memory and degrade the responsiveness of the local
    machine.  The memory used by the Trimming and Alignment steps is in 
    addition to that used by the Java environment as set by the --maxPerm and
    --maxHeap parameters.  If --cpu value 0 (zero) is supplied, then all the
    processors on the local machine will be used.
    The –configDir option forces doAmplicon to use a configuration directory 
    other than the default.
 
    Example usage:
    Usage: gsAmplicon [--maxHeap <number>] [--maxPerm <number>] [--configDir
    <directory>] [--cpu <number>]
 
3.3.2.2
Parsing Help
    The interpreter is case insensitive with respect to its commands and
    options.  For example, consider the two commands below:
 
    create amplicon Amp1
    CREATE AMPLICON Amp1
 
    These commands are equivalent.  Note, however, that all strings that are
    part of the project itself are case sensitive.  For example, consider the
    two commands below:
 
    create amplicon Amp1
    create amplicon AMP1
 
    These commands are not equivalent, since record names are case sensitive.
    
    The character '#' may be used to document your scripts.  The '#' may 
    appear anywhere on the line, and everything from the '#' until the end
    of the line is ignored.  For example:
    
    # The next command line lists the Variants of the project
    list variant
    list amplicon # and this command lists the Amplicons
      
    To use an argument that contain spaces or the comment character, surround
    the argument with double quotes. For example, you can set an annotation
    of an amplicon to an unusual string by running the following:
 
    update amplicon Amp1 -annotation "My unusual string with a
      comment # character and even line breaks
    "
 
    If you need a double quote in your argument, "escape" it with a preceding
    backslash.  For example:
 
    update amplicon Amp1 -annotation "The \"Best\" Amplicon"
 
 
    Inside of double quotes, the backslash (except when preceding a double
    quote) and new line characters are treated literally.  Thus, in the
    example:
 
    update amplicon Amp1 -annotation "Testing 1 \
    2 3"
 
    both the backslash and new line will become part of the annotation.
    
    Outside of double quotes, the backslash character can be used to make
    any single character ordinary, avoiding the need to use double quotes.
    Thus, the following two commands are equivalent:
    
    create amplicon Amp\#1   
    create amplicon "Amp#1"
    
    Note that without the '\' or surrounding quotes, the #1 in the both
    of the commands above would have been treated as a comment, and an
    amplicon simply named "Amp" would have been created.
    
    Outside of double quotes, the backslash character can also be used
    for line continuation, allowing you to split a command over multiple
    lines.  A backslash immediately followed by a new line will join
    the following line to the current line.  This allows you to format
    commands nicely.  For example:
 
    update amplicon Amp1 \
        -annotation "The best amplicon" \
        -reference "ref1"
 
    is equivalent to the single line command:
   
    update amplicon Amp1 -annotation "The best amplicon" -reference "ref1"
    
3.3.2.3
Tabular Commands Help
    To facilitate high-throughput project setup and modification, it is 
    possible to run commands with tables of data as input.  The table column
    headers are simply the options of the command that is to be run, but with
    the leading "-" removed.  As with the command options themselves, the
    column headers are case insensitive.  The tabular data may be supplied
    from an external file, or from a table embedded in the command script
    itself, using tab or comma separated value formats.
    
    For example, suppose you need to add 100 amplicons to a project.  Instead
    of adding them one by one with 'create amplicon' commands, you can issue
    a single 'create amplicon' with a table as input.  For example:
 
    create amplicon -file - << end_marker
    Name     Reference
    Amp1     Ref1
    Amp2     Ref2
    Amp3     Ref3
    Amp4     Ref4
    Amp5     Ref5
    Amp6     Ref6
    Amp7     Ref7
    Amp8     Ref8
    end_marker
 
    This command will create 8 amplicons when run.  Let us examine each 
    element of this invocation.  First, the 'create amplicon' indicates that
    we are creating amplicons.  The '-file - <<' option indicates that we are
    going to be supplying a table in the form of a "Here" document.  A "Here"
    document is essentially a document supplied to the command that can be
    specified in place.  The 'end_marker' indicates that we are creating a
    here document that terminates when 'end_marker' is seen by itself on a
    line.
    
    The document itself must be a tab-separated table whose first row indicates
    what option each column represents.  Thus, when the second line of our
    table is executed, it is precisely the same as if we were to have written:
 
    create amplicon -name "Amp1" -reference "Ref1"
 
    In fact, our table command is the same as executing the following:
 
    create amplicon -name "Amp1" -reference "Ref1"
    create amplicon -name "Amp2" -reference "Ref2"
    create amplicon -name "Amp3" -reference "Ref3"
    create amplicon -name "Amp4" -reference "Ref4"
    create amplicon -name "Amp5" -reference "Ref5"
    create amplicon -name "Amp6" -reference "Ref6"
    create amplicon -name "Amp7" -reference "Ref7"
    create amplicon -name "Amp8" -reference "Ref8"
 
    However, it is much more succinct in table form.
 
    This works for any command that takes a '-file' option.  For example:
 
    update reference -annotation "Updated 2/12/07" -file - << end
    Name     Sequence
    Ref1     ATAGCAGATAGATAATATATAAAAAAGACGAT
    Ref2     ATAGCAGATATAGATAGTGATGCAGTATAGACAGTAAGATAGACAG
    Ref3     ATGAATAAAAAATCCCCCCCTAGTAGTACTTTTTTAAAAATA
    Ref4     TGACGAAACATAGTGTAAACGTGTGCAGACAGCCCAC
    Ref5     GCAGACGATAAAAAAATGATGACGACGTAATACAATAT
    Ref6     GACGCATTTTTTTTAGATATACTATATATT
    Ref7     TATAATAAAAATTATATCGGGATAGTAGTGCAGAGAGAGAGTAGTAGCAC
    Ref8     TACGACATATAGATGATAGACAAATAACAGATAGTAGTAGTAGAAGT
    end
 
    This time we are updating references rather than creating amplicons.  You
    will also note that we specified an annotation in the main command and not
    in the here document.  Options specified in this manner are applied to
    each row of the command.  Our table command is the same as executing the
    following:
 
    update reference -annotation "Updated 2/12/07" -reference Ref1 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref2 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref3 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref4 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref5 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref6 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref7 -sequence ..
    update reference -annotation "Updated 2/12/07" -reference Ref8 -sequence ..
 
    Instead of using here documents, external files can be supplied using the
    '-file' option.  For example:
 
    create variant -file /data/variants.txt
 
    In the previous examples, we specified the table in place using a here
    document.  Here, we refer to the external file, "/data/variants.txt".  The
    format of the external file is expected to be exactly the same as that of
    the here document (without the need for an end marker, however).  
    
    So "/data/variants.txt" could look like this:
 
    <begin /data/variants.txt>
    Name     Reference     Status
    Var1     Ref1     Accepted
    Var2     Ref1     Accepted
    Var3     Ref1     Accepted
    Var4     Ref1     Rejected
    Var5     Ref1     Rejected
    Var6     Ref1     Accepted
    <end /data/variants.txt>
 
    You will also note in this example that the rows do not line up exactly.
    This is because we always expect one tab character to separate each column,
    regardless of the size of the data in the column.
 
    If you prefer comma-separated columns, use the '-format' option.  For
    example:
 
    update readData -file - -format csv << end
    Name,Active
    Data1,true
    Data2,true
    Data3,false
    Data4,true
    Data5,false
    end
 
    Note the '-format csv' option.  Valid values are "csv" and "tsv" to
    indicate comma-separated and tab-separated table formats, respectively.
    The default is "tsv", except when a file is provided with a ".csv"
    extension (such as those exported from Excel).
 
    It is also important to note that empty cells are not omitted from the
    arguments.  For example:
 
    update variant -file - << end
    Name     Reference
    Var1     Ref1
    Var2
    Var3
    Var4     Ref4
    Var5
    Var6     Ref6
    Var7     Ref7
    Var8     Ref8
    end
 
    Executing this command will make variants "Var2", "Var3", and "Var5" refer
    to no reference sequence.
    
    Finally, note that the parsed table values are what are used to supply
    values to the command arguments, as opposed to the literal table text
    itself.  This means that the table contents must follow the syntactic
    conventions of tab and comma separated values tables, not that of the
    command interpreter.  In particular, this means that neither the
    interpreter's  comment character '#' nor the special '\' constructs have
    any special meaning inside of tables.  Similarly the conventions for
    quoting double quotes in tables should be followed.  
    
    Rather than, as one would embed a '"' in a command line argument:
    
    "This is how \"double quotes\" are embedded for the interface"
    
    in a table, one must use the double-double quote convention:
    
    "Tables use ""double-double quotes"" to embed a quote character"
    
    For more information on the interpreter's parsing of commands and special
    characters, run 'help general parsing'.
3.3.2.4
Record Names Help
    The command line interpreter primarily uses record names to identify and
    distinguish records.  Duplicate record names lead to ambiguity that the
    interpreter cannot resolve in most cases.  For example, it is technically
    allowed for two reference sequences to have the same name, "Ref1".  If we
    want to update one of these reference, we issue the command:
 
    update reference "Ref1" -annotation "New annotation"
 
    This will report an error, since the interpreter cannot discern which
    record to update.  It is therefore recommended that unique names be used
    for records.
 
    There are exceptions to this rule.  Amplicons and variants can be
    disambiguated by their reference sequences if duplicate names are found.
    For example, if you have two amplicons named "Amp1", but one of them refers
    to reference "Ref1" and the other to "Ref2", the '-ofRef' option of
    commands dealing with amplicons can be used to disambiguate them.  For
    example:
 
    update amplicon "Amp1" -annotation "New annotation"
 
    This will result in an error, since there are two amplicons named "Amp1".
    However, consider this command:
 
    update amplicon "Amp1" -ofRef "Ref1" -annotation "New annotation"
 
    This is allowed because the '-ofRef' option has been used to determine
    which amplicon to update.  This can be used in other commands as well:
 
    associate -sample "Sam1" -amplicon "Amp1" -ofRef "Ref1"
 
    Again, we are distinguishing between the duplicately named amplicons by
    using the '-ofRef' option.
 
    The 'utility validateNames' command is provided to help determine if your
    project has any such ambiguity and if so, help correct.  Type
    'help utility validateNames' for more information.
 
3.3.2.5
Abbreviations Help
    Many commands and options can be abbreviated.  For example:
 
    create amp Amp1
 
    This command is the same as:
 
    create amplicon Amp1
 
    Such abbreviations are noted in the help documentation.  For example,
    the documentation for 'create amplicon' specifies:
 
    create amp[licon]
 
    to indicate that it can be abbreviated as such.  This also goes for
    some options.  For example:
 
    assoc -sam "Sam1" -amp "Amp1"
 
    This is the same as:
 
    associate -sample "Sam1" -amplicon "Amp1"
 
    The option abbreviations are also similarly noted in the help
    documentation.
3.3.2.6
File Paths Help
    File paths are used in commands to specify projects, script files, tabular
    data, and, more generally, the location of input and output files.  For
    example, records can be listed to a file:
    
        list amplicon -outputFile someFile.txt
 
    or other scripts can be executed:
 
        utility execute someOtherScript.ava
 
    In these examples, relative paths (i.e., paths that don't start with a '/')
    specify the files to use.  These paths are considered relative to the
    interpreter's current directory (currDir), which may be set with the
    'set currDir' command.
 
    When the interpreter starts, the currDir is initially set to the directory
    in which the interpreter was invoked.  For example, if the current working
    directory is /home/me/projects when doAmplicon is invoked, the initial
    currDir will be /home/me/projects.  In this situation, for the example
    above, the relative path someFile.txt would be resolved to the absolute
    path /home/me/projects/someFile.txt.
    
    If 'set currDir' is used, the file resolution will change.  For example:
 
        set currDir /some/other/directory
        list amplicon -outputFile someFile.txt
 
    Now the relative path someFile.txt will be resolved to the absolute path
    /some/other/directory/someFile.txt.
 
    A few special path prefix shortcuts, denoted with a leading '%', are also
    available to make specifying files easier.  The first of these, currDir,
    has already been described.  This may be used to explicitly specify the
    currDir in a path, but is entirely equivalent to the default interpretation
    of relative paths.  For example, "%currDir/someFile.txt" and "someFile.txt"
    will refer to the same file.
 
    There is also a special path prefix shortcut to access the user's home
    directory.  For example, if the user's home directory is /home/me, the path
    "%homeDir/someFile.txt" will be resolved to the absolute path
    /home/me/someFile.txt.
 
    Finally, there is a special path prefix shortcut, libDir, to access a
    system library path that is set up as part of installation of the software.
    This provides access to a standard library that may be modified by the site
    administrator.
 
    Path prefixes are only recognized when they prefix the path and match a
    known shortcut.  For example, suppose the values of the shortcuts are as
    follows:
        
        currDir=/some/dir
        homeDir=/home/me
        libDir=/opt/454/apps/amplicons/config/lib
 
    Only paths starting with "%currDir", "%homeDir", or "%libDir",
    respectively, will be affected by shortcuts.  Here are some example
    path specifications with their shortcut-expanded versions:
 
        %currDir/someFile.txt           => /some/dir/someFile.txt
        %homeDir/someFile.txt           => /home/me/someFile.txt
        %libDir/someFile.txt            =>
                              /opt/454/apps/amplicons/config/lib/someFile.txt
        someFile.txt                    => /some/dir/someFile.txt
        %otherDir/someFile.txt          => /some/dir/%otherDir/someFile.txt
        data/%currDir/someFile.txt      => /some/dir/data/%currDir/someFile.txt
        
    The last example does not expand the %currDir shortcut because it
    does not appear at the beginning of the path specification.  The second to
    last example interprets '%otherDir' literally, and resolves the given path
    relative to the currDir value of /some/dir, because %otherDir is not one of
    the defined shortcuts.
        
    Absolute paths (i.e., paths that begin with '/') may also be used.  Such
    paths are entirely unaffected by the currDir and by shortcuts.
 
    To see the values of the shortcut prefixes, use the 'show environment'
    command.
3.3.2.7
Multiplexing Help
 
    The GS Amplicon Variant Analyzer (AVA) software provides a number of 
    mechanisms for multiplexing reads, allowing multiple amplicons from
    the same or different samples to be simultaneously sequenced within
    a PTP region.
 
    The simplest demultiplexing method relies on the sequence specific primer 
    regions of the amplicons.  If an experiment calls for measuring multiple 
    distinct amplicons from the same sample, those amplicons may be mixed 
    together in a PTP region.  The project setup allows different amplicons to 
    be associated with different samples, so it is also possible to multiplex 
    reads from different samples, providing the samples are constructed such 
    that each sample is comprised of reads from different amplicons.
 
    However, if a user wants to sequence reads from different samples but the 
    same amplicons, the sequence specific primer information for the amplicons 
    is no longer sufficient for demultiplexing the reads to their appropriate 
    samples.  To allow multiplexing of samples with the same amplicon in a PTP
    region, the Multiplex Identifier (MID) approach is supported, in which
    bases are added adjacent to the sequence specific primer in order to label
    an amplicon's sample.
 
    MIDs are technically part of the amplicon primer, but if they were encoded
    as such in a project, the user would have to enter as many versions of an 
    amplicon as there are samples to be demultiplexed in a given region.  For 
    simplicity, the AVA software allows the specification of amplicons in a
    manner that is independent of whether MIDs are employed, and provides a
    separate 'multiplexer' formalism that describes the MID to sample
    relationships.  The AVA software automatically combines, for the user, the
    MIDs of a multiplexer with the primers of an amplicon and applies the
    multiplexer's MID-sample relationships to determine the sample to which a
    given read belongs.  This facilitates project setup since multiple 
    amplicons can share the same MID to sample relationship information,
    with that information being defined just once in a single multiplexer.  
 
    This also allows the MID specification (encapsulated in the multiplexer) to
    be shared across multiple read data, in the event that the MID-sample 
    relationships are replicated in more than one read data of the experiment.
 
    The use of multiplexers provides the following benefits: 
        1) Separation of amplicon specification from the complexities of MIDs
        2) The sharing of MID to sample relationships across multiple amplicons 
        3) The sharing of such information across multiple read data
 
    The 'associate' command provides the ability to define both MID-based and
    non-MID-based multiplexing relationships.  Run 'help associate' for more
    details on how to create these multiplexing relationships.  For more
    information on creating multiplexers, and their associated constituents,
    run 'help create multiplexer', 'help create mid', and 
    'help create midGroup'.