|
3.3.2
|
Find help on general use of the command interpreter in the sections below.
Run 'help general <subsection>'.
commandLine Information about the command line arguments to start
the command interpreter itself.
parsing Information about how commands are parsed.
filePaths Information about how file paths are interpreted.
tabularCommands Information about using tables to succinctly construct
commands.
recordNames Information about record naming for the command line
interpreter.
abbreviations Information about abbreviations for options and
commands that can be used throughout the command line
interpreter.
multiplexing Information about how the GS Amplicon Variant Analyzer
software supports multiplexing of amplicons and/or
samples within a single read data set.
|
3.3.2.1
|
doAmplicon [<advanced options>]
[<files>] [-onErrors <"stop" or "continue">]
[-i[nteractive]]
[-v[erbose]]
[-c[ommand] <command>]
[-p[roject] <project path>]
[-h[elp]]
[-a[bout]]
Runs the command interpreter. If no '<files>' are given, the interpreter
reads from standard input for its commands. If one or more files is
given, each file is executed in order. If "-" is encountered as one of
the file arguments, standard input is read for commands at that position.
For example:
doAmplicon preamble.ava -
will execute the preamble and start reading commands from standard input.
The '-onErrors' option sets the value of the 'onErrors' parameter. If
'onErrors' is set to "stop", the command interpreter will exit if an error
is encountered. If 'onErrors' is set to "continue", the command
interpreter will abort the command that caused the error but will continue
running and executing subsequent commands.
The '-interactive' option indicates that the interpreter is being used
interactively. A prompt is written to standard output and some commands
attempt to interact with the user for further input when necessary.
If neither '<files>' nor the '-command' option are given, the interpreter
implicitly enters interactive mode, even if '-interactive' is not
specified. If "-" is supplied as one of the '<files>' option, then the
interpreter will read from standard input, but will not implicitly enter
interactive mode. Thus, one syntax allows the interpreter to be used
as an interactive command line interface while the other facilitates the
creation of automated pipelined scripts, as in:
generateScript | doAmplicon - > resultFile
Unless explicitly given an '-onErrors' option value, the interpreter, in
interactive mode, behaves as if 'onErrors' were set to "continue" and, in
non-interactive mode, behaves as if 'onErrors' were set to "stop."
The '-verbose' option will cause the interpreter to output information
about the commands that it is executing as it executes them.
The '-command' option can be used to execute a single command in the
interpreter. For example, if you want to create an empty project, you
would execute:
doAmplicon -command "create project /data/new/project/path"
The '-project' option can be used to open a project before executing the
rest of the specified commands. For example, you may have a script that
removes all of the variants in a project. You could choose on what project
to run this script by using the '-project' option. For example:
doAmplicon -project "/some/project" myRemoveVariants.ava
Combined with the "-command" option, this can be used to execute single
commands on a project. For example:
doAmplicon -project "/some/project" -command "list amplicon"
This command will list all of the amplicons of the project at
"/some/project".
The '-project' option attempts to open the project with exclusive control
and will fail if another instance of the program has control of the
project. To attempt to preempt control of the project or open it in
a read-only fashion, requires the use of the 'open' command from within
the interpreter itself.
The '-help' option displays this help. Online help for the interpreter
commands is available by entering the "help" command to interpreter
itself.
The '-about' option displays version information about the interpreter.
The <advanced options>, if provided, must all precede any of the files
or other basic options on the command line and may be one or more of:
--maxPerm A number indicating, in megabytes, the maximum amount
of "PermGen" memory that doAmplicon's Java environment
may use. (Default = 128)
--maxHeap A number indicating, in megabytes, the maximum amount
of "Heap" memory that doAmplicon's Java environment
may use. (Default = 500)
--cpu A number indicating the number of processes that
doAmplicon may use to parallelize computations.
--configDir The location of configuration files used by
doAmplicon. (Default -
/opt/454/apps/amplicons/config/)
Note that all the advanced options are preceded by two dashes, unlike
the basic options that are preceded by only one. Normally the default
values of --maxPerm or --maxHeap, which are 128 and 500 megabytes,
respectively, are sufficient. If doAmplicon's underlying Java environment
runs out of memory, a message will be displayed indicating which
parameter needs adjusting. Doubling the default values will typically
resolve any memory issues. The --cpu option, which defaults to 1,
defines the number of parallel processes that may be used during the
Trimming and Alignment steps performed when computing a project via
the command 'computation start'. Due to the memory and cpu resource
requirements of the Trimming and Alignment steps, the --cpu option
generally should not exceed the number of actual processors on the local
machine, as all the processes will be run on the local machine (i.e.,
not spread across a cluster). If the amount of memory on the local machine
is limited, then it is advisable to limit the --cpu value, because the
parallelized steps will compete for memory resources and may lead to
excessive swapping of memory and degrade the responsiveness of the local
machine. The memory used by the Trimming and Alignment steps is in
addition to that used by the Java environment as set by the --maxPerm and
--maxHeap parameters. If --cpu value 0 (zero) is supplied, then all the
processors on the local machine will be used.
The –configDir option forces doAmplicon to use a configuration directory
other than the default.
Example usage:
Usage: gsAmplicon [--maxHeap <number>] [--maxPerm <number>] [--configDir
<directory>] [--cpu <number>]
|
3.3.2.2
|
The interpreter is case insensitive with respect to its commands and
options. For example, consider the two commands below:
create amplicon Amp1
CREATE AMPLICON Amp1
These commands are equivalent. Note, however, that all strings that are
part of the project itself are case sensitive. For example, consider the
two commands below:
create amplicon Amp1
create amplicon AMP1
These commands are not equivalent, since record names are case sensitive.
The character '#' may be used to document your scripts. The '#' may
appear anywhere on the line, and everything from the '#' until the end
of the line is ignored. For example:
# The next command line lists the Variants of the project
list variant
list amplicon # and this command lists the Amplicons
To use an argument that contain spaces or the comment character, surround
the argument with double quotes. For example, you can set an annotation
of an amplicon to an unusual string by running the following:
update amplicon Amp1 -annotation "My unusual string with a
comment # character and even line breaks
"
If you need a double quote in your argument, "escape" it with a preceding
backslash. For example:
update amplicon Amp1 -annotation "The \"Best\" Amplicon"
Inside of double quotes, the backslash (except when preceding a double
quote) and new line characters are treated literally. Thus, in the
example:
update amplicon Amp1 -annotation "Testing 1 \
2 3"
both the backslash and new line will become part of the annotation.
Outside of double quotes, the backslash character can be used to make
any single character ordinary, avoiding the need to use double quotes.
Thus, the following two commands are equivalent:
create amplicon Amp\#1
create amplicon "Amp#1"
Note that without the '\' or surrounding quotes, the #1 in the both
of the commands above would have been treated as a comment, and an
amplicon simply named "Amp" would have been created.
Outside of double quotes, the backslash character can also be used
for line continuation, allowing you to split a command over multiple
lines. A backslash immediately followed by a new line will join
the following line to the current line. This allows you to format
commands nicely. For example:
update amplicon Amp1 \
-annotation "The best amplicon" \
-reference "ref1"
is equivalent to the single line command:
update amplicon Amp1 -annotation "The best amplicon" -reference "ref1"
|
3.3.2.3
|
To facilitate high-throughput project setup and modification, it is
possible to run commands with tables of data as input. The table column
headers are simply the options of the command that is to be run, but with
the leading "-" removed. As with the command options themselves, the
column headers are case insensitive. The tabular data may be supplied
from an external file, or from a table embedded in the command script
itself, using tab or comma separated value formats.
For example, suppose you need to add 100 amplicons to a project. Instead
of adding them one by one with 'create amplicon' commands, you can issue
a single 'create amplicon' with a table as input. For example:
create amplicon -file - << end_marker
Name Reference
Amp1 Ref1
Amp2 Ref2
Amp3 Ref3
Amp4 Ref4
Amp5 Ref5
Amp6 Ref6
Amp7 Ref7
Amp8 Ref8
end_marker
This command will create 8 amplicons when run. Let us examine each
element of this invocation. First, the 'create amplicon' indicates that
we are creating amplicons. The '-file - <<' option indicates that we are
going to be supplying a table in the form of a "Here" document. A "Here"
document is essentially a document supplied to the command that can be
specified in place. The 'end_marker' indicates that we are creating a
here document that terminates when 'end_marker' is seen by itself on a
line.
The document itself must be a tab-separated table whose first row indicates
what option each column represents. Thus, when the second line of our
table is executed, it is precisely the same as if we were to have written:
create amplicon -name "Amp1" -reference "Ref1"
In fact, our table command is the same as executing the following:
create amplicon -name "Amp1" -reference "Ref1"
create amplicon -name "Amp2" -reference "Ref2"
create amplicon -name "Amp3" -reference "Ref3"
create amplicon -name "Amp4" -reference "Ref4"
create amplicon -name "Amp5" -reference "Ref5"
create amplicon -name "Amp6" -reference "Ref6"
create amplicon -name "Amp7" -reference "Ref7"
create amplicon -name "Amp8" -reference "Ref8"
However, it is much more succinct in table form.
This works for any command that takes a '-file' option. For example:
update reference -annotation "Updated 2/12/07" -file - << end
Name Sequence
Ref1 ATAGCAGATAGATAATATATAAAAAAGACGAT
Ref2 ATAGCAGATATAGATAGTGATGCAGTATAGACAGTAAGATAGACAG
Ref3 ATGAATAAAAAATCCCCCCCTAGTAGTACTTTTTTAAAAATA
Ref4 TGACGAAACATAGTGTAAACGTGTGCAGACAGCCCAC
Ref5 GCAGACGATAAAAAAATGATGACGACGTAATACAATAT
Ref6 GACGCATTTTTTTTAGATATACTATATATT
Ref7 TATAATAAAAATTATATCGGGATAGTAGTGCAGAGAGAGAGTAGTAGCAC
Ref8 TACGACATATAGATGATAGACAAATAACAGATAGTAGTAGTAGAAGT
end
This time we are updating references rather than creating amplicons. You
will also note that we specified an annotation in the main command and not
in the here document. Options specified in this manner are applied to
each row of the command. Our table command is the same as executing the
following:
update reference -annotation "Updated 2/12/07" -reference Ref1 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref2 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref3 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref4 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref5 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref6 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref7 -sequence ..
update reference -annotation "Updated 2/12/07" -reference Ref8 -sequence ..
Instead of using here documents, external files can be supplied using the
'-file' option. For example:
create variant -file /data/variants.txt
In the previous examples, we specified the table in place using a here
document. Here, we refer to the external file, "/data/variants.txt". The
format of the external file is expected to be exactly the same as that of
the here document (without the need for an end marker, however).
So "/data/variants.txt" could look like this:
<begin /data/variants.txt>
Name Reference Status
Var1 Ref1 Accepted
Var2 Ref1 Accepted
Var3 Ref1 Accepted
Var4 Ref1 Rejected
Var5 Ref1 Rejected
Var6 Ref1 Accepted
<end /data/variants.txt>
You will also note in this example that the rows do not line up exactly.
This is because we always expect one tab character to separate each column,
regardless of the size of the data in the column.
If you prefer comma-separated columns, use the '-format' option. For
example:
update readData -file - -format csv << end
Name,Active
Data1,true
Data2,true
Data3,false
Data4,true
Data5,false
end
Note the '-format csv' option. Valid values are "csv" and "tsv" to
indicate comma-separated and tab-separated table formats, respectively.
The default is "tsv", except when a file is provided with a ".csv"
extension (such as those exported from Excel).
It is also important to note that empty cells are not omitted from the
arguments. For example:
update variant -file - << end
Name Reference
Var1 Ref1
Var2
Var3
Var4 Ref4
Var5
Var6 Ref6
Var7 Ref7
Var8 Ref8
end
Executing this command will make variants "Var2", "Var3", and "Var5" refer
to no reference sequence.
Finally, note that the parsed table values are what are used to supply
values to the command arguments, as opposed to the literal table text
itself. This means that the table contents must follow the syntactic
conventions of tab and comma separated values tables, not that of the
command interpreter. In particular, this means that neither the
interpreter's comment character '#' nor the special '\' constructs have
any special meaning inside of tables. Similarly the conventions for
quoting double quotes in tables should be followed.
Rather than, as one would embed a '"' in a command line argument:
"This is how \"double quotes\" are embedded for the interface"
in a table, one must use the double-double quote convention:
"Tables use ""double-double quotes"" to embed a quote character"
For more information on the interpreter's parsing of commands and special
characters, run 'help general parsing'.
|
3.3.2.4
|
The command line interpreter primarily uses record names to identify and
distinguish records. Duplicate record names lead to ambiguity that the
interpreter cannot resolve in most cases. For example, it is technically
allowed for two reference sequences to have the same name, "Ref1". If we
want to update one of these reference, we issue the command:
update reference "Ref1" -annotation "New annotation"
This will report an error, since the interpreter cannot discern which
record to update. It is therefore recommended that unique names be used
for records.
There are exceptions to this rule. Amplicons and variants can be
disambiguated by their reference sequences if duplicate names are found.
For example, if you have two amplicons named "Amp1", but one of them refers
to reference "Ref1" and the other to "Ref2", the '-ofRef' option of
commands dealing with amplicons can be used to disambiguate them. For
example:
update amplicon "Amp1" -annotation "New annotation"
This will result in an error, since there are two amplicons named "Amp1".
However, consider this command:
update amplicon "Amp1" -ofRef "Ref1" -annotation "New annotation"
This is allowed because the '-ofRef' option has been used to determine
which amplicon to update. This can be used in other commands as well:
associate -sample "Sam1" -amplicon "Amp1" -ofRef "Ref1"
Again, we are distinguishing between the duplicately named amplicons by
using the '-ofRef' option.
The 'utility validateNames' command is provided to help determine if your
project has any such ambiguity and if so, help correct. Type
'help utility validateNames' for more information.
|
3.3.2.5
|
Many commands and options can be abbreviated. For example:
create amp Amp1
This command is the same as:
create amplicon Amp1
Such abbreviations are noted in the help documentation. For example,
the documentation for 'create amplicon' specifies:
create amp[licon]
to indicate that it can be abbreviated as such. This also goes for
some options. For example:
assoc -sam "Sam1" -amp "Amp1"
This is the same as:
associate -sample "Sam1" -amplicon "Amp1"
The option abbreviations are also similarly noted in the help
documentation.
|
3.3.2.6
|
File paths are used in commands to specify projects, script files, tabular
data, and, more generally, the location of input and output files. For
example, records can be listed to a file:
list amplicon -outputFile someFile.txt
or other scripts can be executed:
utility execute someOtherScript.ava
In these examples, relative paths (i.e., paths that don't start with a '/')
specify the files to use. These paths are considered relative to the
interpreter's current directory (currDir), which may be set with the
'set currDir' command.
When the interpreter starts, the currDir is initially set to the directory
in which the interpreter was invoked. For example, if the current working
directory is /home/me/projects when doAmplicon is invoked, the initial
currDir will be /home/me/projects. In this situation, for the example
above, the relative path someFile.txt would be resolved to the absolute
path /home/me/projects/someFile.txt.
If 'set currDir' is used, the file resolution will change. For example:
set currDir /some/other/directory
list amplicon -outputFile someFile.txt
Now the relative path someFile.txt will be resolved to the absolute path
/some/other/directory/someFile.txt.
A few special path prefix shortcuts, denoted with a leading '%', are also
available to make specifying files easier. The first of these, currDir,
has already been described. This may be used to explicitly specify the
currDir in a path, but is entirely equivalent to the default interpretation
of relative paths. For example, "%currDir/someFile.txt" and "someFile.txt"
will refer to the same file.
There is also a special path prefix shortcut to access the user's home
directory. For example, if the user's home directory is /home/me, the path
"%homeDir/someFile.txt" will be resolved to the absolute path
/home/me/someFile.txt.
Finally, there is a special path prefix shortcut, libDir, to access a
system library path that is set up as part of installation of the software.
This provides access to a standard library that may be modified by the site
administrator.
Path prefixes are only recognized when they prefix the path and match a
known shortcut. For example, suppose the values of the shortcuts are as
follows:
currDir=/some/dir
homeDir=/home/me
libDir=/opt/454/apps/amplicons/config/lib
Only paths starting with "%currDir", "%homeDir", or "%libDir",
respectively, will be affected by shortcuts. Here are some example
path specifications with their shortcut-expanded versions:
%currDir/someFile.txt => /some/dir/someFile.txt
%homeDir/someFile.txt => /home/me/someFile.txt
%libDir/someFile.txt =>
/opt/454/apps/amplicons/config/lib/someFile.txt
someFile.txt => /some/dir/someFile.txt
%otherDir/someFile.txt => /some/dir/%otherDir/someFile.txt
data/%currDir/someFile.txt => /some/dir/data/%currDir/someFile.txt
The last example does not expand the %currDir shortcut because it
does not appear at the beginning of the path specification. The second to
last example interprets '%otherDir' literally, and resolves the given path
relative to the currDir value of /some/dir, because %otherDir is not one of
the defined shortcuts.
Absolute paths (i.e., paths that begin with '/') may also be used. Such
paths are entirely unaffected by the currDir and by shortcuts.
To see the values of the shortcut prefixes, use the 'show environment'
command.
|
3.3.2.7
|
The GS Amplicon Variant Analyzer (AVA) software provides a number of
mechanisms for multiplexing reads, allowing multiple amplicons from
the same or different samples to be simultaneously sequenced within
a PTP region.
The simplest demultiplexing method relies on the sequence specific primer
regions of the amplicons. If an experiment calls for measuring multiple
distinct amplicons from the same sample, those amplicons may be mixed
together in a PTP region. The project setup allows different amplicons to
be associated with different samples, so it is also possible to multiplex
reads from different samples, providing the samples are constructed such
that each sample is comprised of reads from different amplicons.
However, if a user wants to sequence reads from different samples but the
same amplicons, the sequence specific primer information for the amplicons
is no longer sufficient for demultiplexing the reads to their appropriate
samples. To allow multiplexing of samples with the same amplicon in a PTP
region, the Multiplex Identifier (MID) approach is supported, in which
bases are added adjacent to the sequence specific primer in order to label
an amplicon's sample.
MIDs are technically part of the amplicon primer, but if they were encoded
as such in a project, the user would have to enter as many versions of an
amplicon as there are samples to be demultiplexed in a given region. For
simplicity, the AVA software allows the specification of amplicons in a
manner that is independent of whether MIDs are employed, and provides a
separate 'multiplexer' formalism that describes the MID to sample
relationships. The AVA software automatically combines, for the user, the
MIDs of a multiplexer with the primers of an amplicon and applies the
multiplexer's MID-sample relationships to determine the sample to which a
given read belongs. This facilitates project setup since multiple
amplicons can share the same MID to sample relationship information,
with that information being defined just once in a single multiplexer.
This also allows the MID specification (encapsulated in the multiplexer) to
be shared across multiple read data, in the event that the MID-sample
relationships are replicated in more than one read data of the experiment.
The use of multiplexers provides the following benefits:
1) Separation of amplicon specification from the complexities of MIDs
2) The sharing of MID to sample relationships across multiple amplicons
3) The sharing of such information across multiple read data
The 'associate' command provides the ability to define both MID-based and
non-MID-based multiplexing relationships. Run 'help associate' for more
details on how to create these multiplexing relationships. For more
information on creating multiplexers, and their associated constituents,
run 'help create multiplexer', 'help create mid', and
'help create midGroup'.