skip to main content
Roche logo
3.4.12
 
    report <report type> <other arguments>
 
    The 'report' command is used to generate reports about the currently open
    project.  The type of report is determined by the '<report type>' argument.
    The '<other arguments>' are determined by the report type.
 
    The following report types are available.  Run
    'help report <report type>' for more detailed information.
 
    alignment       The alignments in the currently open project.
    variantHits     The variant hits in the currently open project.
 
3.4.12.1
report alignment
    
    report align[ment] 
           -sam[ple] <sample name> 
           -ref[erence] <reference sequence name>
           [-readT[ype] <"con[sensus]" or "ind[ividual]">]  
           [-start <reference start position>] [-end <reference end position>] 
           [-mar[gin] <size>]
           [-wrap[pingWidth] <width>]
           [-makeDir[ectory] <"all", "last" or "none">]
           [-outputFor[mat] <"fasta", "clustal", "ace", "sam", "bam",
                             "table" [-tableOutputFormat <tsv|csv>]> ]
           [-outputDir[ectory] <directory path>] 
           [ [-outputFile <file>] | 
             [ [-outputPre[fix] <prefix>] 
               [-outputSuf[fix] <suffix>] 
               [-mappingFile <file>]   ] ]
           [-annot[ationFileSuffix] <suffix>]
           [-fileFilter <"all", "linux", "mac", or "windows">]
           [-file <file> [-format <format>]]
           [<amplicon name 1> <amplicon name 2> ...]
 
 
    The 'report alignment' command outputs sequence alignments in one of
    several formats.  FASTA format is the default, but Clustal, Ace, SAM, BAM 
    and Table may also be specified using the -outputFormat parameter.
    
    Values for the '-sample' and '-reference' parameters are required, and if
    specified as the names of a sample and reference sequence for which an 
    alignment has been computed in the project, then the corresponding
    alignment will be output.  If no '-outputFile' option is given, the
    alignment is printed to the standard output of the interpreter.  An output
    file of "-" has the same effect.  If an output file is given, the alignment
    is written to that file.  Run 'help general filePaths' for more information 
    about specifying files.
    
    Alternatively, either or both of the '-sample' or '-reference' parameters
    may be specified as the (wildcard) character '*', in which case all
    alignments that have been computed in the project for the indicated
    combination of samples and reference sequences will be output.  When using
    this form of the command, multiple alignments will typically be produced,
    and so the output cannot be sent to standard output and the '-outputFile'
    parameter cannot be used.  As explained below, the alignments are written
    to files in a directory structure according to a file naming convention
    that can be customized using the '-outputPrefix' and '-outputSuffix'
    parameters.
        
    Using the '-file' parameter, one or more of the parameter values may be
    supplied from tabular input.  Run 'help general tabularCommands' for
    information about the '-file' option.
    
    The remaining parameters are described below, grouped by their use in
    specifying the alignment region to output, formatting the alignment, and
    determining where the output is to be written.
 
    ALIGNMENT TYPE AND REGION PARAMETERS:
 
    The '-readType' parameter specifies the type of read to include in the
    alignment, and may be either "consensus" (the default if '-readType' is not
    used) or "individual".
   
    By default, the alignment output includes the target sequence regions of
    all the amplicons for which there are computed alignment data for the given
    '-sample' and '-reference' values.  An optional, space separated, list of
    amplicon names may be provided to restrict the alignment output to the
    target sequence neighborhoods of those specific amplicons.  The amplicon
    names are interpreted relative to the given '-reference' value, and thus
    this amplicon filtering ability is typically only useful if a non-wildcard
    '-reference' value is supplied.
 
    The '-start' and '-end' parameters may be used to precisely define
    (in 1-based reference sequence positions) the bounds for the reads in the
    alignment.
    
    If '-start' and/or '-end' positions are specified along with a list of
    specific amplicons (or all amplicons for the reference sequence, if a
    specific list is not supplied), the alignment output will be restricted to
    that region of reference base positions that constitute the (smallest)
    intersection of all the specifications.
    
    Bases of reads that extend outside the specified alignment region will be
    trimmed from the output, and reads that align within these positions will
    be padded on either side, as applicable, with gap characters ('-').  Reads 
    whose alignments have no overlap with the specified alignment region will
    not be included in the output at all.
    
    FORMATTING PARAMETERS:
 
    The '-margin' parameter specifies a number of additional reference bases 
    to include on either side of the alignment region (as determined by the
    amplicons, '-start' and '-end' parameters described above).  The bases
    of the reads in the alignment will still be trimmed to the specified 
    alignment region, but the reference sequence, which is output as the first
    sequence of the alignment output, will include the additional contextual
    bases. Under these reference positions, the read alignments will be padded
    with the gap character ('-').  If not specified, the default margin is
    0 (zero).
    
    The '-wrappingWidth' parameter defines the maximum number of alignment
    characters to allow per line in the formatted alignment output. In FASTA 
    output only, the special value 0 (zero) may be given to indicate no 
    wrapping.  If no value is supplied, then the default value of 50 will be 
    used.  ACE and SAM/BAM ignore this option. 
    
    WRITING ALIGNMENT TO STANDARD OUTPUT:
 
    If no wildcard ('*') specifiers are used for either the '-sample' or 
    '-reference' and no '-outputFile' parameter value is supplied (or one is
    supplied, but it is the special value '-'), then the alignment
    will be written to the standard output of the interpreter.
    
    WRITING ALIGNMENT(S) TO FILE(S):
 
    Alignment output may be written to files using a combination of the 
    '-outputDirectory' parameter and other parameters that depend on whether
    or not a wildcard ('*') specification was provided for either of the 
    '-sample' or '-reference' parameters.
 
    The '-outputDirectory' is optional, but can be used as a convenience to
    factor out the specification of a containing directory from the remainder
    of the output file path specification.   The value given for 
    '-outputDirectory' follows all the rules as explained for specifying
    paths in 'help general filePaths' and, in particular, allows the use of
    path shortcuts like %homeDir at the beginning of the path specification.
    
    When wildcard ('*') specifications for '-sample' and '-reference' are not
    used, the '-outputFile' parameter may be used to specify a single file for
    the alignment output. The file is placed under the path specified
    by the '-outputDirectory' parameter, if given.  If '-outputDirectory' is
    not specified, then the file specified by '-outputFile' will be written 
    under the current directory unless the '-outputFile' itself contains some
    additional, prefixed relative or absolute path specification as explained
    in 'help general filePaths'.
    
    When a wildcard ('*') specification for either '-sample' or '-reference' is
    used,  the output file for a given sample / reference combination is a file
    in the directory:
    
        outputDirectory/filteredSampleName/filteredReferenceName
        
    where the outputDirectory is the current directory if '-outputDirectory'
    is not specified.  The filteredSampleName and filteredReferenceName are the
    original sample and reference names from the project, possibly changed
    according to the value of the '-fileFilter' parameter, which is explained 
    below.
    
    Within that directory structure, that alignment file is written to a file
    of the automatically generated name:
    
    outputPrefix + 
        filteredSampleName + "_vs_" + filteredReferenceName + outputSuffix
       
    where "+" indicates concatenation of the values.  The outputPrefix value
    can be specified with the '-outputPrefix' parameter and defaults to the
    empty string if not supplied.  The outputSuffix may be specified with the
    '-outputSuffix' parameter to provide a filename extension; when unspecified 
    it defaults to the filename-extension associated with the type given in 
    -outputFormat, i.e., 'fasta'=".fna", 'clustal'=".aln", 'ace'=".ace".
    Note that the "." that separates the file extension from the rest of the 
    file name is explicitly supplied as part of the outputSuffix itself, and 
    so the extension can be effectively eliminated by supplying an empty string 
    ("") for the '-outputSuffix' parameter value.
    
    When wildcards are used, the automatically generated filenames and the
    directory structure that contains the alignment output, are based on the
    names of the samples and reference sequences.  It is possible that these
    names contain characters that are not allowed in filenames according to
    the operating system where the files are initially created or may
    eventually be viewed (if the files were copied to another machine).
    Consequently, these names must be filtered to be compatible with file
    naming conventions of the intended operating systems.
    
    Filename filtering is controlled by the '-fileFilter' parameter that
    ensures that the automatically generated output filenames and paths use
    legal file system characters. If this parameter is not supplied, then its
    value defaults to "all" which provides the most strict filtering and should
    produce filenames that are compatible across all major operating systems. 
    Illegal characters are replaced with a hyphen and a unique index (for the
    one invocation of the report alignment command) that uniquely encodes the
    characters.  Less general, OS-specific filename filtering may be elected by 
    setting this parameter to "linux", "windows" or "mac".  Note, that this
    setting does not filter the file-path value set by '-outputFile' when 
    wildcards are not used, where the user is in complete control of the
    filename.
 
    When wildcards are used, the '-mappingFile' parameter may optionally
    designate the name of the file that should be created by the report
    alignment command in the outputDirectory.  This file will contain a row of
    data for each sample/reference name pair and specify the relative path to 
    the corresponding alignment output file for that pair.  Using this file,
    a user, or automated process, can determine the alignment output file based
    on the original sample and reference names, prior to any
    filesystem-specific filename filtering.  The mapping file will be in comma
    separated format if specified with a ".csv" extension, and will be
    tab-separated otherwise.
    
    When using wildcards, it is possible that the directory specified by
    '-outputDirectory' does not already exist.  The '-makeDirectory' parameter
    may be given to specify what to do in this case.  Providing the value "all"
    will allow all sub-directories in the -outputDirectory path to be created  
    (i.e., if they don't already exist on the disk).  The value "last" will
    allow the last directory on the path to be created, but if any of the
    intermediate parent directories do not exist, the command will fail with
    an error.  When not supplied, the default value is "none", in which case
    the entire '-outputDirectory' path must already exist.  Regardless of this
    value, the subdirectories based on the filtered sample and reference names
    will automatically be created below the '-outputDirectory' location, and 
    do not have to pre-exist.
    
    When not using wildcards, the '-makeDirectory' parameter is also available,
    but is applied to the full directory path derived from the combination of
    the values of the '-outputDirectory' and '-outputFile' parameters, rather 
    than just to the '-outputDirectory' value itself.
    
    When writing to files, pre-existing files may be overwritten.  Run 
    'help set outputFileOverwritePolicy' to learn how to be alerted to, or
    prevent, such file overwrites.
 
 
    SUPPLEMENTAL ANNOTATION FILES
    
    The -annotationFileSuffix may only be used in conjunction with 
    '-outputFormat clustal' or '-outputFormat ace' to generate two files: the 
    primary (i.e., clustal or ace) and the secondary, an ‘annotation file’ in 
    ‘table’ format. The secondary file has the same name as the primary output 
    file plus the given annotation suffix.  If the suffix ends with ‘.csv’ the 
    annotation file format will be a table in comma separated value format, tab 
    separated value otherwise.  NOTE: annotation files can not be sent to 
    standard output, only to files.
    
    
    BASIC EXAMPLES:
 
    report alignment -sample Sample1 -reference EGFR_Exon_19
 
        Reports the consensus read alignment (default) for all amplicons in the 
        EGFR_Exon_19 reference to the standard output of the command
        interpreter in FASTA format.  Default wrapping width of 50 characters
        is used.
 
    report align -sam Sample1 -ref EGFR_Exon_19 -readType individual \
           -wrapping 0 -outputFile rpts/out.fna
 
        Reports the alignment of individual reads with no line wrapping and
        output going to the file: 
           %currDir/rpts/out.fna 
        
    report align -sam Sample2 -ref HLA_Long_Amps -readType consensus \
           -wrappingWidth 60 -margin 15 
 
        Reports, to standard output, the alignment of the consensus reads with
        a margin of 15 bases from the reference sequence added to both ends and
        then line wrapped on every 60th character.  Note: it is not necessary
        to use '-readType consensus' as this is the default report output.
        
        
    AMPLICON FILTERING EXAMPLES:
    
    report align -sam Sample1 -ref HLA_Long_Amps GA9 DE15
    
        Reports the consensus alignment for the amplicons GA9 and DE15 
        in the reference to the standard output of the command interpreter in 
        FASTA format.
        
    report align -sam Sample1 -ref HLA_Long_Amps DD14 DE15 \
           -start 50 -end 350
    
        Reports the consensus alignment for the amplicons DD14 and DE15, 
        clipping output to the given reference sequence positions 
        [50, 350], inclusive.
    
    
    WILDCARD SAMPLE AND REFERENCE EXAMPLES:
    
    report align -sam * -ref *
    
        Reports the consensus alignment for all valid sample and reference
        pairs to a collection of files located in the current directory.
    
    report align -sam Sample1 -ref * -outputDir dirA -makeDir last \
           -fileFilter linux -mappingFile map.tsv
    
        Reports the consensus alignment for all valid Sample1 and reference
        pairs to files (whose auto-generated names are linux OS compliant) in 
        the %currdir/dirA directory, creating the 'dirA' directory if
        necessary, and creating a mapping file called "map.tsv" in the dirA
        directory as well.
    
    
    FASTA ALIGNMENT OUTPUT FORMAT
    
    The FASTA alignment output first begins with an entry for the reference
    sequence as trimmed according to the '-start', '-end', amplicon list, and
    '-margin' parameter values.  Subsequent entries are either the individual
    or consensus reads (depending on the '-readType' parameter) that comprise 
    the alignment, padded as necessary with '-' gap characters.  Each entry
    consists of a definition line prefixed with a '>' followed by the aligned
    sequence data, wrapped according to the '-wrappingWidth' parameter.  The
    definition line specifies the name of the reference sequence or read, as
    applicable, followed by a set of keyword/value pairs that annotate the
    sequence.  The general form of the definition line is:
    
        >name keyword1=value1 keyword2=value2 ...
    
    The particular keyword value pairs that appear on the definition line
    depend on whether or not the entry corresponds to the reference sequence
    or an individual or consensus read.  The keywords are as follows, depending
    on the sequence type.
    
        KEYWORD          |R|C|I| DESCRIPTION OF CORRESPONDING VALUE
        -----------------+-+-+-+----------------------------------------------
        sample           |x| | | name of the sample that is the read source
        amplicon         | |x|x| name of the amplicon that is the read source
        consensusLabel   | | |x| consensus read containing the individual read
        strand           |x|x|x| + = forward, - = reverse
        forwardCount     | |!| | # of + strand reads in consensus
        reverseCount     | |!| | # of - strand reads in consensus
        refStart         |x|x|x| start alignment position relative to reference
        refEnd           |x|x|x| end alignment position relative to reference
        readStart        | |~|x| position of base within read at alignment start  
        readEnd          | |~|x| position of base within read at alignment end
        alignedReadBases | |x|x| number of aligned read bases
 
    NOTE: R(x) = key is shown for the Reference Sequence (first output line).
          I(x) = key is shown for Individual alignment reads.
          C(x) = key is shown for Consensus alignment reads.
          C(!) = key is shown for Consensus alignment reads only if 
                 value is non-zero.
          C(~) = key is shown for Consensus alignment reads but positions 
                 are synthesized as [1..alignedReadBases].
          
    For a given alignment output, all the reads will be derived from the same
    sample and so, for brevity, the sample keyword is only present on the
    definition line of the reference sequence that appears at the start of the
    output.  All reported positions are given using a 1-based positioning
    system (i.e., the first base is base #1).  For reads with a strand of '-',
    the readStart and readEnd are given relative to the original read
    orientation, and so in this case readStart will be greater than the
    readEnd.    
 
 
    TABLE OUTPUT FORMAT
    
    The Table format is a tab or comma separated value table whose column
    headers are identical to FASTA's keywords, but with the first letter of
    each keyword in upper case (e.g., the "readEnd" values of the FASTA output
    would appear in a column labeled "ReadEnd").  Two additional columns of
    data are also included, 'Accno' and 'Alignment', specifying the identifier
    of a sequence and its (gapped) sequence alignment, respectively.  The first
    row after the column labels contains data for the reference sequence and
    subsequent rows contain the data for the consensus or individual reads
    (depending on the value of the -readType parameter). 
    
    The '-tableOutputFormat' option controls the format of the table.  
    If 'tsv' is specified, a tab-delimited format is used.  Alternatively if
    'csv' is given, then a comma-delimited format is used.  If not specified,
    table will be tab-delimited, unless an output file is given
    (or is wildcard generated) with a ".csv" extension.
    
      Example: 
      
        report alignment -sample Sample1 -reference EGFR_Exon_19 \
               -outputFormat table -outputFile S1_E19.dat \
               -tableOutputFormat csv
               
        Reports the consensus read alignment (default) for all amplicons in the 
        EGFR_Exon_19 reference to the file S1_E19.dat in a Table format, with
        data separated by commas.
    
    The Table format can also, optionally, be used to supplement Clustal and
    Ace outputs formats to compensate for sequence annotations that are not
    fully supported by those formats.   When used in this manner, the Alignment
    column of data is not included in the output (see Clustal Output Format
    documentation for an example).
 
 
    CLUSTAL OUTPUT FORMAT
 
    The Clustal output format is provided as another way to export AVA 
    nucleotide sequence alignments.  Output produced in this format is
    from the AVA alignments, and should not be misconstrued as being output
    from an actual Clustal-based alignment implementation.
    
    For more information on specifics of the Clustal output format, and the
    basis of the AVA implementation of that format, see:
    
    		http://mcast.sdsc.edu/doc/clustalw-format.html
 
 	All 'report align' options used with CLUSTAL have similar effects as 
 	described for FASTA.  One exception is -wrappingWidth, which for CLUSTAL 
 	is limited to a range of [1..60] and defaults to 50 if left unspecified.
 	
 	Clustal format does not include space for key information, such as the
 	forwardCount or reverseCount of reads contained within consensus reads or
 	the true refStart and refEnd position of the Reference sequence and the
 	readStart and readEnd positions of the reads in the type of local
 	alignments performed by AVA (post primer trimming).  A Table format
 	output containing this additional information to annotate the Clustal
 	formatted output can be generated along with the Clustal output by
 	specifying a value for the '-annotationFileSuffix' option.
 	
      Example: 
        
        report align -sam * -ref * -outputFormat clustal \
                -annotationFileSuffix _annot.csv
      	  
    In the above example, the wildcard expansion will generate file names
    based on the Sample and Reference names in the usual manner, and each
    file will contain alignments in Clustal format.   For each such output file
    named X, an additional file named X_annot.csv will be generated in the
    Table format (see Table Output Format above) and contains the supplemental
    annotations.
     
    NOTE: if -annotationFileSuffix is used, the report output can not be 
    directed to the console's standard output.
      
     
    ACE OUTPUT FORMAT
    
    Using the option '-outputFormat ace', alignments are output in Ace format.
    Alignments in this format are still those of the AVA alignment algorithm
    and shouldn't be misconstrued as being output based on the Phrap
    assembly/alignment algorithm.   
    
    For more information on specifics of the Ace output format, see:
    
          http://www.phrap.org/consed/distributions/README.16.0.txt
    
    In the current implementation, the "BQ" tagged quality score values are not
    truly output (the constant value 30 is output for each base).
    
    All 'report align' options used with ACE have similar effects as described 
    for FASTA. One exception is -wrappingWidth, which is ignored for ACE 
    because the width is fixed at 50.
    
    The -annotationFileSuffix option may be used with the Ace format 
    (see Clustal Output Format for an example) to generate separate file(s) 
    containing supplemental annotation information for each alignmed sequence
    in tabular form.
    
    
    SAM / BAM OUTPUT FORMAT (Sequence Alignment/Map Format)
    
    Using the option '-outputFormat sam', alignments are output in SAM format
    per v0.1.2 draft here:
    
          http://samtools.sourceforge.net/SAM1.pdf
            
    Using the option '-outputFormat bam', alignments are output in a 
    compressed binary format.
    
    Currently the reference sequence is added as the first sequence in the 
    output file.  We don't advise dumping bam output to the console.
    
    All 'report align' options used with SAM/BAM have similar effects as 
    described for FASTA. One exception is -wrappingWidth, which is ignored.
    
     
    READ ORDER IN ALIGNMENT:
    
    Every alignment begins with an entry for the reference sequence.  Depending
    on the specified '-readType', the consensus or individual reads that follow
    are ordered as follows:
    
    For the "consensus" reads:
        1.  Reads are grouped by amplicon, and the amplicon-based groups are
            ordered so that amplicons with smaller target start values appear
            first, and shorter (nested) amplicons with the same target start
            appear before the longer (containing) amplicons: i.e., reads from
            amplicons closest to the 5' end of the reference sequence appear
            before reads from amplicons that are closer to the 3' end.
        2.  Within an amplicon-based group, the consensus reads are ordered by:
            1.  Constituent read count: consensi with the largest forwardCount
                and reverseCount values appear first.
            2.  And if tied, then ordered by refStart: reads with fewer leading
                gaps appear first.
            3.  And if tied, then ordered by the aligned nucleotide sequence:
                these are sorted by their natural ASCII lexicographic order 
                (i.e., - < A < C < G < N < T).
            4.  And if tied, then ordered by the strand: forward reads appear
                before reverse reads.
            5.  And finally, if necessary, ordered by the consensus read name.
            
    For the "individual" reads:
        1.  Reads are first ordered by the refStart: reads with fewer leading
            gaps appear first.
        2.  And if tied, then ordered by the aligned nucleotide sequence:
            these are sorted by their natural ASCII lexicographic order 
            (i.e., - < A < C < G < N < T).
        3.  And if tied, then ordered by the strand: forward reads appear
            before reverse reads.
        4.  And if tied, then ordered by the read identifier (i.e., as taken
            from the SFF file). 
        
3.4.12.2
report variantHits
    report variantHits [-outputFile <file>] [-format <table format>]
 
    Reports variant hits.  Variant hits are reported in the form of a table.
    The table has columns for the following.
 
        Reference Name
        Variant Name
        Variant Status
        Variant Pattern
        Sample Name
        Forward Hits
        Forward Denom
        Reverse Hits
        Reverse Denom
        Read Type
 
    Data are provided for a Variant of a given Reference Sequence if there
    are reads of a Sample that span the region of variation as described
    by the Variant Pattern.  The number of forward and reverse reads that
    span the region are reported in the Forward Denom and Reverse Denom
    columns, respectively.  The number of these reads that have the variation
    are given in the Forward Hits and Reverse Hits columns.  The Hit / Denom
    ratio provides an estimate of the Variant frequency in the Sample.
    Two rows of data are given for each Variant based on the Read Type, 
    which is either Consensus or Individual.  
 
    If no '-outputFile' option is given, the table is printed in a
    tab-delimited format to the standard output of the interpreter.  An output
    file of "-" has the same effect.  If an output file is given, the table is
    written to that file.  Run 'help general filePaths' for more information
    about specifying files.
 
    The '-format' option controls the format of the printed table.  If "tsv", a
    tab-delimited format is used.  If "csv", a comma-delimited format is used.
    By default, the tab-delimited format is used, unless an output file is
    given with a ".csv" extension.
 
    Here are some examples.
 
    report variantHits
 
        Reports the variant hits table to the standard output of the command
        interpreter in a tab-delimited format.
 
    report variantHits -outputFile /reports/hits.csv
 
        Reports the variant hits table to the /reports/hits.csv file in a
        comma-delimited format.
 
    report variantHits -outputFile -
 
        Reports the variant hits table to the standard output of the command
        interpreter in a tab-delimited format.