|
•
|
The Large or complex genome option should be used when large or complex data sets (i.e. eukaryotic genomes) are being assembled. This will invoke algorithms especially suited for such data sets, allowing successful and speedy assembly. This option:
|
|
•
|
The Heterozygotic mode option is used to specify that the project’s read data is from a diploid or non-inbred organism. This prompts the assembler to adjust the algorithms it uses to reflect an increase in the expected variability in sequence identity.
|
|
•
|
The Expected depth option allows you to specify the expected depth of coverage in the assembly. For high-depth assemblies (where the expected coverage of a position in the genome could reach hundreds or thousands of reads), this option can be used to filter out random-chance events that would be considered significant against a lower depth background.
|
|
•
|
The Minimum read length option can be used to change the minimum accepted tag/read length allowed in the assembly. For projects that use any Paired End data, this option sets the minimum length for reads or tags (see discussion of PE reads in section 4.8.2) to be used in the assembly (the default is 20 bp, the allowed value range is 15-45). In such projects, 454NewblerMetrics.txt will report the value of numberTooShort as 0 since any shotgun reads at least as long as the minimum read length will be used in the assembly. For projects with only shotgun read data available, the minimum read length of 50 bp cannot be changed using this option.
|
|
•
|
An optional Trimming database is used to trim the ends of input reads (for cloning vectors, primers, adapters or other end sequences). Specify the path to a FASTA file of sequences to be used for this trimming (see Section 4.9).
|
|
•
|
To use the Screening database option, set the path to a FASTA file of sequences to be used to screen the input reads for contaminants. A read that almost completely aligns against a sequence in the screening database is removed so that it is not used in the computation; if at least 15 bases of a read do not align to the screening sequence, no action is taken (see Section 4.9).
|
|
•
|