4. GS De Novo Assembler and GS Reference Mapper Appendices
: 4.12 The –urt option: Use Read Tips
4.12
The –urt option: Use Read Tips
The “-urt” option can be used to improve contigging in low depth portions of assemblies. While designed primarily to help recover more complete representations for rare transcripts in cDNA assemblies, genomic assemblies may also benefit from this option since it helps extending contigs to the overhanging tips of the reads at the end of multiple alignments.
The -urt option performs the following 2 operations:
•
When a single read is found to overhang past the end of a contig, the assembler tries to extend the contig to the “tip” (end) of the read which extends unaligned at the end of the contig (
Figure 97
A and B).
•
It calls contigs through regions of barely overlapping reads, where read depth is 1 (
Figure 97
C).
Figure 97: Effects of the –urt option on assemblies
For cDNA assemblies, use of this option usually results in a reduction in the number of singletons while increasing the number of fully and partially aligned reads. Also, the number of isogroups and isotigs generally increases because –urt generates additional contigs from reads previously declared to be singletons. For genome that contain closely packed genes, however, this option can cause isotig and isogroup fusions, so the option should be used with caution in such cases. Also, with the lack of coverage for certain loci, this option can cause isogroup fragmentation: while some of the reads from a low coverage locus are assembled, assemblies may be broken into multiple isogroups because there are not enough reads to cover the whole locus (typically with each isogroup containing a single low-depth contig), thereby increasing overall fragmentation. Another side effect of using this option on data sets with incomplete coverage is a drop in average contig length and N50 statisitics (again, due to the creation of new, shorter, low-depth contigs which cover only portions of certain loci).
Note that the -urt option is not recorded and/or persisted in the project’s xml file. Therefore, it must be specified on the command line or selected in the GUI (Extend low depth overlaps check box) each time the computation is run.