skip to main content
Roche logo
Additional information about the Sanger reads can be specified using “name=value” annotation strings on the description line of each sequence in the FASTA file. The data analysis software applications looks for and uses the following annotation strings:
1.
template – the Paired End template string for this read (Paired End reads are matched by having the same template string)
2.
dir – values “F”, “R”, “fwd” or “rev” giving the direction of the Paired End read
3.
library – the name of the library that generated this Paired End read (all Paired End reads are grouped by library name for the determination of expected pair distance)
4.
trim – the trimmed region of the sequence, given as “#-#”
5.
scf – the path or “command string” to use to access the SCF file for the read
6.
phd – the path or “command string” to use to access the PHD file for the read
>DJS045A03F template=DJS054A03 dir=F library=DJS045 trim=12-543
The data analysis software looks for the six “name=” strings on the description line, and then takes the text from the “=” to the next whitespace character as the “value” of the annotation string. So, other text besides annotation strings can appear on the description line, and no whitespace may appear in the value of an annotation string.
4.13.3.1
The “template”, “dir”, and “ library” Annotations
4.13.3.2
The “trim” Annotation
This trim annotation should combine the results of all of the sources of trimming that may occur (low quality, vector, primer, adapter, linker, etc.), so that the data analysis software is given just the sequence region that represents the bases to be included in the assembly or mapping. The “‑v” option can be used with the runAssembly, runMapping and runProject commands to specify a FASTA file containing sequences to be trimmed: each read will then be screened against this database, and the ends of reads that match the sequences included will be trimmed off.
4.13.3.3
The “scf” and “phd” Annotations