Three of the 454 Sequencing System data processing applications output DNA sequences: Signal Processing, for the basecalls of individual reads; GS De Novo Assembler, for the
de novo-assembled consensus sequence of the sample DNA library; and GS Reference Mapper, for the sample’s consensus sequence mapped to a reference sequence. These use the FASTA standard file format (.fna), and are always accompanied by a corresponding base quality scores file, in the .qual format. Examples are shown in the Output sub-sections of these applications’ descriptions (e.g.
region.
key.454Reads and 454AllContigs).
>rank_
x_
y length=
XXbp uaccno=
accession
…where “rank_
x_
y” is the identifier or accession number of the read (the rank, x and y values are as described in section
2.3.4),
XXbp is the length in bases of the read, and
accession is the full universal accession number for the read.
>contigXXXXX length=
abc numReads=
xyz
…where “contigXXXXX” is the identifier of the contig and “
XXXXX” is a sequential numbering of the contigs in the assembly; and where the length and numReads values are the length in bases of the contig and the number of reads that were used in that contig’s multiple alignment.
>contigXXXXX refaccno,
YYY..
ZZZ length=
abc numReads=
xyz
…where “contigXXXXX” is the identifier of the contig and “
XXXXX” is a sequential numbering of the contigs along the reference; “
refaccno” is the accession of the reference sequence where this contig aligns; “
YYY..
ZZZ” is the start and end position of the contig on that reference sequence; and the length and numReads values are the length in bases of the contig and the number of reads that were used in that contig’s multiple alignment.