skip to main content
Roche logo
2. Data Files and Formats : 2.3 Standard File Formats : 2.3.6 DNA Sequence (FASTA; .fna) and Base Quality Score (.qual) Files
Three of the 454 Sequencing System data processing applications output DNA sequences: Signal Processing, for the basecalls of individual reads; GS De Novo Assembler, for the de novo-assembled consensus sequence of the sample DNA library; and GS Reference Mapper, for the sample’s consensus sequence mapped to a reference sequence. These use the FASTA standard file format (.fna), and are always accompanied by a corresponding base quality scores file, in the .qual format. Examples are shown in the Output sub-sections of these applications’ descriptions (e.g. region.key.454Reads and 454AllContigs).
>rank_x_y length=XXbp uaccno=accession
…where “rank_x_y” is the identifier or accession number of the read (the rank, x and y values are as described in section 2.3.4), XXbp is the length in bases of the read, and accession is the full universal accession number for the read. 
2.
For contigs generated by the GS De Novo Assembler application, the description lines are formatted as follows:
>contigXXXXX length=abc numReads=xyz
…where “contigXXXXX” is the identifier of the contig and “XXXXX” is a sequential numbering of the contigs in the assembly; and where the length and numReads values are the length in bases of the contig and the number of reads that were used in that contig’s multiple alignment.
>contigXXXXX refaccno, YYY..ZZZ length=abc  numReads=xyz
…where “contigXXXXX” is the identifier of the contig and “XXXXX” is a sequential numbering of the contigs along the reference; “refaccno” is the accession of the reference sequence where this contig aligns; “YYY..ZZZ” is the start and end position of the contig on that reference sequence; and the length and numReads values are the length in bases of the contig and the number of reads that were used in that contig’s multiple alignment.