skip to main content
Roche logo
1. Overview of the 454 Sequencing System Software : 1.1 Data Acquisition, Data Processing and Data Analysis
The data acquisition phase is controlled by the GS Sequencer software application on the GS FLX+ Instrument or by GS Junior Sequencer on the GS Junior Instrument (this is the only application that has two different implementations under the two systems). The raw data consists of a series of digital images captured by the camera. The images are a representation of the surface of the PicoTiterPlate device over which the sequencing reactions are taking place; and each image corresponds to one nucleotide flow over that surface. If the sample DNA fragment present in a given well of the PicoTiterPlate device is extended during a nucleotide flow, light is emitted from the well and captured on the image corresponding to that flow. Furthermore, the amount of light emitted is proportional to the number of nucleotides extended.
The data processing phase is carried out by the GS Run Processor application and converts raw image data to base-called results suitable for use by downstream data analysis applications. Data processing is done in two main steps, image processing and signal processing. The software first identifies well locations and measures the amount of light emitted in each well location during each flow. It then uses this information to determine the sequence of DNA fragments located in each well.
2.
The first step of the GS Run Processor application, image processing, performs initial pixel-level calculations, and then groups pixels from the image set into a representation of the PicoTiterPlate wells where sequencing reactions were detected.
3.
The second step of the GS Run Processor application, signal processing, performs well-level calculations across the whole series of images to generate well “flowgrams” (and the basecalls of the DNA fragments being sequenced in all the active wells of the PicoTiterPlate device; “reads”).
Table 1 lists the inputs and outputs of the three main early components of data handling, from data acquisition through data processing, as well as the individual functions carried out by each application.
GS Junior Sequencer or GS Sequencer
GS Run Processor (image processing step)
GS Run Processor (signal processing step)
Table 1: The 3 main early components of data handling, from data acquisition through data processing, in the 454 Sequencing System, with their inputs, outputs, and main processing steps. They are performed in succession, in the order indicated; the SFF files output by the signal processing step of the GS Run Processor application are used as input to the data analysis applications (see Table 2). For a description of the data processing pipeline options, see section 1.2. For a full description of the GS Junior Sequencer or GS Sequencer application, see Part A of this manual (Section 2 in the GS Junior System version or Section 3 in the GS FLX+ System version); and for the GS Run Processor application, see Part B, Section 1.
The data analysis phase offers a choice of several downstream analysis paths to generate the desired final output: a consensus sequence of the DNA sample generated by the assembly of reads into contigs and scaffolds (GS De Novo Assembler); a consensus sequence along with a list of high-confidence differences obtained by mapping the reads to a known reference sequence (GS Reference Mapper); or the identification and quantitation of sequence variants by the ultra deep sequencing of amplicons (GS Amplicon Variant Analyzer). All data analysis outputs also include base-per-base quality scores (Phred-equivalent) and other specific metric files.
1.
The GS De Novo Assembler application generates a consensus sequence of the whole DNA sample, by assembling the reads into contigs (de novo shotgun assembly). An option allows the use of one or more sequencing Runs performed on a Paired End library (any type, or even a combination of Paired End library types) prepared from the same DNA sample, to be analyzed together with Shotgun sequencing Run(s) and help order and orient the resulting contigs into scaffolds. (Paired End reads do not necessarily need to be analyzed together with Shotgun reads.)
2.
The GS Reference Mapper application generates the consensus DNA sequence by mapping, or aligning, the reads to a reference sequence; as well as a list of high-confidence differences (individual bases or blocks of bases that differ between the consensus DNA sequence of the sample and the reference sequence). Robust cDNA analysis is also available.
3.
The GS Amplicon Variant Analyzer application compares reads from an Amplicon library to corresponding reference sequences, and allows the user to detect, identify and quantitate the prevalence of sequence variants.
The data analysis applications use the fully processed and “trimmed” read basecalls of a sequencing Run, or of a pool of Runs, to produce initial alignments to the reference sequence (or read-to-read overlaps for the GS De Novo Assembler). They then use a combination of nucleotide and flowgram information for consensus-calling of the contigs and determination of quality values for the contig sequences. Contig consensus-calling is carried out in “flowspace” (i.e. it operates directly on the processed signals measured from the wells), followed by basecalling to produce a consensus sequence for the sample. Table 2 lists the specific outputs of the 3 data analysis applications as well as the individual functions carried out by each one.
GS De Novo Assembler
Sample consensus sequence, assembled de novo (and scaffold information, with Paired End option)
Construct multiple alignments of reads that tile together (i.e. form contigs), based on the pairwise overlaps
The software package described in this manual also includes a variety of applications that are used primarily or exclusively off-instrument (on a DataRig or GS Junior Attendant PC). The GS Reporter and the GS Run Browser applications are used to view and troubleshoot the results of a completed sequencing Run; the GS Support Tool is used to package sequencing Run data to send to Roche Customer Support for further help and troubleshooting; and the SFF Tools are a set of commands used to create, manipulate and access sequencing trace data from SFF files. However, these applications and commands are not required steps of data processing and analysis.