skip to main content
Roche logo
Glossary
Accession number; A unique identifier given to a DNA or protein sequence record utilizing a universal accession naming convention created by the UniProt (SwissProt) Knowledgebase.
ATP flow; In the pyrosequencing reaction, one molecule of Adenosine-5'-triphosphate (ATP) is synthesized for each unit nucleotide incorporation causing a flash of light (signal) whose intensity is proportional to the number of nucleotides incorporated. The initial ATP flow (GS Junior Titanium and GS FLX Titanium Chemistry) causes a simultaneous flash of light from all enzyme active wells and is used to define the PTP Device loading regions and the shape of the background across the plate.
Basecalling; Use of the relative signal intensity generated during an individual nucleotide "flow" (incorporation) step to generate a quantitative representation of nucleotide incorporation (singlet or homopolymer stretch).
Background; The non-specific or non-resolvable signal intensity generated during the sequencing Run which is corrected or subtracted to reduce the noise (signal to background ratio) and improve the information content of the signal flowgrams.
Carry forward; Occurs when a trace amount of nucleotide remains in a well after the apyrase wash, perpetuating premature nucleotide incorporations for specific sequence combinations on some DNA strands during the next base flow, and subsequent sequencing out-of-phase with respect to the rest of the strands on the bead.
Composite well format (*.cwf files); Contain the uncorrected flowgrams from the image processing step or the corrected flowgrams from the signal processing step.
Control DNA; Fragments of DNA with a known sequence used in each sequencing Run to determine the accuracy of the sequencing reaction signal intensity translation into basecalled sequence information.
Control DNA tab; GS Run Browser application tab, reports accuracy results for the Control DNA beads in terms of the % of reads that match their reference sequence at 95%, 98% and 100% accuracy, per PTP region and total, via a histogram plot. The results can be viewed across all or for individual Control DNA sequences, for a given base pair length (Base Pair selection) over which the match is calculated.
Data processing; The algorithmic processing of sequencing Run raw images (*.png files) to produce high quality basecalled reads in composite well format (*.cwf files) and standard flowgram format (*.sff files) for further data analysis. Data processing is carried out in two steps; image processing and signal processing.
image processing; Process of converting image data into raw flow signals for each active well of the PTP Device (a well containing a DNA fragment that produced light due to base incorporations during the sequencing Run).
signal processing; Process of applying filters, corrections and trimming of the raw flow signals to produce high quality sequence information.
Environmental variable; A dynamically defined relationship, usually a [key, value] pair, used by a computer operating system to affect the way running processes will behave on a computer.
Filters tab; GS Run Browser application tab, reports statistics on the read quality filters used to process the signals into high quality (HQ) reads for library and control wells. Includes a histogram of the number of key pass wells, per PTP region and total, for the % wells that passed all filters and the % wells that failed the Dot (null – not sufficient signal from DNA fragment on the bead) and/or Mixed (more than one DNA fragment on a bead) filters.
Flow; During a sequencing Run, nucleotides are flowed sequentially across the PTP Device, one at a time, in the cyclical order ‘TACG', as controlled by the Run script. When the flowed nucleotide is complementary to the next nucleotide (or homopolymer) on the DNA template in any given well, the polymerase extends the nascent DNA strand in that well. Addition of one or more nucleotide(s) releases a corresponding number of pyrophosphate (PPi) molecules. One molecule of ATP is synthesized for each PPi released, causing a flash of light (signal) whose intensity is proportional to the number of nucleotides incorporated.
Flow order; the order of nucleotides flowed during each cycle of the sequencing Run, generally ‘TACG’.
Key flows; the first few cycles of nucleotide flows needed to sequence the library and control sequence keys. For the flow order ‘TACG’ and the key ‘TCAG’, the key flows would be T-A-C-G-T-A-C-G (incorporation in bold) and consist of two complete flow cycles.
Negative flow; a well-specific attribute denoting a nucleotide flow where no signal is detected and thus no nucleotide incorporation is assumed.
Positive flow; a well-specific attribute denoting a nucleotide flow where signal is detected and the intensity of the signal is related to the number of nucleotides incorporated.
Flowgram; Data processing extracts information about the signal intensity in each well, over all flows. The signal intensity for each flow is plotted as a function of flow order, yielding a flowgram for the well. The signal intensity is proportional to the number of bases added (linear relationship); if no nucleotides is extended in that well during a flow, the signal will be very low (background); if one nucleotide is added, the signal will be similar in intensity to the key signal; if more than one nucleotide is added, the height of the signal will be correspondingly higher.
Homopolymer; Nucleotide sequence of varying length consisting of one uninterrupted nucleotide type, e.g. A (1-mer), A-A (2-mer), A-A-A (3-mer).
Inter-well crosstalk correction; Corrects individual wells for the additional signal intensity conveyed by neighboring high intensity signal wells.
Incomplete extension; Occurs when some DNA strands on a bead fail to incorporate during the appropriate base flow. This can be due to reactivity differences (it happens more often for ‘T’ flows, for example) and/or reagent and substrate local concentrations (it happens more for wells on the far end of the plate relative to the reagent flow direction). The strands that fail to incorporate must wait another flow cycle to continue sequencing and thus those strands will incorporate out-of-phase with the rest of the strands.
Key; The sequencing key is a known sequence of four nucleotides located immediately downstream from the sequencing primer and, therefore, the first to be sequenced in each well.
library key; The sequencing key used for the DNA library being sequenced. ‘TCAG’ or ‘GACT’
control key; The sequencing key used for the Control DNA used in the sequencing Run. ‘CATG’ or ‘ATGC’ (see Section 7.1 for details)
Key sequence; The first bases of a sequencing key, used for matching the initial signal flow information from a well to a well categorization; Key Pass – matches a key, Fail – does not match a key, Library – matches the library DNA key, Control – matches the Control DNA key. Only the first three nucleotides are used because the fourth base to be incorporated may incorporate as a homopolymer instead of a single nucleotide, depending on the DNA fragment sequence, thus complicating the matching algorithm.
Legacy files, legacy formats; Files and file formats generated by previous 454 Sequencing System software versions. Conversion of current formats to legacy formats and from legacy formats to current formats is enabled in some cases. See the GS Reporter and SFF Tools Sections of the 454 Sequencing System Software Manual for more details.
Library; A library is a collection of DNA fragments representative of the entire DNA sample to be sequenced. Each library is created from user-supplied purified DNA.
Normalization flow; The initial normalization flow (ATP in the GS Junior Titanium and GS FLX Titanium chemistry and PPi in the GS FLX standard chemistry) causes a simultaneous flash of light from all enzyme-active wells and is used to define the PTP Device loading regions and the shape of the background across the plate.
Nucleotide normalization; A signal processing correction that normalizes the signal strengths of different base incorporations.
Nucleotide incorporation; Polymerase extension of the nascent DNA strand in a well by a complementary nucleotide flowed across the PTP Device.
Out-of-phase error corrections (CAFIE - CArry Forward & Incomplete Extension); Chemical/system sequencing events that cause some DNA strands on a bead to incorporate nucleotides out-of-phase with respect to the rest of the strands and contributes to signal ‘noise’.
Overview tab; GS Run Browser application tab, contains summary data of the sequencing Run, summary data of the processing results, if carried out, and the GS Run Processor Manager used to launch data processing or reprocessing jobs for the currently selected Run data set.
PHRED; A software program that reads DNA sequencer trace data, calls bases, assigns quality values to the bases, and writes the basecalls and quality values to output files.
PHRED score; A quality score logarithmically linked to the error probabilities for basecalling sequences derived from signal data for known sequences. Calculated based on several parameters related to peak shape and peak resolution at each base. Methodology described by Ewing and Green (Genome Research, 8: 186-194, 1998)
PicoTiterPlate Device; A plate containing the DNA being sequenced. The PTP Device is the interface between the fluidics and optics subsystems of the GS Junior and GS FLX+ System. The side of the PTP Device that is in contact with the fluidics subsystem, which delivers the reaction reagents, contains ~1.8 million microscopic (18.5 picoliter) wells in which the sequencing reactions take place. Each well is designed to contain a single, unique library bead carrying a clonally amplified DNA fragment. The bottom of each well is made of an optical fiber, which transmits light produced by the sequencing reaction across the thickness of the PTP Device, to a camera, the optics subsystem which captures the raw images of the PTP Device during each nucleotide flow. Each well wall is lined with a metalized finish to reduce well-to-well crosstalk and signal interference. The wells are organized into regions (or loading gaskets) of different configurations (2-16 regions), allowing flexibility in the depth and breadth of information captured in any single sequencing Run.
Primer; Short preexisting polynucleotide chain to which new deoxyribonucleotides can be added by DNA polymerase.
Primer sequence; GS Junior and GS FLX+ System Adaptor sequence.
Processing pipeline; A data processing pipeline specifies the options of how to carry out the data processing with respect to when to process (during or post sequencing Run), where to process (on-instrument or on an external DataRig), which processing steps to carry out (no processing, image processing, signal processing, full processing), and what to process (standard shotgun / Paired End library / cDNA library , Amplicon library)
Processing launch script commands; These commands provide a user command line interface to the gsRunProcessor executable; runImagePipe, runAnalysisPipe, runAnalyisisPipePairedEnd, runAnalysisPipeAmplicon, runAnalysisFilter.
Processing pipeline script; XML-based text files located in /etc/gsRunProcessor that specify a series of commands to be sent to the gsRunProcessor.
Pyrosequencing; 'Sequencing by synthesis'- sequencing of a single-strand DNA by synthesis of the complementary strand one base pair at a time. The added nucleotide pair is detected and coded.
Quality score; A PHRED-like binned scored associated with a basecall, calculated for each nucleotide or homopolymer translation from signal space, based on the specific and local signal properties of the called base relative to a pre-calibrated 454 control DNA training set.
Quality filter; Any of a series of filters used to assess and retain only high quality reads for further data analysis; KeyPass, Dots, Mixed, signal intensity, primer, trimBack valley.
Q20 test; Average base quality score > 20.
Q20 Read Length; Read length at which the bases are 99% accurate or higher for all preceding bases.
Raw intensity; Uncorrected signal intensity from a nucleotide flow for a read in a well of the PTP Device.
Raw flow signals; Uncorrected signal intensities from a sequencing Run for a read in a well of the PTP Device.
Raw image files (*.png); Images of the PTP Device taken during each nucleotide flow of a sequencing Run, capturing the light emitted from each active well due to nucleotide incorporation sequencing reactions.
Read; The sequence trace data derived from the flow signal of a DNA template.
Raw read; The uncorrected sequence trace data derived from the flow signal of a DNA template.
High quality read; The corrected sequence trace data derived from the flow signal. Corrections are applied for known chemical, biological and system artifacts and trimming is applied to retain the high quality signal intensity portion of the read.
Trimmed read; A read that has had a portion of its 3’ end (distal from the sequencing primer) trimmed to retain the relevant and/or high quality signal portion of the DNA template.
Unrecognized read; Reads which begin with the Control DNA sequencing key (CATG or ATGC; see Section 7.1 for details) but do not match any of the corresponding Control DNA reference sequences.
Read trace information; The linear sequence of a DNA template derived from signal data obtained during a sequencing experiment.
Read length; The length in nucleotides of a read.
Read rejecting filters; Applied as a pass/fail test and quickly discards no-information or low information active wells; KeyPass, Dots, Mixed.
Read trimming filters; Applied to retain the high information content portion of a read; signal intensity, primer, trimBack valley.
Reads tab; GS Run Browser application tab, contains statistics on the distribution of the read length and read quality for library and control wells and summary statistics per PTP region and total, on the number of raw wells, key pass wells, passed filter wells, total bases, read length average, standard deviation, longest read length, shortest read length, and median read length.
Reagent flow event balancer; Corrects anomalous signal spikes due to reagent valve events.
Instrument Procedure Wizard; A software application on the GS Junior and GS FLX+ Instruments used to set up and launch a sequencing Run.
Signal droop correction; Correct for signal reduction during the eight-hour sequencing Run exposure.
Signal flowgram; The linear sequence of a DNA template in a well of the PTP Device during a sequencing Run, inferred from the signal intensity for each nucleotide flow, plotted as a function of flow order.
Standard flowgram format file (*.sff); Contains data on all the reads resulting from the data processing including the quality and primer trimming positions, flowgram information, called bases and their quality scores. This is a text file starting with a common header section, followed by a header and data section for each read.
Signal per base; The signal intensity calculated for a single nucleotide incorporation during a sequencing Run, averaged over all homopolymer nucleotide incorporation signals in all Control DNA wells. It can be used to remove ghost wells, active wells with signal below 1-mer incorporations, and to estimate the number of copies of DNA template per bead in each well.
Signal processing; Process of applying filters, corrections and trimming of the raw flow signals to produce high quality sequence information.
Signals tab; GS Run Browser application tab, contains statistics on the distribution of well intensity, filtered well intensity and N-mer (number or bases) signals recorded for each flow for control and library wells.
Titration; A process of determining or achieving the correct ratio of chemical reactants. For DNA sequencing Runs, titration can be used to determine the optimal amount of DNA library to use in emPCR amplification.
Valley; Off-peak signal intensity relative to the nearly quantized signal intensity per nucleotide homopolymer incorporation.
Well; Conceptually, a well is a location on the raw image of the PTP Device at which signal was observed during a nucleotide flow of a sequencing reaction. Physically, a well is an octagonal 18.5 picoliter compartment on the PTP Device in which the sequencing reactions take place. Each well is designed to contain a single, unique library bead carrying a clonally amplified DNA fragment. The bottom of each well is made of an optical fiber, which transmits light produced by the sequencing reaction across the thickness of the PTP Device, to the camera (optics subsystem) and each well wall is lined with a metalized finish to reduce well-to-well crosstalk and signal interference.
Raw wells; Wells identified as having measurable signal intensity during a sequencing Run but that have not been filtered or corrected for sequence information content relevance.
Key Pass wells; Wells that have signal intensity matches of the initial nucleotide flows to known DNA Adaptor sequences used in the sequencing Run.
Well density; A property estimating the closeness of active wells across the PTP Device, based on the calculated signal per base.
Well density correction; Calculates the signal per base to filter out ghost wells.
Well flowgram; A plot of the linear sequence of a DNA template inferred from the signal intensity observed for each nucleotide flow, plotted as a function of flow order. Accessible from the Wells tabs of the 454 Sequencing System software applications.
Consensus flowgram; The flowgram of a Control DNA sequence constructed by averaging, for each nucleotide flow, the read flowgram signals of the reads identified for that reference sequence. Accessible from the Control DNA tabs of the 454 Sequencing System software applications.
Location (raw) flowgram; A raw flowgram constructed by computing, for each image in the sequencing Run, the average raw (non-corrected) signal intensity for the 9 pixels surrounding the selected location, and plotting these averages against the succession of reagent flows. (Note that this calculation does not give any consideration to the notion of “wells”.) Accessible from the Wells tabs of the 454 Sequencing System software applications.
Subtraction (raw) flowgram; A plot of the subtraction of any two location flowgrams, flow by flow. Created by the use of subtraction pins in location flowgram plots, accessible from the Wells tabs of the 454 Sequencing System software applications.
Triflowgram; A plot of the subtraction of an idealized or consensus flowgram from an observed flowgram for a Control DNA reference sequence, flow by flow. Accessible from the Control DNA tabs of the 454 Sequencing System software applications.
Well status; A categorization of the signal intensity of a well based on a variety of criteria. Listed in several output files and viewable in several GUIs of the 454 Sequencing System software applications. Passed Filter – Library read that passed all quality filters; No Key – Identified as a well (generates signal), but not one with recognizable data; Failed – Library read that failed any of the quality filters; Control DNA – Control DNA read; Key Pass – matches a sequencing key, library or control.
Wells tab; GS Run Browser application tab, contains raw images of the PTP Device which can be displayed for various well categories and the selected base flow. Also reports summary statistics of the average well density of the PTP Device, per-region and total, for raw well, and key pass wells.