skip to main content
Roche logo
1. GS De Novo Assembler : 1.6 Add/Remove Read Data with the Project Tab : 1.6.1 GS Reads, FASTA and FASTQ Reads Sub-Tabs
To add GS read data sets to a project, click on the GS reads sub-tab, then click the Add button to open the “Select GS Read Data Files” window (Figure 6). This dialog window allows you to navigate and see the available read files. Read files already in the project are visible but grayed out. You can select multiple files in this window using the ctrl or shift keys while clicking with the mouse.
Once the desired file(s) are selected, the “Set GS Read Data Attributes” dialog window opens. From this window (Figure 7) the GS Read file(s) can be explicitly specified as Paired End or non-Paired End, or the user can invoke auto-detection using the Read Type Specification dropdown. When the read type is known, it is advisable to specify it directly rather than use auto-detect as the auto-detect feature, on rare occasion, may fail to detect a Paired End file.
You can add any number of Read Data files to a project using the Add button, navigating to a folder, then selecting the files needed from the folder. You can even add multiple files that share the same name into a project, as long as they have different path locations in the file system (e.g. “/dir1/path1/reads.sff” and “/dir2/otherpath2/reads.sff”). In such a case, both files will be added to the Read Data list and displayed using the same name. To see the path to a file listed in the Read Data area of the main window (and the file’s last modification date), hover the mouse over the filename of interest and a tooltip containing the file’s path will be displayed. Files that have failed validation will have a red X left of the file name as an indication of failure. Pause the mouse cursor over the red X to bring up a tooltip explaining the problem encountered.
The FASTA and FASTQ Reads sub-tab is functionally similar to the GS Reads sub-tab for all project types. The only difference is that when a directory containing FASTA/FASTQ files is selected, the software examines the files in that directory to determine which files are FASTA/FASTQ files. This can take some time if there are many files in the directory, so a progress bar is displayed to show the progress of the search. See the Overview Manual, Section 2.2.2 for a discussion of the FASTQ format.
Order of addition of Read data may affect assembly results: In general, the reads should be added to an incremental assembly in the following order to achieve the best contigging and scaffolding results:
Except for the Name column and the Multiplex column (which contains comma-delimited lists of the MIDs associated with the file; see Section 1.6.3, below), all columns with Run statistics are initially filled with dashes. These data are updated in the table each time the project runs to completion. For a project that has already been through an assembly computation, summary statistics relating to the usage of the reads in the assembly process is also listed, as shown in Figure 17. On the left hand side of the reads table is a data export button that can be used to write the reads table data in csv, tab-delimited or plain text formats.