----------------------------------------------------------------------------------- NOTE: For naming consistency, the files below now correspond to the following files downloadable with the Thaps3/Thaps3_bd annotation releases: Thaps3_chromosomes_assembly_chromosomes.fasta chromosome.fasta Chromosome fasta files Thaps3_assembly_organelle.fasta organelle.fasta Mitochondria and Chloroplast fasta files Thaps3_bd_unmapped_assembly_scaffolds.fasta Bottom_Drawer.fasta Bottom_Drawer fasta files Chromosome names in Thaps3_assembly_chromosomes.fasta file have also been standardized to chr_* format, e.g. chr_1, chr_2, .... Scaffold names in Thaps3_bd_assembly_scaffolds.fasta file have also been standardized to bd_* format. Note that the bd_* names may be changed at the time that JGI submits gene model data to GenBank, in order to comply with GenBank requirements. FYI: * The Bottom_Drawer (i.e. 'unmapped') fasta file contains sequences from the finished assembly that were not placed into chromosomes or organelles. * The Thaps3_bd release consists of an annoation of the sequences in the 'BOTTOMDRAWER' file that came with the Thaps3 assembly; after the original Thaps3 (i.e. 'chromosomes') annotation release, the additional sequences in the 'BOTTOMDRAWER' were determined to have important Thalassiosira pseudonana genes. For that reason, the Thaps3_bd annotation was performed. --Robert P. Otillar Thaps3 Lead Annotator DOE Joint Genome Institute RPOtillar@lbl.gov 2008 10 14 ----------------------------------------------------------------------------------- README Thalassiosira_pseudonana_v3.031306 Chromosomes were numbered based on the Optical Map and Science 306, 79-86 This release is composed of 3 fasta files: chromosome.fasta Chromosome fasta files organelle.fasta Mitochondria and Chloroplast fasta files Bottom_Drawer.fasta Bottom_Drawer fasta files Gaps are represented by Ns. The number of Ns is an estimate of the gap size. When contigs are linked, they are orientated relative to one another. When contigs are not linked they are represented as different scaffolds (a,b,c etc) and unless there is some other guide (ie a telomere on one end of a contig) they are not necessarily orientated correctly. Some scaffolds could not be placed on the genome. This could either mean they fall inside unresolved gaps or are polymorphic representations of areas already captured on a chromosome. Some of these scaffolds however did contain genes based on EST evidence. These scaffolds have been included in this release in the 'Bottom Drawer'. Telomere signature sequence: ACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAAC CCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCC TAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCCTAACCCT AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAA CCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACC CTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCT Due to the polymorphic nature of this genome, the strategy for improvement was that of targets - a target being a low quality region or gap as identified by the JAZZ assembly. Improvement was carried out on these targets only. Therefore the areas of the draft assembly that were considered high quality by the JAZZ assembly were not necessarily verified. The assembly is a mosaic of the 2 haplotypes. Chromosome SHGC_PID JGI_Scaffold/s x draft coverage 1 5050 1 12.64 2 5052 2 12.10 3 5060 6,22 12.29, 12.65 4 5056 4 12.70 5 5054 3 12.76 6 5058 5 12.55 7 5061 7,12 12.58, 12.51 8 5069 21,15 15.19, 12.05 9 5077 40,23,27,34 13.19, 13.07, 12.22, 11.59 10 5062 8 13.17 11 5071 44,17,36 13.38, 12.40, 11.58 12 5080 25,26,37 12.08, 12.34, 12.64 13 5063 9 13.02 14 5064 10 17.41 15 5065 11 12.02 16 5074 20 12.09 17 5070 16 12.78 18 5067 13 12.52 19 5073 19,31,29 7.90, 11.87, 12.25 20 5068 14 15.52 21 19,24 ***Duplicated not included in this release*** 22 5072 28,18 13.28, 12.27 23 5078 24 11.85 24 5084 30 12.42 mitochondria from GenBank ? chloroplast 5090 ? Chromosome SIZE #Scaffolds # Captured Gaps Telomere(s)? 1 3042585 1 1 1 2 2707195 1 1 2 3 2440052 1 0 1 4 2402323 1 2 1 5 2305972 1 0 1 6 2071480 1 0 1 7 1992434 1 0 2 8 1267198 1 0 1 9 1191060 1 0 1 10 1105668 1 0 1 11a 806142 1 1 1 11b 82843 1 0 1 12 1128382 1 5 2 13 1052196 1 0 1 14 998643 1 0 0 15 931268 1 1 2 16a 501076 1 0 1 16b 173712 1 0 1 17 659924 1 0 1 18 827053 1 0 2 19a 607239 1 3 1 19b 151677 1 0 0 19c 291194 1 1 0 20 800234 1 1 2 21 22 1057565 1 3 1 23 454954 1 0 0 24 297359 1 0 1 TOTAL 31,347,428 bp 27 19 29 Mitochondria 43827 bp Chloroplast 128814 bp Bottom Drawer 1,176,344 bp NOTES: Chromosome 1 Repeat gap spanned by TDO21-H02 and TDO49-G24. Missing one telomere. 2 Repeat gap spanned by TDO13-I20 and TDO89-B04. 3 Complete 4 Repeat gaps spanned by TDO89-I15. Missing one telomere. 5 Missing one telomere 6 Missing one telomere 7 Complete 8 Missing one telomere 9 Missing one telomere 10 Missing one telomere 11 Scaffold 11a and 11b are separated by sequence that appears to converge with Chromosome 22. Due to the polymorphisms we were not able to tell definitively which part of chromosome22 was involved in the duplication 12 Repetitive chromosome. All 5 gaps are repeat gaps which may be longer than fosmid lengths 13 Missing one telomere 14 Missing both telomeres. NOTE: Transposon sequencing failed to verify 492100-493000. Unsure number of repeat copies 492100-493000. 15 Repeat gap spanned by TDO65-L04 and TDO13-C17. 16 2 Scaffolds. Scaffolds separated by repeats larger than a fosmid length. 17 Missing one telomere 18 Complete 19 Very confused. Still in 3 scaffolds with no evidence of any linking between the scaffolds. Missing one telomere. 20 Repeat gap spanned by TDO21-B13 and PQJ03-C10. 21 Duplicated chromosome from Optical map. Not included in this release 22 Repetitive chromosome. All 3 gaps are repeat gaps. Missing one telomere. 23 Missing both telomeres 24 Missing one telomere