MMETSP1354-20130828
-------------------
stats.txt:  Summary  statistics  associated  with contigs.fa. In-
cludes the total number of sequences and bases in the contig set,
N50,  etc.   Q1,  Q2, Q3 are the quartiles of the reported contig
lengths. B1000 and B2000 indicate the  percentage  of  bases  in-
volved in contigs at least 1000 bp and 2000 bp, respectively.

--
contigs.fa:  Contigs  from the assembly, minimum 150 bp. Possibly
includes UTRs. Sequences contain IUPAC ambiguity codes represent-
ing    ambiguous   bases,   http://www.bioinformatics.org/sms/iu-
pac.html.

--
cds.fa: Coding regions associated with contigs, as  predicted  by
ESTscan, minimum 50 bp.  Sequence identifiers for these predicted
CDS are provided suffixes _1, _2, etc., to  accommodate  multiple
predictions,  and  to indicate association with predicted protein
products. Sequences contain IUPAC  ambiguity  codes  representing
ambiguous   bases,  http://www.bioinformatics.org/sms/iupac.html.
Note that the total number of predicted CDS might  be  higher  or
lower  than the number of contigs. This can be due to the report-
ing threshold of 50 nt or multiple predictions per contig.

--
peptides.fa: Protein products associated with  contigs,  as  pre-
dicted by ESTScan, minimum 30 aa.  Sequence identifiers for these
predicted products correspond to the  associated  nucleotide  se-
quence  in  contig.fa, and are provided suffixes _1, _2, etc., to
accommodate multiple predictions. Note that the total  number  of
predicted  peptides  might  be higher or lower than the number of
contigs. This can be due to the reporting threshold of 30  aa  or
multiple predictions per contig.

--
annot/pfam.gff3,  ...: Models matching predicted protein products
(peptides.fa) reported in GFF3 format; based on  HMMER3  searches
against  the  Pfam-A, Superfamily, and TIGRFAMs model sets. These
are restricted to full-sequence-evalue <=  1.0e-5  with  the  top
five hits reported.

Association with InterPro terms is indicated in the Ontology_term
attribute, and is based on the assertions (InterPro -> model, In-
terPro -> protein accession) made by InterPro. Currently InterPro
associations from Superfamily hits are not computed.

annot/swissprot.gff3: Protein sequence accessions  matching  pre-
dicted  protein  products  (peptides.fa) reported in GFF3 format;
based upon NCBI-BLASTP searches against SwissProt.  These are re-
stricted to the top five HSP bitscores with evalue <= 1.0e-20.

------------------------------------
National Center for Genome Resources
http://www.ncgr.org











