======================================= Blog posts and additional documentation ======================================= Hashtable and filtering ======================= The basic inexact-matching approach used by the hashtable code is described in this blog post: http://ivory.idyll.org/blog/jul-10/kmer-filtering A test data set (soil metagenomics, 88m reads, 10gb) is here: http://angus.ged.msu.edu.s3.amazonaws.com/88m-reads.fa.gz The `filter-exact `__ script can be used as a starting point. Illumina read abundance profiles ================================ khmer can be used to look at systematic variations in k-mer statistics across Illumina reads; see, for example, this blog post: http://ivory.idyll.org/blog/jul-10/illumina-read-phenomenology The `fasta-to-abundance-hist `__ and `abundance-hist-by-position `__ scripts can be used to generate the k-mer abundance profile data:: %% python scripts/fasta-to-abundance-hist.py %% python scripts/abundance-hist-by-position.py ..hist > out e.g.:: %% python scripts/fasta-to-abundance-hist.py 88m-reads.fa 17 %% python scripts/abundance-hist-by-position.py 88m-reads.fa.17.hist > 17.txt The hashtable method 'dump_kmers_by_abundance' can be used to dump high abundance k-mers, but we don't have a script handy to do that yet. The quality trimming script is in `scripts/quality-trim.py `__:: %% python scripts/quality-trim.py You can assess high/low abundance k-mer distributions with the `hi-lo-abundance-by-position script `__:: %% python scripts/hi-lo-abundance-by-position.py This will produce two output files, .pos.abund=1 and .pos.abund=255.