Program: Data proprecessing for NSMAP Authors: Zheng Xia Version: 0.1.0 Date: December, 2010 Here we used some code from IsoInfer. We will use the TSS and PAS retrived from annotated isoforms which is downloaded from UCSC known gene table. The preprocessing consists of the following steps: 1. Extract the gene range and TSSPAS into files "GeneRange" and "TSSPAS" perl knownGeneExtractor.pl knownGene.txt 2. Get the boundaries from the output of TopHat into file "Bound" python IsoInfer_Bound_Junction_read_from_tophat.py 3. Get the junction information from the output of TopHat. May need to change the output file name python IsoInfer_reads_mapping_position.py 4. Tophat_extraction will generate the final input for NSMAP. Tophat_extraction used some codes from IsoInfer. For more detail, please refer IsoInfer in http://www.cs.ucr.edu/~jianxing/IsoInfer0.9.1.html The command here is the following: ./Tophat_extraction -bound Bound -grange GeneRange -tsspas TSSPAS -intron_exp 3 -min_dup 2 -min_exp 0.1 -read_info read_info -read_len 76 -o nsmap.input Here 76 is the read length. The output is nsmap.input which will be used by NSMAP. Tophat_extraction is tested executable under Ubuntu 32bit.