For details on the algorithm, please see A clustering approach for identification of enriched domains from histone modification ChIP-Seq data, Chongzhi Zang, Dustin E. Schones, Chen Zeng, Kairong Cui, Keji Zhao, and Weiqun Peng, Bioinformatics (2009) ############################################# ############## Installation ################ ############################################# Installation of SICER only requires unpacking the files in the SICER.tgz file. Prerequisites include: I. Install the numpy and scipy packages. More information on this can be found at: http://www.scipy.org/. To check whether numpy and scipy are properly installed, please run $python >>> import numpy >>> import scipy If there is no error message, this step is done. II. Define environment variables. Please open SICER.sh and SICER-rb.sh, replace {PATHTO} in the definition of $SICER with the directory where you want your SICER to be. For alternative approaches, please see the additional notes below. ############################################# ############## Running SICER ############### ############################################# The raw data needs to be in the BED format. See the test.bed file in the ex/ directory for an example. SICER can be run with and without a control library. Examples on running SICER under either condition are included below. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Running SICER with a control library ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Copy the shell script SICER.sh to the directory where test.bed and control.bed is stored. Starting from the two raw bed, to find significant islands with a window size of 200bp and a gap size of 600bp and FDR=1E-3 run: $sh SICER.sh test.bed control.bed 200 600 1E-3 There are a number of other parameters that need to be set, as explained in the shell script SICER.sh. You can also redefine the directory structure so that you don't have to have the code and the data under the same directory,etc. The output file (test-W200-G600-islands-summary-FDR1E-3) has the format: chrom, start, end, ChIP_island_read_count, CONTROL_island_read_count, p_value, fold_change, q_value Note: 1) The significance of all candidate islands are stored in file test-W200-G600-islands-summary. If you want to try a different FDR without changing other parameters, there is no need to run the entire SICER.sh again. only the last substep in SICER.sh needs to be rerun, which can be done by commenting out the previous substeps. 2) Pvalue can also be used to control significance with proper adjustment in the last substep of SICER.sh. Fold change is reported in test-W200-G600-islands-summary as well. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Running SICER without a control library, using a random background ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Copy the shell script SICER-rb.sh to the directory where the raw bed file test.bed is stored. Starting from test.bed, to find significant islands under random background model with window_size of 200 bp and gap size of 400bp and an E-value of 100, run: $sh SICER-rb.sh test.bed 200 400 100 The shell script SICER-rb.sh contains the other parameters that needs to be set, with explanation about what they are and how to change them. You can also redefine the directory structure so that you don't have to have the code and the data under the same directory,etc. The output island file (with appendix .probscoreisland) is also in bed format: chrom start end island_score ############################################# ############## Additional Notes ############ ############################################# 1) Alternative approaches to defining $PYTHONPATH. 1a) You can define $SICER and $PYTHONPATH as global environment variables, so that the modules under /lib are always recognized and the python modules can run on their own without shell script. To do this, Please edit Utility/setup.sh and replace {PATHTO} with the directory under which you will put SICER. Then incorporate the content in setup.sh into the bash configuration file .bash_profile under your home directory. After pasting the content to .bash_profile, please run $source .bash_profile Then all your newly created shells will know $SICER and lib/ . To check, please run $echo $SICER $echo $PYTHONPATH Note setup.sh is applicable only to bash. If you use other shells, contents in setup.sh needs to be modified accordingly. 1b) The above approach depends on the shell used. A shell-independent approach is to insert a sitecustomize.py under ${pythondir}/lib/site-packages/. sitecustomize.py is a special script; Python will try to import it on startup, so any code in it will be run automatically. If sitecustomize.py does not exist, then add it to ${pythondir}/lib/site-packages/. If sitecustomize.py exists under ${pythondir}/lib/site-packages/, then edit it. In sitecustomize.py, please add (if not there already) import sys sys.path.append("{PATHTO}/SICER/lib") 1c) If none of the above works, copy the modules under /lib and /utility to /src, then you are good to go. 2) There are a number of intermediate output files in addition to the final results,see the SICER.sh or SICER-rb.sh for explanation. 3) There are a number of modules under utility/, quite useful for additional analysis: filter_raw_tags_by_islands.py: identify all reads that are on significant islands filter_summary_graphs.py: identify all summary graphs that are on significant islands find_overlapped_islands.py: compare two sets of islands and identify unique and overlapped ones get_windows_histogram.py: generate window read-count statistics islands_statistics_pr.py: generate island score and length statistics slice_raw_bed.py: randomly sample a given number of reads from a raw read library for satuaration analysis. For questions, please email chongzhizang@gmail.com, schonesde@nhlbi.nih.gov, wpeng@gwu.edu