QuEST 2.4

Table of Contents

I. License and Terms of Use
II. Disclosures
III. QuEST Reference and Home Page [renumber subsequent sections]
IV. Changes
V. Software summary
VI. Quick start guide
VII. File formats
VIII. QuEST overview
IX. FAQ

===============================================================
I. Licence and Terms of Use
===============================================================

This code is written by Anton Valouev and is copyrighted under
the Lesser GPL:

Copyright (C) 2008 Anton Valouev

 This program is free software; you can redistribute it and/or modify 
 it under the terms of the GNU Lesser General Public License as       
 published by the Free Software Foundation; version 2.1 or later.    
 This program is distributed in the hope that it will be useful,     
 but WITHOUT ANY WARRANTY; without even the implied warranty of      
 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 See the GNU Lesser General Public License for more details. 
 You should have received a copy of the GNU Lesser General Public 
 License along with this program; if not, write to the Free Software
 Foundation, Inc., 59 Temple Place - Suite 330, Boston,
 MA 02111-1307, USA.
 
 The author may be contacted via email at: valouev@stanford.edu

==================================================================
II. Disclosures
==================================================================

current version: 2.4
current development status: stable release

#################################################################
#                                                               #
#   If you discover any bugs or have any suggestions,           #
#   please contact Anton Valouev (valouev@stanford.edu).        #
#                                                               #
#################################################################

==================================================================
III. QuEST Reference and Home Page

==================================================================
QuEST is a statistical package for analysis of ChIP-Seq data.

If you use QuEST or any derivatives of its results, or any part of
QuEST for any purpose, please cite the following in your 
publications, patents and code:

 Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S,
 Myers RM, Sidow A (2008). Genome-wide analysis of transcription
 factor binding sites based on ChIP-Seq data.
 Nat Methods 5, 829-834 (2008)

QuEST Home Page:
http://mendel.stanford.edu/downloads/quest/

==============================================================
IV. Changes
==============================================================
QuEST 2.4
   1. Incorporated region filter. If background data is provided, QuEST
      will scan the background data to detect artifactual regions where
      where enrichment fold of the control data exceeds 100 fold and
      will extend until 5 fold enrichment boundary. Any ChIP regions
      overlapping artifactual control regions will be filtered out
   2. Added eland_extended format for the input data
   3. Added bowtie format for the input data (bowtie alignments generated
      with -z flag are not supported. -z flag will cause bowtie to report
      alignments in non-contiguous order. Currently stick to bowtie
      without -z flag).
   4. Added maq alignment code. Output of maq mapview command can now
      be directly feeded into QuEST.
   5. Added report files for importing alignments (check align_2_QuEST
      report file under module outputs for the import error log)
   6. Added report file for the metrics module (check metrics.ChIP.report.txt
      under the module outputs directory). You can now find out how
      much ChIP/control data false into the called regions. This is a good
      metric for the ChIP quality
   7. Added report file for the peak filter module (check the peak_filter.ChIP.report.txt
      under the module outputs directory)
   8. Added report file for the QuEST parametrization (check the QuEST_parametrization.report.txt
      for the complete set of QuEST parametrization parameters)
      
QuEST 2.3
   1. QuEST ChIP scores (after normalization) now represent fold enrichment
      and are thus more interpretable
   2. QuEST configuration script now operates with fold enrichments rather
      than KDE score cutoffs. Internally QuEST still uses KDE scores, but
      they are now not visible to the user directly and parametrization
      occurs using normalized ChIP scores (fold enrichment)
   3. Eliminated rescue fold, stringency relative to background (input) can
      be set using ChIP_to_background fold enrichment parameter during parametrization
   4. Added a filter to the "TF with motif and a narrow (punctate) binding site"
      option that only reports a single peak per region and removes regions with
      no peaks in the "peak_caller.ChIP.out.accepted" file. All initial regions
      and peaks before filtering can be found in the "peak_caller.ChIP.out.with_metrics"
      file
   5. Added tag collapsing that removes stacks from the input data. 
   6. Removed oversampling metric throughout as it wasn't found to be very useful
   7. Substituted Bonferroni correction with Benjamini-Hochberg correction for p-values
   8. Added -log10_pv field to the output file to provide p-values

QuEST 2.2
   1. added a new data visualization output through the bed graph UCSC genome browser display
   3. improved region boundary detection by applying "ChIP extension threshold". Now
      regions are seeded using "ChIP threshold" and boundaries are defined
      by the "ChIP extension threshold".
      

QuEST 2.0
   Parts of QuEST were rewritten to significantly speed up the execution.
      
   QuEST now incorporates a number of improvements and additional
   features listed below. QuEST now produces bed and wig files 
   that can be viewed on the UCSC genome browser. 

   1. Global peak shift estimation.
      Global peak shift estimation is now completely re-written. In
      the initial pass candidate regions are identified based on the 
      tag count threshold and tag count enrichment threshold compared to control
      experiment. For each candidated region local peak shift estimate
      is given by the distance for which correlation between opposit
      strand profiles is maximized. The global peak shift estimate
      is given by the mode of the peak shift density estimate
      obtained using KDE on local peak shift estimates from candidate
      regions. This approach provides substantial improvement over 
      previously implemented method.

   2. Local peak shift estimate
      Local peak shift estimate is a new metric that can be used 
      to filter out regions where peak shift falls outside the 
      expected range (typically 30-100 bp). Local peak shift estimate
      is given by the shift between opposite strand density profiles
      where correlation between profile values within a window is maximized.

   3. Strand correlation values
      Strand correlation values reported by QuEST is a new metric
      that can be used to filter out regions and peaks. Strand correlation
      measures similarity between density profiles of +/- strands
      after correcting for the local peak shift. Regions with low values
      of correlation (0.5 or less) are likely to be artifacts and not
      real regions of enrichment.

   4. Tag oversampling
      Tag oversampling is a new metric that can be used to filter out regions
      resulting from mapping artifacts and repetetive regions. Oversampling
      measures the ratio of observed mean of tag instances at positions
      with mapped tags within the region compared to expectation of tag instances
      under uniform distribution. High values of oversampling usually indicate
      outliers such as loci with only a few positions where reads start but
      with many reads starting at these few positions. Outliers usually appear
      to have oversampling values of 10 or more. Good ChIP-enrichment regions
      usually have oversampling values of 1-5. Be aware that bottlenecked libraries
      (i.e. where the same template is typically sequenced multiple times) will
      result in high values of oversampling even in the correct regions.

   5. Q-values for regions
      Q-values for regions represent Poisson p-values based on the control
      experiment corrected for the number of tests (genome size). 

   6. Q-values for peaks
      Q-values for peaks represent approximated FDR for the peak intensity.
      Control data profile is first analyzed to build the peak num vs profile
      threshold curve which is than approximated by the exponential curve
      in the form c*exp(-score/mu) for large values of score. This curve is
      then applied with the scale to the ChIP data to approximate FDN for
      any score value and the q-value is then calculated by deviding FDN
      by the total number of peaks identified.


===============================================================
V. Software Summary
===============================================================

Target OS: Unix-type systems

QuEST was tested on Linux ES with kernel 2.6.9, and Mac OS 10.5.
Software requirements: perl, make, g++ compiler
Hardware requirements: 1Gb of RAM, ~ 20 Gb of disc space for 
temporary files. 

===============================================================

What you will need to run QuEST: 

1. ChIP data

   Reads from the ChIP-Seq experiment. 
   
2. RX_noIP data
  
   Reads from the RX_noIP experiment. This is also known
   as total chromatin or input.

The following alignment formats are currently supported in QuEST (see below for the format descriptions): 
	     - Solexa align
	     - Eland align
	     - Eland extended align
	     - Bowtie align (except for generated with -z option)
	     - MAQ align (generated by running MAQ mapview command)	
	     - QuEST align

3. Reference genome or genome table (see below for the format 
   description)

   For the reference genome, there should be one contig per .fa file.
   Alternatively, you can provide a genome table containing chromosome
   names and their sizes (see below for the detailed format 
   description). Supported formats: fasta, genome table. Note that 
   you need either the analysis path or the genome table, but not 
   both, or else the script will not run.

==============================================================
VI. Quick Start Guide
==============================================================

To run QuEST, the following commands should execute on your system:
make, g++, perl. 

1. Configure QuEST to run with your data. 
   This will generate all the necessary configuration files and 
   thresholds:

   command: generate_QuEST_parameters.pl

   flags:   
	    -solexa_align_ChIP <file containing ChIP reads in
			       solexa align format (see below)>

	    -solexa_align_RX_noIP <file containing RX_noIP 
				  (aka input/total chromatin) 
				  reads in solexa align format
				  (see below)>

	    -rp <path to the reference genome containing .fa files>
	    -ap <analysis path where the QuEST results and 
		all files will be located> 

            -silent (this will assume all the default parameters 
		    and will not prompt the user)

   example:

   $/home/agent_smith/QuEST/generate_QuEST_parameters.pl -solexa_align_ChIP /Volumes/Projects/GABP/GABP.align_25.hg18.txt -solexa_align_RX_noIP /Volumes/Projects/background/Jurkat_RX_noIP.align_25.hg18.txt -rp /Volumes/Projects/Genomes/Human/hg18/ -ap /Volumes/Projects/GABP/analysis/

   all flags and help:

   to list all flags execute "generate_QuEST_parameters.pl"

2. Run QuEST:

   command: run_QuEST_with_param_file.pl

   flags: 
          -ap <path to the analysis directory>

   example:

   $/home/agent_smith/QuEST/run_QuEST_with_param_file.pl -ap /Volumes/Projects/GABP/analysis/

3. QuEST will output the following files of interest:

QuEST summary: QuEST.out

QuEST calls:

  QuEST calls for ChIP peaks and regions: analysis_directory/calls/peak_caller.ChIP.out
  QuEST calls with additional metrics: analysis_direcotry/calls/peak_caller.ChIP.out.with_metrics
  QuEST calls that passed the quality filter: analysis_directory/calls/peak_caller.ChIP.out.accepted
  QuEST calls that did not pass the quality filter: analysis_directory/calls/peac_caller.ChIP.out.rejected

  pseudo_ChIP peak calls: peak_caller.pseudo_ChIP.out	

UCSC genome browser tracks (these are good for visualisation of data and analysis):

  QuEST calls bed track: analysis_director/tracks/ChIP_calls.filtered.bed

  ChIP data bed tracks: analysis_directory/tracks/data_bed_files/ChIP.bed.gz
  (these files can sometimes be too large to load into UCSC genome browser,
  but you can use by chromosome files instead)

  ChIP data bed tracks by chormosome: analysis_directory/tracks/data_bed_files/by_chr/ChIP

  RX_noIP data bed tracks: analysis_directory/tracks/data_bed_files/RX_noIP.bed.gz	
  (these files can sometimes be too large to load into UCSC genome browser,	
  but you can use by chromosome files instead)

  RX_noIP data bed tracks by chormosome: analysis_directory/tracks/data_bed_files/by_chr/RX_noIP

  normalized ChIP enrichment profiles: analysis_directory/tracks/wig_profiles/ChIP_normalized.profile.wig.gz
  normalized ChIP enrichment profiles by chromosome: analysis_directory/tracks/wig_profiles/by_chr/ChIP_normalized/

  unnormalized ChIP enriment profiles: analysis_directory/tracks/wig_profiles/ChIP_unnormalized.profile.wig.gz
  unnormalized ChIP enrichment profiles by chromosome: analysis_directory/tracks/wig_profiles/by_chr/ChIP_unnormalized/	

  normalized background enrichment profiles: analysis_directory/tracks/wig_profiles/background_normalized.profile.wig.gz
  normalized background enrichment profiles by chromosome: analysis_directory/tracks/wig_profiles/by_chr/background_normalized/

  unnormalized background enriment profiles: analysis_directory/tracks/wig_profiles/background_unnormalized.profile.wig.gz
  unnormalized background enrichment profiles by chromosome: analysis_directory/tracks/wig_profiles/by_chr/background_unnormalized/	

Report files:
   module_outputs/align_2_QuEST.ChIP.report.txt
   module_outpus/align_2_QuEST.RX_noIP.report.txt
   these two files contain information about problems that QuEST encountered when importing
   alignments

   module_outputs/metrics.ChIP.report.txt
   contains information about what percentage of ChIP and background fell within the called regions

   module_outputs/QuEST_parametrization.report.txt
   contains information about parametrization choices provided for running QuEST

   module_outputs/peak_filter.ChIP.report.txt
   contains information about result of application of QuEST filters on regions and peaks

==============================================================
VII. File Formats
==============================================================
The following input file formats are supported in QuEST:

1.  Read aligment formats
1.a Solexa/Illumina read alignment files (standard Solexa 2007 align file).
    The fields are space delimited (X means QuEST does not care about
    what is in that particular field as long as it is present). Each
    line corresponds to the alignment of one read and the coordinate
    gives the first base in the reference genome to which the read
    aligns. F/R specifies whether the alignment is to the sense or 
    antisense strand of the reference genome. Coordinates begin with 
    position 1.

Definition:
<DNA sequence> X X <chromosome:coordinate> <F/R>

Example:
GACTACACCTCTAACCACTACCATC 1250 1 chr4:72788542 F

1.b Solexa/Illumina eland/vmf alignment files (extended eland alignment
    files are not yet supported). Each line corresponds to a 
    single alignment, and aligment coordinate corresponds to the first
    aligned base relative to the reference genome.
    Fields are tab delimited (X means QuEST does not care about
    the value of a particular field as long as it is non-empty).
    F/R specifies whether the alignment is to the sense or antisense
    strand of the reference genome. Coordinates begin with position 1.

Definition:
X	<DNA sequence>	X	X	X	X	<chrom_name.fa>	<coordinate>	<F/R>

Example:
>HWI-EAS105_4:4:1:1664:1790     GAAAAGAACTGAAGGATCTCAAAAAT      U1      0       1       0       chr1.fa 37883447        F       DD 1C


1.c QuEST align file
    Sequencing read QuEST align files. Fields are space delimited; 
    each line corresponds to alignment of one read. Coordinate gives 
    position of the 5 prime base coordinate of the read relative to 
    the reference genome. +/- specifies whether the alignment is to 
    the sense or antisense strand of the reference genome. 
    Coordinates begin with position 0.

Definition:
<chromosome> <coordinate> <+/->

1.d Eland extended align file.
    will add description

Example:
>HWUSI-EAS270R:5:1:0:372#0/1    NGTCTCACTGGCAAATCCATTCCCTCACTGACATGA    1:1:1   chr4.fa:135597163RA13G11G3TGG1TG,chr8.fa:113733968FG35

1.e Bowtie align file (except for ones generated using -z option)
    will add description

Example: 
18213   +       chrX    19506600        CTGGTGTTAATGTTCAAGCTGTCTCTGGCTGGAGCT    IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII    21

Example:
chr1 23942576

1.e MAQ align file generated using MAQ mapview command
    will add description

Example:
HWUSI-EAS270R:5:59:47:243#0/1   chr10   3000454 -       0       0       0       0       0       0       0       9       18      36      AAGCTGTTTTCCTGCTACTCTCTGAGCCTGCTACCC    _\^`^`WP]\^`]^`\V`Z`\`\`Z```XX``T```

2. Reference genome files
2.a QuEST genome table file. Specifies sizes of the chromosomes
    to which the aligments were made. The entries are organized one
    chromosome per line, fields are space separated. Note that 
    chromosome names of the aligments and genome table file should 
    match or else QuEST will not use those reads.

Definition:
<chromosome> <size in bp>

Example:
chr18 76117153

2.b Reference genome fasta files. QuEST will accept standard fasta
    files with the reference sequence such as those that can be 
    obtained from UCSC genome browser. Chromosome names in fasta 
    files following '>' symbol must match chromosome names in the 
    alignment files or else QuEST will dismiss those reads. Space 
    is not allowed in the chromosome names. Chromosomes can be 
    organized one per file or all in one file, but make sure no two 
    same chromosomes are present in .fa files in the path provided to
    QuEST or else the size of the genome may be overestimated leading
    to improper choice of QuEST parameters. Using fasta files QuEST
    will produce a genome table file (see above) in the analysis path 
    directory which can be used for this genome in future analysis 
    instead of the fasta files. You can check the genome table file 
    for the correct names of the chromosome and presence of any 
    duplicate chromosome entries and edit it to your liking. 

The following output file formats are adopted in QuEST:

1. QuEST ChIP calls. QuEST calls provide information about regions
   of significant enrichment compared to control experiment. (these
   may be slightly outdated, please check your peak call file)

   calls/peak_caller.ChIP.out
   This file contains initial unfiltered results of the ChIP-Seq caller.
   Entries represent significant regions and peaks and are sorted
   according to the enrichment intensity of the strongest peak
   within the region. Regions are separated by the empty line.
   Each region entry is followed by one or more peak entries 
   corresponding to peak within that region.

   Definition:

   R-<region_id> <chromosome> <region start>-<region end> ChIP: <ChIP enrichment intensity of the strongest peak> control: <control enrichment intensity at the position of the strongest ChIP peak> max_pos: <coordinate of the highest peak in the region> ef: <normalized enrichment fold at the position of the strongest peak>
   P-<region_id>-<peak_id> <chromosome> <peak coordinate> ChIP: <ChIP enrichment intensity> control: <control enrichment intensity> region: <region start>-<region end> ef: <normalized_enrichment fold>

   Example:

   R-1 chr8 123640927-123641979 ChIP: 0.58 control: 0.0027 max_pos: 123641348 ef: 262.007
   P-1-1 chr8 123641348 ChIP: 0.58 control: 0.0027 region: 123640927-123641979 ef: 262.007

   R-2 chr8 87322905-87324677 ChIP: 0.533 control: 3.7e-05 max_pos: 87323637 ef: 17570.083
   P-2-1 chr8 87323192 ChIP: 0.431 control: 0 region: 87322905-87324677 ef: inf
   P-2-2 chr8 87323637 ChIP: 0.533 control: 3.7e-05 region: 87322905-87324677 ef: 17570.083
   P-2-3 chr8 87323991 ChIP: 0.474 control: 0.00356 region: 87322905-87324677 ef: 162.397
   P-2-4 chr8 87324464 ChIP: 0.292 control: 0.0069 region: 87322905-87324677 ef: 51.616

   In this example two regions are reported one containing a
   single local peak and another containing 4 local peaks.

   calls/peak_caller.ChIP.out.with_metrics

   This file contains the same entries as the peak_caller.ChIP.out file
   with additional metrics characterizing regions and peaks.

   Definition:

   R-<region_id> <chromosome> <region start>-<region end> ChIP: <ChIP enrichment intensity of the strongest peak> control: <control enrichment intensity at the position of the strongest ChIP peak> max_pos: <coordinate of the highest peak in the region> ef: <normalized enrichment fold at the position of the strongest peak> ChIP_tags: <ChIP reads starting within the region> background_tags: <background reads starting within the region> tag_ef: <normalized tag enrichment fold within the region> ps: <local peak shift estimate> cor: <local strand correlation value> os: <local oversampling value> z_score: <z-value for deviating from the Poisson expectation> n_lg_qv: <negative log_10 of Poisson p-value corrected for the number of tests>
   P-<region_id>-<peak_id> <chromosome> <peak coordinate> ChIP: <ChIP enrichment intensity> control: <control enrichment intensity> region: <region start>-<region end> ef: <normalized_enrichment fold> ChIP_tags: <ChIP reads starting within the region> background_tags: <background reads starting within the region> tag_ef: <normalized tag enrichment fold within the region> ps: <local peak shift estimate> cor: <local strand correlation value> os: <local oversampling value> z_score: NA n_lg_qv: <negative log_10 of approximated FDR>

   Example:

   R-1 chr8 123640927-123641979 ChIP: 0.58 control: 0.0027 max_pos: 123641348 ef: 262.007 ChIP_tags: 660 background_tags: 16 tag_ef: 50.3 ps: 34 cor: 0.824 os: 1.54 z_score: 178.603 n_lg_qv: 40.731
   P-1-1 chr8 123641348 ChIP: 0.58 control: 0.0027 region: 123640927-123641979 ef: 262.007 ChIP_tags: 267 background_tags: 6 tag_ef: 54.3 ps: 34 cor: 0.832 os: 1.54 z_score: NA n_lg_qv: 2.118

   R-2 chr8 87322905-87324677 ChIP: 0.533 control: 3.7e-05 max_pos: 87323637 ef: 17570.083 ChIP_tags: 717 background_tags: 6 tag_ef: 146 ps: 34 cor: 0.754 os: 1.56 z_score: 321.054 n_lg_qv: 40.731
   P-2-1 chr8 87323192 ChIP: 0.431 control: 0 region: 87322905-87324677 ef: inf ChIP_tags: 185 background_tags: 0 tag_ef: inf ps: 45 cor: 0.905 os: 1.68 z_score: NA n_lg_qv: 2.109
   P-2-2 chr8 87323637 ChIP: 0.533 control: 3.7e-05 region: 87322905-87324677 ef: 17570.083 ChIP_tags: 252 background_tags: 0 tag_ef: inf ps: 34 cor: 0.745 os: 1.49 z_score: NA n_lg_qv: 2.115
   P-2-3 chr8 87323991 ChIP: 0.474 control: 0.00356 region: 87322905-87324677 ef: 162.397 ChIP_tags: 221 background_tags: 1 tag_ef: 270 ps: -8 cor: 0.665 os: 1.42 z_score: NA n_lg_qv: 2.111
   P-2-4 chr8 87324464 ChIP: 0.292 control: 0.0069 region: 87322905-87324677 ef: 51.616 ChIP_tags: 127 background_tags: 2 tag_ef: 77.5 ps: 61 cor: 0.768 os: 1.57 z_score: NA n_lg_qv: 2.101

   calls/peak_caller.ChIP.out.accepted, calls/peak_caller.ChIP.out.rejected
   
   These files contain entries in the same format as 
   calls/peak_caller.ChIP.out.with_metrics with additional filters applied
   based on the peak shift and oversampling values. Default filter values
   are os <= 5 and ps>= 0. .accpeted file will contain entries passing the 
   filter and .rejected file will contain regions and peaks for which 
   the filter didn't pass either for a region or for any one peak within the
   region.
   

2. QuEST summary file module_outputs/QuEST.out contains information about
   the number of identified regions and peaks and FDR if applicable.

   Example:

   ChIP peaks: 2510
   ChIP peaks accepted: 1762
   ChIP peaks rejected: 748

   ChIP regions: 2319
   ChIP regions accepted: 1767
   ChIP regions rejected: 552

3. UCSC Genome Browser track files

   tracks/ChIP.filtered.bed
   
   Track file displaying regions and individual peak positions.
   
   tracks/data_bed_files
   
   directory containing bed files for both ChIP and RX_noIP data
   for the entire genome as well as for individual chromosomes
   
   tracks/wig_profiles

   directory containing wig files for both normalized and 
   unnormalized CDP profiles for ChIP and background for the entire
   genome and by chromosome.

==============================================================
VIII. QuEST software overview
==============================================================

QuEST performs analysis of ChIP and control sequencing data to
identify positions in the genome that are enriched in the ChIP data
compared to the control data. If sufficient number of the control
reads is provided, QuEST also performs a false discovery rate (FDR)
for the choice of QuEST parameters.

The QuEST workflow consists of several modules that are implemented as 
stand-alone executables consolidated into the QuEST pipeline using 
wrapper and master scripts.

QuEST starts with the generate_QuEST_parameters.pl script which 
converts the data to usable formats, creates parameter files and 
defines necessary thresholds. Specifically, the sequencing data are
converted into QuEST alignment formats, the genome is formatted into 
a genome table, and the control data is split into background and 
pseudo ChIP data according to user�s needs. Parameter files are
created for individual QuEST modules that contain all the necessary 
input/output definitions and thresholds for the modules to be executed.

The QuEST master script is called run_QuEST_with_param_file.pl. It 
performs the QuEST analysis by executing necessary modules according
to the user specifications.

Modules are implemented by wrappers that run executables by 
specifying parameters from the parameter files. Quick Window Scan
and Calibrate Peak Shift Modules implement the parameter estimation
part of QuEST.  The modules CDP and Peak Caller implement the peak 
finding part of QuEST.  The FDR estimation is implemented by 
run_QuEST_with_param_file.pl based on the output of peak calling for 
ChIP and pseudo ChIP data.

Here is a brief description of how QuEST works:

1. Peak shift estimation. 
To estimate peak shift, QuEST initially selects candidate regions
to perform this estimation. Initial candidates are identified using
Quick Window Scan module, that is a sliding window that finds positions
where a window contains a lot of tags and is enriched compared to control.
Regions are then ranked by the tag count and the top 200 hundred after
some light filtering is used for peak shift estimation. For each region
the peak shift estimate is given as the distance for which forward and
reverse tag density profile show maximum of correlation after being shifted
towards each other by that distance. Global peak shift estimate is then
given by the mode of local peak shift estimates across 200 regions. This mode
is calculated after applying kernel density estimation to the 200 sample
points (bw=10).

2. CDP calculation
Combined density profile represents enrichment in the data. It is calculated
after applying KDE to both strands and adding obtained values after the peak
shift correction. CDP is calculated for both input and ChIP data.

3. Peak calling
Peak caller will find initial "seeds" using seeding threshold for CDP where
ChIP data is enriched compared to control (ChIP to background fold enrichment)
and then extending the seeded regions in both directions until the value
of the CDP becomes lower than the ChIP extension threshold. This gives an enriched
region. Within the region the peaks are idenfied as local maxima of CDP that
meet the seeding threshold and ChIP to background enrichment threshold and for
which a "dip" is observed between the two called peaks within the same region.
If only a single peak exists within the region, it will be reported.

4. Metrics
Metrics are calculated for each region and peak and include local peak shift
estimate, correlation of strands at the global peak shift estimate, Q-values,
fold enrichment, tag fold enrichment, and rank by Q-value.

5. Filter
Filter applies only in some cases (such as punctate peak option) and includes
filter on local peak shift (28 or more) to remove repeat mismapping effects, 
best peak per region, removing regions that don't have any peaks after filtering

==============================================================
VIII. FAQ
==============================================================

1. Q: When trying to load bed/wig files into the genome browser
      the browser just hangs there and never displays the tracks. What
      should I do?

   A: UCSC genome browser currently has a set limit on the number of
      features that can be displayed in the custom track (typically about
      10 M for the bed file). Try using individual chromosome file
      to overcome this limitation.