Goal

  • figure out the right easy peak calling

TODO

  • [x] get the peak calls for sox2
  • [ ] what fraction of the genome do they cover?
  • [ ] write a random genome sampler (discard the read in case it overlaps some of the labels)
  • [ ] make 2 histograms: number of counts per windows
    • randomly sampled regions
    • peak regions
  • [ ] plot the same things for oct4
  • compute the count statistics for contiguous bins (100bp) and plot the distribution
  • compare the results (overlap) to the currently called peaks with smoothed signal

Next

  • [ ] Setup the pipeline for learning from a single bigwig file:
    • pre-compute a set of high-count regions
    • find 'interesting' regions. Sample regions w.r.t. how interesting they are and update your model