Goal¶
- figure out the right easy peak calling
TODO¶
- [x] get the peak calls for sox2
- [ ] what fraction of the genome do they cover?
- [ ] write a random genome sampler (discard the read in case it overlaps some of the labels)
- [ ] make 2 histograms: number of counts per windows
- randomly sampled regions
- peak regions
- [ ] plot the same things for oct4
- compute the count statistics for contiguous bins (100bp) and plot the distribution
- compare the results (overlap) to the currently called peaks with smoothed signal
Next¶
- [ ] Setup the pipeline for learning from a single bigwig file:
- pre-compute a set of high-count regions
- find 'interesting' regions. Sample regions w.r.t. how interesting they are and update your model