bigwig-training

Goal¶

figure out the right easy peak calling

TODO¶

[x] get the peak calls for sox2
[ ] what fraction of the genome do they cover?
[ ] write a random genome sampler (discard the read in case it overlaps some of the labels)
[ ] make 2 histograms: number of counts per windows
- randomly sampled regions
- peak regions
[ ] plot the same things for oct4

compute the count statistics for contiguous bins (100bp) and plot the distribution
compare the results (overlap) to the currently called peaks with smoothed signal

Next¶

[ ] Setup the pipeline for learning from a single bigwig file:
- pre-compute a set of high-count regions
- find 'interesting' regions. Sample regions w.r.t. how interesting they are and update your model