%load_ext autoreload
%autoreload 2
from matlas.fimo_hits import make_motif_df, combine_motif_dfs
logdir = "/mnt/lab_data/kundaje/msharmin/mouse_hem/mtf"
outdir = "/mnt/lab_data/kundaje/msharmin/mouse_hem/mtf/specific_hits/task_{}".format(273)
nonz_dfs = combine_motif_dfs(logdir, outdir, zscored=False)
z_dfs = combine_motif_dfs(logdir, outdir, zscored=True)
len(z_dfs)
Any instance that are inside a MEL peak is considered. Rest of the instances are discarded. In each violin plot, for both fimo and deeplift density curve the number of instances are same.
from matlas.fimo_hits import plot_violins_alternate
plot_violins_alternate(z_dfs, 'zscore', nonz_dfs, 'raw_score')
After selecting instances inside a MEL peak, no filtering is applied for fimo instances.
For deeplift based instances, only those with high deeplift score
is considered. Here, high deeplift score
refers to a threshold, e.g. total importance or deeplift sum score must be greater than per_base_importance*motif_length
and based on experience per_base_importance=0.0625
works well. 0.0625
is considered from 75 percentile of all possible sum_scores in the deeplift track divided by motif_length.
nonz_dfs2 = combine_motif_dfs(logdir, outdir, zscored=False, filter_low_deeplift=True)
z_dfs2 = combine_motif_dfs(logdir, outdir, zscored=True, filter_low_deeplift=True)
plot_violins_alternate(z_dfs2, 'zscore', nonz_dfs2, 'raw_score')
Number of motif instances retained based on deeplift score shows that in MEL shows the presence of Ar, Bach, Bcl, Ets, Ctcf, Klf, Gata, Sp, Zfx etc.
from matlas.fimo_hits import plot_motif_counts
plot_motif_counts(nonz_dfs2, size=(25, 200))
from matlas.fimo_hits import interactive_plot_motif_counts
interactive_plot_motif_counts(nonz_dfs2, height=400, width=1000)