In [1]:
# Parameters
sample_name = "ES;E14;A;GEO"
modisco_root = "/srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_modisco2019"
mitra_subdir = "report/version2"
task_dir = "task_99-naivegw"
database_name = "CISBP"
perf_file = "/srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/fineFactorized/task_99-naivegw/NaiveauPRC.txt"
homer_root = "/srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_scans"
In [2]:
from matlas.modisco_report import modisco_report_pipeline, display_metadata
reportfile= "/mnt/lab_data/kundaje/msharmin/mouse_hem/with_tfd/full_mouse50/filtering samples_MS2.xlsx"
sheetname = "filter23"
load data from labcluster
Using TensorFlow backend.
2019-07-22 20:09:23,582 [WARNING] git-lfs not installed
In [3]:
display_metadata(sample_name, perf_file, reportfile, sheetname)
    Sample Information
    MetaData NameDescription
    Cell typeE14 ES cells
    Cell GroupES cells and embryonic tissues
    Experiment NameATAC
    Experiment GroupGEO
    Pipeline Output
    replicateNaïve overlap peaksIDR peaksTSS enrichment (< 8 is very poor <10 is low)Final number of unique mapping, dup-filtered, chrM filtered readsNumber of reads in called peak regionsFraction of reads in called peak regionsNumber of reads in promoter regionsFraction of reads in promoter regionsNumber of reads in enhancer regionsFraction of reads in enhancer regions
    rep119461411882412.55911791604726156590.146219787820.110668049410.3802
    rep219461411882412.2691452818115738070.108515583300.107454902980.3784
    rep319461411882411.44341320017911775040.089314117940.107150021510.3795
    rep41946141188249.7165118937886196570.052210993510.092643437370.3658
    Modelling Metadata
    MetricValue
    auPRC0.5846
    Calibrated Recall at 50% FDR0.233
    Number of Positive Examples in Test Data126558
    Number of Negative Examples in Test Data7944293
    Imbalance Ratio in Test Data0.0157
    Test Chromosomeschr2, chr3, chr19
In [4]:
from matlas.modisco_report import display_comparative_motif_sets
display_comparative_motif_sets(sample_name, homer_root, modisco_root)
TF-MoDISco is using the TensorFlow backend.
Number of CISBP motifs obtained by TF-MoDISco and Homer-denovo
Shared Motifs
Motif NameModiscoHomer
Pou5f1
Pbx3
Ctcf
Unique TF-MoDISco Motifs
Motif NameModiscoHomer
Lhx2absent
Sp2absent
Rfx1absent
Sox17absent
Zfp281absent
Mbtps2absent
Fosabsent
E2f1absent
Smarcc2absent
Restabsent
Klf1absent
Rfx5absent
Creb3absent
Erfabsent
Tcfebabsent
Sp3absent
Grhl1absent
Unique Homer Motifs
Motif NameModiscoHomer
Zfp143absent
E4f1absent
Rfx8absent
Mbd1absent
Gabpaabsent
Klf5absent
Esx1absent
Sox9absent
In [5]:
modisco_report_pipeline(sample_name, modisco_root, mitra_subdir, task_dir, database_name, 
                        importance=True, render=True)
rsync -t -av /srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_modisco2019/task_99-naivegw/cisbp_tomtomout /srv/www/kundaje/msharmin/report/version2/task_99-naivegw/
chmod -R +755 /srv/www/kundaje/msharmin/report/version2/task_99-naivegw
Displaying motifs which has positive importances for the cell type
metacluster_0, # patterns: 31, # seqlets: 18544, Positive for: ES;E14;A;GEO
  • pattern_0: # seqlets: 2636 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Ctcf, Ctcfl

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_1: # seqlets: 1996 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sp2, Sp3, Maz, Zbtb7a, Sp5

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_2: # seqlets: 1611 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Zfp281, Zfp148, Sp2, Sp3, Klf5

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_3: # seqlets: 1421 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Pbx3, Foxi1, Nfya, Ybx1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_4: # seqlets: 1266 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sp3, Wt1, Sp2, Egr4, Maz

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_5: # seqlets: 1239 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Pou5f1, Sox2, Tbpl2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_6: # seqlets: 1125 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    E2f1, Wt1, Tcfap2d, Zbtb7a, Sp3

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_7: # seqlets: 995 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_8: # seqlets: 746 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Erf, Etv5, Gabpa, Elk3, Elk1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_9: # seqlets: 654 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_10: # seqlets: 637 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sox17, Sox9, Sox6, Sox3, Sox4

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_11: # seqlets: 585 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Klf1, Klf2, Klf3, Klf8, Klf4

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_12: # seqlets: 482 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Fos, Smarcc1, Fosb, Nfe2l2, Nfe2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_13: # seqlets: 427 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Mbtps2, Yy1, Sp3, Zfx, Wt1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_14: # seqlets: 389 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_15: # seqlets: 311 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Rfx1, Rfx2, Arid2, Rfx4, Rfx7

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_16: # seqlets: 278 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Ctcf

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_17: # seqlets: 270 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Rfx5

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_18: # seqlets: 222 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Tcfeb, Bach2, Tcfec, Mitf, Tcfe3

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_19: # seqlets: 210 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Lhx2, Pax6, Lhx9, Esx1, Mnx1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_20: # seqlets: 185 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Creb3, Atf7, 9430076C15Rik, Jdp2, Atf2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_21: # seqlets: 182 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Smarcc2, Tbx2, Zfp143, Zfp523, Six5

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_22: # seqlets: 128 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_23: # seqlets: 119 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Grhl1, Tcfcp2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_24: # seqlets: 97 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_25: # seqlets: 89 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sp3, E2f1, Wt1, Sp2, Zbtb7a

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_26: # seqlets: 72 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Rest

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_27: # seqlets: 57 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_28: # seqlets: 46 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_29: # seqlets: 36 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_30: # seqlets: 33 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
In [6]:
modisco_report_pipeline(sample_name, modisco_root, mitra_subdir, task_dir, database_name, 
                        importance=False, render=True)
rsync -t -av /srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_modisco2019/task_99-naivegw/cisbp_tomtomout /srv/www/kundaje/msharmin/report/version2/task_99-naivegw/
chmod -R +755 /srv/www/kundaje/msharmin/report/version2/task_99-naivegw
No motifs with negative importance