In [1]:
# Parameters
sample_name = "ES;v6.5;A;GEO"
modisco_root = "/srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_modisco2019"
mitra_subdir = "report/version2"
task_dir = "task_102-naivegw"
database_name = "CISBP"
perf_file = "/srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/fineFactorized/task_102-naivegw/NaiveauPRC.txt"
homer_root = "/srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_scans"
In [2]:
from matlas.modisco_report import modisco_report_pipeline, display_metadata
reportfile= "/mnt/lab_data/kundaje/msharmin/mouse_hem/with_tfd/full_mouse50/filtering samples_MS2.xlsx"
sheetname = "filter23"
load data from labcluster
Using TensorFlow backend.
2019-07-22 20:19:45,166 [WARNING] git-lfs not installed
In [3]:
display_metadata(sample_name, perf_file, reportfile, sheetname)
    Sample Information
    MetaData NameDescription
    Cell typeEmbryonic Stem Cells(v6.5)
    Cell GroupES cells and embryonic tissues
    Experiment NameATAC
    Experiment GroupGEO
    Pipeline Output
    replicateNaïve overlap peaksIDR peaksTSS enrichment (< 8 is very poor <10 is low)Final number of unique mapping, dup-filtered, chrM filtered readsNumber of reads in called peak regionsFraction of reads in called peak regionsNumber of reads in promoter regionsFraction of reads in promoter regionsNumber of reads in enhancer regionsFraction of reads in enhancer regions
    rep12292991403418.938979632607158209940.198784622260.1063336516410.4227
    Modelling Metadata
    MetricValue
    auPRC0.6322
    Calibrated Recall at 50% FDR0.216
    Number of Positive Examples in Test Data124149
    Number of Negative Examples in Test Data7946702
    Imbalance Ratio in Test Data0.0154
    Test Chromosomeschr2, chr3, chr19
In [4]:
from matlas.modisco_report import display_comparative_motif_sets
display_comparative_motif_sets(sample_name, homer_root, modisco_root)
TF-MoDISco is using the TensorFlow backend.
Number of CISBP motifs obtained by TF-MoDISco and Homer-denovo
Shared Motifs
Motif NameModiscoHomer
Pbx3
Sox9
Grhl1
Rest
Pou5f1
Ctcf
Smarcc1
Creb3
Rfx5
Unique TF-MoDISco Motifs
Motif NameModiscoHomer
Smarcc2absent
Zfp281absent
Mbtps2absent
Zic4absent
Erfabsent
Sp3absent
Unique Homer Motifs
Motif NameModiscoHomer
Sp2absent
Gabpaabsent
Zfp143absent
In [5]:
modisco_report_pipeline(sample_name, modisco_root, mitra_subdir, task_dir, database_name, 
                        importance=True, render=True)
rsync -t -av /srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_modisco2019/task_102-naivegw/cisbp_tomtomout /srv/www/kundaje/msharmin/report/version2/task_102-naivegw/
chmod -R +755 /srv/www/kundaje/msharmin/report/version2/task_102-naivegw
Displaying motifs which has positive importances for the cell type
metacluster_1, # patterns: 26, # seqlets: 20814, Positive for: ES;v6.5;A;GEO
  • pattern_0: # seqlets: 3545 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Ctcf, Ctcfl

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_1: # seqlets: 2989 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sp3, Klf16, Sp5, Klf5, Klf14

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_2: # seqlets: 2093 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Zfp281, Zfp148, Sp2, Klf5, Sp3

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_3: # seqlets: 2045 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Pou5f1, Sox2, Tbpl2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_4: # seqlets: 1680 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_5: # seqlets: 1273 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Smarcc1, Fos, Fosb, Nfe2l2, Nfe2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_6: # seqlets: 1001 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Pbx3, Foxi1, Nfya, Ybx1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_7: # seqlets: 747 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Creb3, 9430076C15Rik, Jdp2, Atf7, Batf3

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_8: # seqlets: 739 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_9: # seqlets: 695 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Rfx5, Rfx1, Rfx4, Rfx7, Arid2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_10: # seqlets: 648 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Zic4, Zic3, Zic1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_11: # seqlets: 617 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Erf, Gabpa, Etv5, Elk3, Erg

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_12: # seqlets: 570 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sp3, E2f1, Klf15

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_13: # seqlets: 518 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sox9, Sox4, Sox17, Sox6, Sox5

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_14: # seqlets: 282 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_15: # seqlets: 264 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Smarcc2, Tbx2, Zfp143, Zfp523, Six5

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_16: # seqlets: 236 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Pou5f1, Sox2, Lhx2, Lhx9, Mnx1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_17: # seqlets: 172 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Ctcf

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_18: # seqlets: 171 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Mbtps2, Sp3, Yy1, Wt1, E2f1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_19: # seqlets: 160 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Rest

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_20: # seqlets: 76 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Grhl1, Tcfcp2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_21: # seqlets: 77 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_22: # seqlets: 58 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_23: # seqlets: 54 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_24: # seqlets: 53 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Ctcf, Ctcfl

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_25: # seqlets: 51 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
In [6]:
modisco_report_pipeline(sample_name, modisco_root, mitra_subdir, task_dir, database_name, 
                        importance=False, render=True)
rsync -t -av /srv/scratch/msharmin/mouse_hem/with_tfd/full_mouse50/Naive_modisco2019/task_102-naivegw/cisbp_tomtomout /srv/www/kundaje/msharmin/report/version2/task_102-naivegw/
chmod -R +755 /srv/www/kundaje/msharmin/report/version2/task_102-naivegw
Displaying motifs which has negative importances for the cell type
metacluster_0, # patterns: 20, # seqlets: 9460, Negative for: ES;v6.5;A;GEO
  • pattern_0: # seqlets: 3940 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Phf21a

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_1: # seqlets: 1536 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_2: # seqlets: 1210 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Zeb1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_3: # seqlets: 537 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_4: # seqlets: 284 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Nfe2, Nfe2l2, Jund, Batf, Fos

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_5: # seqlets: 271 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Nr5a2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_6: # seqlets: 233 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sp3, Egr1, Egr2, Zfx

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_7: # seqlets: 188 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Mnt, Zkscan1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_8: # seqlets: 165 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_9: # seqlets: 156 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_10: # seqlets: 142 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_11: # seqlets: 141 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_12: # seqlets: 117 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Sox9, Sox17, Sox6, Sox2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_13: # seqlets: 100 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Klf1, Klf4, Klf3, Klf8, Klf2

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_14: # seqlets: 92 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_15: # seqlets: 82 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Nr5a2, Esrrb, Nr6a1, Nr5a1, Prdm4

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_16: # seqlets: 78 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_17: # seqlets: 94 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Hsf1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_18: # seqlets: 53 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Hsf1

    SequenceContrib ScoresHyp_Contrib Scores
  • pattern_19: # seqlets: 41 Cisbp matches (q-value>=0.01) using Tomtom (Full Report) :

    Zfx, Egr4

    SequenceContrib ScoresHyp_Contrib Scores