Fi-NeMo hit calling report

TF-MoDISco seqlet comparisons

The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.

Hit vs. seqlet counts

This figure shows the number of hits called vs. the number of TF-MoDISco seqlets identified for each motif. The dashed line is the identity line. When comparing a shared set of regions, the hit counts should be mostly greater than the corresponding seqlet counts, since TF-MoDISco stringently filters seqlets and usually uses a smaller input window.

Hit and seqlet motif comparisons

For each motif, this table examines the consistency between hits and TF-MoDISco seqlets.

The following statistics report the number of hits, seqlets, and their relationships:

Note that the seqlet counts here may be lower than those shown in the tfmodisco-lite report due to double-counting in overlapping regions. The seqlet counts shown here are unique while the counts in the tfmodisco-lite report are not de-duplicated.

Note that palindromic motifs may have lower recall due to disagreements on orientation. If seqlet recall is near zero for all motifs, the -W/--modisco-region-width argument is likely incorrect. This value is required to infer genomic coordinates of seqlets from the tfmodisco-lite output H5.

Motif CWMs (contribution weight matrices) are average contribution scores over a set of regions. The CWMs plotted here are:

The plots span the full untrimmed motif, with the trimmed motif shaded.

The hit-seqlet similarity is the cosine similarity between the additional-restricted-hits CWM and the seqlet CWM. This statistic measures the similarity between hits that were missed by TF-MoDISco and the seqlets used to construct the motif.

Motif Name Seqlet Recall Hit-Seqlet CWM Similarity Hits Restricted Hits Seqlets Hit/Seqlet Overlaps Missed Seqlets Additional Restricted Hits Hit CWM (FC) Hit CWM (RC) TF-MoDISco CWM (FC) TF-MoDISco CWM (RC) Missed-Seqlet-Only CWM Additional-Restricted-Hit CWM
pos_patterns.pattern_0 0.005 0.982 686389 465736 23308 118 23190 465618
pos_patterns.pattern_1 0.001 1.000 39448 32923 20220 19 20201 32904
pos_patterns.pattern_2 0.001 0.999 114848 79613 15294 19 15275 79594
pos_patterns.pattern_3 0.001 0.998 142049 91121 11489 9 11480 91112
pos_patterns.pattern_4 0.001 0.990 135221 92007 10962 13 10949 91994
pos_patterns.pattern_5 0.000 0.995 24799 17821 4110 1 4109 17820
pos_patterns.pattern_6 0.001 0.995 145015 100555 3747 2 3745 100553
pos_patterns.pattern_7 0.001 0.992 44331 28730 3683 2 3681 28728
pos_patterns.pattern_8 0.000 0.999 28228 18519 3648 0 3648 18519
pos_patterns.pattern_9 0.000 0.981 26253 19798 2274 0 2274 19798
pos_patterns.pattern_10 0.000 0.998 9810 6770 1969 0 1969 6770
pos_patterns.pattern_11 0.000 0.996 16422 13073 1851 0 1851 13073
pos_patterns.pattern_12 0.000 0.976 69805 47028 1517 0 1517 47028
pos_patterns.pattern_13 0.000 0.990 65673 44138 1251 0 1251 44138
pos_patterns.pattern_14 0.000 0.953 36890 23391 1047 0 1047 23391
pos_patterns.pattern_15 0.000 0.994 6711 3995 835 0 835 3995
pos_patterns.pattern_16 0.000 0.990 8632 5696 814 0 814 5696
pos_patterns.pattern_17 0.000 1.000 1904 1536 777 0 777 1536
pos_patterns.pattern_18 0.000 0.990 6277 4432 733 0 733 4432
pos_patterns.pattern_19 0.000 0.981 14538 10024 611 0 611 10024
pos_patterns.pattern_20 0.000 0.994 14114 8622 559 0 559 8622
pos_patterns.pattern_21 0.000 0.991 20807 13436 516 0 516 13436
pos_patterns.pattern_22 0.000 0.988 6112 4120 464 0 464 4120
pos_patterns.pattern_23 0.000 0.996 4139 3027 375 0 375 3027
pos_patterns.pattern_24 0.000 0.995 4094 2726 360 0 360 2726
pos_patterns.pattern_25 0.000 0.985 4369 3107 345 0 345 3107
pos_patterns.pattern_26 0.000 0.927 10187 6777 344 0 344 6777
pos_patterns.pattern_27 0.000 0.991 3214 2108 328 0 328 2108
pos_patterns.pattern_28 0.000 0.983 1323 840 301 0 301 840
pos_patterns.pattern_29 0.000 0.877 59735 42977 285 0 285 42977
pos_patterns.pattern_30 0.000 0.986 1727 1067 145 0 145 1067
pos_patterns.pattern_31 0.008 0.988 2119 1411 129 1 128 1410
pos_patterns.pattern_32 0.000 0.987 1058 788 117 0 117 788
pos_patterns.pattern_33 0.000 0.966 3631 2630 76 0 76 2630
pos_patterns.pattern_34 0.000 0.973 1298 1103 70 0 70 1103
pos_patterns.pattern_35 0.000 0.987 1447 1055 49 0 49 1055
pos_patterns.pattern_36 0.000 0.902 33980 24916 25 0 25 24916
pos_patterns.pattern_37 0.000 0.969 7086 4819 23 0 23 4819
pos_patterns.pattern_38 0.000 0.986 165 112 21 0 21 112
neg_patterns.pattern_0 0.000 0.974 1363428 721034 95 0 95 721034
neg_patterns.pattern_1 0.000 0.875 13 8 40 0 40 8
neg_patterns.pattern_2 0.000 nan 0 0 22 0 22 0

Seqlet-hit confusion matrix

This heatmap shows the prevalence of motifs whose (untrimmed) hits overlap with TF-MoDISco seqlets of other motifs. The vertical axis shows the motif of the seqlet, while the horizontal axis shows the motif of the hit. The color intensity here represents an estimator of the expected number of bases of hit overlap per base of seqlet.

Hit statistic distributions

The following figures visualize the distribution of hit statistics across motifs and regions.

Overall distribution of hit counts per region

This plot shows the distribution of hit counts per region for any motif. The number of regions with no hits should be near zero.

Per-motif distributions of hit statistics

These plots show the distribution of hit statistics for each motif, specifically:

Motif Name Hits Per Region Hit Coefficient Hit Similarity Hit Importance
pos_patterns.pattern_0
pos_patterns.pattern_1
pos_patterns.pattern_2
pos_patterns.pattern_3
pos_patterns.pattern_4
pos_patterns.pattern_5
pos_patterns.pattern_6
pos_patterns.pattern_7
pos_patterns.pattern_8
pos_patterns.pattern_9
pos_patterns.pattern_10
pos_patterns.pattern_11
pos_patterns.pattern_12
pos_patterns.pattern_13
pos_patterns.pattern_14
pos_patterns.pattern_15
pos_patterns.pattern_16
pos_patterns.pattern_17
pos_patterns.pattern_18
pos_patterns.pattern_19
pos_patterns.pattern_20
pos_patterns.pattern_21
pos_patterns.pattern_22
pos_patterns.pattern_23
pos_patterns.pattern_24
pos_patterns.pattern_25
pos_patterns.pattern_26
pos_patterns.pattern_27
pos_patterns.pattern_28
pos_patterns.pattern_29
pos_patterns.pattern_30
pos_patterns.pattern_31
pos_patterns.pattern_32
pos_patterns.pattern_33
pos_patterns.pattern_34
pos_patterns.pattern_35
pos_patterns.pattern_36
pos_patterns.pattern_37
pos_patterns.pattern_38
neg_patterns.pattern_0
neg_patterns.pattern_1
neg_patterns.pattern_2

Motif co-occurrence

This heatmap shows the co-occurrence of motifs across regions. The color intensity here represents the cosine similarity between the motifs' occurrence across regions, where occurence is defined as the presence of a hit for a motif in a region.