Fi-NeMo hit calling report

TF-MoDISco seqlet comparisons

The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.

Hit vs. seqlet counts

This figure shows the number of hits called vs. the number of TF-MoDISco seqlets identified for each motif. The dashed line is the identity line. When comparing a shared set of regions, the hit counts should be mostly greater than the corresponding seqlet counts, since TF-MoDISco stringently filters seqlets and usually uses a smaller input window.

Hit and seqlet motif comparisons

For each motif, this table examines the consistency between hits and TF-MoDISco seqlets.

The following statistics report the number of hits, seqlets, and their relationships:

Note that the seqlet counts here may be lower than those shown in the tfmodisco-lite report due to double-counting in overlapping regions. The seqlet counts shown here are unique while the counts in the tfmodisco-lite report are not de-duplicated.

Note that palindromic motifs may have lower recall due to disagreements on orientation. If seqlet recall is near zero for all motifs, the -W/--modisco-region-width argument is likely incorrect. This value is required to infer genomic coordinates of seqlets from the tfmodisco-lite output H5.

Motif CWMs (contribution weight matrices) are average contribution scores over a set of regions. The CWMs plotted here are:

The plots span the full untrimmed motif, with the trimmed motif shaded.

The hit-seqlet similarity is the cosine similarity between the additional-restricted-hits CWM and the seqlet CWM. This statistic measures the similarity between hits that were missed by TF-MoDISco and the seqlets used to construct the motif.

Motif Name Seqlet Recall Hit-Seqlet CWM Similarity Hits Restricted Hits Seqlets Hit/Seqlet Overlaps Missed Seqlets Additional Restricted Hits Hit CWM (FC) Hit CWM (RC) TF-MoDISco CWM (FC) TF-MoDISco CWM (RC) Missed-Seqlet-Only CWM Additional-Restricted-Hit CWM
pos_patterns.pattern_0 0.001 1.000 38329 32009 18384 11 18373 31998
pos_patterns.pattern_1 0.003 0.992 450484 302605 11725 35 11690 302570
pos_patterns.pattern_2 0.001 0.998 164078 115823 9893 13 9880 115810
pos_patterns.pattern_3 0.001 0.998 104270 64751 7933 6 7927 64745
pos_patterns.pattern_4 0.001 0.961 252218 153012 6362 7 6355 153005
pos_patterns.pattern_5 0.001 0.994 152185 103188 6204 5 6199 103183
pos_patterns.pattern_6 0.000 0.995 47585 30132 3467 0 3467 30132
pos_patterns.pattern_7 0.001 0.998 22849 16888 3320 3 3317 16885
pos_patterns.pattern_8 0.000 0.963 46555 33463 2225 1 2224 33462
pos_patterns.pattern_9 0.001 0.992 17666 12572 1498 1 1497 12571
pos_patterns.pattern_10 0.000 0.939 137905 86128 1465 0 1465 86128
pos_patterns.pattern_11 0.000 0.999 16079 11019 1327 0 1327 11019
pos_patterns.pattern_12 0.000 1.000 2083 1654 1087 0 1087 1654
pos_patterns.pattern_13 0.000 0.998 6104 3659 903 0 903 3659
pos_patterns.pattern_14 0.000 0.981 99036 65837 866 0 866 65837
pos_patterns.pattern_15 0.000 0.985 57470 36443 703 0 703 36443
pos_patterns.pattern_16 0.000 0.993 9404 6223 665 0 665 6223
pos_patterns.pattern_17 0.000 0.996 6380 4429 621 0 621 4429
pos_patterns.pattern_18 0.000 0.998 9974 7104 581 0 581 7104
pos_patterns.pattern_19 0.000 0.987 7096 4619 534 0 534 4619
pos_patterns.pattern_20 0.000 0.896 65435 44988 325 0 325 44988
pos_patterns.pattern_21 0.000 0.966 4447 2745 262 0 262 2745
pos_patterns.pattern_22 0.000 0.989 6739 4402 212 0 212 4402
pos_patterns.pattern_23 0.000 0.993 3428 2300 188 0 188 2300
pos_patterns.pattern_24 0.000 0.977 7713 5828 148 0 148 5828
pos_patterns.pattern_25 0.000 0.962 1264 913 121 0 121 913
pos_patterns.pattern_26 0.000 0.997 676 610 100 0 100 610
pos_patterns.pattern_27 0.000 0.973 3150 2194 74 0 74 2194
pos_patterns.pattern_28 0.000 0.960 3269 2399 62 0 62 2399
pos_patterns.pattern_29 0.000 0.900 1581 958 21 0 21 958
neg_patterns.pattern_0 0.000 0.974 31438 15129 38 0 38 15129
neg_patterns.pattern_1 0.000 0.901 2385 903 29 0 29 903
neg_patterns.pattern_2 0.000 0.952 1220569 695892 28 0 28 695892

Seqlet-hit confusion matrix

This heatmap shows the prevalence of motifs whose (untrimmed) hits overlap with TF-MoDISco seqlets of other motifs. The vertical axis shows the motif of the seqlet, while the horizontal axis shows the motif of the hit. The color intensity here represents an estimator of the expected number of bases of hit overlap per base of seqlet.

Hit statistic distributions

The following figures visualize the distribution of hit statistics across motifs and regions.

Overall distribution of hit counts per region

This plot shows the distribution of hit counts per region for any motif. The number of regions with no hits should be near zero.

Per-motif distributions of hit statistics

These plots show the distribution of hit statistics for each motif, specifically:

Motif Name Hits Per Region Hit Coefficient Hit Similarity Hit Importance
pos_patterns.pattern_0
pos_patterns.pattern_1
pos_patterns.pattern_2
pos_patterns.pattern_3
pos_patterns.pattern_4
pos_patterns.pattern_5
pos_patterns.pattern_6
pos_patterns.pattern_7
pos_patterns.pattern_8
pos_patterns.pattern_9
pos_patterns.pattern_10
pos_patterns.pattern_11
pos_patterns.pattern_12
pos_patterns.pattern_13
pos_patterns.pattern_14
pos_patterns.pattern_15
pos_patterns.pattern_16
pos_patterns.pattern_17
pos_patterns.pattern_18
pos_patterns.pattern_19
pos_patterns.pattern_20
pos_patterns.pattern_21
pos_patterns.pattern_22
pos_patterns.pattern_23
pos_patterns.pattern_24
pos_patterns.pattern_25
pos_patterns.pattern_26
pos_patterns.pattern_27
pos_patterns.pattern_28
pos_patterns.pattern_29
neg_patterns.pattern_0
neg_patterns.pattern_1
neg_patterns.pattern_2

Motif co-occurrence

This heatmap shows the co-occurrence of motifs across regions. The color intensity here represents the cosine similarity between the motifs' occurrence across regions, where occurence is defined as the presence of a hit for a motif in a region.