FiNeMo hit calling report

TF-MoDISco seqlet comparisons

The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.

Hit vs. seqlet counts

This figure shows the number of hits called vs. the number of TF-MoDISco seqlets identified for each motif. The hit counts should be greater than the corresponding seqlet counts, since TF-MoDISco stringently filters seqlets and usually uses a smaller input window. The dashed line is the identity line.

CWMs and seqlet recall

For each motif, this table examines the consistency between hits and TF-MoDISco seqlets.

The following statistics report the number of hits, seqlets, and their relationships:

Hits: The number of hits called by FiNeMo
Restricted Hits: The number of FiNeMo hits within the TF-MoDISco input regions
Seqlets: The number of unique TF-MoDISco seqlets
Hit/Seqlet Overlaps: The number of hits that coincide with TF-MoDISco seqlets
Missed Seqlets: The number of TF-MoDISco seqlets not called as hits
Additional Restricted Hits: The number of hits within the TF-MoDISco input regions that were not identified as seqlets by TF-MoDISco
Seqlet Recall: The fraction of seqlets that are called as hits
Hit-Seqlet Correlation: The Pearson correlation between the additional-restricted-hits CWM and the seqlet CWM

Note that the seqlet counts here may be lower than those shown in the tfmodisco-lite report due to double-counting in overlapping regions. The seqlet counts shown here are after de-duplication, while the counts in the tfmodisco-lite report are not de-duplicated.

Note that palindromic motifs may have lower recall due to disagreements on orientation. If seqlet recall is near zero for all motifs, the -W/--modisco-region-width argument is likely incorrect.

CWMs (contribution weight matrices) are average contribution scores over a set of regions. The CWMs shown here are:

Hit CWM (FC): The forward-strand CWM of all hits
Hit CWM (RC): The reverse-strand CWM of all hits
Seqlet CWM: The CWM of all TF-MoDISco seqlets
Missed-Seqlet-Only CWM: The CWM of all TF-MoDISco seqlets that were not called as hits
Additional-Restricted-Hit CWM: The CWM of all hits within the TF-MoDISco input regions that were not identified as seqlets by TF-MoDISco

The hit-seqlet correlation is the Pearson correlation between the additional-restricted-hits CWM and the seqlet CWM. This statistic measures the similarity between hits that were missed by TF-MoDISco and the seqlets used to construct the motif.

{% for item in seqlet_recall_data %} {% endfor %}

Motif Name	Seqlet Recall	Hit-Seqlet Correlation	Hits	Restricted Hits	Seqlets	Hit/Seqlet Overlaps	Missed Seqlets	Additional Restricted Hits	Hit CWM (FC)	Hit CWM (RC)	Seqlet CWM	Missed-Seqlet-Only CWM	Additional-Restricted-Hit CWM
`{{ item.motif_name }}`	{{ '%0.3f'\| format(item.seqlet_recall\|float) }}	{{ '%0.3f'\| format(item.cwm_correlation\|float) }}	{{ item.num_hits_total }}	{{ item.num_hits_restricted }}	{{ item.num_seqlets }}	{{ item.num_overlaps }}	{{ item.num_seqlets_only }}	{{ item.num_hits_restricted_only }}

Hit distributions

The following figures visualize the distribution of hits across motifs and peaks.

Overall distribution of hits per peak

This plot shows the distribution of hit counts per peak for any motif. The number of peaks with no hits should be near zero.

Per-motif distributions of hits per peak

These plots show the distribution of hit counts per peak for each motif.

{% for m in motif_names %} {% endfor %}

Motif Name	Hits Per Peak
`{{ m }}`

Motif co-occurrence

This heatmap shows the co-occurrence of motifs across peaks. The color intensity here represents the pearson correlation between the motifs' occurrence across peaks, where occurence is defined as the presence of a hit for a motif in a peak.