Fi-NeMo hit calling report

{% if not use_seqlets %}

Seqlet comparisons are not shown because a TF-MoDISco H5 file with seqlet data is not provided.

{% elif not compute_recall %}

Seqlet recall and other statistics directly comparing hits and seqlets are not computed because the -n/--no-recall argument is set.

{% endif %} {% if use_seqlets %}

TF-MoDISco seqlet comparisons

The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.

Hit vs. seqlet counts

This figure shows the number of hits called vs. the number of TF-MoDISco seqlets identified for each motif. The dashed line is the identity line. When comparing a shared set of regions, the hit counts should be mostly greater than the corresponding seqlet counts, since TF-MoDISco stringently filters seqlets and usually uses a smaller input window.

{% endif %}

Hit and seqlet motif comparisons

For each motif, this table examines the consistency between hits and TF-MoDISco seqlets.

The following statistics report the number of hits, seqlets, and their relationships:

Hits: The number of hits called by Fi-NeMo
Restricted Hits: The number of Fi-NeMo hits within the TF-MoDISco input regions
Seqlets: The number of unique TF-MoDISco seqlets
Hit/Seqlet Overlaps: The number of hits that coincide with TF-MoDISco seqlets
Missed Seqlets: The number of TF-MoDISco seqlets not called as hits
Additional Restricted Hits: The number of hits within the TF-MoDISco input regions that are not identified as seqlets by TF-MoDISco
Seqlet Recall: The fraction of seqlets that are called as hits
Hit-Seqlet CWM Similarity: The cosine similarity between the hit CWM and the TF-MoDISco CWM

Note that the seqlet counts here may be lower than those shown in the tfmodisco-lite report due to double-counting in overlapping regions. The seqlet counts shown here are unique while the counts in the tfmodisco-lite report are not de-duplicated.

{% if compute_recall %}

Note that palindromic motifs may have lower recall due to disagreements on orientation. If seqlet recall is near zero for all motifs, the -W/--modisco-region-width argument is likely incorrect. This value is required to infer genomic coordinates of seqlets from the tfmodisco-lite output H5.

{% endif %}

Motif CWMs (contribution weight matrices) are average contribution scores over a set of regions. The CWMs plotted here are:

Hit CWM (FC): The forward-strand CWM of all hits
Hit CWM (RC): The reverse-strand CWM of all hits
TF-MoDISco CWM (FC/RC): The CWM of all TF-MoDISco seqlets
Missed-Seqlet-Only CWM: The CWM of all TF-MoDISco seqlets that were not called as hits
Additional-Restricted-Hit CWM: The CWM of all hits within the TF-MoDISco input regions that were not identified as seqlets by TF-MoDISco

The plots span the full untrimmed motif, with the trimmed motif shaded.

The hit-seqlet similarity is the cosine similarity between the additional-restricted-hits CWM and the seqlet CWM. This statistic measures the similarity between hits that were missed by TF-MoDISco and the seqlets used to construct the motif.

{% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% if use_seqlets %} {% endif %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% for item in report_data %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% if use_seqlets %} {% endif %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% endfor %}

Motif Name	Seqlet Recall	Hit-Seqlet CWM Similarity	Hits	Restricted Hits	Seqlets	Hit/Seqlet Overlaps	Missed Seqlets	Additional Restricted Hits	Hit CWM (FC)	Hit CWM (RC)	TF-MoDISco CWM (FC)	TF-MoDISco CWM (RC)	Missed-Seqlet-Only CWM	Additional-Restricted-Hit CWM
`{{ item.motif_name }}`	{{ '%0.3f'\| format(item.seqlet_recall\|float) }}	{{ '%0.3f'\| format(item.cwm_similarity\|float) }}	{{ item.num_hits_total }}	{{ item.num_hits_restricted }}	{{ item.num_seqlets }}	{{ item.num_overlaps }}	{{ item.num_seqlets_only }}	{{ item.num_hits_restricted_only }}

Hit distributions

The following figures visualize the distribution of hits across motifs and peaks.

Overall distribution of hits per peak

This plot shows the distribution of hit counts per peak for any motif. The number of peaks with no hits should be near zero.

Per-motif distributions of hits per peak

These plots show the distribution of hit counts per peak for each motif.

{% for m in motif_names %} {% endfor %}

Motif Name	Hits Per Peak
`{{ m }}`

Motif co-occurrence

This heatmap shows the co-occurrence of motifs across peaks. The color intensity here represents the cosine similarity between the motifs' occurrence across peaks, where occurence is defined as the presence of a hit for a motif in a peak.