This report provides a comprehensive analysis of motif instance calling results from Fi-NeMo, a GPU-accelerated method for identifying transcription factor binding sites using neural network contribution scores. Fi-NeMo uses a competitive optimization approach to comprehensively map motif instances by solving a sparse linear reconstruction problem. The report compares Fi-NeMo hits with TF-MoDISco seqlets (when available) and provides detailed statistics on hit quality and motif discovery performance.
{% if not use_seqlets %}Note: Seqlet comparisons are not shown because a TF-MoDISco H5 file with seqlet data was not provided.{% elif not compute_recall %}
Note: Seqlet recall and other statistics directly comparing hits and seqlets are not computed because the -n/--no-recall argument was specified.
{% endif %}
{% if use_seqlets %}
The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.
This scatter plot compares the number of motif instances called by Fi-NeMo versus the number of TF-MoDISco seqlets
identified for each motif. The dashed line represents perfect agreement (y = x). Fi-NeMo typically identifies
an order of magnitude more motif instances than TF-MoDISco because: (1) TF-MoDISco applies stringent filtering criteria
during seqlet identification, and (2) TF-MoDISco often analyzes smaller genomic windows than those used for hit calling.
This table provides detailed statistics for each motif, comparing the consistency between Fi-NeMo hits and TF-MoDISco seqlets. The analysis includes hit counts, overlap statistics, and visual comparisons of contribution weight matrices (CWMs).
Statistical measures include:
Important notes:
-W/--modisco-region-width parameter matches the original TF-MoDISco analysis window
Contribution Weight Matrix (CWM) visualizations:
CWMs represent average contribution scores across motif instances and show the functional importance
of each nucleotide position. The following CWMs are displayed for comparison:
All CWM plots span the full untrimmed motif width, with the core trimmed region highlighted by shading. {% if compute_recall %} The hit-seqlet CWM similarity quantifies the overall agreement between Fi-NeMo's discovered instances and TF-MoDISco's original motif definitions. {% endif %}
| Motif Name | {% if compute_recall %}Seqlet Recall | {% endif %}Hit-Seqlet CWM Similarity | Hits | {% if compute_recall %}Restricted Hits | {% endif %} {% if use_seqlets %}Seqlets | {% endif %} {% if compute_recall %}Hit/Seqlet Overlaps | Missed Seqlets | Additional Restricted Hits | {% endif %}Hit CWM (FC) | Hit CWM (RC) | TF-MoDISco CWM (FC) | TF-MoDISco CWM (RC) | {% if compute_recall %}Missed-Seqlet-Only CWM | Additional-Restricted-Hit CWM | {% endif %}
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
{{ item.motif_name }} |
{% if compute_recall %}
{{ '%0.3f'| format(item.seqlet_recall|float) }} | {% endif %}{{ '%0.3f'| format(item.cwm_similarity|float) }} | {{ item.num_hits_total }} | {% if compute_recall %}{{ item.num_hits_restricted }} | {% endif %} {% if use_seqlets %}{{ item.num_seqlets }} | {% endif %} {% if compute_recall %}{{ item.num_overlaps }} | {{ item.num_seqlets_only }} | {{ item.num_hits_restricted_only }} | {% endif %}
This confusion matrix identifies cases where Fi-NeMo hits of one motif type spatially overlap with TF-MoDISco seqlets of different motif types. Such cross-assignments can reveal related motif families, algorithm differences, or cases where similar-looking motifs compete for the same binding sites.
The y-axis represents seqlet motif identity, the x-axis represents hit motif identity, and color intensity indicates the estimated overlap frequency per base of seqlet sequence. High off-diagonal values suggest potential motif ambiguity and/or algorithmic disagreements at groups of putative TF binding sites.
These visualizations examine the quality and distribution of Fi-NeMo hits across genomic regions and motifs, measuring algorithm performance and signal strength.
This histogram shows the distribution of total hit counts per genomic region (across all motifs). A good distribution should show nearly all regions containing at least one hit.
These distribution plots characterize the quality and prevalence of hits for individual motifs:
| Motif Name | Hits Per Region | Hit Coefficient | Hit Similarity | Hit Importance |
|---|---|---|---|---|
{{ m }} |
This correlation heatmap reveals which motifs tend to occur together in the same genomic regions, potentially indicating cooperative transcription factor binding or shared regulatory mechanisms. Color intensity represents cosine similarity between motif occurrence patterns, where occurrence is defined as the presence of at least one hit for each motif within individual regions.
High positive correlations (dark colors) suggest motifs that frequently co-occur. Low correlations suggest independent or mutually exclusive binding patterns.