Fi-NeMo Motif Hit Calling Report

This report provides a comprehensive analysis of motif instance calling results from Fi-NeMo, a GPU-accelerated method for identifying transcription factor binding sites using neural network contribution scores. Fi-NeMo uses a competitive optimization approach to comprehensively map motif instances by solving a sparse linear reconstruction problem. The report compares Fi-NeMo hits with TF-MoDISco seqlets (when available) and provides detailed statistics on hit quality and motif discovery performance.

{% if not use_seqlets %}

Note: Seqlet comparisons are not shown because a TF-MoDISco H5 file with seqlet data was not provided.

{% elif not compute_recall %}

Note: Seqlet recall and other statistics directly comparing hits and seqlets are not computed because the -n/--no-recall argument was specified.

{% endif %} {% if use_seqlets %}

TF-MoDISco seqlet comparisons

The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.

Hit vs. seqlet counts

This scatter plot compares the number of motif instances called by Fi-NeMo versus the number of TF-MoDISco seqlets identified for each motif. The dashed line represents perfect agreement (y = x). Fi-NeMo typically identifies an order of magnitude more motif instances than TF-MoDISco because: (1) TF-MoDISco applies stringent filtering criteria during seqlet identification, and (2) TF-MoDISco often analyzes smaller genomic windows than those used for hit calling.

{% endif %}

Motif-specific hit and seqlet analysis

This table provides detailed statistics for each motif, comparing the consistency between Fi-NeMo hits and TF-MoDISco seqlets. The analysis includes hit counts, overlap statistics, and visual comparisons of contribution weight matrices (CWMs).

Statistical measures include:

Hits: Total number of motif instances called by Fi-NeMo across all genomic regions
Restricted Hits: Fi-NeMo hits overlapping with TF-MoDISco input regions (enables direct comparison)
Seqlets: Unique TF-MoDISco seqlets used to construct this motif pattern
Hit/Seqlet Overlaps: Fi-NeMo hits that spatially coincide with TF-MoDISco seqlets (successful recovery)
Missed Seqlets: TF-MoDISco seqlets not identified as hits by Fi-NeMo (potential false negatives)
Additional Restricted Hits: Fi-NeMo hits not identified as seqlets by TF-MoDISco (potential new discoveries)
Seqlet Recall: Fraction of TF-MoDISco seqlets successfully recovered as Fi-NeMo hits
Hit-Seqlet CWM Similarity: Cosine similarity between average contribution scores of hits vs. seqlets

Important notes:

Seqlet counts may appear lower than in TF-MoDISco-lite reports due to removal of duplicate seqlets
Palindromic motifs may show reduced recall due to strand orientation ambiguity
If seqlet recall is near zero across all motifs, verify that the -W/--modisco-region-width parameter matches the original TF-MoDISco analysis window

Contribution Weight Matrix (CWM) visualizations:
CWMs represent average contribution scores across motif instances and show the functional importance of each nucleotide position. The following CWMs are displayed for comparison:

Hit CWM (FC/RC): Average contribution patterns from Fi-NeMo hits on forward/reverse strands
TF-MoDISco CWM (FC/RC): Average contribution patterns from TF-MoDISco seqlets on forward/reverse strands
Missed-Seqlet-Only CWM: Contribution patterns from seqlets not recovered by Fi-NeMo (identifies potential algorithmic disagreements)
Additional-Restricted-Hit CWM: Contribution patterns from Fi-NeMo hits not identified by TF-MoDISco

All CWM plots span the full untrimmed motif width, with the core trimmed region highlighted by shading. {% if compute_recall %} The hit-seqlet CWM similarity quantifies the overall agreement between Fi-NeMo's discovered instances and TF-MoDISco's original motif definitions. {% endif %}

{% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% if use_seqlets %} {% endif %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% for item in report_data %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% if use_seqlets %} {% endif %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% endfor %}

Motif Name	Seqlet Recall	Hit-Seqlet CWM Similarity	Hits	Restricted Hits	Seqlets	Hit/Seqlet Overlaps	Missed Seqlets	Additional Restricted Hits	Hit CWM (FC)	Hit CWM (RC)	TF-MoDISco CWM (FC)	TF-MoDISco CWM (RC)	Missed-Seqlet-Only CWM	Additional-Restricted-Hit CWM
`{{ item.motif_name }}`	{{ '%0.3f'\| format(item.seqlet_recall\|float) }}	{{ '%0.3f'\| format(item.cwm_similarity\|float) }}	{{ item.num_hits_total }}	{{ item.num_hits_restricted }}	{{ item.num_seqlets }}	{{ item.num_overlaps }}	{{ item.num_seqlets_only }}	{{ item.num_hits_restricted_only }}

{% if compute_recall %}

Motif cross-assignment analysis

This confusion matrix identifies cases where Fi-NeMo hits of one motif type spatially overlap with TF-MoDISco seqlets of different motif types. Such cross-assignments can reveal related motif families, algorithm differences, or cases where similar-looking motifs compete for the same binding sites.

The y-axis represents seqlet motif identity, the x-axis represents hit motif identity, and color intensity indicates the estimated overlap frequency per base of seqlet sequence. High off-diagonal values suggest potential motif ambiguity and/or algorithmic disagreements at groups of putative TF binding sites.

{% endif %}

Hit Quality and Distribution Analysis

These visualizations examine the quality and distribution of Fi-NeMo hits across genomic regions and motifs, measuring algorithm performance and signal strength.

Genome-wide hit density

This histogram shows the distribution of total hit counts per genomic region (across all motifs). A good distribution should show nearly all regions containing at least one hit.

Motif-specific hit quality metrics

These distribution plots characterize the quality and prevalence of hits for individual motifs:

Hits Per Region: Frequency of motif occurrence across genomic regions (higher values suggest more prevalent motifs)
Hit Coefficient: Strength of motif instance assignment by the optimization algorithm (higher values indicate stronger matches)
Hit Similarity: Cosine similarity between individual hits and the motif CWM (higher values indicate closer pattern matching)
Hit Importance: Total contribution score magnitude within hit regions (reflects functional significance from the neural network model)

{% for m in motif_names %} {% endfor %}

Motif Name	Hits Per Region	Hit Coefficient	Hit Similarity	Hit Importance
`{{ m }}`

Motif co-occurrence analysis

This correlation heatmap reveals which motifs tend to occur together in the same genomic regions, potentially indicating cooperative transcription factor binding or shared regulatory mechanisms. Color intensity represents cosine similarity between motif occurrence patterns, where occurrence is defined as the presence of at least one hit for each motif within individual regions.

High positive correlations (dark colors) suggest motifs that frequently co-occur. Low correlations suggest independent or mutually exclusive binding patterns.