Fi-NeMo Motif Hit Calling Report

This report provides a comprehensive analysis of motif instance calling results from Fi-NeMo, a GPU-accelerated method for identifying transcription factor binding sites using neural network contribution scores. Fi-NeMo uses a competitive optimization approach to comprehensively map motif instances by solving a sparse linear reconstruction problem. The report compares Fi-NeMo hits with TF-MoDISco seqlets (when available) and provides detailed statistics on hit quality and motif discovery performance.

{% if not use_seqlets %}
Note: Seqlet comparisons are not shown because a TF-MoDISco H5 file with seqlet data was not provided.
{% elif not compute_recall %}
Note: Seqlet recall and other statistics directly comparing hits and seqlets are not computed because the -n/--no-recall argument was specified.
{% endif %} {% if use_seqlets %}

TF-MoDISco seqlet comparisons

The following figures and statistics compare the called hits with the seqlets used by TF-MoDISco to construct each motif.

Hit vs. seqlet counts

This scatter plot compares the number of motif instances called by Fi-NeMo versus the number of TF-MoDISco seqlets identified for each motif. The dashed line represents perfect agreement (y = x). Fi-NeMo typically identifies an order of magnitude more motif instances than TF-MoDISco because: (1) TF-MoDISco applies stringent filtering criteria during seqlet identification, and (2) TF-MoDISco often analyzes smaller genomic windows than those used for hit calling.

{% endif %}

Motif-specific hit and seqlet analysis

This table provides detailed statistics for each motif, comparing the consistency between Fi-NeMo hits and TF-MoDISco seqlets. The analysis includes hit counts, overlap statistics, and visual comparisons of contribution weight matrices (CWMs).

Statistical measures include:

Important notes:

Contribution Weight Matrix (CWM) visualizations:
CWMs represent average contribution scores across motif instances and show the functional importance of each nucleotide position. The following CWMs are displayed for comparison:

All CWM plots span the full untrimmed motif width, with the core trimmed region highlighted by shading. {% if compute_recall %} The hit-seqlet CWM similarity quantifies the overall agreement between Fi-NeMo's discovered instances and TF-MoDISco's original motif definitions. {% endif %}

{% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% if use_seqlets %} {% endif %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% for item in report_data %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% if use_seqlets %} {% endif %} {% if compute_recall %} {% endif %} {% if compute_recall %} {% endif %} {% endfor %}
Motif NameSeqlet RecallHit-Seqlet CWM Similarity HitsRestricted HitsSeqletsHit/Seqlet Overlaps Missed Seqlets Additional Restricted HitsHit CWM (FC) Hit CWM (RC) TF-MoDISco CWM (FC) TF-MoDISco CWM (RC)Missed-Seqlet-Only CWM Additional-Restricted-Hit CWM
{{ item.motif_name }}{{ '%0.3f'| format(item.seqlet_recall|float) }}{{ '%0.3f'| format(item.cwm_similarity|float) }} {{ item.num_hits_total }}{{ item.num_hits_restricted }}{{ item.num_seqlets }}{{ item.num_overlaps }} {{ item.num_seqlets_only }} {{ item.num_hits_restricted_only }}
{% if compute_recall %}

Motif cross-assignment analysis

This confusion matrix identifies cases where Fi-NeMo hits of one motif type spatially overlap with TF-MoDISco seqlets of different motif types. Such cross-assignments can reveal related motif families, algorithm differences, or cases where similar-looking motifs compete for the same binding sites.

The y-axis represents seqlet motif identity, the x-axis represents hit motif identity, and color intensity indicates the estimated overlap frequency per base of seqlet sequence. High off-diagonal values suggest potential motif ambiguity and/or algorithmic disagreements at groups of putative TF binding sites.

{% endif %}

Hit Quality and Distribution Analysis

These visualizations examine the quality and distribution of Fi-NeMo hits across genomic regions and motifs, measuring algorithm performance and signal strength.

Genome-wide hit density

This histogram shows the distribution of total hit counts per genomic region (across all motifs). A good distribution should show nearly all regions containing at least one hit.

Motif-specific hit quality metrics

These distribution plots characterize the quality and prevalence of hits for individual motifs:

{% for m in motif_names %} {% endfor %}
Motif Name Hits Per Region Hit Coefficient Hit Similarity Hit Importance
{{ m }}

Motif co-occurrence analysis

This correlation heatmap reveals which motifs tend to occur together in the same genomic regions, potentially indicating cooperative transcription factor binding or shared regulatory mechanisms. Color intensity represents cosine similarity between motif occurrence patterns, where occurrence is defined as the presence of at least one hit for each motif within individual regions.

High positive correlations (dark colors) suggest motifs that frequently co-occur. Low correlations suggest independent or mutually exclusive binding patterns.