---
Thu Dec 4 22:00:00 PM PST 2025

Updated file discovery functions to support 3-level hierarchy.

## Changes to varbook/write/pdf.py

### Updated discover_variant_ids() (lines 41-97):
Modified to traverse the full hierarchy structure:
```
{variant_dataset}/
  {model_dataset}/           (e.g., "Fetal Brain")
    heatmap/                 (skip)
    {cluster_name}/          (e.g., "microglia-specific cluster (#3)")
      {variant_id}/          (e.g., "chr10:123:A:G")
  profiles/                  (skip - centralized storage)
```

Now correctly:
- Iterates through model_dataset directories (skipping "profiles")
- Within each model_dataset, iterates through cluster directories (skipping "heatmap")
- Within each cluster, finds variant_id directories (identified by ':' in name)
- Returns unique variant IDs from across all model_datasets/clusters

### Updated discover_plot_files() (lines 100-166):
Modified to find plot files within the hierarchy:

For dataset-level plots (variant_id is empty):
- Searches: {variant_dataset}/{model_dataset}/{plot_type}/

For variant-level plots:
- Searches: {variant_dataset}/{model_dataset}/{cluster_name}/{variant_id}/{plot_type}/

Collects files from all matching paths across the hierarchy.

### Updated get_before_after_files() (lines 169-231):
Modified to find .before.md and .after.md files within the hierarchy:

- Traverses same hierarchy as discover_plot_files()
- Returns first match found
- Handles both dataset-level and variant-level files

These changes ensure the auto-discovery system works correctly with the
3-level hierarchical structure defined in the Snakefile.

---
Thu Dec  4 10:58:22 PM PST 2025

Fixed file discovery to match actual Snakefile output structure where plots are directly in variant directories.

## Root Cause:
The `discover_plot_files()` function in pdf.py was looking for plots in subdirectories like `{variant_id}/{plot_type}/file.png`, but the Snakefile generates plots directly in the variant directory as `{variant_id}/01-model-specificity-barplot.png`, `{variant_id}/02-model-scatterplot.html`, `{variant_id}/03-profile-{model}.png`.

## Actual File Structure (from Snakefile):
```
varbook_gen/Broad neurological disorders/Fetal Brain/microglia-specific cluster (#3)/
  chr10:114282405:A:T/
    00-intro.md
    01-model-specificity-barplot.png
    01-model-specificity-barplot.before.md
    01-model-specificity-barplot.after.md
    02-model-scatterplot.html
    03-profile-KUN_FB_microglia.png  (symlink to ../../../profiles/chr10:114282405:A:T/KUN_FB_microglia.png)
```

## Changes Made:

### varbook/write/pdf.py discover_plot_files() (lines 148-166):
Updated the direct-level check to look for plots directly in variant directory and match by plot_type in filename:

```python
if variant_id:
    direct_variant_dir = dataset_dir / variant_id
    if direct_variant_dir.exists() and direct_variant_dir.is_dir():
        for plot_file in direct_variant_dir.iterdir():
            if plot_file.is_file() and plot_file.suffix.lower() in ['.md', '.html', '.htm', '.png', '.svg', '.pdf']:
                # Skip before/after files
                if plot_file.name.endswith('.before.md') or plot_file.name.endswith('.after.md'):
                    continue
                # Match files that contain the plot_type in their name
                # e.g., "model-scatterplot" matches "02-model-scatterplot.html"
                if plot_type in plot_file.name:
                    files.append(plot_file)
        # If we found files at this level, we're done
        if files:
            return sorted(files)
```

This correctly handles:
- Files directly in variant directory (not in subdirectories)
- Matching by plot_type substring in filename
- Skipping .before.md and .after.md files
- Symlinked profile files (no special handling needed)

## Results:
- File discovery now works when variant_dataset = "Broad neurological disorders:Fetal Brain:microglia-specific cluster (#3)"
- Finds all plot files including symlinked profiles
- HTML expandable rows will have content to display


## Additional Fix - Profile Filename Matching:
The profile files are named with singular "profile" (e.g., `03-profile-KUN_FB_microglia.png`) but the plot_type parameter is "profiles" (plural). Added singular form matching to handle this:

```python
plot_type_singular = plot_type.rstrip('s') if plot_type.endswith('s') else plot_type
if plot_type in plot_file.name or plot_type_singular in plot_file.name:
    files.append(plot_file)
```

## Test Results (from snakemake/ directory):
- ✅ Found 226 variants
- ✅ model-scatterplot: 2 files (html + md)  
- ✅ model-specificity-barplot: 3 files (md + png + svg)
- ✅ profiles: 1+ files (symlinked PNG files)

File discovery now works correctly for all plot types when called from the snakemake directory with the full hierarchy path.

---
$(date)

Fixed heatmap not showing in HTML report after hierarchy restructuring.

## Root Cause:
Model-dataset level files (like 00-intro.md, 01-heatmap.png) were being detected and skipped with a `continue` statement in the variant processing loop, with a comment saying they're "already in html_sections". However, they were never actually being added to html_sections anywhere, so they were simply being ignored.

The code structure had:
1. A loop processing all files
2. Detection logic to skip model-level files (assuming they're already processed)
3. No actual processing of model-level files before the skip

## Solution:
Restructured the file processing into two passes:

### First pass (lines 2549-2589):
- Separate model-level files from variant-level files
- Process model-level files (00-intro, 01-heatmap, etc.) and add directly to html_sections
- These files are identified by having numeric prefixes and no colons in the filename

### Second pass (lines 2591+):
- Process only variant-level files
- Removed redundant detection and skip logic for model-level files

This ensures heatmaps and other model-dataset level files are included in the HTML output before variant processing begins.

### varbook/varbook/write/html.py:2549-2589
```python
# First pass: Process model-dataset level files
model_level_files = []
variant_level_files = []

for file_path, section_name, source_path in dataset_files:
    parts = section_name.split(' - ')
    if len(parts) >= 2:
        potential_id = parts[1]
        if potential_id and potential_id[0].isdigit() and ':' not in potential_id:
            model_level_files.append((file_path, section_name, source_path))
        else:
            variant_level_files.append((file_path, section_name, source_path))

# Process model-level files and add to html_sections
for file_path, section_name, source_path in model_level_files:
    # ... process and add to html_sections

# Second pass: Process variant-level files
for file_path, section_name, source_path in variant_level_files:
    # ... process variants
```