--- Thu Dec 4 22:00:00 PM PST 2025 Updated file discovery functions to support 3-level hierarchy. ## Changes to varbook/write/pdf.py ### Updated discover_variant_ids() (lines 41-97): Modified to traverse the full hierarchy structure: ``` {variant_dataset}/ {model_dataset}/ (e.g., "Fetal Brain") heatmap/ (skip) {cluster_name}/ (e.g., "microglia-specific cluster (#3)") {variant_id}/ (e.g., "chr10:123:A:G") profiles/ (skip - centralized storage) ``` Now correctly: - Iterates through model_dataset directories (skipping "profiles") - Within each model_dataset, iterates through cluster directories (skipping "heatmap") - Within each cluster, finds variant_id directories (identified by ':' in name) - Returns unique variant IDs from across all model_datasets/clusters ### Updated discover_plot_files() (lines 100-166): Modified to find plot files within the hierarchy: For dataset-level plots (variant_id is empty): - Searches: {variant_dataset}/{model_dataset}/{plot_type}/ For variant-level plots: - Searches: {variant_dataset}/{model_dataset}/{cluster_name}/{variant_id}/{plot_type}/ Collects files from all matching paths across the hierarchy. ### Updated get_before_after_files() (lines 169-231): Modified to find .before.md and .after.md files within the hierarchy: - Traverses same hierarchy as discover_plot_files() - Returns first match found - Handles both dataset-level and variant-level files These changes ensure the auto-discovery system works correctly with the 3-level hierarchical structure defined in the Snakefile. --- Thu Dec 4 10:58:22 PM PST 2025 Fixed file discovery to match actual Snakefile output structure where plots are directly in variant directories. ## Root Cause: The `discover_plot_files()` function in pdf.py was looking for plots in subdirectories like `{variant_id}/{plot_type}/file.png`, but the Snakefile generates plots directly in the variant directory as `{variant_id}/01-model-specificity-barplot.png`, `{variant_id}/02-model-scatterplot.html`, `{variant_id}/03-profile-{model}.png`. ## Actual File Structure (from Snakefile): ``` varbook_gen/Broad neurological disorders/Fetal Brain/microglia-specific cluster (#3)/ chr10:114282405:A:T/ 00-intro.md 01-model-specificity-barplot.png 01-model-specificity-barplot.before.md 01-model-specificity-barplot.after.md 02-model-scatterplot.html 03-profile-KUN_FB_microglia.png (symlink to ../../../profiles/chr10:114282405:A:T/KUN_FB_microglia.png) ``` ## Changes Made: ### varbook/write/pdf.py discover_plot_files() (lines 148-166): Updated the direct-level check to look for plots directly in variant directory and match by plot_type in filename: ```python if variant_id: direct_variant_dir = dataset_dir / variant_id if direct_variant_dir.exists() and direct_variant_dir.is_dir(): for plot_file in direct_variant_dir.iterdir(): if plot_file.is_file() and plot_file.suffix.lower() in ['.md', '.html', '.htm', '.png', '.svg', '.pdf']: # Skip before/after files if plot_file.name.endswith('.before.md') or plot_file.name.endswith('.after.md'): continue # Match files that contain the plot_type in their name # e.g., "model-scatterplot" matches "02-model-scatterplot.html" if plot_type in plot_file.name: files.append(plot_file) # If we found files at this level, we're done if files: return sorted(files) ``` This correctly handles: - Files directly in variant directory (not in subdirectories) - Matching by plot_type substring in filename - Skipping .before.md and .after.md files - Symlinked profile files (no special handling needed) ## Results: - File discovery now works when variant_dataset = "Broad neurological disorders:Fetal Brain:microglia-specific cluster (#3)" - Finds all plot files including symlinked profiles - HTML expandable rows will have content to display ## Additional Fix - Profile Filename Matching: The profile files are named with singular "profile" (e.g., `03-profile-KUN_FB_microglia.png`) but the plot_type parameter is "profiles" (plural). Added singular form matching to handle this: ```python plot_type_singular = plot_type.rstrip('s') if plot_type.endswith('s') else plot_type if plot_type in plot_file.name or plot_type_singular in plot_file.name: files.append(plot_file) ``` ## Test Results (from snakemake/ directory): - ✅ Found 226 variants - ✅ model-scatterplot: 2 files (html + md) - ✅ model-specificity-barplot: 3 files (md + png + svg) - ✅ profiles: 1+ files (symlinked PNG files) File discovery now works correctly for all plot types when called from the snakemake directory with the full hierarchy path. --- $(date) Fixed heatmap not showing in HTML report after hierarchy restructuring. ## Root Cause: Model-dataset level files (like 00-intro.md, 01-heatmap.png) were being detected and skipped with a `continue` statement in the variant processing loop, with a comment saying they're "already in html_sections". However, they were never actually being added to html_sections anywhere, so they were simply being ignored. The code structure had: 1. A loop processing all files 2. Detection logic to skip model-level files (assuming they're already processed) 3. No actual processing of model-level files before the skip ## Solution: Restructured the file processing into two passes: ### First pass (lines 2549-2589): - Separate model-level files from variant-level files - Process model-level files (00-intro, 01-heatmap, etc.) and add directly to html_sections - These files are identified by having numeric prefixes and no colons in the filename ### Second pass (lines 2591+): - Process only variant-level files - Removed redundant detection and skip logic for model-level files This ensures heatmaps and other model-dataset level files are included in the HTML output before variant processing begins. ### varbook/varbook/write/html.py:2549-2589 ```python # First pass: Process model-dataset level files model_level_files = [] variant_level_files = [] for file_path, section_name, source_path in dataset_files: parts = section_name.split(' - ') if len(parts) >= 2: potential_id = parts[1] if potential_id and potential_id[0].isdigit() and ':' not in potential_id: model_level_files.append((file_path, section_name, source_path)) else: variant_level_files.append((file_path, section_name, source_path)) # Process model-level files and add to html_sections for file_path, section_name, source_path in model_level_files: # ... process and add to html_sections # Second pass: Process variant-level files for file_path, section_name, source_path in variant_level_files: # ... process variants ```