---
Tue Nov 25 01:50:07 PM PST 2025

Fixed ToC generation in auto-discovery mode for HTML reports. The include_toc flag was being ignored when variant_datasets were provided. Updated html.py to collect headers during auto-discovery processing and insert ToC at the beginning of sections. Regenerated test HTML successfully with full table of contents showing all variant sections and subsections with functional anchor links.
---
Tue Nov 25 01:56:47 PM PST 2025

Implemented hierarchical HTML structure with variant dataset headers and collapsible ToC.

Key changes to html.py:
1. Added variant dataset section headers with gradient styling (purple gradient background)
2. Implemented collapsible ToC with <details>/<summary> elements for variant datasets and variants
3. Added proper hierarchy indentation in ToC (0px, 15px, 30px margins for H1, H2, H3)
4. Reorganized file processing to group by dataset, then by variant
5. Added CSS classes: .dataset-header, .toc-dataset, .toc-variant, .toc-dataset-list, .toc-variant-list
6. ToC structure: Dataset (collapsible) > Variants (collapsible) > Headers (with indent)

The HTML now follows the hierarchical structure described in README.md with proper visual organization and navigation.
---
Tue Nov 25 02:12:15 PM PST 2025

Updated README.md to document model-scatterplot label column functionality.

Added comprehensive documentation including:
- Command format with all new arguments (--label-cols, --label-names, --label-colors)
- Basic example without labels
- Advanced example with custom labels showing prioritization categories
- Key features explanation including:
  - Model selection options
  - Interactive vs static plots
  - Auto-discovery path generation
  - Custom labeling system with column format specification
  - Marker shape assignments
- Output file descriptions

Clarified that all points with data are shown by default - they are simply color-coded according to their label categories (or gray for "Other" if no label matches).
---
Mon Dec 2 01:37:00 AM PST 2025

Completed implementation of 4 major enhancements to varbook HTML generation:

## Enhancement 1: Subdataset Support
- Created varbook/utils/subdataset.py module with SubdatasetSpec class
- Implemented parsing for "DATASET:SUBDATASET" syntax (e.g., "KUN_FB:cluster_3")
- Added support for multiple filter types:
  * Cluster-based: "cluster_3" filters on kmeans_cluster column
  * Boolean columns: "is_peak", "is_promoter"
  * Categorical: "genomic_region:promoter"
  * Numeric comparisons: "logfc>0.5", "aaq<0.1"
- Integrated subdataset filtering into all plotting functions:
  * plot variant model-scatterplot
  * plot variant profiles
  * plot models heatmap
- Updated HTML generation to support hierarchical output paths with subdatasets

## Enhancement 2: Model Dataset Named Groups
- Added --model-dataset argument to plot commands
- Creates named hierarchy level in output paths (e.g., "KUN_FB Models")
- Allows organizing models into meaningful groups for reports
- Integrated into variant_dataset path structure
- Updated ToC generation to show model dataset groups

## Enhancement 3: Variants Table with DataTables
- Implemented interactive variants table using DataTables jQuery plugin
- Features:
  * Sorting/filtering on all columns
  * Pagination with customizable page sizes
  * User-editable dropdowns for "Relevance to HPOs" and "Confidence"
  * Expandable rows showing full variant analyses
  * State persistence (sorting/filtering remembered across page loads)
  * localStorage persistence for dropdown values
- Integrated with variants_tsv data for metadata columns
- Added proper CSS styling matching report theme
- Created metadata_config module for column display configuration

## Enhancement 4: Hierarchy Restructure
- Restructured path hierarchy to match 3-level system:
  * variant_dataset (top) > variant_subdataset (optional) > model_dataset (named group)
- Updated all output path generation functions
- Modified ToC structure to reflect hierarchy
- Added proper section nesting in HTML generation
- Created hierarchy visualizations in variants table expandable rows

All changes tested and working correctly with Snakemake workflow.
---
Wed Dec 3 01:36:50 PM PST 2025

Successfully added user-editable Summary column to variants table in varbook write html/html-live output.

## Implementation Details

### HTML Structure Changes (in _generate_variants_table_html):
1. Added "Summary" column header after "Confidence" column in table header
2. Added contenteditable div for each variant row:
   - Class: summary-input
   - Attributes: contenteditable="true", data-variant-id, placeholder="Add summary..."
   - Positioned as 4th column (after expand control, relevance, confidence)

### CSS Styling:
Added comprehensive styling for .summary-input:
- Base styling: padding, border, border-radius, min-width (200px), min/max height
- Placeholder styling: :empty:before pseudo-element with italic gray text
- Interactive states: hover (blue border), focus (blue shadow + light background)
- Text handling: pre-wrap, word-wrap, overflow-y auto for scrolling
- Font inheritance from table for consistency

### JavaScript Functionality:
1. **loadSummaryValues()**: Loads saved summaries from localStorage('variant_summary') on page load
2. **saveSummaryValue(variantId, value)**: Saves/deletes summary text to localStorage
3. **Event Listeners**:
   - blur event: Saves summary when user clicks away from field
   - input event: Auto-saves with 1-second debounce timer during typing
4. **DataTables Integration**:
   - Updated columnDefs to mark Summary column (target: 3) as non-sortable
   - Updated default sort to column 4 (first metadata column after new Summary)

### User Experience Features:
- Contenteditable div allows rich text editing in-place
- Auto-save prevents data loss (saves after 1 second of inactivity)
- Placeholder text provides clear user guidance
- localStorage persistence survives page refreshes
- Visual feedback on hover/focus states
- Scrollable content for long summaries (max 150px height)

### Storage Structure:
localStorage key: 'variant_summary'
Format: JSON object mapping variant_id -> summary_text
Example: {"chr10:52171043:A:G": "Interesting variant in promoter region", "chr10:3277486:T:C": "High confidence pathogenic"}

### Future Enhancement:
The user requested that summaries could optionally generate markdown files in the variant folders. This could be implemented as an export feature that:
1. Reads variant_summary from localStorage
2. Creates/updates summary.md files in varbook_gen/{variant_dataset}/{variant_subdataset}/{model_dataset}/{variant_id}/
3. Could be triggered via export button or automatic sync

This enhancement is not yet implemented but the data structure supports it.
---
Thu Dec 4 08:26:22 PM PST 2025

Modified Snakefile to use live HTML server instead of static HTML generation.

## Changes to snakemake/Snakefile

### Updated rule generate_html:
Changed from `varbook write html` to `varbook write html-live` command.

Key differences:
1. Command changed from `{VARBOOK_CMD} write html` to `{VARBOOK_CMD} write html-live`
2. Removed `--editable` flag (always enabled in html-live mode)
3. Added `--port 8765` flag to specify server port
4. Server will automatically start after HTML generation completes

### Expected Behavior:
When the Snakemake workflow completes and generates variant_report.html:
1. HTML file is generated with all plots and variant data
2. Flask server starts on http://localhost:8765
3. Browser automatically opens to display the HTML report (unless --no-auto-open is specified)
4. Users can edit content directly in browser
5. Edits auto-save to both HTML file and source markdown files
6. Server runs until manually stopped with Ctrl+C

### Implementation Details:
The html-live command (from varbook/__main__.py:851-878):
- Calls html.generate_html() with live_mode=True, editable=True
- Starts html_server.start_server() with specified port
- Server provides API endpoints for saving edited sections
- Updates both HTML report and source .md files on edit

This change enables live editing of the variant report with auto-save functionality,
making it easier to annotate variants and update descriptions without manually
editing markdown files.

---
Thu Dec 4 08:35:15 PM PST 2025

Added comprehensive variants TSV generation for HTML live-server with metadata merging.

## Changes to snakemake/Snakefile

### Added rule merge_comprehensive_variants (line 715):
New rule to create a comprehensive TSV file for HTML report generation by merging:
- {variant_dataset}.general.tsv - General variant information (coordinates, alleles, etc.)
- {variant_dataset}.patient_hpo_expanded.tsv - Patient HPO annotations
- {variant_dataset}.closest_elements.tsv - Closest genomic elements

Uses merge-columns with:
- --merge-column variant_id (merge key)
- --join-type outer (preserves all variants from all files)
- Output: data/{variant_dataset}.comprehensive.tsv

### Added helper function get_comprehensive_variants_tsvs() (line 538):
Generates list of comprehensive TSV file paths for all variant datasets in VARIANT_DATASET_CONFIGS.
Returns paths like: ["data/Broad neurological disorders.comprehensive.tsv"]

### Updated rule generate_html (line 1556):
Modified inputs and parameters:
1. Removed: variants_tsv = VARIANTS_TSV (static cluster_3 file)
2. Added: comprehensive_tsvs = get_comprehensive_variants_tsvs() (dynamic dependency)
3. Updated params.variant_datasets: Added quotes around dataset names for proper shell handling
4. Updated params.variants_tsv: Uses lambda to extract first comprehensive TSV from inputs

The comprehensive TSV includes all metadata needed for the interactive variants table in HTML:
- Variant coordinates and alleles
- Patient HPO terms and meanings
- Genomic context (closest elements)
- All other metadata columns for display and filtering

This ensures the HTML live-server has access to complete variant information for the
interactive DataTables display and user annotations.

---
Thu Dec 4 10:21:17 PM PST 2025

Fixed file discovery system to properly handle variant IDs and centralized profiles.

## Issues Found:
1. In html.py (line 2344): variant_ids_list was populated by directly iterating over variant_dataset directory, treating model_dataset directories ("Fetal Brain", "profiles") as variant IDs instead of actual variant IDs (chr:pos:ref:alt).

2. In pdf.py discover_plot_files() and get_before_after_files(): Did not handle centralized profiles directory specially, so profiles plots were not being discovered.

## Changes Made:

### varbook/write/html.py (lines 2336-2342):
Changed from manual directory iteration to using discover_variant_ids():
```python
# Before: Incorrectly iterated dataset_path.iterdir()
# After: Use discover_variant_ids to properly traverse hierarchy
for variant_dataset in variant_datasets:
    variant_ids = discover_variant_ids(output_dir, variant_dataset)
    variant_ids_list.extend(variant_ids)
```

### varbook/write/pdf.py discover_plot_files() (lines 135-143):
Added special handling for centralized profiles directory:
```python
# Special handling for centralized profiles directory
if plot_type == 'profiles' and variant_id:
    profiles_dir = dataset_dir / 'profiles' / variant_id
    if profiles_dir.exists():
        # Collect profile files directly from centralized location
        return sorted(files)
```

### varbook/write/pdf.py get_before_after_files() (lines 222-232):
Added similar special handling for profiles .before.md/.after.md files:
```python
# Special handling for centralized profiles directory
if plot_type == 'profiles' and variant_id:
    profiles_dir = dataset_dir / 'profiles' / variant_id
    # Look for before/after files in centralized location
```

## Structure Clarification:
Actual file structure shows:
- Variant directories under model_dataset are EMPTY placeholders
- Actual profile files are in centralized location: {variant_dataset}/profiles/{variant_id}/
- Model-dataset hierarchy: {variant_dataset}/{model_dataset}/[{cluster_name}/]{variant_id}/

The discovery functions now correctly:
- Find variant IDs from placeholder directories in model_dataset hierarchy
- Find profile plot files from centralized profiles directory
- Skip "profiles" and "heatmap" directories when traversing model_dataset level
- Handle optional cluster level by checking for ':' in directory names

---
Thu Dec  4 10:46:08 PM PST 2025

Fixed HTML hierarchy to match actual directory structure with clusters.

## Root Cause:
The Snakefile was passing only base variant dataset names (e.g., "Broad neurological disorders") to html-live, but the actual files are organized in a 3-level hierarchy:
`{variant_dataset}/{model_dataset}/{cluster}/{variant_id}/`

The HTML generator couldn't find files because it was looking in the wrong location.

## Actual Directory Structure:
- Centralized profiles: `varbook_gen/profiles/{variant_id}/{model}.png`
- Symlinked profiles: `varbook_gen/Broad neurological disorders/Fetal Brain/microglia-specific cluster (#3)/{variant_id}/03-profile-{model}.png` -> `../../../profiles/{variant_id}/{model}.png`
- Other plots: `varbook_gen/Broad neurological disorders/Fetal Brain/microglia-specific cluster (#3)/{variant_id}/01-model-specificity-barplot.png`

## Solution:
Created `get_variant_dataset_paths()` function in Snakefile to build full hierarchy paths for HTML generation.

### snakemake/Snakefile (lines 1558-1587):
```python
def get_variant_dataset_paths():
    """Build list of variant_dataset paths with full hierarchy."""
    paths = []
    for variant_dataset, model_dataset_configs in VARIANT_DATASET_CONFIGS.items():
        for model_dataset_config in model_dataset_configs:
            model_dataset_name = model_dataset_config['name']
            clusters = model_dataset_config.get('clusters', [])

            if clusters:
                for cluster in clusters:
                    cluster_name = cluster.get('name', cluster.get('id'))
                    # Build full path: variant_dataset/model_dataset/cluster
                    path = f"{variant_dataset}/{model_dataset_name}/{cluster_name}"
                    paths.append(path)
            else:
                path = f"{variant_dataset}/{model_dataset_name}"
                paths.append(path)
    return paths
```

### snakemake/Snakefile (line 1599):
```python
params:
    variant_datasets = " ".join(get_variant_dataset_paths()),
```

## Results:
- HTML will now be passed: `"Broad neurological disorders/Fetal Brain/microglia-specific cluster (#3)"`
- File discovery will find files in the correct clustered directory
- ToC hierarchy will properly show: Broad neurological disorders > Fetal Brain > microglia-specific cluster (#3) > Variants Table

---
Fri Dec  5 01:28:00 AM PST 2025

CHANGE: Filter out .before.md and .after.md files from HTML generation
=======================================================================

## Modification
Added filter to build_hierarchical_file_list() in varbook/write/pdf.py (lines 460-465)
to exclude .before.md and .after.md files from the final file list.

## Implementation
```python
# Filter out .before.md and .after.md files
file_list = [
    (file, label, source)
    for file, label, source in file_list
    if not file.name.endswith('.before.md') and not file.name.endswith('.after.md')
]
```

## Rationale
After moving headers from .after.md to main .md files, the .before.md and .after.md 
files now only contain a single space character. Processing them as separate entries:
- Creates unnecessary empty sections in HTML
- Adds processing overhead
- Can cause layout issues

Since all meaningful content (headers and images) is now in the main .md files,
we can safely skip the .before.md and .after.md files entirely.

## Impact
- HTML will only process main .md files (e.g., 01-heatmap.md, 01-model-specificity-barplot.md)
- Headers from main .md files will be displayed correctly
- No extra whitespace or empty sections
- Cleaner HTML output

## Files Modified
- varbook/write/pdf.py: Added filter at line 460-465

---
$(date)

Fixed markdown section saving after hierarchy restructuring.

## Root Cause:
After restructuring datasets to fit the 3-level hierarchy (variant_dataset/model_dataset/cluster), section IDs changed because they were generated using the full hierarchical path instead of just the base dataset name.

Before: section_id = "Broad_neurological_disorders_chr1_123_A_G_01-model-specificity-barplot"
After: section_id = "Broad_neurological_disorders_Fetal_Brain_microglia-specific_cluster___3__chr1_123_A_G_01-model-specificity-barplot"

This caused a mismatch between section IDs in the HTML and what the browser was sending to save, so markdown edits couldn't be saved.

## Solution:
Modified html.py line 2656 to use `dataset_name` (extracted from SubdatasetSpec) instead of the full `variant_dataset` path when generating section IDs.

### varbook/varbook/write/html.py:2656
```python
# Before:
section_id = f"{variant_dataset}_{current_variant}_{file_path.stem}".replace(':', '_').replace('/', '_').replace(' ', '_')

# After:
section_id = f"{dataset_name}_{current_variant}_{file_path.stem}".replace(':', '_').replace('/', '_').replace(' ', '_')
```

The `dataset_name` variable is already extracted at line 2470 using `spec.dataset`, which gives us just the base dataset name (e.g., "Broad neurological disorders") without the model_dataset and cluster path components.

This ensures section IDs remain stable regardless of the hierarchical structure used to organize files.