Data Dictionary Update
The data dictionary at docs/data-dictionary.md must be kept in sync with the codebase.
This skill is invoked automatically by Claude whenever relevant files are modified.
Trigger Files
Invoke this skill whenever any of these files are modified:
R/data_loading.R— loading functions, column names, return valuesR/schemas.R— Arrow schemas, factor levels, validation rulesR/cache.R— caching behavior changesworkflow/scripts/generate_variant_parquet.py— variant table output columnsworkflow/scripts/count_variants_by_genomic_context.py— variant count output columnsworkflow/scripts/compute_bed_metrics.py— metrics output columns- Any
workflow/rules/*.smkfile that changes output file formats
What to Check
- New columns: Added to a loading function or pipeline script → document in relevant section
- Removed columns: Dropped from a function → remove from data dictionary
- Renamed columns: Renamed anywhere → update both the data dictionary and verify no other code uses the old name
- Factor levels changed: Added/removed values in
schemas.R→ update documented valid values - New cached datasets: New
load_*function added → add full dataset documentation section - New pipeline outputs: New rule output format → add to pipeline outputs section
Process
- Read
docs/data-dictionary.mdto understand current state - Read the modified file(s) to understand what changed
- Identify which sections of the data dictionary are affected
- Propose specific updates — show exact text changes
- Apply updates after user confirmation (or directly if changes are unambiguous)