curate-h5ad — for Claude Code curate-h5ad, hca-validation-tools, community, for Claude Code, ide skills, $ARGUMENTS, evaluate-h5ad, cellannotation_schema_version, cellannotation_metadata, description

v1.0.0

このスキルについて

適した場面: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments. ローカライズされた概要: # Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

機能

Curate the h5ad file at absolute path: $ARGUMENTS
The target schemas are:
Rules (do not break)
Ground every fix in validator + evaluator output. Run both before proposing anything — don't
Step 1 — Gather findings

# Core Topics

clevercanary clevercanary
[1]
[0]
Updated: 4/21/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 8/11

This page remains useful for operators, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution
Review Score
8/11
Quality Score
45
Canonical Locale
en
Detected Body Locale
en

適した場面: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments. ローカライズされた概要: # Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

このスキルを使用する理由

推奨ポイント: curate-h5ad helps agents curate the h5ad file at absolute path: $arguments. Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill

おすすめ

適した場面: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments.

実現可能なユースケース for curate-h5ad

ユースケース: Applying Curate the h5ad file at absolute path: $ARGUMENTS
ユースケース: Applying The target schemas are:
ユースケース: Applying Rules (do not break)

! セキュリティと制限

  • 制約事項: Rules (do not break)
  • 制約事項: Ground every fix in validator + evaluator output. Run both before proposing anything — don't assume what's wrong.
  • 制約事項: Do NOT guess a default

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.
  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

After The Review

Decide The Next Action Before You Keep Reading Repository Material

Killer-Skills should not stop at opening repository instructions. It should help you decide whether to install this skill, when to cross-check against trusted collections, and when to move into workflow rollout.

Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is curate-h5ad?

適した場面: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments. ローカライズされた概要: # Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

How do I install curate-h5ad?

Run the command: npx killer-skills add clevercanary/hca-validation-tools. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for curate-h5ad?

Key use cases include: ユースケース: Applying Curate the h5ad file at absolute path: $ARGUMENTS, ユースケース: Applying The target schemas are:, ユースケース: Applying Rules (do not break).

Which IDEs are compatible with curate-h5ad?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for curate-h5ad?

制約事項: Rules (do not break). 制約事項: Ground every fix in validator + evaluator output. Run both before proposing anything — don't assume what's wrong.. 制約事項: Do NOT guess a default.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add clevercanary/hca-validation-tools. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use curate-h5ad immediately in the current project.

! Reference-Only Mode

This page remains useful for installation and reference, but Killer-Skills no longer treats it as a primary indexable landing page. Read the review above before relying on the upstream repository instructions.

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

curate-h5ad

# Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

Curate H5AD File

Curate the h5ad file at absolute path: $ARGUMENTS

Pass an absolute path to the .h5ad file. Relative paths are resolved against the MCP server's working directory, which may not match the user's.

/evaluate-h5ad identifies problems. /curate-h5ad applies the safe, mechanical fixes and hands back a punch list of everything still needing a curator's decision or upstream data.

The target schemas are:

Rules (do not break)

  1. Never invent metadata values. If a required value isn't already in the file or derivable from it, emit a todo asking the wrangler for it. Do NOT guess a default. Examples of fields that always need wrangler input: description, ambient_count_correction, doublet_detection, default_embedding.
  2. Ground every fix in validator + evaluator output. Run both before proposing anything — don't assume what's wrong.
  3. replace_placeholder_values is restricted to library_preparation_batch and library_sequencing_run. Never run it on other columns. Other placeholder-looking values (e.g. "unknown" in author_cell_type) need curator-reviewed mappings, not a blanket NaN conversion.

Step 1 — Gather findings

Start with the evaluator, then gate the HCA validator on the schema it reports:

  • Run /evaluate-h5ad $ARGUMENTS — produces the structured overview report (schema type, X verdict, metadata, storage, embeddings, CAP, edit history, summary). This already calls check_schema_type and check_x_normalization, so their verdicts are available for Step 2 gating without a separate tool call.
  • If the evaluator reports schema: "hca", run validate_schema $ARGUMENTS — the HCA schema validator (is_valid, full errors and warnings lists). These are the authoritative blocking/advisory signals for Bucket A decisions. Feature-ID warnings are ordered last; summarize repeated shapes in the punch list rather than pasting thousands of lines verbatim.
  • If the evaluator reports schema: "cellxgene", do not run validate_schema yet — the HCA validator would report a large, mostly irrelevant error list. convert_cellxgene_to_hca moves into Bucket A; after it runs, re-enter Step 1 on the converted file to get the accurate HCA findings.

Step 2 — Classify every finding into one bucket

Bucket A — Mechanical (safe to run after approval)

Only these are in Bucket A. Nothing else. A row belongs in A only when its preconditions are already satisfied at punch-list time — don't pre-list rows whose inputs depend on an unanswered B question (e.g. set_uns('default_embedding', …) belongs in B2 until the wrangler picks a value, then gets promoted to A per Step 3).

  • convert_cellxgene_to_hca — when check_schema_type reports schema: "cellxgene". Must run first: it reshapes the file into HCA layout before any other fix makes sense, and the other tools (including validate_schema) assume HCA layout. After conversion, re-enter Step 1 on the converted file to get an accurate Bucket A/B/C list.
  • normalize_raw — when check_x_normalization reports verdict: "raw_counts" and has_raw_x: false. Deterministic: moves X→raw.X, normalizes X with normalize_total(target_sum=10000) + log1p.
  • replace_placeholder_values on library_preparation_batch — only if the column actually contains placeholder values flagged by the validator.
  • replace_placeholder_values on library_sequencing_run — same condition.
  • label_h5ad — always eligible once the file is in HCA layout (and any prior Bucket A items have run). Populates var['feature_name'] + feature_reference / feature_biotype / feature_length / feature_type from Ensembl IDs via vendored GENCODE (mirrored to raw.var when present), writes the eight obs ontology labels (tissue, cell_type, assay, disease, sex, organism, development_stage, self_reported_ethnicity) from their _ontology_term_id counterparts, and writes obs['observation_joinid']. Unconditionally overwrites any producer-provided values in those controlled columns — call it out in the report so the wrangler sees what changed. Unknown Ensembl IDs yield NaN across the five feature_* columns for that row (not an error). Preflight refuses to run when the file carries uns['schema_version'] / uns['schema_reference'] or any non-human obs['organism_ontology_term_id'] — those go to Bucket C below.
  • copy_cap_annotations — only if the wrangler provided a CAP source file in Step 3. Copies annotation sets + cellannotation_schema_version + cellannotation_metadata from the source into the target. Partial overlap is allowed: the source and target obs indexes only need to match at ≥95% in both directions (target-covered and source-covered); target rows absent from source get NaN in the new CAP columns. If the overlap is below 95% the tool aborts — treat that as a Bucket B item and bring it back to the wrangler (usually the CAP source is stale or wrong).
  • compress_h5ad — when get_storage_info shows no HDF5 filter on X's underlying dataset (X.data.compression for sparse X, X.compression for dense X). If the file is already compressed, the tool safely returns {skipped: true, reason: ...} rather than rewriting. Pure compression, no data change.

Bucket B — Needs wrangler input (todo — stop and ask)

Split these into two classes so the wrangler sees which items actually block validation vs. which are recommended-but-optional. The primary blocking signal is validate_schema — any error it reports (on obs, var, or uns) blocks. Use list_uns_fields as a secondary signal for missing uns fields specifically: required: true fields that are unset are blocking; required: false fields that are unset are recommended at most.

For each item, write a concrete question. For B1 items, do not include a suggested answer — ask only for the missing required value. For B2 items, if there's an obvious single valid option (e.g. only one 2D embedding exists), you may phrase it as a confirmation question ("X_umap — confirm?") rather than silently deciding.

B1 — Blocking (validator errors or unset required: true fields)

  • Missing required uns fields (e.g. study_pi) — ask for the value(s).
  • No CAP annotation set present — the file must ship with at least one CAP annotation set (see the HCA Cell Annotation schema). Ask the wrangler to provide a local path to a CAP-exported version of this file (same cells, with CAP annotation sets populated) — copy_cap_annotations reads the source via AnnData/h5py so a URL must be downloaded locally first. If supplied, copy_cap_annotations becomes a mechanical fix for Step 4.
  • Any other uns field the validator flags as missing.

B2 — Recommended (optional fields the wrangler may want to set)

Only the fields explicitly named below belong in B2. Do not scan list_uns_fields for other unset optional fields and invent questions about them — a field being optional-and-unset is not itself a reason to ask. The skill's scope is the explicit tool list (convert_cellxgene_to_hca, normalize_raw, replace_placeholder_values, label_h5ad, copy_cap_annotations, set_uns on the named fields here, compress_h5ad); everything else is the wrangler's call, unprompted.

  • default_embedding — list the obsm keys and ask which one. Optional per schema, but a file shipped without it will display in CELLxGENE Explorer with no default scatter. Must name a 2D embedding to actually plot; 30D latents (e.g. X_scVI) are valid per schema but won't display. If only one 2D embedding exists, surface that — the wrangler will almost certainly pick it.

If the wrangler answers a B2 item during the session, that answer becomes a set_uns mechanical fix (promoted to Bucket A) for Step 4.

Bucket C — Upstream / curator judgment (out of scope for this skill)

Report these but don't attempt to fix:

  • High NaN rates on non-allowed columns (e.g. library_id) — needs real values from source.
  • Sparse or missing ambient_count_correction / doublet_detection obs columns — per-cell values must come from the upstream source (each source dataset's processing record). Do not broadcast a single value. Report fill rate and move on.
  • Delimited-list values in single-identifier columns (e.g. library_preparation_batch containing "lib1; lib2; lib3") — needs per-cell resolution, not placeholder replacement.
  • Gene IDs missing from the current GENCODE — needs annotation-version decision.
  • Inconsistent author_cell_type variants — needs a curator mapping.
  • (CAP annotations are handled in Bucket B above — the wrangler provides a CAP source file and copy_cap_annotations runs mechanically.)
  • Cells whose labels don't match the atlas focus (e.g. non-myeloid labels in a myeloid atlas) — needs a curator decision on keep/drop.
  • File carries uns['schema_version'] or uns['schema_reference'] — signals it has already been through cellxgene-schema add-labels. label_h5ad refuses to run; upstream needs to re-emit without those keys. Do not strip them here.
  • Any obs['organism_ontology_term_id'] value other than NCBITaxon:9606label_h5ad is human-only. Supporting another organism is a code change, not a curation fix.

Step 3 — Present the punch list

Show these sections: A (will run), B1 (blocking — needs your answer), B2 (recommended — optional), C (still to do, out of scope). Then stop and wait for explicit approval before running anything.

If the wrangler answers any Bucket B items (B1 or B2), promote those to Bucket A as the appropriate mechanical action: set_uns for answered uns values (e.g. default_embedding, study_pi), copy_cap_annotations when the answer is a CAP source file path.

Step 4 — Run the mechanical fixes

Order:

  1. convert_cellxgene_to_hca first if applicable — then stop, re-run Steps 1–3 on the converted file before continuing (conversion changes the layout enough that the prior punch list is stale).
  2. Content edits, in this order: normalize_raw, each replace_placeholder_values, label_h5ad, copy_cap_annotations (if a source was supplied), and any set_uns approved in Step 3. label_h5ad must run before copy_cap_annotationscopy_cap_annotations calls validate_marker_genes, which reads var['feature_name']; running the labeler first gives marker-gene validation canonical gene symbols to match against.
  3. compress_h5ad last.

Each tool writes a new timestamped file. For most subsequent calls, passing either the original path or the latest works — resolve_latest picks up the newest variant automatically. Two exceptions: convert_cellxgene_to_hca does not auto-resolve (call it with the exact path you want to convert), and copy_cap_annotations only auto-resolves its target_path (the source_path is used verbatim).

Step 5 — Report

Re-run view_edit_log on the final file, then produce a structured report with these sections in order. Also re-run validate_schema — but only if check_schema_type reports hca on the final file. If the file is still CellxGENE (e.g. conversion wasn't approved), skip the validator rerun and note why under "Validator delta" instead of pasting a misleading error list. Use markdown tables; skip any section with no content.

For the Provenance line below, re-run get_summary on the final file to fetch its obs columns, then run get_descriptive_stats with columns set to the intersection of ["donor_id", "sample_id", "library_id", "dataset_id"] and the final file's obs column names (extract name from each {name, dtype} object in get_summary.obs_columns — it's a list of objects, not plain strings). The intersection avoids erroring on absent columns.

One short paragraph or bullet block with: final file path, shape (n_obs × n_vars), title from uns, schema type (include version only when schema is CellxGENE — HCA is unversioned), X verdict + raw.X presence, compression status, obsm keys present. Add a Provenance line: N donors · M samples · K libraries · D source datasets — pulled from get_descriptive_stats.columns[<col>].unique for each column (the stats are nested under a columns dict keyed by column name). Omit any metric whose column is absent OR whose column is present but unpopulated (columns[<col>].unique == 0, equivalently columns[<col>].n_nan == n_rows) so an all-NaN column doesn't render as "0 libraries". dataset_id is not a schema field (optional integrator convention); absent is normal.

Mechanical fixes applied

#OperationEffect
1normalize_rawe.g. "Moved raw counts → raw.X; normalized X with normalize_total(target_sum=10000) + log1p"
2replace_placeholder_values (library_preparation_batch)e.g. "N cells: 'unknown' → NaN"
3label_h5ade.g. "Populated var['feature_name'] for 34,505/35,574 rows (1,069 NaN); wrote 8 obs ontology labels; overwrote pre-existing tissue, disease". Fill the overwrite clause from the tool result's obs_label_cols_overwritten / var_feature_name_overwritten; omit the clause when both are empty.
4copy_cap_annotationsname the CAP source file
5compress_h5ade.g. "Skipped — already gzipped" or "Rewrote X with gzip level 4"

Only include the rows for tools that actually ran this session.

Validator delta

BeforeAfter
ErrorsNM
Non-feature-ID warningsNM
CAP zero-observation warningsNM
Named warnings resolvede.g. "raw.X absent", "unknown placeholder in library_preparation_batch"

Count CAP "zero observations" warnings (text: contains a category '...' with zero observations) separately from other warnings. These are expected after copy_cap_annotations: CAP declares a closed vocabulary per annotation set that spans all lineages, and a per-lineage file only realizes a subset — unused vocabulary terms are intentional schema information, not a defect. Report the count and move on; don't prune them. The validator's --add-labels remediation note comes from vendored CellxGENE code and does not apply to HCA.

Also list the specific error/warning kinds that disappeared or newly appeared, one line each.

CAP overlap (only if copy_cap_annotations ran this session, or a prior import_cap_annotations entry is in the edit log)

Pull from the latest import_cap_annotations entry's details:

MetricValue
CAP source filecap_source_file
source_n_obs
target_n_obs
matched_n_obs
match_fraction_of_sourceas %
match_fraction_of_targetas %

CAP marker validation (only if copy_cap_annotations ran this session, or a prior import_cap_annotations entry is in the edit log)

Source the numbers from the copy_cap_annotations tool result's marker_gene_validation field if it ran this session. If only a prior import_cap_annotations entry exists, call validate_marker_genes on the final file to get fresh numbers — a marker list that matched against var.index before label_h5ad populated var['feature_name'] will look very different now.

Marker symbols are resolved against the target's var gene-name source: var['feature_name'] is preferred, else var['gene_name'], else var.index (the Ensembl IDs) as a last resort. Post-label_h5ad files always have feature_name; files that skipped labeling fall back to whatever the producer shipped.

MetricValue
Total unique markers
Found in var gene-name source
Missing

For each missing marker, list it with its classification exactly as returned by the tool — not_in_gencode (marker symbol doesn't resolve to any GENCODE entry — typo, glob pattern, or deprecated rename), missing_from_var (valid symbol but not present in this file's gene set), or known_rename (submitted marker is a deprecated symbol; the tool provides the current target in var_name, plus ensembl_id when available):

MarkerClassificationVar nameEnsembl ID

Leave Var name / Ensembl ID blank for not_in_gencode and missing_from_var rows — those fields are only populated on known_rename. If all markers hit, say so in one line instead of an empty table. not_in_gencode entries point at CAP-side fixes (ask the CAP curator); missing_from_var points at target-side gaps (different gene set than the one CAP was authored against); known_rename entries should report the rename target from var_name so the mismatch is explicit.

Still to do

Bucket B1 — blocking (validator errors or unset required: true fields)

FieldQuestion
study_piwho are the PI(s)? e.g. ["Teichmann,Sarah,A."]

Bucket B2 — recommended (optional)

FieldQuestion
default_embeddingX_umap (only 2D option) — confirm?

Bucket C — upstream / curator

IssueDetail
library_id NaN (validator error)Needs real values from source

Only surface items that are still open — don't re-list anything resolved this session. Omit any of the three sub-tables that have no entries.

関連スキル

Looking for an alternative to curate-h5ad or another community skill for your workflow? Explore these related open-source skills.

すべて表示

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

Generate customizable widget plugins for the prompts.chat feed system

149.6k
0
AI

flags

Logo of vercel
vercel

The React Framework

138.4k
0
ブラウザ

pr-review

Logo of pytorch
pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

98.6k
0
開発者