curate-h5ad — for Claude Code curate-h5ad, hca-validation-tools, community, for Claude Code, ide skills, $ARGUMENTS, evaluate-h5ad, cellannotation_schema_version, cellannotation_metadata, description

v1.0.0

关于此技能

适用场景: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments. 本地化技能摘要: # Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

功能特性

Curate the h5ad file at absolute path: $ARGUMENTS
The target schemas are:
Rules (do not break)
Ground every fix in validator + evaluator output. Run both before proposing anything — don't
Step 1 — Gather findings

# 核心主题

clevercanary clevercanary
[1]
[0]
更新于: 4/21/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 8/11

This page remains useful for operators, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution
Review Score
8/11
Quality Score
45
Canonical Locale
en
Detected Body Locale
en

适用场景: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments. 本地化技能摘要: # Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

核心价值

推荐说明: curate-h5ad helps agents curate the h5ad file at absolute path: $arguments. Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports

适用 Agent 类型

适用场景: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments.

赋予的主要能力 · curate-h5ad

适用任务: Applying Curate the h5ad file at absolute path: $ARGUMENTS
适用任务: Applying The target schemas are:
适用任务: Applying Rules (do not break)

! 使用限制与门槛

  • 限制说明: Rules (do not break)
  • 限制说明: Ground every fix in validator + evaluator output. Run both before proposing anything — don't assume what's wrong.
  • 限制说明: Do NOT guess a default

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.
  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

评审后的下一步

先决定动作,再继续看上游仓库材料

Killer-Skills 的主价值不应该停在“帮你打开仓库说明”,而是先帮你判断这项技能是否值得安装、是否应该回到可信集合复核,以及是否已经进入工作流落地阶段。

实验室 Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

常见问题与安装步骤

以下问题与步骤与页面结构化数据保持一致,便于搜索引擎理解页面内容。

? FAQ

curate-h5ad 是什么?

适用场景: Ideal for AI agents that need curate the h5ad file at absolute path: $arguments. 本地化技能摘要: # Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

如何安装 curate-h5ad?

运行命令:npx killer-skills add clevercanary/hca-validation-tools/curate-h5ad。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

curate-h5ad 适用于哪些场景?

典型场景包括:适用任务: Applying Curate the h5ad file at absolute path: $ARGUMENTS、适用任务: Applying The target schemas are:、适用任务: Applying Rules (do not break)。

curate-h5ad 支持哪些 IDE 或 Agent?

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

curate-h5ad 有哪些限制?

限制说明: Rules (do not break);限制说明: Ground every fix in validator + evaluator output. Run both before proposing anything — don't assume what's wrong.;限制说明: Do NOT guess a default。

安装步骤

  1. 1. 打开终端

    在你的项目目录中打开终端或命令行。

  2. 2. 执行安装命令

    运行:npx killer-skills add clevercanary/hca-validation-tools/curate-h5ad。CLI 会自动识别 IDE 或 AI Agent 并完成配置。

  3. 3. 开始使用技能

    curate-h5ad 已启用,可立即在当前项目中调用。

! 参考页模式

此页面仍可作为安装与查阅参考,但 Killer-Skills 不再把它视为主要可索引落地页。请优先阅读上方评审结论,再决定是否继续查看上游仓库说明。

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

curate-h5ad

# Curate H5AD File Curate the h5ad file at absolute path: $ARGUMENTS Pass an absolute path to the .h5ad file. This AI agent skill supports Claude Code

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

Curate H5AD File

Curate the h5ad file at absolute path: $ARGUMENTS

Pass an absolute path to the .h5ad file. Relative paths are resolved against the MCP server's working directory, which may not match the user's.

/evaluate-h5ad identifies problems. /curate-h5ad applies the safe, mechanical fixes and hands back a punch list of everything still needing a curator's decision or upstream data.

The target schemas are:

Rules (do not break)

  1. Never invent metadata values. If a required value isn't already in the file or derivable from it, emit a todo asking the wrangler for it. Do NOT guess a default. Examples of fields that always need wrangler input: description, ambient_count_correction, doublet_detection, default_embedding.
  2. Ground every fix in validator + evaluator output. Run both before proposing anything — don't assume what's wrong.
  3. replace_placeholder_values is restricted to library_preparation_batch and library_sequencing_run. Never run it on other columns. Other placeholder-looking values (e.g. "unknown" in author_cell_type) need curator-reviewed mappings, not a blanket NaN conversion.

Step 1 — Gather findings

Start with the evaluator, then gate the HCA validator on the schema it reports:

  • Run /evaluate-h5ad $ARGUMENTS — produces the structured overview report (schema type, X verdict, metadata, storage, embeddings, CAP, edit history, summary). This already calls check_schema_type and check_x_normalization, so their verdicts are available for Step 2 gating without a separate tool call.
  • If the evaluator reports schema: "hca", run validate_schema $ARGUMENTS — the HCA schema validator (is_valid, full errors and warnings lists). These are the authoritative blocking/advisory signals for Bucket A decisions. Feature-ID warnings are ordered last; summarize repeated shapes in the punch list rather than pasting thousands of lines verbatim.
  • If the evaluator reports schema: "cellxgene", do not run validate_schema yet — the HCA validator would report a large, mostly irrelevant error list. convert_cellxgene_to_hca moves into Bucket A; after it runs, re-enter Step 1 on the converted file to get the accurate HCA findings.

Step 2 — Classify every finding into one bucket

Bucket A — Mechanical (safe to run after approval)

Only these are in Bucket A. Nothing else. A row belongs in A only when its preconditions are already satisfied at punch-list time — don't pre-list rows whose inputs depend on an unanswered B question (e.g. set_uns('default_embedding', …) belongs in B2 until the wrangler picks a value, then gets promoted to A per Step 3).

  • convert_cellxgene_to_hca — when check_schema_type reports schema: "cellxgene". Must run first: it reshapes the file into HCA layout before any other fix makes sense, and the other tools (including validate_schema) assume HCA layout. After conversion, re-enter Step 1 on the converted file to get an accurate Bucket A/B/C list.
  • normalize_raw — when check_x_normalization reports verdict: "raw_counts" and has_raw_x: false. Deterministic: moves X→raw.X, normalizes X with normalize_total(target_sum=10000) + log1p.
  • replace_placeholder_values on library_preparation_batch — only if the column actually contains placeholder values flagged by the validator.
  • replace_placeholder_values on library_sequencing_run — same condition.
  • label_h5ad — always eligible once the file is in HCA layout (and any prior Bucket A items have run). Populates var['feature_name'] + feature_reference / feature_biotype / feature_length / feature_type from Ensembl IDs via vendored GENCODE (mirrored to raw.var when present), writes the eight obs ontology labels (tissue, cell_type, assay, disease, sex, organism, development_stage, self_reported_ethnicity) from their _ontology_term_id counterparts, and writes obs['observation_joinid']. Unconditionally overwrites any producer-provided values in those controlled columns — call it out in the report so the wrangler sees what changed. Unknown Ensembl IDs yield NaN across the five feature_* columns for that row (not an error). Preflight refuses to run when the file carries uns['schema_version'] / uns['schema_reference'] or any non-human obs['organism_ontology_term_id'] — those go to Bucket C below.
  • copy_cap_annotations — only if the wrangler provided a CAP source file in Step 3. Copies annotation sets + cellannotation_schema_version + cellannotation_metadata from the source into the target. Partial overlap is allowed: the source and target obs indexes only need to match at ≥95% in both directions (target-covered and source-covered); target rows absent from source get NaN in the new CAP columns. If the overlap is below 95% the tool aborts — treat that as a Bucket B item and bring it back to the wrangler (usually the CAP source is stale or wrong).
  • compress_h5ad — when get_storage_info shows no HDF5 filter on X's underlying dataset (X.data.compression for sparse X, X.compression for dense X). If the file is already compressed, the tool safely returns {skipped: true, reason: ...} rather than rewriting. Pure compression, no data change.

Bucket B — Needs wrangler input (todo — stop and ask)

Split these into two classes so the wrangler sees which items actually block validation vs. which are recommended-but-optional. The primary blocking signal is validate_schema — any error it reports (on obs, var, or uns) blocks. Use list_uns_fields as a secondary signal for missing uns fields specifically: required: true fields that are unset are blocking; required: false fields that are unset are recommended at most.

For each item, write a concrete question. For B1 items, do not include a suggested answer — ask only for the missing required value. For B2 items, if there's an obvious single valid option (e.g. only one 2D embedding exists), you may phrase it as a confirmation question ("X_umap — confirm?") rather than silently deciding.

B1 — Blocking (validator errors or unset required: true fields)

  • Missing required uns fields (e.g. study_pi) — ask for the value(s).
  • No CAP annotation set present — the file must ship with at least one CAP annotation set (see the HCA Cell Annotation schema). Ask the wrangler to provide a local path to a CAP-exported version of this file (same cells, with CAP annotation sets populated) — copy_cap_annotations reads the source via AnnData/h5py so a URL must be downloaded locally first. If supplied, copy_cap_annotations becomes a mechanical fix for Step 4.
  • Any other uns field the validator flags as missing.

B2 — Recommended (optional fields the wrangler may want to set)

Only the fields explicitly named below belong in B2. Do not scan list_uns_fields for other unset optional fields and invent questions about them — a field being optional-and-unset is not itself a reason to ask. The skill's scope is the explicit tool list (convert_cellxgene_to_hca, normalize_raw, replace_placeholder_values, label_h5ad, copy_cap_annotations, set_uns on the named fields here, compress_h5ad); everything else is the wrangler's call, unprompted.

  • default_embedding — list the obsm keys and ask which one. Optional per schema, but a file shipped without it will display in CELLxGENE Explorer with no default scatter. Must name a 2D embedding to actually plot; 30D latents (e.g. X_scVI) are valid per schema but won't display. If only one 2D embedding exists, surface that — the wrangler will almost certainly pick it.

If the wrangler answers a B2 item during the session, that answer becomes a set_uns mechanical fix (promoted to Bucket A) for Step 4.

Bucket C — Upstream / curator judgment (out of scope for this skill)

Report these but don't attempt to fix:

  • High NaN rates on non-allowed columns (e.g. library_id) — needs real values from source.
  • Sparse or missing ambient_count_correction / doublet_detection obs columns — per-cell values must come from the upstream source (each source dataset's processing record). Do not broadcast a single value. Report fill rate and move on.
  • Delimited-list values in single-identifier columns (e.g. library_preparation_batch containing "lib1; lib2; lib3") — needs per-cell resolution, not placeholder replacement.
  • Gene IDs missing from the current GENCODE — needs annotation-version decision.
  • Inconsistent author_cell_type variants — needs a curator mapping.
  • (CAP annotations are handled in Bucket B above — the wrangler provides a CAP source file and copy_cap_annotations runs mechanically.)
  • Cells whose labels don't match the atlas focus (e.g. non-myeloid labels in a myeloid atlas) — needs a curator decision on keep/drop.
  • File carries uns['schema_version'] or uns['schema_reference'] — signals it has already been through cellxgene-schema add-labels. label_h5ad refuses to run; upstream needs to re-emit without those keys. Do not strip them here.
  • Any obs['organism_ontology_term_id'] value other than NCBITaxon:9606label_h5ad is human-only. Supporting another organism is a code change, not a curation fix.

Step 3 — Present the punch list

Show these sections: A (will run), B1 (blocking — needs your answer), B2 (recommended — optional), C (still to do, out of scope). Then stop and wait for explicit approval before running anything.

If the wrangler answers any Bucket B items (B1 or B2), promote those to Bucket A as the appropriate mechanical action: set_uns for answered uns values (e.g. default_embedding, study_pi), copy_cap_annotations when the answer is a CAP source file path.

Step 4 — Run the mechanical fixes

Order:

  1. convert_cellxgene_to_hca first if applicable — then stop, re-run Steps 1–3 on the converted file before continuing (conversion changes the layout enough that the prior punch list is stale).
  2. Content edits, in this order: normalize_raw, each replace_placeholder_values, label_h5ad, copy_cap_annotations (if a source was supplied), and any set_uns approved in Step 3. label_h5ad must run before copy_cap_annotationscopy_cap_annotations calls validate_marker_genes, which reads var['feature_name']; running the labeler first gives marker-gene validation canonical gene symbols to match against.
  3. compress_h5ad last.

Each tool writes a new timestamped file. For most subsequent calls, passing either the original path or the latest works — resolve_latest picks up the newest variant automatically. Two exceptions: convert_cellxgene_to_hca does not auto-resolve (call it with the exact path you want to convert), and copy_cap_annotations only auto-resolves its target_path (the source_path is used verbatim).

Step 5 — Report

Re-run view_edit_log on the final file, then produce a structured report with these sections in order. Also re-run validate_schema — but only if check_schema_type reports hca on the final file. If the file is still CellxGENE (e.g. conversion wasn't approved), skip the validator rerun and note why under "Validator delta" instead of pasting a misleading error list. Use markdown tables; skip any section with no content.

For the Provenance line below, re-run get_summary on the final file to fetch its obs columns, then run get_descriptive_stats with columns set to the intersection of ["donor_id", "sample_id", "library_id", "dataset_id"] and the final file's obs column names (extract name from each {name, dtype} object in get_summary.obs_columns — it's a list of objects, not plain strings). The intersection avoids erroring on absent columns.

One short paragraph or bullet block with: final file path, shape (n_obs × n_vars), title from uns, schema type (include version only when schema is CellxGENE — HCA is unversioned), X verdict + raw.X presence, compression status, obsm keys present. Add a Provenance line: N donors · M samples · K libraries · D source datasets — pulled from get_descriptive_stats.columns[<col>].unique for each column (the stats are nested under a columns dict keyed by column name). Omit any metric whose column is absent OR whose column is present but unpopulated (columns[<col>].unique == 0, equivalently columns[<col>].n_nan == n_rows) so an all-NaN column doesn't render as "0 libraries". dataset_id is not a schema field (optional integrator convention); absent is normal.

Mechanical fixes applied

#OperationEffect
1normalize_rawe.g. "Moved raw counts → raw.X; normalized X with normalize_total(target_sum=10000) + log1p"
2replace_placeholder_values (library_preparation_batch)e.g. "N cells: 'unknown' → NaN"
3label_h5ade.g. "Populated var['feature_name'] for 34,505/35,574 rows (1,069 NaN); wrote 8 obs ontology labels; overwrote pre-existing tissue, disease". Fill the overwrite clause from the tool result's obs_label_cols_overwritten / var_feature_name_overwritten; omit the clause when both are empty.
4copy_cap_annotationsname the CAP source file
5compress_h5ade.g. "Skipped — already gzipped" or "Rewrote X with gzip level 4"

Only include the rows for tools that actually ran this session.

Validator delta

BeforeAfter
ErrorsNM
Non-feature-ID warningsNM
CAP zero-observation warningsNM
Named warnings resolvede.g. "raw.X absent", "unknown placeholder in library_preparation_batch"

Count CAP "zero observations" warnings (text: contains a category '...' with zero observations) separately from other warnings. These are expected after copy_cap_annotations: CAP declares a closed vocabulary per annotation set that spans all lineages, and a per-lineage file only realizes a subset — unused vocabulary terms are intentional schema information, not a defect. Report the count and move on; don't prune them. The validator's --add-labels remediation note comes from vendored CellxGENE code and does not apply to HCA.

Also list the specific error/warning kinds that disappeared or newly appeared, one line each.

CAP overlap (only if copy_cap_annotations ran this session, or a prior import_cap_annotations entry is in the edit log)

Pull from the latest import_cap_annotations entry's details:

MetricValue
CAP source filecap_source_file
source_n_obs
target_n_obs
matched_n_obs
match_fraction_of_sourceas %
match_fraction_of_targetas %

CAP marker validation (only if copy_cap_annotations ran this session, or a prior import_cap_annotations entry is in the edit log)

Source the numbers from the copy_cap_annotations tool result's marker_gene_validation field if it ran this session. If only a prior import_cap_annotations entry exists, call validate_marker_genes on the final file to get fresh numbers — a marker list that matched against var.index before label_h5ad populated var['feature_name'] will look very different now.

Marker symbols are resolved against the target's var gene-name source: var['feature_name'] is preferred, else var['gene_name'], else var.index (the Ensembl IDs) as a last resort. Post-label_h5ad files always have feature_name; files that skipped labeling fall back to whatever the producer shipped.

MetricValue
Total unique markers
Found in var gene-name source
Missing

For each missing marker, list it with its classification exactly as returned by the tool — not_in_gencode (marker symbol doesn't resolve to any GENCODE entry — typo, glob pattern, or deprecated rename), missing_from_var (valid symbol but not present in this file's gene set), or known_rename (submitted marker is a deprecated symbol; the tool provides the current target in var_name, plus ensembl_id when available):

MarkerClassificationVar nameEnsembl ID

Leave Var name / Ensembl ID blank for not_in_gencode and missing_from_var rows — those fields are only populated on known_rename. If all markers hit, say so in one line instead of an empty table. not_in_gencode entries point at CAP-side fixes (ask the CAP curator); missing_from_var points at target-side gaps (different gene set than the one CAP was authored against); known_rename entries should report the rename target from var_name so the mismatch is explicit.

Still to do

Bucket B1 — blocking (validator errors or unset required: true fields)

FieldQuestion
study_piwho are the PI(s)? e.g. ["Teichmann,Sarah,A."]

Bucket B2 — recommended (optional)

FieldQuestion
default_embeddingX_umap (only 2D option) — confirm?

Bucket C — upstream / curator

IssueDetail
library_id NaN (validator error)Needs real values from source

Only surface items that are still open — don't re-list anything resolved this session. Omit any of the three sub-tables that have no entries.

相关技能

寻找 curate-h5ad 的替代方案 (Alternative) 或可搭配使用的同类 community Skill?探索以下相关开源技能。

查看全部

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

Generate customizable widget plugins for the prompts.chat feed system

149.6k
0
AI

flags

Logo of vercel
vercel

The React Framework

138.4k
0
浏览器

pr-review

Logo of pytorch
pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

98.6k
0
开发者工具