ontology-resolver 是什么？

The multimodal biomedical data atlas builder

如何安装 ontology-resolver？

运行命令：npx killer-skills add epiblastai/lancell/ontology-resolver。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

ontology-resolver 支持哪些 IDE 或 Agent？

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

Ontology Resolver

Name: ontology-resolver
Availability: InStock
Author: epiblastai

Resolve free-text biological metadata values to canonical ontology terms with CELLxGENE-compatible IDs. Covers 9 entity types across 8 ontologies:

Entity	Ontology	Prefix
cell_type	Cell Ontology	CL
tissue	UBERON	UBERON
disease	MONDO	MONDO
organism	NCBITaxon	NCBITaxon
assay	EFO	EFO
development_stage	HsapDv / MmusDv	HsapDv, MmusDv
ethnicity	HANCESTRO	HANCESTRO
sex	PATO	PATO
cell_line	CLO	CLO

Interface

This resolver operates in Phase B (per-experiment obs resolution). It reads raw obs CSVs from an experiment directory and writes a fragment CSV with resolved ontology columns.

Input:

Raw obs CSV ({fs}_raw_obs.csv) — read-only, do not modify this file
A mapping of column names to OntologyEntity types (provided by the caller or derived from schema field classification)
Organism context (required for development_stage, helpful for others)
Output fragment path ({fs}_fragment_ontology_obs.csv)

Output:

A new fragment CSV file ({fs}_fragment_ontology_obs.csv) indexed by the same index as the raw obs, containing:
- One column per schema field the resolver is responsible for, using the exact schema field name (e.g., cell_type, tissue, disease) — resolved to canonical ontology term names
- ontology_resolved boolean indicating whether all ontology fields resolved successfully
A markdown report at resolver_reports/ontology-resolver.md in the working directory summarizing the experiment, fields resolved, unresolved values, and any omitted output columns with reasons.

Column naming: Output columns must exactly match the schema field names the resolver is told to fill. No _ontology_id suffixes, no validated_ prefixes. For organism fields, output the resolved scientific name (e.g., "Homo sapiens", "Mus musculus") — do not convert to common names. Ontology CURIEs are recorded in the markdown report for debugging but do not appear in the fragment CSV.

Reporting

Each run must write a markdown report to resolver_reports/ in the working directory.

Create the directory if it does not exist.
Default report path: resolver_reports/ontology-resolver.md
Overwrite the report for the current run unless the caller asks for a different naming scheme.
Include:
- experiment directory and input file path
- output fragment path
- ontology fields attempted
- resolved/unresolved counts per field
- correction mappings or manual normalizations used
- any omitted or blank output fields, with reasons

Imports

python
1from homeobox.standardization import (
2    OntologyEntity,
3    resolve_ontology_terms,
4    is_control_label,
5    detect_control_labels,
6    detect_negative_control_type,
7)
8from homeobox.standardization.types import OntologyResolution, ResolutionReport

Scripts

`scripts/resolve_ontology.py`

Handles the standard ontology resolution workflow: control detection, resolution via resolve_ontology_terms, column writing, and report generation.

python .claude/skills/ontology-resolver/scripts/resolve_ontology.py \
    <input_csv> <output_csv> \
    --field <obs_col>:<schema_field>:<entity> [...] \
    [--organism human] \
    [--corrections corrections.json] \
    [--report-dir resolver_reports]

Argument	Description
`input_csv`	Path to `{fs}_raw_obs.csv` (must have `index_col=0`)
`output_csv`	Path to fragment output (`{fs}_fragment_ontology_obs.csv`)
`--field`	Repeatable. Format: `obs_column:schema_field:ENTITY_TYPE`. Entity is the `OntologyEntity` enum name (e.g., `CELL_TYPE`, `TISSUE`, `DISEASE`).
`--organism`	Organism for development_stage resolution (default: `human`)
`--corrections`	JSON file with correction mappings (see below)
`--report-dir`	Output directory for markdown report (default: `resolver_reports` in input dir)

The script writes the fragment CSV with one column per schema field (exact name match) plus the ontology_resolved boolean. No _ontology_id suffix columns are written — ontology CURIEs are included in the markdown report only. It prints per-field resolution stats and unresolved values to stdout.

Corrections format

A JSON file mapping obs column names to {original: corrected} dictionaries:

json
1{
2    "cell_type_annotation": {
3        "T-cell": "T cell",
4        "B-cell": "B cell",
5        "Monocyte/Macrophage": "monocyte"
6    }
7}

Corrections are applied before resolution. Build these after reviewing unresolved values from a first pass.

`finalize_features.py` — Schema finalization with type coercion (shared)

Uses the shared gene-resolver/scripts/finalize_features.py script if the ontology fragment needs to be written to parquet for direct LanceDB ingestion.

bash
1python .claude/skills/gene-resolver/scripts/finalize_features.py \
2    <resolved_csv> <output_parquet> <schema_module> <schema_class> \
3    [--column KEY=VALUE ...]

Agent Workflow

1. Inspect raw obs and determine field mappings

Read the raw obs CSV. Identify which columns contain ontology-resolvable metadata and map each to a schema field name and OntologyEntity:

python
1import pandas as pd
2raw_obs = pd.read_csv("<experiment_dir>/<fs>_raw_obs.csv", index_col=0)
3for col in raw_obs.columns:
4    unique = raw_obs[col].dropna().unique()
5    print(f"{col}: {len(unique)} unique — {list(unique[:5])}")

Not every dataset has every ontology field. If a column is absent or entirely null, skip it.

2. Determine organism context

Identify the organism from the dataset metadata or a dedicated organism column. Pass it to --organism for correct development_stage dispatch (HsapDv for human, MmusDv for mouse).

3. Run the script

bash
1python .claude/skills/ontology-resolver/scripts/resolve_ontology.py \
2    /path/to/{fs}_raw_obs.csv \
3    /path/to/{fs}_fragment_ontology_obs.csv \
4    --field cell_type_annotation:cell_type:CELL_TYPE \
5    --field donor_tissue:tissue:TISSUE \
6    --field diagnosis:disease:DISEASE \
7    --organism human

4. Review unresolved values

The script prints unresolved values to stdout. For each:

Case/whitespace issues — already handled by resolve_ontology_terms, but check for unusual Unicode
Abbreviations — "T cell" vs "T-cell" vs "T lymphocyte". Build corrections.
Concatenated annotations — "CD4+ T cell (activated)" may need qualifier stripping
Dataset-specific labels — "Cluster_5", "Unknown", "Other" are not ontology terms; accept as unresolved
Near-misses — use hierarchy navigation to find the correct term:

python
1from homeobox.standardization import get_ontology_descendants
2descendants = get_ontology_descendants("CL:0000084", OntologyEntity.CELL_TYPE, max_depth=2)

5. Re-run with corrections (if needed)

Build a corrections JSON and re-run:

bash
1python .claude/skills/ontology-resolver/scripts/resolve_ontology.py \
2    /path/to/{fs}_raw_obs.csv \
3    /path/to/{fs}_fragment_ontology_obs.csv \
4    --field cell_type_annotation:cell_type:CELL_TYPE \
5    --corrections /path/to/corrections.json \
6    --organism human

6. Verify fragment output

Confirm the fragment has the expected columns and the ontology_resolved counts are acceptable.

Entity-Specific Notes

organism — Common values: "Homo sapiens", "Mus musculus". Use the resolved scientific name directly — do not convert to common names.

development_stage — Pass --organism to select the correct ontology (HsapDv for human, MmusDv for mouse). Without it, both are searched and may produce wrong matches.

sex — Only 3 canonical values: "female" (PATO:0000383), "male" (PATO:0000384), "unknown" (PATO:0000461). "other" maps to "unknown sex". Resolution is hardcoded (not from the local DB).

cell_line — Uses Cellosaurus local DB with FTS fuzzy search fallback. Exact matches get confidence 1.0; fuzzy matches get confidence 0.7. Flag fuzzy matches for user review.

assay — Free-text assay names (e.g., "10x 3' v3", "Smart-seq2") must match EFO terms. Many GEO assay descriptions don't match EFO exactly — investigate failures carefully.

Rules

Read-only input. Never modify the _raw_obs.csv file. Write all output to the fragment file.
Exact schema field names only. Output columns must match schema field names exactly. No _ontology_id suffixes, no validated_ prefixes.
Organism as scientific name. Do not convert to common names.
Use resolve_ontology_terms() for all entities. It handles dispatch to DB lookup, OLS4, or hardcoded tables internally.
Pass organism for development_stage. Without it, wrong matches are possible.
Do not derive control fields from ontology columns. is_negative_control and negative_control_type are perturbation-level concepts populated by perturbation resolvers.
Use detect_control_labels() for control detection. Do not hardcode control label sets.
Never set output columns to NaN for failed resolution. Keep the original value in the schema field.
Always write ontology_resolved boolean. Named ontology_resolved to avoid collision during assembly.
Save after each column pair to prevent losing work.
Never modify h5ad files. All validated data goes into the fragment CSV only.
Investigate failures before giving up. Build correction mappings for close-but-not-exact values.
Dataset-specific labels are acceptable as unresolved. Cluster IDs, "Unknown", "Other" are not ontology terms.
Ask before guessing. If a column's entity type is ambiguous, ask the user.

Resolution Strategy

Resolution succeeds — use the canonical ontology name in the schema field. Row is resolved. CURIE is recorded in the report.
Resolution fails — keep the original value in the schema field. Row is unresolved.
NaN only when no value exists — schema field is NaN.
Control labels → None — Control values map to None in the schema field.

ontology-resolver — multimodal ontology-resolver, lancell, community, multimodal, ide skills, single-cell, Claude Code, Cursor, Windsurf

关于此技能

# 核心主题

Killer-Skills Review

核心价值

适用 Agent 类型

↓ 赋予的主要能力 · ontology-resolver

! 使用限制与门槛

Why this page is reference-only

Source Boundary

Browser Sandbox Environment

⚡️ Ready to unleash?

常见问题与安装步骤

? FAQ

ontology-resolver 是什么？

如何安装 ontology-resolver？

ontology-resolver 支持哪些 IDE 或 Agent？

↓ 安装步骤

! 参考页模式

Imported Repository Instructions

ontology-resolver

Ontology Resolver

Interface

Reporting

Imports

Scripts

scripts/resolve_ontology.py

Corrections format

finalize_features.py — Schema finalization with type coercion (shared)

Agent Workflow

1. Inspect raw obs and determine field mappings

2. Determine organism context

3. Run the script

4. Review unresolved values

5. Re-run with corrections (if needed)

6. Verify fragment output

Entity-Specific Notes

Rules

Resolution Strategy

相关技能

寻找 ontology-resolver 的替代方案 (Alternative) 或可搭配使用的同类 community Skill？探索以下相关开源技能。

openclaw-release-maintainer

widget-generator

flags

pr-review

`scripts/resolve_ontology.py`

`finalize_features.py` — Schema finalization with type coercion (shared)