vllm-ascend-model-adapter — for Claude Code vllm-ascend-model-adapter, vllm-ascend-hust, community, for Claude Code, ide skills, vllm-ascend, transformers, vllm serve, workspace, enable-expert-parallel

v1.0.0

关于此技能

适用场景: Ideal for AI agents that need vllm ascend model adapter. 本地化技能摘要: This skill is for both already-supported models and new architectures not yet registered in vLLM. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

功能特性

vLLM Ascend Model Adapter
Start with references/workflow-checklist.md.
Read references/multimodal-ep-aclgraph-lessons.md (feature-first checklist).
If startup/inference fails, read references/troubleshooting.md.
If checkpoint is fp8-on-NPU, read references/fp8-on-npu-lessons.md.

# 核心主题

intellistream intellistream
[0]
[0]
更新于: 4/2/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 8/11

This page remains useful for teams, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution
Review Score
8/11
Quality Score
42
Canonical Locale
en
Detected Body Locale
en

适用场景: Ideal for AI agents that need vllm ascend model adapter. 本地化技能摘要: This skill is for both already-supported models and new architectures not yet registered in vLLM. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

核心价值

推荐说明: vllm-ascend-model-adapter helps agents vllm ascend model adapter. This skill is for both already-supported models and new architectures not yet registered in vLLM. This AI agent skill supports Claude Code

适用 Agent 类型

适用场景: Ideal for AI agents that need vllm ascend model adapter.

赋予的主要能力 · vllm-ascend-model-adapter

适用任务: Applying vLLM Ascend Model Adapter
适用任务: Applying Start with references/workflow-checklist.md
适用任务: Applying Read references/multimodal-ep-aclgraph-lessons.md (feature-first checklist)

! 使用限制与门槛

  • 限制说明: --enable-expert-parallel and flashcomm1 checks are MoE-only; for non-MoE models mark as not-applicable with evidence.
  • 限制说明: Do not rely on PYTHONPATH=<modified-src :$PYTHONPATH unless debugging fallback is strictly needed.
  • 限制说明: Final deliverable commit must be one single signed commit in the current working repo (git commit -sm ...).

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.
  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

评审后的下一步

先决定动作,再继续看上游仓库材料

Killer-Skills 的主价值不应该停在“帮你打开仓库说明”,而是先帮你判断这项技能是否值得安装、是否应该回到可信集合复核,以及是否已经进入工作流落地阶段。

实验室 Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

常见问题与安装步骤

以下问题与步骤与页面结构化数据保持一致,便于搜索引擎理解页面内容。

? FAQ

vllm-ascend-model-adapter 是什么?

适用场景: Ideal for AI agents that need vllm ascend model adapter. 本地化技能摘要: This skill is for both already-supported models and new architectures not yet registered in vLLM. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

如何安装 vllm-ascend-model-adapter?

运行命令:npx killer-skills add intellistream/vllm-ascend-hust/vllm-ascend-model-adapter。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

vllm-ascend-model-adapter 适用于哪些场景?

典型场景包括:适用任务: Applying vLLM Ascend Model Adapter、适用任务: Applying Start with references/workflow-checklist.md、适用任务: Applying Read references/multimodal-ep-aclgraph-lessons.md (feature-first checklist)。

vllm-ascend-model-adapter 支持哪些 IDE 或 Agent?

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

vllm-ascend-model-adapter 有哪些限制?

限制说明: --enable-expert-parallel and flashcomm1 checks are MoE-only; for non-MoE models mark as not-applicable with evidence.;限制说明: Do not rely on PYTHONPATH=<modified-src :$PYTHONPATH unless debugging fallback is strictly needed.;限制说明: Final deliverable commit must be one single signed commit in the current working repo (git commit -sm ...).。

安装步骤

  1. 1. 打开终端

    在你的项目目录中打开终端或命令行。

  2. 2. 执行安装命令

    运行:npx killer-skills add intellistream/vllm-ascend-hust/vllm-ascend-model-adapter。CLI 会自动识别 IDE 或 AI Agent 并完成配置。

  3. 3. 开始使用技能

    vllm-ascend-model-adapter 已启用,可立即在当前项目中调用。

! 参考页模式

此页面仍可作为安装与查阅参考,但 Killer-Skills 不再把它视为主要可索引落地页。请优先阅读上方评审结论,再决定是否继续查看上游仓库说明。

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

vllm-ascend-model-adapter

This skill is for both already-supported models and new architectures not yet registered in vLLM. This AI agent skill supports Claude Code, Cursor, and

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

vLLM Ascend Model Adapter

Overview

Adapt Hugging Face or local models to run on vllm-ascend with minimal changes, deterministic validation, and single-commit delivery. This skill is for both already-supported models and new architectures not yet registered in vLLM.

Read order

  1. Start with references/workflow-checklist.md.
  2. Read references/multimodal-ep-aclgraph-lessons.md (feature-first checklist).
  3. If startup/inference fails, read references/troubleshooting.md.
  4. If checkpoint is fp8-on-NPU, read references/fp8-on-npu-lessons.md.
  5. Before handoff, read references/deliverables.md.

Hard constraints

  • Never upgrade transformers.
  • Primary implementation roots are fixed by Dockerfile:
    • /vllm-workspace/vllm
    • /vllm-workspace/vllm-ascend
  • Start vllm serve from /workspace with direct command by default.
  • Default API port is 8000 unless user explicitly asks otherwise.
  • Feature-first default: try best to validate ACLGraph / EP / flashcomm1 / MTP / multimodal out-of-box.
  • --enable-expert-parallel and flashcomm1 checks are MoE-only; for non-MoE models mark as not-applicable with evidence.
  • If any feature cannot be enabled, keep evidence and explain reason in final report.
  • Do not rely on PYTHONPATH=<modified-src>:$PYTHONPATH unless debugging fallback is strictly needed.
  • Keep code changes minimal and focused on the target model.
  • Final deliverable commit must be one single signed commit in the current working repo (git commit -sm ...).
  • Keep final docs in Chinese and compact.
  • Dummy-first is encouraged for speed, but dummy is NOT fully equivalent to real weights.
  • Never sign off adaptation using dummy-only evidence; real-weight gate is mandatory.

Execution playbook

1) Collect context

  • Confirm model path (default /models/<model-name>; if environment differs, confirm with user explicitly).
  • Confirm implementation roots (/vllm-workspace/vllm, /vllm-workspace/vllm-ascend).
  • Confirm delivery root (the current git repo where the final commit is expected).
  • Confirm runtime import path points to /vllm-workspace/* install.
  • Use default expected feature set: ACLGraph + EP + flashcomm1 + MTP + multimodal (if model has VL capability).
  • User requirements extend this baseline, not replace it.

2) Analyze model first

  • Inspect config.json, processor files, modeling files, tokenizer files.
  • Identify architecture class, attention variant, quantization type, and multimodal requirements.
  • Check state-dict key prefixes (and safetensors index) to infer mapping needs.
  • Decide whether support already exists in vllm/model_executor/models/registry.py.

3) Choose adaptation strategy (new-model capable)

  • Reuse existing vLLM architecture if compatible.
  • If architecture is missing or incompatible, implement native support:
    • add model adapter under vllm/model_executor/models/;
    • add processor under vllm/transformers_utils/processors/ when needed;
    • register architecture in vllm/model_executor/models/registry.py;
    • implement explicit weight loading/remap rules (including fp8 scale pairing, KV/QK norm sharding, rope variants).
  • If remote code needs newer transformers symbols, do not upgrade dependency.
  • If unavoidable, copy required modeling files from sibling transformers source and keep scope explicit.
  • If failure is backend-specific (kernel/op/platform), patch minimal required code in /vllm-workspace/vllm-ascend.

4) Implement minimal code changes (in implementation roots)

  • Touch only files required for this model adaptation.
  • Keep weight mapping explicit and auditable.
  • Avoid unrelated refactors.

5) Two-stage validation on Ascend (direct run)

  • Run from /workspace with --load-format dummy.
  • Goal: fast validate architecture path / operator path / API path.
  • Do not treat Application startup complete as pass by itself; request smoke is mandatory.
  • Require at least:
    • startup readiness (/v1/models 200),
    • one text request 200,
    • if VL model, one text+image request 200,
    • ACLGraph evidence where expected.

Stage B: real-weight mandatory gate (must pass before sign-off)

  • Remove --load-format dummy and validate with real checkpoint.
  • Goal: validate real-only risks:
    • weight key mapping,
    • fp8/fp4 dequantization path,
    • KV/QK norm sharding with real tensor shapes,
    • load-time/runtime stability.
  • Require HTTP 200 and non-empty output before declaring success.
  • Do not pass Stage B on startup-only evidence.

6) Validate inference and features

  • Send GET /v1/models first.
  • Send at least one OpenAI-compatible text request.
  • For multimodal models, require at least one text+image request.
  • Validate architecture registration and loader path with logs (no unresolved architecture, no fatal missing-key errors).
  • Try feature-first validation: EP + ACLGraph path first; eager path as fallback/isolation.
  • If startup succeeds but first request crashes (false-ready), treat as runtime failure and continue root-cause isolation.
  • For torch._dynamo + interpolate + NPU contiguous failures on VL paths, try TORCHDYNAMO_DISABLE=1 as diagnostic/stability fallback.
  • For multimodal processor API mismatch (for example skip_tensor_conversion signature mismatch), use text-only isolation (--limit-mm-per-prompt set image/video/audio to 0) to separate processor issues from core weight loading issues.
  • Capacity baseline by default (single machine): max-model-len=128k + max-num-seqs=16.
  • Then expand concurrency (e.g., 32/64) if requested or feasible.

7) Backport, generate artifacts, and commit in delivery repo

  • If implementation happened in /vllm-workspace/*, backport minimal final diff to current working repo.
  • Generate test config YAML at tests/e2e/models/configs/<ModelName>.yaml following the schema of existing configs (must include model_name, hardware, tasks with accuracy metrics, and num_fewshot). Use accuracy results from evaluation to populate metric values.
  • Generate tutorial markdown at docs/source/tutorials/models/<ModelName>.md following the standard template (Introduction, Supported Features, Environment Preparation with docker tabs, Deployment with serve script, Functional Verification with curl example, Accuracy Evaluation, Performance). Fill in model-specific details: HF path, hardware requirements, TP size, max-model-len, served-model-name, sample curl, and accuracy table.
  • Update docs/source/tutorials/models/index.md to include the new tutorial.
  • Confirm test config YAML and tutorial doc are included in the staged files.
  • Commit code changes once (single signed commit).

8) Prepare handoff artifacts

  • Write comprehensive Chinese analysis report.
  • Write compact Chinese runbook for server startup and validation commands.
  • Include feature status matrix (supported / unsupported / checkpoint-missing / not-applicable).
  • Include dummy-vs-real validation matrix and explicit non-equivalence notes.
  • Include changed-file list, key logs, and final commit hash.
  • Post the SKILL.md content (or a link to it) as a comment on the originating GitHub issue to document the AI-assisted workflow.

Quality gate before final answer

  • Service starts successfully from /workspace with direct command.
  • OpenAI-compatible inference request succeeds (not startup-only).
  • Key feature set is attempted and reported: ACLGraph / EP / flashcomm1 / MTP / multimodal.
  • Capacity baseline (128k + bs16) result is reported, or explicit reason why not feasible.
  • Dummy stage evidence is present (if used), and real-weight stage evidence is present (mandatory).
  • Test config YAML exists at tests/e2e/models/configs/<ModelName>.yaml and follows the established schema (model_name, hardware, tasks, num_fewshot).
  • Tutorial doc exists at docs/source/tutorials/models/<ModelName>.md and follows the standard template (Introduction, Supported Features, Environment Preparation, Deployment, Functional Verification, Accuracy Evaluation, Performance).
  • Tutorial index at docs/source/tutorials/models/index.md includes the new model entry.
  • Exactly one signed commit contains all code changes in current working repo.
  • Final response includes commit hash, file paths, key commands, known limits, and failure reasons where applicable.

相关技能

寻找 vllm-ascend-model-adapter 的替代方案 (Alternative) 或可搭配使用的同类 community Skill?探索以下相关开源技能。

查看全部

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

为prompts.chat的信息反馈系统生成可定制的插件小部件

149.6k
0
AI

flags

Logo of vercel
vercel

React 框架

138.4k
0
浏览器

pr-review

Logo of pytorch
pytorch

Python中具有强大GPU加速的张量和动态神经网络

98.6k
0
开发者工具