vllm-ascend-serving — for Claude Code vllm-ascend-serving, vllm-ascend-workspace, community, for Claude Code, ide skills, vLLM Ascend serving, single-node colocated services, SSH escaping, remote execution, machine-readable JSON, Claude Code

v1.0.0

关于此技能

Perfect for AI Agents needing seamless integration with vLLM Ascend services, handling SSH escaping and remote execution internally. This skill manages the lifecycle of a single-node colocated vLLM Ascend online service, handling SSH escaping and remote execution internally, and returning machine-readable JSON for seamless integration.

功能特性

Start services using remote-code-parity
Restart services with changed flags or environment variables
Check service status using JSON output
Stop running services securely
Manage services on macOS, Linux, WSL, and Windows platforms

# 核心主题

maoxx241 maoxx241
[5]
[1]
更新于: 4/10/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 8/11

This page remains useful for operators, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution
Review Score
8/11
Quality Score
36
Canonical Locale
en
Detected Body Locale
en

Perfect for AI Agents needing seamless integration with vLLM Ascend services, handling SSH escaping and remote execution internally. This skill manages the lifecycle of a single-node colocated vLLM Ascend online service, handling SSH escaping and remote execution internally, and returning machine-readable JSON for seamless integration.

核心价值

Empowers agents to manage the lifecycle of single-node colocated vLLM Ascend online services, returning machine-readable JSON and supporting features like remote-code-parity and wrap-script mechanisms through protocols like SSH and tools like Python 3.

适用 Agent 类型

Perfect for AI Agents needing seamless integration with vLLM Ascend services, handling SSH escaping and remote execution internally.

赋予的主要能力 · vllm-ascend-serving

Automating vLLM Ascend service launches with structured parameters
Restarting services with changed flags or environment variables
Checking the status of running services for aliveness and readiness
Stopping running services securely
Integrating with other skills like ascend-memory-profiling for advanced functionality

! 使用限制与门槛

  • Requires a ready remote container and a managed machine
  • Dependent on remote-code-parity for start operations
  • Limited to online service management, excluding tasks like machine management, code syncing, and offline inference

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.
  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

实验室 Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

常见问题与安装步骤

以下问题与步骤与页面结构化数据保持一致,便于搜索引擎理解页面内容。

? FAQ

vllm-ascend-serving 是什么?

Perfect for AI Agents needing seamless integration with vLLM Ascend services, handling SSH escaping and remote execution internally. This skill manages the lifecycle of a single-node colocated vLLM Ascend online service, handling SSH escaping and remote execution internally, and returning machine-readable JSON for seamless integration.

如何安装 vllm-ascend-serving?

运行命令:npx killer-skills add maoxx241/vllm-ascend-workspace/vllm-ascend-serving。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

vllm-ascend-serving 适用于哪些场景?

典型场景包括:Automating vLLM Ascend service launches with structured parameters、Restarting services with changed flags or environment variables、Checking the status of running services for aliveness and readiness、Stopping running services securely、Integrating with other skills like ascend-memory-profiling for advanced functionality。

vllm-ascend-serving 支持哪些 IDE 或 Agent?

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

vllm-ascend-serving 有哪些限制?

Requires a ready remote container and a managed machine;Dependent on remote-code-parity for start operations;Limited to online service management, excluding tasks like machine management, code syncing, and offline inference。

安装步骤

  1. 1. 打开终端

    在你的项目目录中打开终端或命令行。

  2. 2. 执行安装命令

    运行:npx killer-skills add maoxx241/vllm-ascend-workspace/vllm-ascend-serving。CLI 会自动识别 IDE 或 AI Agent 并完成配置。

  3. 3. 开始使用技能

    vllm-ascend-serving 已启用,可立即在当前项目中调用。

! 参考页模式

此页面仍可作为安装与查阅参考,但 Killer-Skills 不再把它视为主要可索引落地页。请优先阅读上方评审结论,再决定是否继续查看上游仓库说明。

Imported Repository Instructions

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

Supporting Evidence

vllm-ascend-serving

Manage vLLM Ascend services with this AI agent skill, designed for developers to streamline service lifecycle management and improve productivity.

SKILL.md
Readonly
Imported Repository Instructions
The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.
Supporting Evidence

vLLM Ascend Serving

Manage the lifecycle of a single-node colocated vllm-ascend online service on a workspace-managed ready remote container.

This skill takes structured parameters, handles all SSH escaping and remote execution internally, and returns machine-readable JSON. The agent never needs to construct raw shell commands for service management.

Use this skill when

  • the user asks to start / launch / pull up a vllm-ascend service on a managed machine
  • the user asks to restart or relaunch a service (possibly with changed flags or env)
  • the user asks to check if a running service is alive / ready
  • the user asks to stop a running service
  • another skill needs to start a service (e.g. ascend-memory-profiling)

Do not use this skill when

  • the task is adding, verifying, repairing, or removing a machine (use machine-management)
  • the task is syncing code to the remote container (use remote-code-parity)
  • the task is running benchmarks (a separate skill's responsibility)
  • the task is offline inference
  • the machine is not yet ready in inventory

Critical rules

  • start automatically runs remote-code-parity before launching. If parity fails, start is blocked.
  • status and stop do not require parity.
  • All remote execution goes through the scripts — never construct raw SSH commands for serving.
  • Keep local runtime state only under .vaws-local/serving/.
  • Progress on stderr as __VAWS_SERVING_PROGRESS__=<json>, final result on stdout as JSON.

Cross-platform launcher rule

  • macOS / Linux / WSL: python3 ...
  • Windows: py -3 ...

Public entry points

Start a service

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 2 --machine <alias-or-ip> \ 3 --model <remote-weight-path> \ 4 [--served-model-name <name>] \ 5 [--tp <N>] [--dp <N>] \ 6 [--devices <0,1,2,3>] \ 7 [--extra-env KEY=VALUE ...] \ 8 [--port <N>] \ 9 [--health-timeout <seconds>] \ 10 [--wrap-script <remote-path>] \ 11 [--skip-parity] \ 12 [-- <extra vllm serve args>]

Launch wrapping (--wrap-script)

The serving skill supports a generic --wrap-script mechanism. When provided, the vLLM launch command is written as _serve.sh in the runtime directory, and the wrapper script is called with two arguments: $1 = serve script path, $2 = runtime directory.

This is used by other skills (e.g. ascend-memory-profiling) to wrap the service launch process without the serving skill needing to know the wrapping details. The serving skill is agnostic to what the wrapper does.

The wrap_script path is recorded in the serving state so downstream skills can detect it.

Relaunch with previous config

bash
1# Exact same config 2python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 3 --machine <alias> --relaunch 4 5# Add a debug env 6python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 7 --machine <alias> --relaunch --extra-env VLLM_LOGGING_LEVEL=DEBUG 8 9# Remove an env from previous config 10python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 11 --machine <alias> --relaunch --unset-env MY_DEBUG_FLAG 12 13# Remove a vllm arg from previous config (use = to avoid argparse ambiguity) 14python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 15 --machine <alias> --relaunch --unset-args=--enforce-eager 16 17# Relaunch with a different model 18python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 19 --machine <alias> --relaunch --model /data/models/OtherModel

Probe NPU device availability

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_probe_npus.py \ 2 --machine <alias-or-ip>

Returns which NPU devices are free, which are busy (with PID and HBM details), probed on the bare-metal host for cross-container visibility.

Check status

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_status.py \ 2 --machine <alias-or-ip>

Stop

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_stop.py \ 2 --machine <alias-or-ip> [--force]

Local state

Per-machine launch state is stored under .vaws-local/serving/<alias>.json.

This file records the last successful launch parameters (model, tp, devices, env, extra args, port, pid, log paths, runtime_dir, wrap_script). It is the basis for --relaunch and is read by other skills (e.g. ascend-memory-profiling) in attach mode.

Workflow

1. Resolve the target machine

The --machine argument is looked up in the local machine inventory. The machine must already be managed and ready.

2. Stop any existing service

If a previous service is recorded for this machine, it is stopped before launching a new one.

3. Run remote-code-parity (start only)

Unless --skip-parity is passed, parity_sync.py is called to ensure the container has the current local code. If parity fails, start is blocked.

4. Probe NPUs

NPU availability is checked via npu-smi info on the bare-metal host (not the container). Host-level probing sees processes from all containers, bypassing PID namespace isolation. Devices with HBM usage above 4 GB are also marked busy to catch cross-container occupancy:

  • If --devices is specified, those devices are verified to be free. If any are busy, start is blocked with the conflict details.
  • If --devices is not specified but --tp is given, the first N free devices are automatically selected, where N = TP × DP (defaults to TP when DP is not set).
  • If NPU probe fails (e.g. driver issue), it is treated as a non-fatal warning and launch continues with user-specified devices.

5. Validate and launch

  • Model path is checked for existence on the remote container.
  • A free port is auto-detected (or the explicit --port is used).
  • A bash launch script is built internally with proper escaping — the agent never sees or edits this script.
  • The process is started via nohup + disown and detached from the SSH session.

6. Wait for readiness

The script polls /health and /v1/models until both return success or the timeout expires.

6a. Diagnose launch failure before any code change

If the service fails during engine initialization or health check timeout:

  • Read both stdout.log and stderr.log from the remote runtime directory — vllm often logs the actual Python exception to stdout, not stderr.
  • Identify the actual exception type and message before hypothesizing a cause.
  • Do not modify source code to work around a launch failure until the root cause is confirmed from logs.
  • If the root cause is unclear, try the simplest launch configuration first (e.g. tp-only, no speculative decoding, no graph mode) and incrementally add features to isolate the failing component.

7. Return structured JSON

On success:

json
1{ 2 "status": "ready", 3 "machine": "blue-a", 4 "base_url": "http://10.0.0.8:38721", 5 "port": 38721, 6 "pid": 12345, 7 "served_model_name": "Qwen3-32B", 8 "model": "/data/models/Qwen3-32B", 9 "log_stdout": "/vllm-workspace/.vaws-runtime/serving/.../stdout.log", 10 "log_stderr": "/vllm-workspace/.vaws-runtime/serving/.../stderr.log" 11}

On failure, includes stderr_tail for diagnosis.

Reference files

  • .agents/skills/vllm-ascend-serving/references/behavior.md
  • .agents/skills/vllm-ascend-serving/references/command-recipes.md
  • .agents/skills/vllm-ascend-serving/references/acceptance.md

相关技能

寻找 vllm-ascend-serving 的替代方案 (Alternative) 或可搭配使用的同类 community Skill?探索以下相关开源技能。

查看全部

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

为prompts.chat的信息反馈系统生成可定制的插件小部件

149.6k
0
AI

flags

Logo of vercel
vercel

React 框架

138.4k
0
浏览器

pr-review

Logo of pytorch
pytorch

Python中具有强大GPU加速的张量和动态神经网络

98.6k
0
开发者工具