vllm-ascend-serving — for Claude Code vllm-ascend-serving, vllm-ascend-workspace, community, for Claude Code, ide skills, vllm-ascend, ascend-memory-profiling, machine-management, remote-code-parity, status

v1.0.0

Über diesen Skill

Geeigneter Einsatz: Ideal for AI agents that need vllm ascend serving. Lokalisierte Zusammenfassung: # vLLM Ascend Serving Manage the lifecycle of a single-node colocated vllm-ascend online service on a workspace-managed ready remote container. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Funktionen

vLLM Ascend Serving
Use this skill when
the user asks to start / launch / pull up a vllm-ascend service on a managed machine
the user asks to restart or relaunch a service (possibly with changed flags or env)
the user asks to check if a running service is alive / ready

# Kernthemen

maoxx241 maoxx241
[5]
[1]
Aktualisiert: 4/10/2026

Skill Overview

Start with fit, limitations, and setup before diving into the repository.

Geeigneter Einsatz: Ideal for AI agents that need vllm ascend serving. Lokalisierte Zusammenfassung: # vLLM Ascend Serving Manage the lifecycle of a single-node colocated vllm-ascend online service on a workspace-managed ready remote container. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Warum diese Fähigkeit verwenden

Empfehlung: vllm-ascend-serving helps agents vllm ascend serving. vLLM Ascend Serving Manage the lifecycle of a single-node colocated vllm-ascend online service on a workspace-managed ready remote container. This AI

Am besten geeignet für

Geeigneter Einsatz: Ideal for AI agents that need vllm ascend serving.

Handlungsfähige Anwendungsfälle for vllm-ascend-serving

Anwendungsfall: vLLM Ascend Serving
Anwendungsfall: Use this skill when
Anwendungsfall: the user asks to start / launch / pull up a vllm-ascend service on a managed machine

! Sicherheit & Einschränkungen

  • Einschraenkung: another skill needs to start a service (e.g. ascend-memory-profiling)
  • Einschraenkung: Do not use this skill when
  • Einschraenkung: status and stop do not require parity.

About The Source

The section below is adapted from the upstream repository. Use it as supporting material alongside the fit, use-case, and installation summary on this page.

Labs-Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

FAQ und Installationsschritte

These questions and steps mirror the structured data on this page for better search understanding.

? Häufige Fragen

Was ist vllm-ascend-serving?

Geeigneter Einsatz: Ideal for AI agents that need vllm ascend serving. Lokalisierte Zusammenfassung: # vLLM Ascend Serving Manage the lifecycle of a single-node colocated vllm-ascend online service on a workspace-managed ready remote container. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

Wie installiere ich vllm-ascend-serving?

Führen Sie den Befehl aus: npx killer-skills add maoxx241/vllm-ascend-workspace. Er funktioniert mit Cursor, Windsurf, VS Code, Claude Code und mehr als 19 weiteren IDEs.

Wofür kann ich vllm-ascend-serving verwenden?

Wichtige Einsatzbereiche sind: Anwendungsfall: vLLM Ascend Serving, Anwendungsfall: Use this skill when, Anwendungsfall: the user asks to start / launch / pull up a vllm-ascend service on a managed machine.

Welche IDEs sind mit vllm-ascend-serving kompatibel?

Dieser Skill ist mit Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer kompatibel. Nutzen Sie die Killer-Skills CLI für eine einheitliche Installation.

Gibt es Einschränkungen bei vllm-ascend-serving?

Einschraenkung: another skill needs to start a service (e.g. ascend-memory-profiling). Einschraenkung: Do not use this skill when. Einschraenkung: status and stop do not require parity..

So installieren Sie den Skill

  1. 1. Terminal öffnen

    Öffnen Sie Ihr Terminal oder die Kommandozeile im Projektverzeichnis.

  2. 2. Installationsbefehl ausführen

    Führen Sie aus: npx killer-skills add maoxx241/vllm-ascend-workspace. Die CLI erkennt Ihre IDE oder Ihren Agenten automatisch und richtet den Skill ein.

  3. 3. Skill verwenden

    Der Skill ist jetzt aktiv. Ihr KI-Agent kann vllm-ascend-serving sofort im aktuellen Projekt verwenden.

! Source Notes

This page is still useful for installation and source reference. Before using it, compare the fit, limitations, and upstream repository notes above.

Upstream Repository Material

The section below is adapted from the upstream repository. Use it as supporting material alongside the fit, use-case, and installation summary on this page.

Upstream Source

vllm-ascend-serving

Install vllm-ascend-serving, an AI agent skill for AI agent workflows and automation. Explore features, use cases, limitations, and setup guidance.

SKILL.md
Readonly
Upstream Repository Material
The section below is adapted from the upstream repository. Use it as supporting material alongside the fit, use-case, and installation summary on this page.
Upstream Source

vLLM Ascend Serving

Manage the lifecycle of a single-node colocated vllm-ascend online service on a workspace-managed ready remote container.

This skill takes structured parameters, handles all SSH escaping and remote execution internally, and returns machine-readable JSON. The agent never needs to construct raw shell commands for service management.

Use this skill when

  • the user asks to start / launch / pull up a vllm-ascend service on a managed machine
  • the user asks to restart or relaunch a service (possibly with changed flags or env)
  • the user asks to check if a running service is alive / ready
  • the user asks to stop a running service
  • another skill needs to start a service (e.g. ascend-memory-profiling)

Do not use this skill when

  • the task is adding, verifying, repairing, or removing a machine (use machine-management)
  • the task is syncing code to the remote container (use remote-code-parity)
  • the task is running benchmarks (a separate skill's responsibility)
  • the task is offline inference
  • the machine is not yet ready in inventory

Critical rules

  • start automatically runs remote-code-parity before launching. If parity fails, start is blocked.
  • status and stop do not require parity.
  • All remote execution goes through the scripts — never construct raw SSH commands for serving.
  • Keep local runtime state only under .vaws-local/serving/.
  • Progress on stderr as __VAWS_SERVING_PROGRESS__=<json>, final result on stdout as JSON.

Cross-platform launcher rule

  • macOS / Linux / WSL: python3 ...
  • Windows: py -3 ...

Public entry points

Start a service

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 2 --machine <alias-or-ip> \ 3 --model <remote-weight-path> \ 4 [--served-model-name <name>] \ 5 [--tp <N>] [--dp <N>] \ 6 [--devices <0,1,2,3>] \ 7 [--extra-env KEY=VALUE ...] \ 8 [--port <N>] \ 9 [--health-timeout <seconds>] \ 10 [--wrap-script <remote-path>] \ 11 [--skip-parity] \ 12 [-- <extra vllm serve args>]

Launch wrapping (--wrap-script)

The serving skill supports a generic --wrap-script mechanism. When provided, the vLLM launch command is written as _serve.sh in the runtime directory, and the wrapper script is called with two arguments: $1 = serve script path, $2 = runtime directory.

This is used by other skills (e.g. ascend-memory-profiling) to wrap the service launch process without the serving skill needing to know the wrapping details. The serving skill is agnostic to what the wrapper does.

The wrap_script path is recorded in the serving state so downstream skills can detect it.

Relaunch with previous config

bash
1# Exact same config 2python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 3 --machine <alias> --relaunch 4 5# Add a debug env 6python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 7 --machine <alias> --relaunch --extra-env VLLM_LOGGING_LEVEL=DEBUG 8 9# Remove an env from previous config 10python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 11 --machine <alias> --relaunch --unset-env MY_DEBUG_FLAG 12 13# Remove a vllm arg from previous config (use = to avoid argparse ambiguity) 14python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 15 --machine <alias> --relaunch --unset-args=--enforce-eager 16 17# Relaunch with a different model 18python3 .agents/skills/vllm-ascend-serving/scripts/serve_start.py \ 19 --machine <alias> --relaunch --model /data/models/OtherModel

Probe NPU device availability

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_probe_npus.py \ 2 --machine <alias-or-ip>

Returns which NPU devices are free, which are busy (with PID and HBM details), probed on the bare-metal host for cross-container visibility.

Check status

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_status.py \ 2 --machine <alias-or-ip>

Stop

bash
1python3 .agents/skills/vllm-ascend-serving/scripts/serve_stop.py \ 2 --machine <alias-or-ip> [--force]

Local state

Per-machine launch state is stored under .vaws-local/serving/<alias>.json.

This file records the last successful launch parameters (model, tp, devices, env, extra args, port, pid, log paths, runtime_dir, wrap_script). It is the basis for --relaunch and is read by other skills (e.g. ascend-memory-profiling) in attach mode.

Workflow

1. Resolve the target machine

The --machine argument is looked up in the local machine inventory. The machine must already be managed and ready.

2. Stop any existing service

If a previous service is recorded for this machine, it is stopped before launching a new one.

3. Run remote-code-parity (start only)

Unless --skip-parity is passed, parity_sync.py is called to ensure the container has the current local code. If parity fails, start is blocked.

4. Probe NPUs

NPU availability is checked via npu-smi info on the bare-metal host (not the container). Host-level probing sees processes from all containers, bypassing PID namespace isolation. Devices with HBM usage above 4 GB are also marked busy to catch cross-container occupancy:

  • If --devices is specified, those devices are verified to be free. If any are busy, start is blocked with the conflict details.
  • If --devices is not specified but --tp is given, the first N free devices are automatically selected, where N = TP × DP (defaults to TP when DP is not set).
  • If NPU probe fails (e.g. driver issue), it is treated as a non-fatal warning and launch continues with user-specified devices.

5. Validate and launch

  • Model path is checked for existence on the remote container.
  • A free port is auto-detected (or the explicit --port is used).
  • A bash launch script is built internally with proper escaping — the agent never sees or edits this script.
  • The process is started via nohup + disown and detached from the SSH session.

6. Wait for readiness

The script polls /health and /v1/models until both return success or the timeout expires.

6a. Diagnose launch failure before any code change

If the service fails during engine initialization or health check timeout:

  • Read both stdout.log and stderr.log from the remote runtime directory — vllm often logs the actual Python exception to stdout, not stderr.
  • Identify the actual exception type and message before hypothesizing a cause.
  • Do not modify source code to work around a launch failure until the root cause is confirmed from logs.
  • If the root cause is unclear, try the simplest launch configuration first (e.g. tp-only, no speculative decoding, no graph mode) and incrementally add features to isolate the failing component.

7. Return structured JSON

On success:

json
1{ 2 "status": "ready", 3 "machine": "blue-a", 4 "base_url": "http://10.0.0.8:38721", 5 "port": 38721, 6 "pid": 12345, 7 "served_model_name": "Qwen3-32B", 8 "model": "/data/models/Qwen3-32B", 9 "log_stdout": "/vllm-workspace/.vaws-runtime/serving/.../stdout.log", 10 "log_stderr": "/vllm-workspace/.vaws-runtime/serving/.../stderr.log" 11}

On failure, includes stderr_tail for diagnosis.

Reference files

  • .agents/skills/vllm-ascend-serving/references/behavior.md
  • .agents/skills/vllm-ascend-serving/references/command-recipes.md
  • .agents/skills/vllm-ascend-serving/references/acceptance.md

Verwandte Fähigkeiten

Looking for an alternative to vllm-ascend-serving or another community skill for your workflow? Explore these related open-source skills.

Alle anzeigen

openclaw-release-maintainer

Logo of openclaw
openclaw

Lokalisierte Zusammenfassung: 🦞 # OpenClaw Release Maintainer Use this skill for release and publish-time workflow. It covers ai, assistant, crustacean workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

333.8k
0
Künstliche Intelligenz

widget-generator

Logo of f
f

Lokalisierte Zusammenfassung: Generate customizable widget plugins for the prompts.chat feed system # Widget Generator Skill This skill guides creation of widget plugins for prompts.chat. It covers ai, artificial-intelligence, awesome-list workflows. This AI agent skill supports Claude Code

149.6k
0
Künstliche Intelligenz

flags

Logo of vercel
vercel

Lokalisierte Zusammenfassung: The React Framework # Feature Flags Use this skill when adding or changing framework feature flags in Next.js internals. It covers blog, browser, compiler workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

138.4k
0
Browser

pr-review

Logo of pytorch
pytorch

Lokalisierte Zusammenfassung: Usage Modes No Argument If the user invokes /pr-review with no arguments, do not perform a review. It covers autograd, deep-learning, gpu workflows. This AI agent skill supports Claude Code, Cursor, and Windsurf workflows.

98.6k
0
Entwickler