dev-tpu-ray — community dev-tpu-ray, community, ide skills, Claude Code, Cursor, Windsurf

v1.0.0

关于此技能

Open-source framework for the research and development of foundation models.

marin-community marin-community
[844]
[103]
更新于: 4/11/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 1/11

This page remains useful for operators, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Review Score
1/11
Quality Score
38
Canonical Locale
en
Detected Body Locale
en

Open-source framework for the research and development of foundation models.

核心价值

Open-source framework for the research and development of foundation models.

适用 Agent 类型

Suitable for operator workflows that need explicit guardrails before installation and execution.

赋予的主要能力 · dev-tpu-ray

! 使用限制与门槛

Why this page is reference-only

  • - Current locale does not satisfy the locale-governance contract.
  • - The page lacks a strong recommendation layer.
  • - The page lacks concrete use-case guidance.
  • - The page lacks explicit limitations or caution signals.
  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

实验室 Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

常见问题与安装步骤

以下问题与步骤与页面结构化数据保持一致,便于搜索引擎理解页面内容。

? FAQ

dev-tpu-ray 是什么?

Open-source framework for the research and development of foundation models.

如何安装 dev-tpu-ray?

运行命令:npx killer-skills add marin-community/marin/dev-tpu-ray。支持 Cursor、Windsurf、VS Code、Claude Code 等 19+ IDE/Agent。

dev-tpu-ray 支持哪些 IDE 或 Agent?

该技能兼容 Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer。可使用 Killer-Skills CLI 一条命令通用安装。

安装步骤

  1. 1. 打开终端

    在你的项目目录中打开终端或命令行。

  2. 2. 执行安装命令

    运行:npx killer-skills add marin-community/marin/dev-tpu-ray。CLI 会自动识别 IDE 或 AI Agent 并完成配置。

  3. 3. 开始使用技能

    dev-tpu-ray 已启用,可立即在当前项目中调用。

! 参考页模式

此页面仍可作为安装与查阅参考,但 Killer-Skills 不再把它视为主要可索引落地页。请优先阅读上方评审结论,再决定是否继续查看上游仓库说明。

Imported Repository Instructions

The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.

Supporting Evidence

dev-tpu-ray

安装 dev-tpu-ray,这是一款面向AI agent workflows and automation的 AI Agent Skill。支持 Claude Code、Cursor、Windsurf,一键安装。

SKILL.md
Readonly
Imported Repository Instructions
The section below is supporting source material from the upstream repository. Use the Killer-Skills review above as the primary decision layer.
Supporting Evidence

Skill: Legacy Ray Dev TPU

Use this skill only when you specifically need the legacy Ray-backed dev TPU workflow. Prefer .agents/skills/dev-tpu/SKILL.md for the current Iris-backed path.

scripts/ray/dev_tpu.py can reserve a temporary TPU VM, sync the repo, and run commands remotely. It is good for:

  • quick test and benchmark loops,
  • memory debugging,
  • profiling and trace capture,
  • short experiments where you want direct shell access.

It is a bad fit for long unattended experiments or many concurrent TPU commands.

Critical concurrency rule

Run at most one TPU job at a time on a given dev TPU VM. Do not launch concurrent TPU commands from separate shells, tmux panes, or background jobs against the same dev TPU.

Commands

  • allocate: reserve a TPU VM and keep it alive while the command runs. This also writes an SSH alias into ~/.ssh/config.
  • connect: open an interactive shell on the TPU.
  • execute: sync local files to remote ~/marin/ unless --no-sync, then run one command.
  • watch: rsync + restart on local file changes.

Prerequisites

  1. Authenticate to GCP and set up the Marin development environment.
bash
1gcloud auth login 2gcloud config set project hai-gcp-models 3gcloud auth application-default login 4make dev_setup
  1. Ensure your SSH public key is in project metadata: https://console.cloud.google.com/compute/metadata?resourceTab=sshkeys&project=hai-gcp-models&scopeTab=projectMetadata

Quick Start

Allocate:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 allocate

Connect interactively:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 connect

Run one command with sync:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 execute -- uv run --package levanter --group test pytest lib/levanter/tests/kernels/test_pallas_fused_cross_entropy_loss.py

dev_tpu.py creates an alias for a TPU VM monitored by Ray. By default it uses your username and the config cluster_name to create a name like dev-<cluster_name>-<user>.

Stop allocation by pressing Ctrl-C in the terminal that is running allocate.

Agent Usage

Always pass --tpu-name to avoid collisions with other agents.

bash
1export TPU_NAME="${USER}-$(git rev-parse --abbrev-ref HEAD | tr '/' '-')-$(date +%H%M%S)"

Then reuse that name for allocate, connect, and execute.

Practical Patterns

Extra environment variables

Use repeatable -e KEY=VALUE with execute:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 --tpu-name "$TPU_NAME" \ 4 execute -e LIBTPU_INIT_ARGS="--xla_tpu_scoped_vmem_limit_kib=50000" -- \ 5 uv run --package levanter --extra tpu lib/levanter/scripts/bench/bench_moe_hillclimb.py

Notes:

  • .levanter.yaml, .marin.yaml, and .config environment entries are injected automatically.
  • execute already wraps the command in bash -c; do not pass your own bash -c.

Fast inner loop

Skip sync with --no-sync when the remote checkout is already current:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 --tpu-name "$TPU_NAME" \ 4 execute --no-sync -- uv run --package levanter --group test pytest lib/levanter/tests/kernels/test_pallas_fused_cross_entropy_loss.py

Or SSH directly:

bash
1ssh "dev-tpu-${TPU_NAME}" 2cd ~/marin 3source ~/.local/bin/env

Run remote TPU commands sequentially.

Copy remote artifacts

bash
1scp "dev-tpu-${TPU_NAME}:~/marin/<remote-path>" "<local-path>"

Common examples include profiles, traces, logs, and checkpoints. For example:

bash
1mkdir -p ".profiles/${TPU_NAME}" 2scp "dev-tpu-${TPU_NAME}:~/marin/.profiles/<run_name>/plugins/profile/*/*" ".profiles/${TPU_NAME}/"

Multiple clusters

When using multiple clusters at once, always pass explicit --config and --tpu-name.

Example naming:

  • infra/marin-us-central1.yaml with --tpu-name "${USER}-central1"
  • infra/marin-us-east5-a.yaml with --tpu-name "${USER}-east5"

Troubleshooting

Could not infer TPU type from config

Pass --tpu-type explicitly:

bash
1uv run scripts/ray/dev_tpu.py --config <config> allocate --tpu-type v5p-8

SSH configuration ... not found

Run allocate first for that --tpu-name, then retry connect or execute.

Verify cleanup after allocate

After finishing work, stop allocation with Ctrl-C in the terminal running allocate.

Recommended verification:

  1. Confirm the allocator exited cleanly.
  2. Confirm no local allocate process is still running for that TPU name.
  3. Confirm the local alias state is cleaned up:
bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config <config> \ 3 --tpu-name <name> execute --no-sync -- /bin/bash -lc 'echo ok'

Expected result after cleanup: it should fail with SSH configuration ... not found.

TPU busy or stale lockfile

If TPU init fails due to lock contention:

bash
1sudo rm -f /tmp/libtpu_lockfile 2sudo lsof -t /dev/vfio/* | xargs -r sudo kill -9

Then rerun the command.

execute feels slow

It syncs with rsync before each run by default. Use --no-sync or direct SSH for repeated runs.

Reference Examples

Run tests:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 --tpu-name "$TPU_NAME" \ 4 execute -- uv run --package levanter --group test pytest lib/levanter/tests/kernels/test_pallas_fused_cross_entropy_loss.py

Run a benchmark:

bash
1RAY_AUTH_MODE=token uv run scripts/ray/dev_tpu.py \ 2 --config infra/marin-us-east5-a.yaml \ 3 --tpu-name "$TPU_NAME" \ 4 execute -e LIBTPU_INIT_ARGS="--xla_tpu_scoped_vmem_limit_kib=50000" -- \ 5 uv run --package levanter --extra tpu lib/levanter/scripts/bench/bench_moe_mlp_profile.py

相关技能

寻找 dev-tpu-ray 的替代方案 (Alternative) 或可搭配使用的同类 community Skill?探索以下相关开源技能。

查看全部

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

为prompts.chat的信息反馈系统生成可定制的插件小部件

149.6k
0
AI

flags

Logo of vercel
vercel

React 框架

138.4k
0
浏览器

pr-review

Logo of pytorch
pytorch

Python中具有强大GPU加速的张量和动态神经网络

98.6k
0
开发者工具