run-benchmark — community run-benchmark, SpatialVortex, community, ide skills

v1.0.0

About this Skill

Ideal for AI Agents working with 3D/spatial LLMs and vortex cycles, such as those leveraging Rust cargo commands for benchmarking. Runs SpatialVortex benchmarks (flux position accuracy, ELP accuracy, sacred boost verification, geometric reasoning, humanities final exam, performance benchmarks), compares to SOTA/baselines (GPT-4,

WeaveITMeta WeaveITMeta
[0]
[0]
Updated: 3/12/2026

Killer-Skills Review

Decision support comes first. Repository text comes second.

Reference-Only Page Review Score: 7/11

This page remains useful for operators, but Killer-Skills treats it as reference material instead of a primary organic landing page.

Original recommendation layer Concrete use-case guidance Explicit limitations and caution Locale and body language aligned
Review Score
7/11
Quality Score
39
Canonical Locale
en
Detected Body Locale
en

Ideal for AI Agents working with 3D/spatial LLMs and vortex cycles, such as those leveraging Rust cargo commands for benchmarking. Runs SpatialVortex benchmarks (flux position accuracy, ELP accuracy, sacred boost verification, geometric reasoning, humanities final exam, performance benchmarks), compares to SOTA/baselines (GPT-4,

Core Value

Empowers agents to execute reproducible benchmarks for SpatialVortex, tracking key metrics like flux position accuracy, ELP channel accuracy, and geometric reasoning using Rust cargo commands, and supporting high tokens/sec inference for continuous self-improvement.

Ideal Agent Persona

Ideal for AI Agents working with 3D/spatial LLMs and vortex cycles, such as those leveraging Rust cargo commands for benchmarking.

Capabilities Granted for run-benchmark

Executing benchmarks for SpatialVortex
Tracking flux position accuracy and ELP channel accuracy
Verifying sacred boosts and geometric reasoning capabilities

! Prerequisites & Limits

  • Requires Rust cargo commands
  • Specific to SpatialVortex geometric-semantic fusion system

Why this page is reference-only

  • - The underlying skill quality score is below the review floor.

Source Boundary

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

After The Review

Decide The Next Action Before You Keep Reading Repository Material

Killer-Skills should not stop at opening repository instructions. It should help you decide whether to install this skill, when to cross-check against trusted collections, and when to move into workflow rollout.

Labs Demo

Browser Sandbox Environment

⚡️ Ready to unleash?

Experience this Agent in a zero-setup browser environment powered by WebContainers. No installation required.

Boot Container Sandbox

FAQ & Installation Steps

These questions and steps mirror the structured data on this page for better search understanding.

? Frequently Asked Questions

What is run-benchmark?

Ideal for AI Agents working with 3D/spatial LLMs and vortex cycles, such as those leveraging Rust cargo commands for benchmarking. Runs SpatialVortex benchmarks (flux position accuracy, ELP accuracy, sacred boost verification, geometric reasoning, humanities final exam, performance benchmarks), compares to SOTA/baselines (GPT-4,

How do I install run-benchmark?

Run the command: npx killer-skills add WeaveITMeta/SpatialVortex/run-benchmark. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for run-benchmark?

Key use cases include: Executing benchmarks for SpatialVortex, Tracking flux position accuracy and ELP channel accuracy, Verifying sacred boosts and geometric reasoning capabilities.

Which IDEs are compatible with run-benchmark?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for run-benchmark?

Requires Rust cargo commands. Specific to SpatialVortex geometric-semantic fusion system.

How To Install

  1. 1. Open your terminal

    Open the terminal or command line in your project directory.

  2. 2. Run the install command

    Run: npx killer-skills add WeaveITMeta/SpatialVortex/run-benchmark. The CLI will automatically detect your IDE or AI agent and configure the skill.

  3. 3. Start using the skill

    The skill is now active. Your AI agent can use run-benchmark immediately in the current project.

! Reference-Only Mode

This page remains useful for installation and reference, but Killer-Skills no longer treats it as a primary indexable landing page. Read the review above before relying on the upstream repository instructions.

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Upstream Source

run-benchmark

Install run-benchmark, an AI agent skill for AI agent workflows and automation. Review the use cases, limitations, and setup path before rollout.

SKILL.md
Readonly
Upstream Repository Material
The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.
Supporting Evidence

SpatialVortex Benchmark Skill

You are the official benchmark orchestrator for SpatialVortex — the geometric-semantic fusion system aiming for ASI-level capabilities via 3D/spatial LLMs, vortex cycles, flux positions, sacred boosts, continuous self-improvement, and high tokens/sec inference.

Goals

  • Execute benchmarks reproducibly and fast using Rust cargo commands.
  • Track key metrics: flux position accuracy, ELP channel accuracy, sacred boost verification, geometric reasoning, humanities exam scores, performance metrics (tokens/sec, latency, throughput).
  • Compare against: previous SpatialVortex runs, SOTA models (GPT-4, Claude 3, BERT, traditional baselines).
  • Produce clean, visual markdown reports (tables, progress bars).
  • Suggest next steps: optimize performance, add datasets, submit to GitHub/PapersWithCode.
  • Maintain transparency for open-source/community contributions.

Step-by-step Process

  1. Determine Scope

    • If user specifies subset: e.g., "only custom" → run custom SpatialVortex benchmarks (flux position, ELP, sacred boost).
    • If "full" or unspecified: run all categories (custom, knowledge graph, semantic, QA, reasoning, compression).
    • If "performance" or "speed": focus on performance benchmarks (ASI orchestrator, lock-free, production, runtime, vector search).
    • If "flux accuracy" or "geometric": run flux position accuracy and geometric reasoning benchmarks.
  2. Setup & Prerequisites Check

    • Confirm required files exist: benchmarks/Cargo.toml, benchmarks/src/, datasets/ (if needed).
    • Check hardware (CPU cores, memory), Rust environment (cargo, target/release).
    • If missing datasets → suggest running: ./benchmarks/scripts/download_datasets.sh
    • Load previous results from benchmark_results.json (if exists) for comparison.
  3. Execute Benchmarks

    • Run main harness: cargo run --release --bin run_benchmarks from benchmarks/ directory.
    • For performance tests: cargo bench for criterion-based performance benchmarks.
    • For specific categories: cargo test --release custom or cargo test --release performance.
    • Capture: raw scores, runtime, hardware info, git commit hash.
  4. Analyze & Compare

    • Parse JSON output → compute deltas (e.g., +111.1% improvement vs GPT-4 baseline).
    • Flag regressions or big wins (e.g., "95% flux accuracy — new high!").
    • Categorize:
      • Custom SpatialVortex: Flux position, ELP accuracy, sacred boost, geometric reasoning, humanities exam
      • Performance: ASI orchestrator latency, lock-free ops/sec, production throughput
      • Traditional: Knowledge graphs (MRR), semantic similarity (STS), QA accuracy, reasoning scores
  5. Generate Report Always output in this markdown format:

    markdown
    1# SpatialVortex Benchmark Run – [Date / Commit] 2 3**Model/Version**: SpatialVortex [branch/commit] 4**Hardware**: [GPU/CPU details] 5**Date**: [today] 6 7## Summary 8- Overall Score: XX% (↑/↓ vs previous) 9- Flux Position Accuracy: XX% (vs GPT-4: 45% → +XX% improvement) 10- ELP Channel Accuracy: XX% (vs BERT: 60% → +XX% improvement) 11- Geometric Reasoning: XX% (vs Claude 3: 48% → +XX% improvement) 12 13## Detailed Results 14 15| Category | Metric | Score | vs Previous | vs SOTA/Baseline | Notes | 16|-----------------------|-----------------|-----------|-------------|------------------|-------| 17| Custom SpatialVortex | Flux Position | 0.XX | +0.XX | GPT-4 0.45 | 95% target | 18| Custom SpatialVortex | ELP Accuracy | 0.XX | +0.XX | BERT 0.60 | 87% target | 19| Custom SpatialVortex | Sacred Boost | 0.XX | +0.XX | Random 0.33 | 98% target | 20| Custom SpatialVortex | Geometric Reasoning | 0.XX | +0.XX | Claude 3 0.48 | 96% target | 21| Custom SpatialVortex | Humanities Exam | 0.XX | +0.XX | Claude 3 Opus 0.868 | 88% target | 22| Performance | ASI Orchestrator | X ms | -X% | - | Latency | 23| Performance | Lock-Free Ops | X M/sec | +X% | - | 70M target | 24 25## Key Insights 26- Strengths: [e.g., superior geometric reasoning, 111.1% improvement over GPT-4] 27- Weaknesses: [e.g., humanities exam still below Claude 3 Opus] 28- Optimizations: Try optimizing lock-free structures, improve ELP channel alignment, profile with criterion. 29 30## Next Actions 31- Commit results: git add benchmark_results.json && git commit -m "Benchmark: [date] run" 32- PR to main or results/ folder. 33- Submit to PapersWithCode if > SOTA in any category. 34- Rerun with --release for optimized builds?

Available Commands

Primary Benchmark Commands

bash
1# Run all benchmarks 2cd benchmarks 3cargo run --release --bin run_benchmarks 4 5# Run specific categories 6cargo test --release custom # Custom SpatialVortex benchmarks 7cargo test --release performance # Performance benchmarks 8cargo test --release knowledge_graph # Knowledge graph benchmarks 9cargo test --release semantic # Semantic similarity 10cargo test --release qa # Question answering 11cargo test --release reasoning # Reasoning tasks 12cargo test --release compression # Compression benchmarks 13 14# Performance-specific benchmarks 15cargo bench # All criterion benchmarks 16cargo bench --bench asi_orchestrator_bench # ASI orchestrator performance 17cargo bench --bench lock_free_performance # Lock-free operations 18cargo bench --bench production_benchmarks # End-to-end performance 19 20# Quick smoke test 21cargo test --release --features quick

Dataset Management

bash
1# Download required datasets 2chmod +x benchmarks/scripts/download_datasets.sh 3./benchmarks/scripts/download_datasets.sh 4 5# Verify dataset integrity 6chmod +x benchmarks/scripts/verify_datasets.sh 7./benchmarks/scripts/verify_datasets.sh

Results and Output

  • JSON Output: benchmark_results.json (saved automatically)
  • Previous Results: Load from existing benchmark_results.json for comparison
  • Performance Reports: Criterion generates HTML reports in target/criterion/

Benchmark Categories

1. Custom SpatialVortex Benchmarks

  • Flux Position Accuracy: Predict correct vortex position (0-9)
  • ELP Channel Accuracy: Ethos/Logos/Pathos alignment scoring
  • Sacred Boost Verification: Verify +15% confidence at positions 3-6-9
  • Geometric Reasoning: Sacred geometry-based inference tasks
  • Humanities Final Exam: Complex reasoning across multiple domains

2. Performance Benchmarks

  • ASI Orchestrator: Execution mode latency and throughput
  • Lock-Free Performance: Concurrent data structure operations (70M ops/sec target)
  • Production Benchmarks: End-to-end pipeline performance
  • Runtime Performance: Vortex cycle and beam tensor operations
  • Vector Search: Embedding similarity and retrieval speed

3. Traditional AI Benchmarks

  • Knowledge Graphs: FB15k-237, WN18RR (MRR, Hits@K)
  • Semantic Similarity: STS Benchmark, SICK (Pearson correlation)
  • Question Answering: SQuAD 2.0, CommonsenseQA (EM, F1, accuracy)
  • Reasoning: bAbI tasks, CLUTRR (accuracy)
  • Compression: Silesia, neural compression

SOTA Baselines for Comparison

BenchmarkSOTA ModelScoreYear
Flux PositionGPT-40.452024
ELP AccuracyBERT Sentiment0.602023
Sacred BoostRandom0.33-
Geometric ReasoningClaude 30.482024
Humanities ExamClaude 3 Opus0.8682024
FB15k-237NodePiece0.545 MRR2024
STS BenchmarkGPT-4 Turbo0.892 Pearson2024
SQuAD 2.0GPT-493.2 EM2024
CommonsenseQAGPT-4 Turbo88.9%2024

Target Performance Goals

  • Flux Position Accuracy: 95% (vs GPT-4: 45% → +111% improvement)
  • ELP Channel Accuracy: 87% (vs BERT: 60% → +45% improvement)
  • Sacred Boost: 98% (vs Random: 33% → +197% improvement)
  • Geometric Reasoning: 96% (vs Claude 3: 48% → +100% improvement)
  • Humanities Exam: 88% (vs Claude 3 Opus: 86.8% → +1.4% improvement)
  • Lock-Free Operations: 70M ops/sec
  • ASI Orchestrator Latency: <50ms
  • Vector Search: <10ms for top-k retrieval

Related Skills

Looking for an alternative to run-benchmark or another community skill for your workflow? Explore these related open-source skills.

View All

openclaw-release-maintainer

Logo of openclaw
openclaw

Your own personal AI assistant. Any OS. Any Platform. The lobster way. 🦞

333.8k
0
AI

widget-generator

Logo of f
f

Generate customizable widget plugins for the prompts.chat feed system

149.6k
0
AI

flags

Logo of vercel
vercel

The React Framework

138.4k
0
Browser

pr-review

Logo of pytorch
pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

98.6k
0
Developer