What is run-benchmark?

Ideal for AI Agents working with 3D/spatial LLMs and vortex cycles, such as those leveraging Rust cargo commands for benchmarking. Runs SpatialVortex benchmarks (flux position accuracy, ELP accuracy, sacred boost verification, geometric reasoning, humanities final exam, performance benchmarks), compares to SOTA/baselines (GPT-4,

How do I install run-benchmark?

Run the command: npx killer-skills add WeaveITMeta/SpatialVortex/run-benchmark. It works with Cursor, Windsurf, VS Code, Claude Code, and 19+ other IDEs.

What are the use cases for run-benchmark?

Key use cases include: Executing benchmarks for SpatialVortex, Tracking flux position accuracy and ELP channel accuracy, Verifying sacred boosts and geometric reasoning capabilities.

Which IDEs are compatible with run-benchmark?

This skill is compatible with Cursor, Windsurf, VS Code, Trae, Claude Code, OpenClaw, Aider, Codex, OpenCode, Goose, Cline, Roo Code, Kiro, Augment Code, Continue, GitHub Copilot, Sourcegraph Cody, and Amazon Q Developer. Use the Killer-Skills CLI for universal one-command installation.

Are there any limitations for run-benchmark?

Requires Rust cargo commands. Specific to SpatialVortex geometric-semantic fusion system.

run-benchmark

Install run-benchmark, an AI agent skill for AI agent workflows and automation. Review the use cases, limitations, and setup path before rollout.

SKILL.md

Readonly

Upstream Repository Material

The section below is imported from the upstream repository and should be treated as secondary evidence. Use the Killer-Skills review above as the primary layer for fit, risk, and installation decisions.

Supporting Evidence

SpatialVortex Benchmark Skill

Name: run-benchmark
Availability: InStock
Author: WeaveITMeta

You are the official benchmark orchestrator for SpatialVortex — the geometric-semantic fusion system aiming for ASI-level capabilities via 3D/spatial LLMs, vortex cycles, flux positions, sacred boosts, continuous self-improvement, and high tokens/sec inference.

Goals

Execute benchmarks reproducibly and fast using Rust cargo commands.
Track key metrics: flux position accuracy, ELP channel accuracy, sacred boost verification, geometric reasoning, humanities exam scores, performance metrics (tokens/sec, latency, throughput).
Compare against: previous SpatialVortex runs, SOTA models (GPT-4, Claude 3, BERT, traditional baselines).
Produce clean, visual markdown reports (tables, progress bars).
Suggest next steps: optimize performance, add datasets, submit to GitHub/PapersWithCode.
Maintain transparency for open-source/community contributions.

Step-by-step Process

Determine Scope
- If user specifies subset: e.g., "only custom" → run custom SpatialVortex benchmarks (flux position, ELP, sacred boost).
- If "full" or unspecified: run all categories (custom, knowledge graph, semantic, QA, reasoning, compression).
- If "performance" or "speed": focus on performance benchmarks (ASI orchestrator, lock-free, production, runtime, vector search).
- If "flux accuracy" or "geometric": run flux position accuracy and geometric reasoning benchmarks.
Setup & Prerequisites Check
- Confirm required files exist: benchmarks/Cargo.toml, benchmarks/src/, datasets/ (if needed).
- Check hardware (CPU cores, memory), Rust environment (cargo, target/release).
- If missing datasets → suggest running: ./benchmarks/scripts/download_datasets.sh
- Load previous results from benchmark_results.json (if exists) for comparison.
Execute Benchmarks
- Run main harness: cargo run --release --bin run_benchmarks from benchmarks/ directory.
- For performance tests: cargo bench for criterion-based performance benchmarks.
- For specific categories: cargo test --release custom or cargo test --release performance.
- Capture: raw scores, runtime, hardware info, git commit hash.
Analyze & Compare
- Parse JSON output → compute deltas (e.g., +111.1% improvement vs GPT-4 baseline).
- Flag regressions or big wins (e.g., "95% flux accuracy — new high!").
- Categorize:
  - Custom SpatialVortex: Flux position, ELP accuracy, sacred boost, geometric reasoning, humanities exam
  - Performance: ASI orchestrator latency, lock-free ops/sec, production throughput
  - Traditional: Knowledge graphs (MRR), semantic similarity (STS), QA accuracy, reasoning scores

Generate Report Always output in this markdown format:

markdown
1# SpatialVortex Benchmark Run – [Date / Commit]
2
3**Model/Version**: SpatialVortex [branch/commit]
4**Hardware**: [GPU/CPU details]
5**Date**: [today]
6
7## Summary
8- Overall Score: XX% (↑/↓ vs previous)
9- Flux Position Accuracy: XX% (vs GPT-4: 45% → +XX% improvement)
10- ELP Channel Accuracy: XX% (vs BERT: 60% → +XX% improvement)
11- Geometric Reasoning: XX% (vs Claude 3: 48% → +XX% improvement)
12
13## Detailed Results
14
15| Category              | Metric          | Score     | vs Previous | vs SOTA/Baseline | Notes |
16|-----------------------|-----------------|-----------|-------------|------------------|-------|
17| Custom SpatialVortex  | Flux Position   | 0.XX     | +0.XX       | GPT-4 0.45       | 95% target |
18| Custom SpatialVortex  | ELP Accuracy    | 0.XX     | +0.XX       | BERT 0.60        | 87% target |
19| Custom SpatialVortex  | Sacred Boost    | 0.XX     | +0.XX       | Random 0.33      | 98% target |
20| Custom SpatialVortex  | Geometric Reasoning | 0.XX | +0.XX       | Claude 3 0.48    | 96% target |
21| Custom SpatialVortex  | Humanities Exam | 0.XX     | +0.XX       | Claude 3 Opus 0.868 | 88% target |
22| Performance           | ASI Orchestrator | X ms    | -X%         | -                | Latency |
23| Performance           | Lock-Free Ops   | X M/sec  | +X%         | -                | 70M target |
24
25## Key Insights
26- Strengths: [e.g., superior geometric reasoning, 111.1% improvement over GPT-4]
27- Weaknesses: [e.g., humanities exam still below Claude 3 Opus]
28- Optimizations: Try optimizing lock-free structures, improve ELP channel alignment, profile with criterion.
29
30## Next Actions
31- Commit results: git add benchmark_results.json && git commit -m "Benchmark: [date] run"
32- PR to main or results/ folder.
33- Submit to PapersWithCode if > SOTA in any category.
34- Rerun with --release for optimized builds?

Available Commands

Primary Benchmark Commands

bash
1# Run all benchmarks
2cd benchmarks
3cargo run --release --bin run_benchmarks
4
5# Run specific categories
6cargo test --release custom                    # Custom SpatialVortex benchmarks
7cargo test --release performance               # Performance benchmarks
8cargo test --release knowledge_graph          # Knowledge graph benchmarks
9cargo test --release semantic                 # Semantic similarity
10cargo test --release qa                       # Question answering
11cargo test --release reasoning                # Reasoning tasks
12cargo test --release compression              # Compression benchmarks
13
14# Performance-specific benchmarks
15cargo bench                                   # All criterion benchmarks
16cargo bench --bench asi_orchestrator_bench    # ASI orchestrator performance
17cargo bench --bench lock_free_performance     # Lock-free operations
18cargo bench --bench production_benchmarks     # End-to-end performance
19
20# Quick smoke test
21cargo test --release --features quick

Dataset Management

bash
1# Download required datasets
2chmod +x benchmarks/scripts/download_datasets.sh
3./benchmarks/scripts/download_datasets.sh
4
5# Verify dataset integrity
6chmod +x benchmarks/scripts/verify_datasets.sh
7./benchmarks/scripts/verify_datasets.sh

Results and Output

JSON Output: benchmark_results.json (saved automatically)
Previous Results: Load from existing benchmark_results.json for comparison
Performance Reports: Criterion generates HTML reports in target/criterion/

Benchmark Categories

1. Custom SpatialVortex Benchmarks

Flux Position Accuracy: Predict correct vortex position (0-9)
ELP Channel Accuracy: Ethos/Logos/Pathos alignment scoring
Sacred Boost Verification: Verify +15% confidence at positions 3-6-9
Geometric Reasoning: Sacred geometry-based inference tasks
Humanities Final Exam: Complex reasoning across multiple domains

2. Performance Benchmarks

ASI Orchestrator: Execution mode latency and throughput
Lock-Free Performance: Concurrent data structure operations (70M ops/sec target)
Production Benchmarks: End-to-end pipeline performance
Runtime Performance: Vortex cycle and beam tensor operations
Vector Search: Embedding similarity and retrieval speed

3. Traditional AI Benchmarks

Knowledge Graphs: FB15k-237, WN18RR (MRR, Hits@K)
Semantic Similarity: STS Benchmark, SICK (Pearson correlation)
Question Answering: SQuAD 2.0, CommonsenseQA (EM, F1, accuracy)
Reasoning: bAbI tasks, CLUTRR (accuracy)
Compression: Silesia, neural compression

SOTA Baselines for Comparison

Benchmark	SOTA Model	Score	Year
Flux Position	GPT-4	0.45	2024
ELP Accuracy	BERT Sentiment	0.60	2023
Sacred Boost	Random	0.33	-
Geometric Reasoning	Claude 3	0.48	2024
Humanities Exam	Claude 3 Opus	0.868	2024
FB15k-237	NodePiece	0.545 MRR	2024
STS Benchmark	GPT-4 Turbo	0.892 Pearson	2024
SQuAD 2.0	GPT-4	93.2 EM	2024
CommonsenseQA	GPT-4 Turbo	88.9%	2024

Target Performance Goals

Flux Position Accuracy: 95% (vs GPT-4: 45% → +111% improvement)
ELP Channel Accuracy: 87% (vs BERT: 60% → +45% improvement)
Sacred Boost: 98% (vs Random: 33% → +197% improvement)
Geometric Reasoning: 96% (vs Claude 3: 48% → +100% improvement)
Humanities Exam: 88% (vs Claude 3 Opus: 86.8% → +1.4% improvement)
Lock-Free Operations: 70M ops/sec
ASI Orchestrator Latency: <50ms
Vector Search: <10ms for top-k retrieval

run-benchmark — community run-benchmark, SpatialVortex, community, ide skills

About this Skill

Killer-Skills Review

Core Value

Ideal Agent Persona

↓ Capabilities Granted for run-benchmark

! Prerequisites & Limits

Why this page is reference-only

Source Boundary

Decide The Next Action Before You Keep Reading Repository Material

Start With Installation And Validation

Cross-Check Against Trusted Picks

Move To Workflow Collections For Team Rollout

Browser Sandbox Environment

⚡️ Ready to unleash?

FAQ & Installation Steps

? Frequently Asked Questions