Priority Review
Refresh the priority signals table for the groomed pool (Backlog + Near Term),
manage active epics, compute the Near Term baseline, and display the comparison
table. The table and Near Term composition are the artifacts; the skill is the
refresh workflow that keeps them current.
Modes
- Default (no flag): Full refresh — all card types, Impact + Severity scoring.
--bugs: Bug cards only. Filters to story_type: bug at classification,
display, scoring, and ranking. Uses a bug-tuned classification prompt that
adds blast_radius and workaround_exists fields. Only Severity scoring
(no Impact). Typically fits in a single delegate batch.
Constraints
- Classification model: Use
claude-sonnet-4-6 for agent classification.
Bake-off showed haiku gets counts right but misclassifies evidence_type;
sonnet and opus are equivalent on judgment, sonnet is cheaper.
- Subagents are classifiers, not summarizers. They return structured JSON.
Review evidence_type assignments if any look wrong — sonnet is good but not
perfect.
- Spot-check at least 3 classifications before importing. Verify
intercom_conversations counts against actual card descriptions. If any error
is found, double the sample size before merging. Repeat until a full sample
passes clean.
- Incremental by default. Only re-classify cards whose
updated_at has
changed since last classification. Use --force flags when a full refresh
is needed.
- No mutations (default mode). This skill reads Shortcut cards and writes
to the local PostgreSQL priority_signals table. In
--bugs mode, Step 6
also sets the Shortcut Severity custom field (routed through execute_approved).
The Near Term refresh step (Step 7) moves cards between Backlog and Near Term
(routed through execute_approved).
- Batch files go in the project directory (
.tmp-classify/), not /tmp.
Subagents cannot access /tmp.
- Data scope: The skill refreshes BOTH Backlog and Near Term states, since
the Near Term sort needs DIC scores for the full groomed pool.
Steps
0. Active Epics Review
Phase boundary — emit TodoWrite before starting:
TodoWrite: content="[priority-review:step-0] Active epics review + preflight", status="in_progress"
The [priority-review:step-0] prefix in content is the phase signal.
The model-switching hook extracts it via regex and runs the
preflight/classification phase (steps 0–5) on a lighter model.
Runs first, before any classification or scoring, since epic membership is the
primary Near Term criterion and changes affect everything downstream.
-
Display current active epics:
bash
1python3 box/active-epics.py list
-
Fetch all epics from Shortcut API, show any not in the active set that have
cards in progress or were created in the last 30 days:
bash
1SHORTCUT_API_TOKEN=$(grep '^SHORTCUT_API_TOKEN=' /Users/paulyokota/Dev/FeedForward/.env | cut -d= -f2-)
2curl -s -H "Shortcut-Token: $SHORTCUT_API_TOKEN" \
3 "https://api.app.shortcut.com/api/v3/epics" | python3 -c "
4import json, sys
5from datetime import datetime, timedelta, timezone
6cutoff = datetime.now(timezone.utc) - timedelta(days=30)
7for e in json.load(sys.stdin):
8 if e.get('archived'): continue
9 created = datetime.fromisoformat(e['created_at'].replace('Z', '+00:00'))
10 if e.get('state') == 'in progress' or created > cutoff:
11 print(f'ID: {e[\"id\"]} | {e[\"name\"]} | State: {e[\"state\"]} | Created: {e[\"created_at\"][:10]}')
12"
-
Ask: "Any epics to add or remove?" — wait for explicit confirmation before
proceeding.
-
Apply changes:
bash
1python3 box/active-epics.py add <ID> "Epic Name"
2python3 box/active-epics.py remove <ID>
0a. Preflight: reconcile live board with DB
Before refreshing anything, check for cards in the DB that have moved out of
the groomed states since the last run. Run for BOTH states:
-
Fetch live card IDs from Shortcut for each groomed state:
bash
1python3 box/shortcut-cards.py --state "Backlog" --summary
2python3 box/shortcut-cards.py --state "Near Term" --summary
-
Query the DB for card IDs currently stored with groomed states:
bash
1psql postgresql://localhost:5432/feedforward -t -A \
2 -c "SELECT card_id, state FROM priority_signals WHERE state IN ('Backlog', 'Near Term')"
-
Diff the two sets. Cards in the DB but not on the live board have moved.
For each moved card, re-extract to update its state in the DB:
bash
1python3 box/priority-signals.py extract --id <comma-separated-moved-ids>
Report what moved and where (the extract updates the state field).
-
Cards on the live board but not in the DB are new — they'll be picked up
by the extract in Step 1.
This prevents stale cards from appearing in the signals table or framework
ranking. Skip this step only if running with --force on a fresh DB.
Run for BOTH groomed states (extract upserts by card_id, no conflicts):
bash
1python3 box/priority-signals.py extract --state "Backlog"
2python3 box/priority-signals.py extract --state "Near Term"
This updates board position, state, product area, feature area users, and
dates. Incremental: skips cards whose updated_at hasn't changed.
2. Check what needs classification
bash
1python3 box/priority-signals.py classify --state "Backlog" > /tmp/ff-$AGENTERMINAL_SESSION_ID/classify-backlog.json
2python3 box/priority-signals.py classify --state "Near Term" > /tmp/ff-$AGENTERMINAL_SESSION_ID/classify-nearterm.json
Merge the two outputs into a single classify file. Outputs card descriptions
as JSON for cards that either:
- Have never been classified
- Have been updated in Shortcut since last classification
If the count is 0, skip to Step 5.
--bugs mode: Filter the classify output to bug cards only before
delegating:
python
1bugs = [c for c in data['cards'] if c.get('story_type') == 'bug']
3. Delegate classification to sonnet agents
Split the classify output into batches of ~16 cards. Delegate each batch
to agenterminal.delegate with model: "claude-sonnet-4-6" and
timeout_ms: 600000 (10 minutes). Classification delegates timed out at
300s in practice — 600s is the tested floor.
The classification prompt for each batch:
You are classifying Shortcut card descriptions for priority signal extraction.
Read each card description and return structured classification data.
The cards are in {BATCH_FILE}. Read this file, then for each card extract:
1. intercom_conversations: Total count of Intercom conversations referenced.
Count BOTH linked conversations AND stated totals (e.g. "31 conversations").
Use the stated total when available. Return 0 if no Intercom evidence.
2. failure_volume_weekly: For bug cards, the headline weekly failure volume.
If sub-categories are broken down, report the headline total, NOT the sum
of sub-breakdowns. Return null for non-bug cards.
3. has_revenue_signal: true if evidence mentions cancellations, refunds, users
leaving, declining to subscribe. Polite requests without leaving signals = false.
4. evidence_type: One of:
- "direct_customer_pain" — Intercom conversations show real users affected
- "internal_metric" — evidence from internal dashboards, not customer reports
- "speculative" — no evidence of current customer impact
- "mixed" — combination of customer and internal evidence
- "implementation_step" — sub-task of a larger initiative
5. sentiment: "high" | "medium" | "low" | "none"
6. notes: One sentence on the most important thing the numbers don't convey.
Return ONLY valid JSON:
{"classifications": [{"card_id": N, ...}, ...]}
Collect all batches in parallel.
--bugs mode classification prompt (replaces the above):
You are classifying Shortcut bug card descriptions for priority signal extraction.
Read the file {BATCH_FILE}, then for each card extract:
1. intercom_conversations: Total count of Intercom conversations referenced.
Count BOTH linked conversations AND stated totals (e.g. "31 conversations").
Use the stated total when available. Return 0 if no Intercom evidence.
2. failure_volume_weekly: The headline weekly failure volume from the card.
If sub-categories are broken down, report the headline total, NOT the sum
of sub-breakdowns. Return null if no failure volume is stated.
3. has_revenue_signal: true if evidence mentions cancellations, refunds, users
leaving, declining to subscribe. Polite requests without leaving signals = false.
4. evidence_type: One of:
- "direct_customer_pain" — Intercom conversations show real users affected
- "internal_metric" — evidence from internal dashboards, not customer reports
- "speculative" — no evidence of current customer impact
- "mixed" — combination of customer and internal evidence
5. blast_radius: One sentence. Who is affected and how broadly? Include any
stated user counts, percentages, or frequency data from the card.
6. workaround_exists: true | false | "partial". Is there a user-accessible
workaround described or implied?
7. notes: One sentence on the most important thing the numbers don't convey.
Return ONLY valid JSON:
{"classifications": [{"card_id": N, "intercom_conversations": N, "failure_volume_weekly": N|null, "has_revenue_signal": bool, "evidence_type": "...", "blast_radius": "...", "workaround_exists": "...", "notes": "..."}, ...]}
Bug cards are typically fewer than 20 — use a single delegate unless the
count exceeds 16.
4. Import classification results
Merge all batch results into a single JSON file, then:
bash
1python3 box/priority-signals.py import-classifications /tmp/ff-$AGENTERMINAL_SESSION_ID/results.json
5. Display the table
bash
1python3 box/priority-signals.py show --state "Near Term"
2python3 box/priority-signals.py show --state "Backlog"
Near Term table is the primary deliverable. Backlog table shown for context.
Sorted by board position, showing all priority signals in a scannable
comparison format.
--bugs mode: Filter the output to bug rows only:
bash
1python3 box/priority-signals.py show --state "Near Term" 2>&1 | grep -E "bug|Rank|----"
6. Score unscored cards (default, skippable)
Phase boundary — emit TodoWrite before starting:
TodoWrite: content="[priority-review:step-6] Scoring unscored cards", status="in_progress"
The [priority-review:step-6] prefix in content is the phase signal.
The model-switching hook extracts it via regex and upgrades to the higher
model for judgment-heavy scoring and ranking work.
Run gaps to find cards missing judgment scores:
bash
1python3 box/framework-rank.py gaps
If there are gaps, propose scores and apply them. If the user says to skip
this step, go directly to Step 7 — mispriority will run on scored cards only.
Scoring rules:
- Features need an Impact score (1-10). "How much would this move the
needle if shipped?" Informed by revenue potential, retention effect,
strategic value. This is product judgment, not mechanical.
- Bugs need a Severity score (1-5):
- 5 = Financial harm / data loss / security
- 4 = Feature broken / workflow blocked
- 3 = Degraded experience / workaround exists
- 2 = Cosmetic / misleading display
- 1 = Edge case / negligible
Process:
- Read the classification data (D*C partial scores, evidence_type, notes)
from the
gaps output.
- For each unscored card, propose an Impact or Severity score with a
one-line rationale. Use the card description and classification notes —
don't infer from titles alone.
- Present the full batch of proposed scores to the user for review.
The user may adjust individual scores or approve the batch.
- Apply approved scores:
bash
1python3 box/framework-rank.py score SC-NNN --impact N # features
2python3 box/framework-rank.py score SC-NNN --severity N # bugs
If the batch is large (>10 cards), split into groups by product area for
easier review. Don't score infra-track cards (chores, implementation_step)
— they are capacity-allocated, not DIC-ranked.
--bugs mode: Skip Impact scoring entirely. Only propose Severity scores
for unscored bugs:
bash
1python3 box/framework-rank.py gaps 2>&1 | grep -A 100 "Bugs needing Severity"
--bugs mode — Shortcut Severity field sync:
After applying DIC severity scores, sync the Shortcut Severity custom field
for any bugs that were just scored. The mapping is Shortcut Sev = 5 - DIC Sev
(see reference/severity-framework.md for level definitions and field IDs).
- For each bug that was just scored, include the proposed Shortcut Severity
value in the same approval table (e.g., "DIC 4 -> Shortcut Sev 1 (Blocked)").
- After approval, set the Shortcut Severity field via the API. Route through
execute_approved (production mutation).
- Verify ALL updated cards via independent GET — not just a sample. When N
mutations go through a single
execute_approved, read back all N. Compose
the read-back loop alongside the mutation script so both go through approval
together.
This keeps the two representations in sync: one assessment, applied to both
framework-rank.py (DIC Sev) and the Shortcut custom field (Sev 0-4).
7. Near Term Refresh
Compute the baseline Near Term set and move cards between Backlog and Near
Term as needed. This is the step that keeps the Near Term pool reflecting
the active epics + top-DIC-ranked standalone cards.
7a. Reconcile manual changes:
Fetch current Near Term card IDs from Shortcut. Load last computed baseline
from near_term_baseline table (most recent run_id batch).
- First run: If no prior baseline exists (empty table), skip reconciliation.
All current Near Term cards are treated as the initial computed set.
- Cards in Near Term but NOT in last baseline and NOT already pinned ->
candidate manual pin. Present to the user for confirmation.
- Cards in last baseline but now in Backlog (not Build/Test/Released) and
NOT already excluded -> candidate manual exclude. Present for confirmation.
- Cards that moved out of the groomed pool entirely are ignored (not manual
removals).
- Auto-clear any override whose card is no longer in the groomed pool
(archived, moved to Build/Test/Released). Set
cleared_at=NOW(), cleared_by='auto'. Report auto-cleared overrides for awareness.
Write confirmed overrides to near_term_overrides.
7b. Compute new baseline:
bash
1python3 box/near-term-sort.py
The sort applies this precedence per card:
- Card in active epic -> Near Term (overrides excludes)
- Card has active exclude override -> Backlog
- Card has active pin override -> Near Term
- Card qualifies by baseline rules (top-N DIC) -> Near Term
- Otherwise -> Backlog
Baseline rules for standalone cards (not in active epics, not blocked):
- Top 5 DIC-ranked bugs
- Top 1 DIC-ranked bug per Shortcut severity level (critical/high/medium/low)
- Top 5 DIC-ranked chores
- Top 5 DIC-ranked features
- Tie-break: board position (lower = higher), then card age (older first)
7c. Show proposed moves:
Present:
- Cards entering Near Term (with reason: epic, top-5 bug, severity rep, etc.)
- Cards leaving Near Term (with reason: no longer qualifies, not pinned)
- Current active overrides for review/cleanup
Wait for user approval before executing moves.
7d. Execute moves via execute_approved:
Move approved cards between Backlog and Near Term.
7e. Write new baseline:
Insert computed baseline to near_term_baseline with new run_id. Old
baselines remain for audit; only the most recent is used for reconciliation.
7f. Report final Near Term composition.
8. Framework ranking comparison
bash
1python3 box/framework-rank.py mispriority -n 20
Compares framework rank (by DIC score) against live board position in Near
Term. Positive gap = board has the card lower than the framework thinks it
should be (underprioritized). Only includes scored cards — unscored cards
from skipped Step 6 will not appear.
--bugs mode: Use the filtered rank output instead:
bash
1python3 box/framework-rank.py rank --state "Near Term" 2>&1 | grep "BUG"
9. Confirm adequacy before cleanup
Before cleaning up temp files or declaring complete, ask the human whether
the output is useful and whether any flags need investigation. The mispriority
table may have analytical gaps (e.g., blocked cards inflating under-prioritized
flags) that only the human can identify.
Adapting to other states
The default targets the groomed pool (Backlog + Near Term). The classification
and scoring workflow works for any single state by replacing the --state
flags. The Near Term refresh step (7) only runs in default mode.