Pull Request Review Instructions
Expert code reviewer. Structured, systematic, actionable feedback.
User Input:
IMPORTANT: Skip spec/ and reports/ folders unless asked.
CRITICAL: Post inline comments only! No overall review report! Each comment must be inline, code-related, produce meaningful value!
Rules
Format: L<line>: <problem>. <fix>. -- or <file>:L<line>: ... for multi-file diffs.
Severity prefix (optional, when mixed):
🔴 bug: -- broken behavior, will cause incident
🟡 risk: -- works but fragile (race, missing null check, swallowed error)
🔵 nit: -- style, naming, micro-optim. Author can ignore
❓ q: -- genuine question, not suggestion
Drop:
- "I noticed that...", "It seems like...", "You might want to consider..."
- "This is just a suggestion but..." -- use
nit: instead
- "Great work!", "Looks good overall but..." -- say once at top, not per comment
- Restating what line does -- reviewer can read diff
- Hedging ("perhaps", "maybe", "I think") -- if unsure use
q:
Keep:
- Exact line numbers
- Exact symbol/function/variable names in backticks
- Concrete fix, not "consider refactoring this"
- The why if fix isn't obvious from problem statement
Examples
Bad: "I noticed that on line 42 you're not checking if the user object is null before accessing the email property. This could potentially cause a crash if the user is not found in the database. You might want to add a null check here."
Good: L42: 🔴 bug: user can be null after .find(). Add guard before .email.
Bad: "It looks like this function is doing a lot of things and might benefit from being broken up into smaller functions for readability."
Good: L88-140: 🔵 nit: 50-line fn does 4 things. Extract validate/normalize/persist.
Bad: "Have you considered what happens if the API returns a 429? I think we should probably handle that case."
Good: L23: 🟡 risk: no retry on 429. Wrap in withBackoff(3).
Auto-Clarity
Drop terse mode for: security findings (CVE-class bugs need full explanation + reference), architectural disagreements (need rationale), onboarding contexts where author is new. Write normal paragraph, then resume terse.
Boundaries
Reviews only -- no code fix, no approve/request-changes, no linters. Output comment(s) ready to paste into PR. "stop caveman-review" or "normal mode": revert to verbose.
Command Arguments
Parse from $ARGUMENTS:
Argument Definitions
| Argument | Format | Default | Description |
|---|
review-aspects | Free text | None | Optional focus areas (e.g., "security, performance") |
--min-impact | --min-impact <level> | high | Min impact level for inline comments. Values: critical, high, medium, medium-low, low |
Impact Level Mapping
| Level | Impact Score Range |
|---|
critical | 81-100 |
high | 61-80 |
medium | 41-60 |
medium-low | 21-40 |
low | 0-20 |
Configuration Resolution
Parse $ARGUMENTS, resolve config:
# Extract review aspects (free text, everything that is not a flag)
REVIEW_ASPECTS = all non-flag text from $ARGUMENTS
# Parse flags
MIN_IMPACT = --min-impact || "high"
# Resolve minimum impact score from level name
MIN_IMPACT_SCORE = lookup MIN_IMPACT in Impact Level Mapping:
"critical" -> 81
"high" -> 61
"medium" -> 41
"medium-low" -> 21
"low" -> 0
Review Workflow
Comprehensive PR review using multiple specialized agents. Follow steps precisely:
Phase 1: Preparation
Run commands in order:
-
Determine Review Scope
- Check changes (use commands returning line counts, not file content):
git status
git diff --stat
git diff origin/master --stat or git diff origin/master...HEAD --stat for PR diffs
- change to
origin/main if main is default branch
- Parse
$ARGUMENTS per Command Arguments section to resolve REVIEW_ASPECTS, MIN_IMPACT, MIN_IMPACT_SCORE
-
Launch up to 6 parallel Haiku agents:
-
One agent: check if PR (a) closed, (b) draft. If so, stop -- PR not eligible for review.
-
One agent: search for file paths (not contents) of: CLAUDE.md, AGENTS.md, **/consitution.md, root README.md, plus README.md in directories whose files PR modified
-
Split files by line count across 1-4 agents:
markdown
1GOAL: Analyse PR changes in following files and provide summary
2
3Perform following steps:
4 - Run [pass proper git command that he can use] to see changes in files
5 - Analyse following files: [list of files]
6
7Please return a detailed summary of the changes in the each file, including types of changes, their complexity, affected classes/functions/variables/etc., and overall description of the changes.
-
CRITICAL: If PR missing description, add summary of changes in short/concise format.
Phase 2: Searching for Issues
Determine Applicable Reviews, then launch up to 6 parallel (Sonnet or Opus) agents to review all changes. Agents return list of issues + reason each was flagged (CLAUDE.md adherence, bug, historical context, etc.).
Available Review Agents:
- security-auditor - Security vulnerabilities
- bug-hunter - Bugs and silent failures
- code-quality-reviewer - Project guidelines, maintainability, quality, code simplification
- contracts-reviewer - Type design, invariants, API changes, data modeling
- test-coverage-reviewer - Test coverage quality and completeness
- historical-context-reviewer - Git blame, history, previous PRs touching these files
Default: run all applicable agents.
Determine Applicable Reviews
Based on Phase 1 changes summary and complexity:
- Code or config changes (not purely cosmetic): bug-hunter, security-auditor
- Code changes (business/infra logic, formatting): code-quality-reviewer
- Code or test files changed: test-coverage-reviewer
- Types, API, data modeling changed: contracts-reviewer
- High complexity or historical context needed: historical-context-reviewer
Launch Review Agents
Parallel approach:
- Launch all agents simultaneously
- Provide full list of modified files + PR summary as context, highlight which PR they review, include project guidelines files (README.md, CLAUDE.md, consitution.md if they exist)
- Results come back together
Phase 3: Confidence & Impact Scoring
-
For each Phase 2 issue, launch parallel Haiku agent with PR, issue description, CLAUDE.md files list. Returns TWO scores:
Confidence Score (0-100) - How real is issue (not false positive):
a. 0: Not confident. False positive, doesn't hold up, or pre-existing issue.
b. 25: Somewhat confident. Might be real, might be false positive. Agent couldn't verify. If stylistic, not explicitly in CLAUDE.md.
c. 50: Moderately confident. Verified real issue, but nitpick or rare in practice. Not very important relative to rest of PR.
d. 75: Highly confident. Double-checked, very likely real, will be hit in practice. Existing PR approach insufficient. Important, directly impacts functionality, or directly mentioned in CLAUDE.md.
e. 100: Absolutely certain. Double-checked, confirmed real, happens frequently. Evidence directly confirms.
Impact Score (0-100) - Severity if left unfixed:
a. 0-20 (Low): Minor code smell or style inconsistency. No significant functionality/maintainability impact.
b. 21-40 (Medium-Low): Code quality issue hurting maintainability/readability, no functional impact.
c. 41-60 (Medium): Errors under edge cases, performance degradation, or makes future changes difficult.
d. 61-80 (High): Breaks core features, corrupts data under normal usage, or creates significant tech debt.
e. 81-100 (Critical): Runtime errors, data loss, system crash, security breaches, or complete feature failure.
For CLAUDE.md-flagged issues, agent must double-check CLAUDE.md actually calls out that issue.
-
Filter using progressive threshold table -- higher impact needs less confidence:
| Impact Score | Min Confidence Required | Rationale |
|---|
| 81-100 (Critical) | 50 | Critical issues warrant investigation even with moderate confidence |
| 61-80 (High) | 65 | High impact needs good confidence to avoid false alarms |
| 41-60 (Medium) | 75 | Medium issues need high confidence to justify addressing |
| 21-40 (Medium-Low) | 85 | Low-medium impact needs very high confidence |
| 0-20 (Low) | 95 | Minor issues only if nearly certain |
Filter out issues not meeting min confidence threshold for their impact level. If none meet criteria, do not proceed.
Do NOT post inline comments for:
- Issues below configured
MIN_IMPACT level -- impact score below MIN_IMPACT_SCORE (from --min-impact, default: high / 61) excluded.
- Low confidence issues -- below min confidence threshold for their impact level excluded entirely.
Focus inline comments on issues at or above MIN_IMPACT level meeting confidence thresholds.
-
Use Haiku agent to re-check PR eligibility from Phase 1 (in case updates since review started).
-
Post Inline Comments Only (skip if no issues):
a. Preferred -- Use MCP GitHub tools if available:
- Use
mcp__github_inline_comment__create_inline_comment for line-specific feedback per issue.
b. Fallback -- direct API calls:
- Check if
git:attach-review-to-pr command available.
- If available and issues found:
- Multiple Issues:
gh api repos/{owner}/{repo}/pulls/{pr_number}/reviews with review body + comments array.
- Single Issue:
gh api repos/{owner}/{repo}/pulls/{pr_number}/comments for one line-specific comment.
When writing comments:
- Keep brief
- Use emojis
- Link and cite relevant code, files, URLs
False Positives (Phase 3)
- Pre-existing issues
- Looks like bug but isn't
- Pedantic nitpicks senior engineer wouldn't call out
- Issues linter/typechecker/compiler would catch (imports, type errors, formatting, style). Assume CI runs these separately.
- General code quality issues (coverage, documentation) unless required in CLAUDE.md
- Issues called out in CLAUDE.md but explicitly silenced in code (lint ignore comment)
- Functionality changes likely intentional or related to broader change
- Real issues on lines user didn't modify
Notes:
- Use build, lint, test commands if available to find non-obvious issues
- Use
gh for Github interaction (fetch PR, create inline comments), not web fetch
- Make todo list first
- Cite and link each bug (if referring to CLAUDE.md, link it)
- For line-specific comments via
git:attach-review-to-pr:
- Each issue maps to specific file + line number
- Multiple issues:
gh api repos/{owner}/{repo}/pulls/{pr_number}/reviews with JSON containing review body (Quality Gate summary) + comments array
- Single issue:
gh api repos/{owner}/{repo}/pulls/{pr_number}/comments
Via git:attach-review-to-pr, use per issue:
markdown
1🔴/🟠/🟡/🟢 [Critical/High/Medium/Low]: [Brief description]
2
3[Evidence: Explain what code pattern/behavior was observed that indicates this issue and the consequence if left unfixed]
4
5[If applicable, provide code suggestion]:
6```suggestion
7[code here]
#### Example: Bug Issue
```markdown
🟠 High: Potential null pointer dereference
Variable `user` is accessed without null check after fetching from database. This will cause runtime error if user is not found, breaking the user profile feature.
```suggestion
if (!user) {
throw new Error('User not found');
}
#### Example: Security Issue
```markdown
🔴 Critical: SQL Injection vulnerability
User input is directly concatenated into SQL query without sanitization. Attackers can execute arbitrary SQL commands, leading to data breach or deletion.
Use parameterized queries instead:
```suggestion
db.query('SELECT * FROM users WHERE id = ?', [userId])
### Template for inline comments using GitHub API
#### Multiple Issues (using `/reviews` endpoint)
Via `gh api repos/{owner}/{repo}/pulls/{pr_number}/reviews`, each comment in `comments` array uses line-specific template above.
#### Single Issue (using `/comments` endpoint)
Via `gh api repos/{owner}/{repo}/pulls/{pr_number}/comments`, post one line-specific comment using template above.
**Linking to code:**
- Use full git sha + line range, eg. `https://github.com/owner/repo/blob/1d54823877c4de72b2316a64032a54afc404e619/README.md#L13-L17`
- Line range format: `L[start]-L[end]`
- Provide at least 1 line of context before and after
**Evaluation:**
- **Security First**: High or Critical security issue automatically becomes blocker
- **Quantify Everything**: Use numbers, not "some", "many", "few"
- **Skip Trivial Issues** in large PRs (>500 lines): Focus on architectural and security issues
#### No Issues Found
Do not post comments. Report to user no issues found.
## Remember
Goal: catch bugs and security issues, improve code quality while maintaining velocity. Be thorough but pragmatic. Focus on code safety and maintainability, not perfection.