The hope is the bug
A review instruction sitting in a sibling file does nothing if the turn ends before anyone runs it. And an agent that just produced a wall of confident work is the worst possible judge of whether that work holds up. Packs that ship a reviewer as prose and hope the agent reads it are betting on exactly the wrong reader.
reasonable-agents is a Claude Code plugin that moves the review out of the agent's good intentions and into the harness. Two hooks do it.
Two hooks
*-expert subagent runs
↓
H4 · pairing
paired *-reviewer hasn't run this session
→ inject a blocking directive to run it
↓
reviewer runs
↓
H6 · output validation
output vs per-persona schema: required tables + a verdict keyword
miss → re-invoke directive naming the specific missing element
(capped by a per-session circuit breaker)
↓
review on the record (both hooks fail open) H4 — pairing
After an *-expert subagent runs, if its paired *-reviewer
exists and hasn't run this session, H4 injects a directive to run it before the turn
proceeds. The dedup matches the structured subagent_type field, not a
name mentioned in prose — so a reviewer merely talked about doesn't count as run.
H6 — output validation
After a reviewer runs, H6 checks its output against a per-persona schema: the required tables plus a verdict keyword. Miss one and it injects a re-invoke directive that names the specific thing missing, capped by a per-session circuit breaker so the loop can't run away.
Both hooks fail open. A missing schema, empty output, a parse error, a missing dependency: exit cleanly, stay out of the way. Enforcement defaults on — a switch that fails to propagate can't silently turn it off.
The skeptic caught its own bugs, three times
Not a thought experiment. The review earned its keep on the plugin's own construction — three times, on bugs the code itself never revealed:
- Bare-string dedup. The first dedup matched a reviewer's name as a quoted
string anywhere in the transcript, so a reviewer merely mentioned in prose
counted as "already run" and silently suppressed the directive. Fixed to match the
structured
subagent_typefield. - Kill-switch cascade. An early kill-switch used a positional default that could read as "off" on an empty value — enforcement disabled by a propagation glitch. Fixed to normalize and explicit-set-match, default-enforce.
- A vacuous schema. Keying a reviewer's compatibility table on the word "Breaking" would have passed vacuously, because that word shows up in a neighboring table's cells. The schema keys on "Change" now, with a test proving the empty case still blocks.
None got caught by writing the code. All three got caught by running the skeptic against the plan. That is the product, run on itself.
Honest by design
The one line the whole project is built on:
Does the hook force the review?
No — and the package says so. Compliance with an injected directive is model-mediated, not mechanical. The hook makes the review fire; it can't make the model obey. So reasonable-agents publishes a measured compliance rate instead of calling the review 'forced' — the live pairing demo ran 10 times, the expert engaged in 9, and the model ran the reviewer in all 9.
That honesty is the point, not a disclaimer bolted on. On a tool whose whole pitch is trustworthy review, saying exactly what it can and can't guarantee is the credibility.
The starter pack
An empty harness can't be fairly evaluated, so the plugin ships six born-clean expert + reviewer + validator triples — authored fresh from public domain knowledge, not scrubbed from anyone's config:
- software-architecture-review
- agent-and-prompt-design
- api-contract-design
- security-threat-modeling
- test-strategy-design
- boss-fight-design
Each was dogfooded through the plugin's own chain before shipping: a clean headless session runs the expert, H4 fires, the reviewer runs, and H6 validates its natural output against its own schema. It passed first try, every time. What that proves is process — briefing-grounded, adversarially paired, schema-checked — not that the expert is correct.
Install
Requires Claude Code 2.1.109 or newer.
/plugin marketplace add ReasonEquals/reasonable-agents
/plugin install reasonable-agents@reasonable-agents
/plugin list
Turn it off any time with the REASONABLE_AGENTS_DISABLE environment
variable. You can also add it from a local clone — the full quickstart, fixture
demo, and per-hook kill-switches are in the
README.
Status & scope
Community marketplace: submitted
This release is the enforcement chain (v0.1), the starter pack, and the receipts. For now, install from the repo above; the community-marketplace listing is pending.
You can author your own expert + reviewer + validator triad today — that's what the enforcement chain is for, and it's how all six starter triples were built. What's a later phase is automated minting (generating a triad from a researched briefing on demand), plus the eval scaffolds — both gated on whether this release earns real use.
What "validated" means, and what it doesn't. Validated here is
process-validated — briefing-grounded, adversarially paired, schema-checked. It is
not a domain-accuracy benchmark and not a hard gate. Reviewer lookup is user-scope
only (~/.claude/agents/), dedup is best-effort across sessions, and the
hooks fail open by design — they reduce missed reviews, they don't guarantee them.
All of it is written down in the repo's
CLAIMS.md
and
SECURITY.md.