A balanced fixture
The checked-in governance suite contains 1,200 unique actions: 400 expected allow, 400 expected warn, and 400 expected block.
termyte bench
termyte bench --json
The current checked-in result is 1,200 correct decisions, zero false-safe results, and zero overblocks. Per-decision precision and recall are reported alongside a confusion matrix and category coverage.
What it validates
The suite measures the stable, non-executing policy/check path against labeled fixtures. It covers read-only actions, tests, publishing, destructive Git operations, secret access, destructive SQL, and broad filesystem deletion.
What it cannot prove
It does not prove complete command coverage, sandbox isolation, guaranteed interception, or governance of commands that bypass Termyte. Re-run the benchmark against the installed version instead of treating a checked-in result as a permanent guarantee.