agent-guardian suite runs many independent scans in parallel from a single
YAML file, then aggregates one cross-scan summary and collects a report per
workload. It’s built for the jobs a single scan doesn’t cover: a nightly sweep
across every agent you ship, a benchmark of several models against the same
target, or a regression suite that scores a fleet on every release.
It is an orchestration layer only. Each workload is launched as a separate
agent-guardian scan subprocess, so a suite run of one workload is identical to
running that one scan command — the suite never changes how a scan behaves.
Every workload key maps 1:1 onto a real
scan flag (see the
CLI reference). The suite adds orchestration — parallelism,
isolation, aggregation — on top of the unchanged scan engine.Quickstart
Write a suite file
Start from the committed reference,
examples/suite.yaml, and edit the
workloads. Each workload needs a name and exactly one target
(endpoint / target / system_prompt / framework / contract).Example
suite.yaml
examples/suite.yaml in the repository.
Isolation — one workload never affects another
Each workload runs in its own OS process under its ownHOME, so its
entire ~/.agentguardian tree — the scan directory and the adversarial
winning-seeds database — is private to that scan. Concurrent workloads can never
collide on disk or cross-pollinate seeds, and the scan engine is never imported
into the runner. The practical guarantee:
A workload in a suite produces the same result as running that one
agent-guardian scan command by hand.
concurrency bounds how many scans run at once — it does not touch each
scan’s own internal agent parallelism. Children always run headless (no
dashboard server, no TTY UI, no plan prompt).
What you get
After every workload finishes,suite run prints a summary table and writes
machine-readable output:
| Output | Where | What it is |
|---|---|---|
| Summary table | stdout | One row per workload + totals + scan-log-folder list. |
summary.json | <out_dir>/summary.json | The same rows, machine-readable for CI. |
| Reports | <out_dir>/reports/<name>.<ext> | Each workload’s report in every requested format. |
| Console logs | <out_dir>/<name>.console.log | Each child’s stdout/stderr. |
The summary is trust-aware. A scan the engine marks non-authoritative
(stub model, high attacker-refusal rate, or coverage below the mode’s
threshold) is flagged
CAVEAT so it is never silently ranked next to a clean
full-coverage scan. The numeric AIVSS is still shown for transparency.Multiple formats, one scan
A workload’sformats can list several report types (json, sarif, pdf,
md, junit, gitlab). The scan runs once and writes report.json
natively; each extra format is rendered from that same finished scan via
agent-guardian report <id> --output <fmt> — never a second scan.
Dashboard
Withregister_scans: true (the default), each finished scan is moved into your
real ~/.agentguardian/scans/<id>/. Because every scan has a unique id, they
never collide — so agent-guardian serve and agent-guardian report <id> find
each workload by its own scan id. Set serve: true to open one dashboard
automatically after the whole fleet finishes.
Gates & exit codes
Per-workload gates (fail_under, max_critical, max_high, …) work exactly as
they do for a single scan — a breach makes that child exit non-zero. The suite
rolls those up into its own exit code via suite.exit_code:
exit_code | Suite exits non-zero when… |
|---|---|
any-gate-fail (default) | any workload failed or tripped a gate. |
all-pass | any workload is not a clean, authoritative pass. |
always-zero | never (reporting-only runs). |
timeout_seconds) becomes an error row and
never aborts its siblings (fail_fast: false, the default).
Commands
--dry-run prints the exact resolved agent-guardian scan … command for every
workload and spawns nothing — the fastest way to confirm a file before a real
run. suite summary OUT_DIR re-prints the table from a prior run’s
summary.json.
Keeping endpoints private
Commit a genericexamples/suite.yaml with placeholder endpoints, and keep the
copy wired to your real target as *.local.yaml — the repository’s .gitignore
excludes examples/*.local.yaml and suite-out/ so neither a live endpoint nor
scan output is ever pushed.