Use this file to discover all available pages before exploring further.
The fastest way to see a real scan — no agent of your own required. The
AgentGuardian Testbench is a hosted Cloud Run service that runs five
demo agents, one clean control and four planted with deliberate
OWASP-LLM-Top-10 vulnerabilities. You point the scanner at the FinBot
banking assistant and watch the swarm peel it open.
The testbench targets are owned and operated by the AgentGuardian
project specifically so the community can red-team them. Never run
AgentGuardian against a system you do not own or have written
authorisation to test. Doing so may violate computer-misuse laws in
your jurisdiction.
You will attack finbot (a fictional banking assistant for “CineFlow
Productions”) in the next step.
Set your LLM API key
The swarm needs an LLM provider to drive the Commander, Attacker, and
Evaluator roles. Gemini 2.5 Flash is the cheapest path — a --mode fast
scan costs roughly $0.01.
export GEMINI_API_KEY=your_key_here
No API key handy? Swap --model gemini:gemini-2.5-flash for
--model stub below. The swarm structure runs end-to-end but the AIVSS
comes back as band=not_evaluated because the stub evaluator is not a
real LLM. Use it to learn the flow, then re-run with a real model for
an authoritative score.
The exact AIVSS, finding count, and per-agent spend vary turn-to-turn
(LLM non-determinism) but the band stays CRITICAL on every fast-mode
run we have benchmarked — FinBot’s planted vulnerabilities are not subtle.
Now point the same scan at clean_control — a control agent built with
no planted vulnerabilities — to verify the scanner is not generating
false positives.
The control answers basic questions about a fictional library catalogue
and refuses every prompt-injection, secret-extraction, and tool-abuse
attempt the swarm throws at it. 0 findings on the control + 14 on
FinBot is the credibility evidence: AgentGuardian found real
vulnerabilities, not phantoms.
You have now run AgentGuardian against both a vulnerable agent and a
clean control. The 73-point AIVSS gap (96 → 23) is the scanner doing
its job.