Documentation Index
Fetch the complete documentation index at: https://docs.agentguardian.io/llms.txt
Use this file to discover all available pages before exploring further.
AgentGuardian is an open-source, Apache-2.0, local-first red-teaming toolkit for AI agents. Point it at a LangGraph, CrewAI, MCP server, RAG app, or REST-API agent — it deploys a swarm of fourteen specialist attackers, produces a deterministic 0–100 AIVSS score mapped to OWASP ASI 2026, MITRE ATLAS v5.4, and the CSA Agentic AI Red Teaming Guide, and emits SARIF / JSON / JUnit / Markdown / PDF reports your CI can gate on.
pip install agent-guardian to try it. The 90-second demo is on YouTube.
Why we built this
Most LLM-security tools were designed in the chatbot era. They send one prompt, read one reply, and grade the reply with a string match or a second LLM. That model worked when the system under test was a single completion call. Production agents are not single completion calls. They:- compose tools (
search_kb,send_email,query_db,exec_python), - hold memory across sessions,
- talk to other agents over A2A or MCP channels,
- and run real code in real environments.
What ships in v1.1
The v1.1 line is the first “production” line. The headline pieces:- The swarm. Fourteen attacker specialists (ten OWASP ASI 2026 categories plus four OWASP LLM Top 10 categories — fuzzing, secret extraction, denial of wallet, detection evasion) run concurrently against your agent, coordinated by a Swarm Commander LLM. The Commander reads the recon fingerprint, decomposes the attack surface, and re-tasks idle agents until either the budget is exhausted or the AIVSS variance has stabilised.
- The AIVSS scorer. Every finding contributes to a deterministic 0–100 score. Higher is safer. The formula is open — see reports/aivss-score. The same agent + the same probes + the same model produce the same score; LLM stochasticity is bounded by the swarm structure, not absorbed into the score.
- PoV-as-oracle. Every finding ships with a proof-of-vulnerability script. Before the score is computed the runner replays each PoV N times against the live target and drops findings that don’t reproduce. Score inflation from one-off LLM hallucinations is gated out.
- Three scan modes.
--mode fastfor CI gates (~45 seconds, ~0.03),--mode fullfor the authoritative pre-release scan (~5 minutes, ~$0.06). Numbers measured against Gemini 2.5 Flash on a tools-only target. - Standards-anchored taxonomy. Every finding is triple-tagged with OWASP ASI 2026, MITRE ATLAS v5.4, and CSA Agentic-RT categories. SARIF output drops into GitHub’s Security tab. JUnit drops into any CI dashboard. PDF goes to your auditor.
- Local-first, no telemetry. Reports stay on your disk. The package does not phone home. The Enterprise tier is a separate product on top of the same engine — these docs cover only the OSS layer.
The five-minute path from pip install to a real report
fast scan finishes in about ninety seconds, costs about a cent of Gemini Flash, and lands an AIVSS in the CRITICAL band against the FinBot demo agent — the testbench’s planted banking-assistant target. The full walkthrough lives at Try the demo agent.
If you want to scan your own agent, the three most common shapes are one-liners:
How the swarm differs from a single-prompt tester
A single-prompt tester runsattack -> reply -> grade. The bug it solves is “did this single response cross a line?”
AgentGuardian’s swarm runs recon -> decompose -> parallel attack -> evaluate -> finalise. The bug it solves is “across a realistic adversarial session, did the agent’s tools, memory, or sub-agents do something the system prompt does not permit?”
The difference is not philosophical. Concretely:
- Recon first. A
RecognitionAgentfingerprints the target — declared tools, declared memory keys, response shape, refusal posture. Every downstream attacker conditions its payloads on the fingerprint instead of throwing a generic ASI corpus at the wall. A tool-exfil payload aimed at an MCP server that exposessearch_kblooks nothing like one aimed at a REST agent withtransfer_funds; the swarm produces both correctly. - Parallel specialists. Each OWASP ASI category has its own specialist agent (
asi01..asi10), driven by its own system prompt, its own probe slice fromsrc/agent_guardian/probes/, and its own LLM-as-judge rubric. The four OWASP LLM specialists (LLM05 fuzzing, LLM07 secret extraction, LLM10 denial of wallet, detection-evasion coverage) opt in via--owasp-llm. - Shared adversarial memory. A finding by the goal-hijack agent updates a shared
VectorMemorythat the tool-abuse agent reads on its next turn. Multi-hop attacks (inject -> persist -> chain into a tool call) are first-class, not retrofitted. - PoV-as-oracle gate. Every candidate finding is re-played N times against the live target before it goes into the report. Unreproducible findings are dropped before AIVSS scoring, so a flaky win can’t inflate the score.
- Budget-aware scheduling. A USD-denominated budget ledger reserves and commits spend per agent. The Commander early-stops at 90% of the cap. Scans are bounded by what you said you’d spend, not by how aggressive the attackers feel.
Standards alignment
Every finding carries three classifiers:- OWASP ASI 2026 — the ten Agentic Security Initiative categories. Each specialist owns exactly one.
- MITRE ATLAS v5.4 — the technique enum that SARIF emitters expose so GitHub Code Scanning and SOC tooling can pivot.
- CSA Agentic AI Red Teaming Guide / CSA AI Controls Matrix — the governance-facing taxonomy GRC tooling already speaks.
What ships next
The roadmap is canonical inROADMAP.md. The themes for the v1.1.x stream:
- Probe corpus growth. ASI04 supply-chain probes for the MCP-registry ecosystem, ASI10 long-horizon drift probes that span multiple scan windows, ASI09 trust-exploitation probes for new output channels.
- Adapter coverage. Stable adapters for ADK, AutoGen, Strands, and a documented MCP-server target shape.
- Report ergonomics. PDF cover page redesign, SARIF rule-help URLs, an HTML report with collapsible evidence trees.
- CI ergonomics. A first-class GitHub Action (
glacien-technologies/agent-guardian@v1) and a Docker image on GHCR.
ROADMAP.md is a public commitment. We ship from PRs, not from blog posts.
How to help
The fastest contributions are:- A new probe under
src/agent_guardian/probes/asi0X/. The probe schema is at reference/probe-schema. - A new adapter under
src/agent_guardian/targets/. The contract is small — adapt your agent to aTargetTransportand the swarm picks it up. - A reproduction against a deliberately-vulnerable agent you wrote. PR it as a walkthrough into
docs/blog/; we’ll review it for the next release notes.
git commit -s). See CONTRIBUTING.md for the full rules.
Pointers
- Quickstart — /quickstart. Five minutes from install to first AIVSS.
- How AgentGuardian works — /concepts/how-agentguardian-works. The four-step mental model.
- Comparison — /concepts/agent-guardian-vs. What we chose and why.
- Attack library — /attacks/overview. All ninety-six probes across the ten OWASP ASI categories.
- GitHub — glacien-technologies/agent-guardian. Star it if it’s useful; file an issue if it isn’t.