The adversarial swarm is the engine. It takes theDocumentation Index
Fetch the complete documentation index at: https://docs.agentguardian.io/llms.txt
Use this file to discover all available pages before exploring further.
TargetFingerprint
produced by recon, picks the specialists that apply, runs them
concurrently, and writes their findings into a single
SharedMemory. This page describes the swarm shape — for the four-phase
flow it sits inside, see How AgentGuardian works.
Three roles, never collapsed
| Role | What it does | Source-of-truth |
|---|---|---|
| Commander | Reads --target-goal, emits a SwarmBrief (per-agent sub-goals, hypotheses, priorities). Skipped when no goal is supplied. | src/agent_guardian/core/swarm.py |
| Attacker | Each specialist agent owns one OWASP ASI category and synthesises category-specific attack prompts. | src/agent_guardian/agents/base.py::AsiAgent |
| Evaluator | A separate LLM-as-judge labels each (prompt, response) pair against a category-specific rubric. | src/agent_guardian/agents/base.py::Judge |
--commander-model, --attacker-model, --evaluator-model). They are
always logically separate — the same model instance is never asked to
both attack and grade its own attack.
The fourteen specialists
Ten run by default — one per OWASP Agentic Security Initiative (ASI) category. Four additional OWASP-LLM specialists run when you pass--include-m2-agents.
| ASI | Specialist | Source file |
|---|---|---|
| ASI01 | Goal Hijack | agents/goal_hijack.py |
| ASI02 | Tool Abuse | agents/tool_abuse.py |
| ASI03 | Privilege Escalation | agents/privilege.py |
| ASI04 | Drift | agents/drift.py |
| ASI05 | Cascade Failure | agents/cascade.py |
| ASI06 | Memory Poisoning | agents/memory_poison.py |
| ASI07 | Identity Leak | agents/identity_leak.py |
| ASI08 | Code Execution | agents/code_exec.py |
| ASI09 | Supply Chain | agents/supply_chain.py |
| ASI10 | Trust Exploit | agents/trust_exploit.py |
| OWASP-LLM | Fuzzing | agents/fuzzing_agent.py |
| OWASP-LLM | Detection Evasion | agents/detection_evasion_agent.py |
| OWASP-LLM | Secret Extraction | agents/secret_extraction_agent.py |
| OWASP-LLM | Denial-of-Wallet | agents/denial_of_wallet_agent.py |
agent-guardian list-agents prints the same table from the CLI.
Applicability filter
Not every specialist runs against every target. After recon, each agent is askedAsiAgent.is_applicable(fingerprint). The filter is conservative —
a specialist that cannot possibly land an attack is skipped so its budget
slice goes to specialists that can:
- A tool-less target skips Tool Abuse (ASI02) and Code Execution (ASI08).
- A memory-less target skips Memory Poisoning (ASI06).
- A single-agent target skips Cascade Failure (ASI05) and Trust Exploit (ASI10).
- A target with no surfaced PII surface deprioritises Identity Leak (ASI07) but does not skip it.
Parallelism
Phase 3 is the only place concurrency happens. Up tomax_parallel_agents (default 10) specialists run under one
asyncio.TaskGroup. Each specialist has its own attack loop —
generate prompt → call target via adapter → evaluator judges → write
finding on verdict="fail" — and terminates when it hits any of:
target_findingsreached for that ASI category,- per-agent turn cap exhausted,
- shared swarm budget exhausted (tokens, wall-clock, or USD),
- target refused N consecutive turns,
- the global scan window closed.
SharedMemory (the TargetFingerprint, the
SwarmBrief, and the running Finding list) but they do not share
strategy state. A bug in one specialist cannot corrupt another’s findings.
Strategies, not just probes
Each specialist composes one or moreStrategy instances from
src/agent_guardian/strategies/. A strategy is the attack pattern (e.g.
single-turn probe, multi-turn jailbreak, indirect injection via tool
output, adversarial role-play). The probe corpus under
src/agent_guardian/probes/asi01..asi10/ provides the seed prompts the
strategy mutates and re-issues.
This decoupling is why two specialists in the same ASI category can use
different attack styles, and why a researcher can add a new strategy
without touching the agent that uses it.
Early-stop checkpoint
A concurrent checkpoint task samples provisional AIVSS every 30 seconds. In--mode smart and --mode fast it can vote EARLY_STOP when the
score has stabilised. In --mode full (the default) the checkpoint
records but does not vote — every probe runs against every applicable
agent so CI gets reproducible coverage.
Where to go next
- Target adapters — the adapter contract and the fingerprint produced by recon.
- Evaluators — the LLM-as-judge rubric and the heuristic + RoE-blocklist layers around it.
- How AgentGuardian works — the end-to-end flow, including the recon, decompose, attack, finalise phases.
- Research foundation — MAD-MAX, TAP, Co-RedTeam, and the swarm-design citations.