Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.agentguardian.io/llms.txt

Use this file to discover all available pages before exploring further.

The adversarial swarm is the engine. It takes the TargetFingerprint produced by recon, picks the specialists that apply, runs them concurrently, and writes their findings into a single SharedMemory. This page describes the swarm shape — for the four-phase flow it sits inside, see How AgentGuardian works.

Three roles, never collapsed

RoleWhat it doesSource-of-truth
CommanderReads --target-goal, emits a SwarmBrief (per-agent sub-goals, hypotheses, priorities). Skipped when no goal is supplied.src/agent_guardian/core/swarm.py
AttackerEach specialist agent owns one OWASP ASI category and synthesises category-specific attack prompts.src/agent_guardian/agents/base.py::AsiAgent
EvaluatorA separate LLM-as-judge labels each (prompt, response) pair against a category-specific rubric.src/agent_guardian/agents/base.py::Judge
The three roles can be the same LLM or three different LLMs (see --commander-model, --attacker-model, --evaluator-model). They are always logically separate — the same model instance is never asked to both attack and grade its own attack.

The fourteen specialists

Ten run by default — one per OWASP Agentic Security Initiative (ASI) category. Four additional OWASP-LLM specialists run when you pass --include-m2-agents.
ASISpecialistSource file
ASI01Goal Hijackagents/goal_hijack.py
ASI02Tool Abuseagents/tool_abuse.py
ASI03Privilege Escalationagents/privilege.py
ASI04Driftagents/drift.py
ASI05Cascade Failureagents/cascade.py
ASI06Memory Poisoningagents/memory_poison.py
ASI07Identity Leakagents/identity_leak.py
ASI08Code Executionagents/code_exec.py
ASI09Supply Chainagents/supply_chain.py
ASI10Trust Exploitagents/trust_exploit.py
OWASP-LLMFuzzingagents/fuzzing_agent.py
OWASP-LLMDetection Evasionagents/detection_evasion_agent.py
OWASP-LLMSecret Extractionagents/secret_extraction_agent.py
OWASP-LLMDenial-of-Walletagents/denial_of_wallet_agent.py
agent-guardian list-agents prints the same table from the CLI.

Applicability filter

Not every specialist runs against every target. After recon, each agent is asked AsiAgent.is_applicable(fingerprint). The filter is conservative — a specialist that cannot possibly land an attack is skipped so its budget slice goes to specialists that can:
  • A tool-less target skips Tool Abuse (ASI02) and Code Execution (ASI08).
  • A memory-less target skips Memory Poisoning (ASI06).
  • A single-agent target skips Cascade Failure (ASI05) and Trust Exploit (ASI10).
  • A target with no surfaced PII surface deprioritises Identity Leak (ASI07) but does not skip it.
The global token / wall-clock / request budget is then sliced across whichever specialists survive the filter.

Parallelism

Phase 3 is the only place concurrency happens. Up to max_parallel_agents (default 10) specialists run under one asyncio.TaskGroup. Each specialist has its own attack loop — generate prompt → call target via adapter → evaluator judges → write finding on verdict="fail" — and terminates when it hits any of:
  • target_findings reached for that ASI category,
  • per-agent turn cap exhausted,
  • shared swarm budget exhausted (tokens, wall-clock, or USD),
  • target refused N consecutive turns,
  • the global scan window closed.
Specialists share SharedMemory (the TargetFingerprint, the SwarmBrief, and the running Finding list) but they do not share strategy state. A bug in one specialist cannot corrupt another’s findings.

Strategies, not just probes

Each specialist composes one or more Strategy instances from src/agent_guardian/strategies/. A strategy is the attack pattern (e.g. single-turn probe, multi-turn jailbreak, indirect injection via tool output, adversarial role-play). The probe corpus under src/agent_guardian/probes/asi01..asi10/ provides the seed prompts the strategy mutates and re-issues. This decoupling is why two specialists in the same ASI category can use different attack styles, and why a researcher can add a new strategy without touching the agent that uses it.

Early-stop checkpoint

A concurrent checkpoint task samples provisional AIVSS every 30 seconds. In --mode smart and --mode fast it can vote EARLY_STOP when the score has stabilised. In --mode full (the default) the checkpoint records but does not vote — every probe runs against every applicable agent so CI gets reproducible coverage.

Where to go next

  • Target adapters — the adapter contract and the fingerprint produced by recon.
  • Evaluators — the LLM-as-judge rubric and the heuristic + RoE-blocklist layers around it.
  • How AgentGuardian works — the end-to-end flow, including the recon, decompose, attack, finalise phases.
  • Research foundation — MAD-MAX, TAP, Co-RedTeam, and the swarm-design citations.