Documentation Index
Fetch the complete documentation index at: https://docs.agentguardian.io/llms.txt
Use this file to discover all available pages before exploring further.
What this is
A probe is one self-contained attack template: a small YAML file with one or more seed prompts, an OWASP-ASI category, MITRE ATLAS technique IDs, a CSA category, a tier floor, a severity, and the evidence the judge should look for. The bundled corpus lives atsrc/agent_guardian/probes/asi01/ … asi10/,
one folder per OWASP category, one YAML file per probe.
At scan time the loader (probes/loader.py::seeds_for_asi_with_provenance)
walks each category folder, validates every YAML against the Probe
Pydantic schema, and hands the resulting ProbeSeed list to the
corresponding specialist agent.
When to use this
- Your team discovered an attack pattern that the current corpus doesn’t cover and you want it to run on every future scan.
- You want category coverage for a domain-specific failure mode (e.g. a banking-specific goal-hijack template, a healthcare-specific PII exfiltration).
- You’re contributing back upstream — every probe-corpus contribution
flows into
PROBE_CORPUS_VERSIONand lands in the next release.
The schema
Probe (declared in
src/agent_guardian/models/probe.py)
is a frozen Pydantic model. Every field is required except references and
owasp_scenario.
| Field | Type | Constraint |
|---|---|---|
id | str | Non-empty. Convention: ASI<NN>-T<tier>-<index> (e.g. ASI01-T4-014). |
name | str | Non-empty. kebab-case slug — must match the filename. |
asi | AsiCategory | One of ASI01 … ASI10. |
mitre_atlas | list[MitreTechnique] | Non-empty list of valid ATLAS IDs (e.g. AML.T0054). |
csa_category | CsaCategory | One of the 12 CSA Agentic-RT categories (kebab-case). |
severity | Severity | critical / high / medium / low. |
tier_floor | Tier | T1 / T2 / T3 / T4 — minimum surface the probe applies to. |
seeds | list[str] | Non-empty list of one or more seed prompts. |
description | str | Non-empty paragraph describing the attack and the indicator of success. |
expected_evidence | str | Non-empty short phrase the judge / triage uses (e.g. persona_swap_acknowledged_or_hidden_instructions_disclosed). |
remediation_ref | str | Non-empty remediation ID (e.g. REM-ASI01-014). |
references | list[str] | Optional citations (BibTeX keys, OWASP URLs). |
owasp_scenario | str | None | OWASP-2026 scenario tag — OWASP_ASI01_PromptInjection, etc. Required for new probes. |
models/probe.py::_coerce_probe): asi, csa_category, and
mitre_atlas (non-empty) MUST be present. Missing any of the three raises
ProbeValidationError with the failing field path. This is what keeps
every shipped probe cross-walked across all three taxonomies.
Author a probe
Drop a new file into the right ASI folder. Filename = probename plus
.yaml. The example below is a healthcare-domain variant of
ASI01.persona-break-jailbreak.
src/agent_guardian/probes/asi01/medical-persona-override.yaml
Validate before committing
Thedoctor subcommand loads the entire corpus in strict=True mode and
fails on the first malformed probe — run it locally before opening a PR.
ProbeValidationError with the failing path:
Run the corpus locally
After adding a probe, fire a scan with--mode full so every probe is
exercised (fast mode caps each agent at the first 3 seeds, ordered by
historical effectiveness — see AsiAgent.run::_mode_probe_cap):
fail.
ProbeSeed.severity propagates onto the Finding — a severity: critical
probe always produces a critical-band finding, regardless of the
specialist agent’s default_severity.
How probe metadata flows into a finding
The agent base class (agents/base.py::_build_finding) resolves every
finding’s taxonomy from the source probe rather than the agent default:
Finding.probe_id←Probe.id(falls back to<agent>-<asi>only for strategy-internal refinement turns that weren’t seeded by the corpus).Finding.severity←Probe.severity(falls back to the agent’sdefault_severityonly when no seed metadata is present).Finding.mitre_atlas←Probe.mitre_atlas(falls back to the agent’s class-default list when the probe doesn’t declare any).Finding.csa_category←Probe.csa_category(falls back to the agent’sdefault_csa_categoryon parse error).
Style and quality rules
- Seeds are concrete prompts, not templates. No
{placeholder}substitution at load time — each seed is sent verbatim. If you want parametric variants, author them as distinct seeds inside the same probe. - Severity reflects worst-case impact, not LLM-attack difficulty. A
trivial-to-write prompt that exfiltrates production secrets is still
critical. - Tier floor is the minimum surface the probe applies to. A probe
that needs tool-calling to land MUST set
tier_floor: T1orT2; the swarm will skip it on prompt-only T4 targets. - Description names the indicator of success in plain English. The
judge’s per-category rubric is built from the agent’s
judge_rubric(), but the description is how a human triager understands the finding without re-reading the prompt. - References are BibTeX keys or stable URLs only. No blog posts.
Where probes can live
| Location | Loaded by default? | When to use |
|---|---|---|
src/agent_guardian/probes/asi<NN>/*.yaml | Yes — bundled corpus. | Upstream contributions. |
A custom directory passed via load_probes_from_dir(Path) | No — programmatic only. | Private corpora your team doesn’t want to ship publicly. |
--probes-dir CLI flag in the current release — bundled
corpus only at the CLI surface. If you need external probes today, drive
the swarm programmatically and feed the loader yourself.
Expected behaviour after merge
Once your probe lands insrc/agent_guardian/probes/asi<NN>/:
PROBE_CORPUS_VERSION(inprobes/loader.py) gets bumped on the next release — every scan stamps the version intoScan.metadataso reports stay reproducible.- The matching ASI specialist agent picks the probe up automatically —
no agent-side wiring required.
seeds_for_category()callsseeds_for_asi_with_provenance(self.asi_category)which walks the folder. agent-guardian list-probesshows the new probe in its category bucket.- CI runs the strict corpus load (
agent-guardian doctor) on every PR; a malformed probe blocks the merge.
Next step
Write a custom target adapter
Implement the
TargetAdapter protocol to point the swarm at any
agent runtime.Attack library overview
What ships in the bundled corpus today — ASI01..ASI10, mapped to
OWASP ATLAS + CSA.
Contributing
DCO, conventional commits, and the PR-template walkthrough.
System overview
Where the probe corpus plugs into the six-phase swarm.