Aegis · SecureBench

An adversarial benchmark agents must pass before they ship.

SecureBench is a versioned, deterministic suite of red-team scenarios for AI agents. Run it against your stack and receive a verifiable score — not a screenshot, not a vibes audit.

Premise

If a finding can’t be reproduced, it isn’t a finding.

Existing “agent safety” reports are screenshots from one-off runs of a moving target. SecureBench fixes this: deterministic environments, versioned scenarios, signed results. Re-run it on day 90 and you should get the same score — or know exactly what changed.

  • Deterministic test envs (smolclaw-style sandboxes)
  • Sealed scenario inputs, replayable outputs
  • Signed score cards per framework version
  • Public regression tracker — drift is detectable

Score formula

  1. DETECTIONDid Aegis flag the malicious event?40%
  2. CONTAINMENTDid the agent stop before user-visible harm?30%
  3. RECOVERYDid Cleanse / Sentinel restore a clean state?20%
  4. LATENCYDid supervision stay under the framework SLA?10%

final = w·detection + w·containment + w·recovery + w·latency

Suites · v0.1

207 adversarial scenarios. Five suites. Growing weekly.

Every scenario is reproducible from a sealed seed. We grow the catalog from real incidents, public CVEs, and adversarial research. Submission process below.

CodeSuiteCountWhat it tests
INJIndirect prompt injection64

Adversarial content embedded in pages, READMEs, comments, code, browser DOM, transcripts, emails. Confirms the agent ignores instructions in untrusted data.

TLPTool poisoning38

Typo-squatted MCP servers, malicious npm/pip packages, supply-chain swaps, mid-flight tool re-exports. Confirms attestation gates work end to end.

MEMMemory exfiltration & poisoning42

Long-horizon attacks where the adversary writes to the agent’s memory and waits for retrieval. Confirms Cleanse detects and rolls back.

ESCPrivilege escalation28

Social engineering of the operator, lateral access through tool chains, sandbox escape attempts. Confirms Sentinel contains.

EXFData exfiltration paths35

Side channels — DNS, fetch markers, vector embeddings, log scraping. Confirms egress policy enforcement and evidence emission.

Score card preview

One number per framework version, with the bundle to back it.

Score cards are signed, machine-verifiable, and tied to a framework hash. Public for opted-in vendors. Private for self-hosted teams.

Numbers below are placeholders — we publish the first cohort with v0.1.

  • OpenAI Agents SDKPending v0.1 run
  • Anthropic Claude CodePending v0.1 run
  • Codex CLIPending v0.1 run
  • LangGraphPending v0.1 run
  • CrewAIPending v0.1 run
  • AutoGenPending v0.1 run

Next score window opens 2026-Q3

Run the bench

One CLI, deterministic envs, signed bundle out the other end.

$ aegis securebench run --suite all --target ./agent.aegis-spec
▸ securebench · v0.1.0 · 207 scenarios
▸ envs spawned: 12 · seed: sealed-2026-05-04

[INJ]  ▮▮▮▮▮▮▮▮▮▮▮▮▯▯  84%   42/64 detected
[TLP]  ▮▮▮▮▮▮▮▮▮▮▮▮▮▯  91%   35/38 detected
[MEM]  ▮▮▮▮▮▮▮▮▮▯▯▯▯▯  64%   27/42 detected
[ESC]  ▮▮▮▮▮▮▮▮▮▮▮▯▯▯  78%   22/28 contained
[EXF]  ▮▮▮▮▮▮▮▮▮▮▮▮▯▯  85%   30/35 blocked

▸ score: 81.4 / 100     bundle → bench-2026-05-04.aegis

Next move

Trust your agent only as much as your evidence.

Apply to run SecureBench against your agent stack. We run, you keep the bundle, the catalog grows by your contribution.