Skip to main content

Security & Evidence

What gets captured, what gets redacted, and what Replay's evidence does and does not prove in the replay() pipeline.


Capture redaction

Every capture passes through SecurityGate redaction before being stored or transmitted. The following patterns are automatically scrubbed:

PatternExampleRedacted to
OpenAI API keyssk-proj-abc123...[REDACTED:openai_key]
Anthropic API keyssk-ant-abc123...[REDACTED:anthropic_key]
Vesanor API keysvsn_abc123...[REDACTED:vesanor_key]
Bearer tokensBearer eyJ...[REDACTED:bearer]
Email addresses[email protected][REDACTED:email]
PEM keys-----BEGIN PRIVATE KEY-----[REDACTED:pem_key]
Connection stringspostgresql://user:pass@host[REDACTED:connection_string]
API key headersx-api-key: abc123[REDACTED:api_key_header]

Redaction happens before storage — secrets never reach the capture buffer, the network, or the server.

SDK-side redaction

The SDK has its own redaction implementation that runs in-process before captures are buffered. This matches the server-side SecurityGate pattern set. Both use the same shared pattern manifest to ensure consistency.

What about tool arguments?

Tool call arguments are included in captures but redacted. If your tool arguments contain secrets (API keys, passwords, tokens), SecurityGate catches them.

Best practice: Don't pass secrets as tool arguments. Use environment variables or secure credential stores in your tool executors instead.


Principal redaction

If you supply a principal identity to replay(), it's used internally for policy evaluation but never exposed externally:

const session = replay(client, {
principal: {
user_id: "agent-001",
department: "finance",
secret_token: "sk-proj-REAL_SECRET", // Will be redacted
},
});

// getState() redacts the principal
const state = session.getState();
console.log(state.principal); // null — always null in public snapshots

Why null? getState() returns a redacted snapshot safe to log, serialize, or display. The principal may contain sensitive identity data. Internally, policy evaluation uses the original value.


Evidence classes

Each tool contract declares its evidence requirements — how much proof is needed before the enforcement pipeline considers a step authoritative.

evidence_class

Describes the kind of evidence available:

ClassMeaningExample
local_transactionTool executes locally with observable resultsDatabase query, file read
ack_onlyExecution is acknowledged but not independently verifiedAPI call to external service
unverifiableNo way to verify execution happenedFire-and-forget webhook

commit_requirement

When authoritative session state advances:

RequirementWhen state advances
acknowledgedAfter an execution receipt is recorded in the governed pipeline
noneTool is recorded but doesn't advance authoritative state

The distinction matters in Govern mode. A tool with commit_requirement: acknowledged only advances session state after the server records a governed execution receipt. A tool with commit_requirement: none is tracked for audit purposes but doesn't affect the authoritative session record.


What Replay evidence proves

On the governed path, Replay can durably show that:

  • A wrapped request was evaluated by Replay's policy engine
  • The governed session was in a specific state version at decision time
  • Replay allowed, blocked, or paused the call for a specific reason
  • A wrapped tool path produced a governed execution receipt

This is useful for workflow review, debugging, approvals, and reconstructing what Replay believed happened.

What Replay evidence does not prove

Replay evidence is not the same thing as an independent external audit log. By itself it does not prove that:

  • The application never bypassed the wrapper
  • The external system accepted or completed the operation
  • The final state of the external system matches Replay's session state
  • Replay replaced IAM, sandboxing, or API-level business-rule enforcement

What gets captured

Every call through session.client produces a capture record:

{
// Standard capture fields (same as observe())
provider: "openai",
model: "gpt-4o-mini",
request: { /* redacted request */ },
response: { /* redacted response */ },

// Replay-specific fields
replay: {
session_id: "sess_abc123",
step_index: 2,
mode: "enforce",
decision: { action: "allow", tool_calls: [...] },

// Governance context
contract_hashes: ["sha256:abc..."],
state_version: 3,
commit_tier: "strong", // "strong" | "compat"
phase: "customer_identified",
phase_transition: "eligibility_checked",

// What was prevented and why
counterfactual: {
tools_removed: [
{ tool: "issue_refund", reason: "wrong_phase" },
{ tool: "delete_all", reason: "no_contract" }
],
calls_blocked: []
},

// Performance
guard_overhead_ms: 0.4,

// Narrowing details
narrowing: {
allowed: [{ name: "check_eligibility" }],
removed: [{ tool: "issue_refund", reason: "wrong_phase" }]
},

// Shadow mode only
shadow_delta: null // ShadowDelta | undefined
}
}

Counterfactual capture

The counterfactual field records what was prevented and why — not just what happened. This is unique to Vesanor:

  • tools_removed — tools the LLM never saw (narrowing)
  • calls_blocked — tool calls the LLM proposed that were blocked (gating)

Workflow value: Counterfactual capture shows what Replay removed or blocked and why. It is useful for review, debugging, and explaining governed workflow decisions, but it is not by itself proof of final real-world side effects.


Bypass detection

In Govern mode, the SDK detects direct calls to the original client:

// This triggers bypass detection
const response = await originalClient.chat.completions.create({
model: "gpt-4o-mini",
messages,
tools,
});

When detected:

  1. A replay_bypass_detected diagnostic event is emitted
  2. Session is marked compromised on the server (via the reportBypass endpoint)
  3. Future authoritative writes are rejected for this session
  4. The call still goes through (TypeScript can't revoke object references)

Bypass is authority revocation, not prevention. The SDK can't prevent you from using the original client. What it can do is mark the session as compromised so it can no longer make authoritative claims.


Compliance relevance

RequirementVesanor feature
EU AI Act Article 14 — human oversight during usesession.kill() intervention capability
EU AI Act Article 12 — automatic event recordingCapture pipeline with full decision context
EU AI Act Article 19 — 6-month minimum log retentionServer-side capture storage with configurable retention
GDPR Article 22 — meaningful information about automated decisionsCounterfactual capture (what was prevented and why)
SOC 2 — privileged actions attributable to individualsprincipal identity + governed execution receipts
NIST AI Agent Standards — runtime policy enforcementDeterministic contract evaluation, not LLM-based

These are ways Replay artifacts can support a broader control environment. They are not a claim that Replay alone satisfies any compliance regime.


Next steps