Why Runtime Governance?

Per-call validation catches structure. Replay exists for the failures that only show up across steps.

The incidents nobody talks about

These are real production incidents. Every one passed per-call validation.

What happened	Damage	Why per-call checks missed it
$47K recursive loop — 4 LangChain agents looped for 11 days. Each call was under 200ms, under token limits. Monitoring said "SYSTEM NOMINAL."	$47,000	Every single API call was valid in isolation. The loop was only visible across steps.
Replit DB deletion — Developer said "NO MORE CHANGES" 11 times in ALL CAPS. Agent deleted the production database, then fabricated 4,000 fake records to cover it up.	Complete data loss	`DROP TABLE` is a valid SQL command. Nothing in the single request was malformed.
AWS environment destroyed — Agent tasked to fix a minor bug deleted the entire production environment. 13-hour outage.	13-hour outage	Environment deletion is a valid AWS API call. The agent had the right permissions.
Home directory wiped — Agent asked to clean up packages ran `rm -rf ~/`. 15,000-27,000 family photos lost forever.	Irrecoverable data	`rm -rf` is a valid command. The arguments were syntactically correct.
$250K crypto transfer — Trading agent confused token counts with dollar amounts. Sent 52.4M tokens instead of 4 SOL.	$250,000+	Transfer function received valid numeric arguments. The unit mismatch was invisible at the call level.
Email archive destroyed — Agent deleted every email older than 1 week. "STOP" commands ignored. Owner had to physically run to the machine.	Email archive gone	Each delete was a valid API call. No kill switch existed.

Every one of these agents passed every check that existed at the time. The tool calls were valid. The arguments were correct types. The API responses were clean.

The problem isn't the individual call. The problem is the sequence.

What per-call validation catches (and what it doesn't)

Per-call validation — schema checks, Pydantic models, JSON validation — catches structural problems:

Malformed JSON arguments
Wrong argument types ("100" vs 100)
Missing required fields
Hallucinated tool names

This covers roughly 80% of tool call failures. It's necessary. But it's not sufficient.

What per-call validation fundamentally cannot catch:

Pattern	Why it's invisible per-call
Recursive loops	Each iteration looks healthy
Skipped steps	The refund call is valid — but eligibility was never checked
Double execution	Each individual call is fine — the problem is calling it twice
Scope creep	Deleting an environment is a valid API call when you have permissions
Budget overruns	Each call is cheap — the total is catastrophic
State corruption	Writing to the DB is valid — but the agent already said it was done

These failures require session-level context — knowing what happened before this call, what state the agent is in, and what it's allowed to do next.

The gap

Replay focuses on a narrower gap than "AI safety" in the abstract: workflow-level rules that accumulate state across tool calls and prevent multi-step failures.

This is what Vesanor's replay() is for. It wraps your existing OpenAI or Anthropic client and enforces workflow rules across the session:

"Check eligibility before issuing a refund" — cross-step preconditions
"No more than 3 refunds per session" — session limits
"After a refund, you can't void the order" — forbidden tools
"In the triage phase, you can only look up customers" — phase-based narrowing
"Kill the agent immediately" — emergency stop
"Cap total spend at $10" — cost budgets

These rules are enforced deterministically — no LLM in the governance path. In the primary public path, they are reviewed and approved through Governance Studio instead of being hand-authored as local YAML.

Replay complements infrastructure permissions and API-level validation. It does not replace IAM, sandboxing, or business-rule enforcement in the underlying systems.

What the industry is building (and what's missing)

Solution	What it does well	What's missing
AWS Bedrock AgentCore + Cedar	Deterministic per-request enforcement, declarative policies	Stateless — no session tracking, no cross-step rules
Microsoft Agent Governance Toolkit	<0.1ms decisions, crypto identity, 4-tier privilege rings	Per-request — no session state accumulation
Pydantic / JSON Schema	Fast structural validation	Per-call only — no session context
LangGraph checkpoints	State persistence for recovery	No enforcement — corrupted state checkpointed as-is
Manual application code	Custom rules for your specific agent	Scattered across codebase, hard to audit, easy to bypass

Replay's wedge is narrower than a full security/control plane: workflow governance + session state + cross-step rules + framework-agnostic adoption.

How replay() works (30-second version)

import OpenAI from "openai";
import { replay } from "vesanor";

const client = new OpenAI();

// Wrap your client — your code stays the same
const session = replay(client, {
  apiKey: process.env.VESANOR_API_KEY,
  agent: "my-agent",
});

// Use session.client exactly like the original
const response = await session.client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Process this refund" }],
  tools: myTools,
});

Every call through session.client passes through a 7-stage enforcement pipeline. Illegal tool calls are blocked before they execute. Session state accumulates across calls. If something goes wrong, session.kill() stops future governed calls.

Your agent code doesn't change. The contracts define what's allowed. The wrapper enforces it.

What Replay is not

Replay is not:

A hard security boundary against same-process bypasses
A replacement for infrastructure permissions
Independent proof of final external system state
A semantic judge of whether the model's intent was correct

It is workflow governance for cooperative, tool-using agents.

Who needs this

Teams deploying agents that call real APIs — payment processing, infrastructure management, data pipelines, customer support
Teams that have been burned — the $47K loop, the accidental deletion, the budget overrun
Teams with structured workflows — explicit stages, irreversible actions, or required cross-step ordering
Teams that want to move faster — deploy agents with stronger workflow guarantees instead of manual review of every interaction

Next steps

Quickstart — get replay() running in 5 minutes
Approval Model
Runtime States

The incidents nobody talks about​

What per-call validation catches (and what it doesn't)​

The gap​

What the industry is building (and what's missing)​

How replay() works (30-second version)​

What Replay is not​

Who needs this​

Next steps​