Replay — Workflow Governance for Agent Reliability — Vesanor
replay() has two entry points today:
- Zero-config governance review when you pass only an API key
- Contract-driven workflow governance when you pass local contracts
Replay is built to make cooperative agent workflows more predictable, reviewable, and testable. If your LLM agent calls issue_refund when it shouldn't, the contract-driven replay() path can block it before execution. The zero-config path learns and reviews governance server-side, but it does not yet enforce the approved compiled_session locally in the SDK.
Quick start — zero-config review
No local contracts needed. One line of code starts the governance learning and review flow:
npm install @vesanor/replay
import OpenAI from "openai";
import { replay } from "@vesanor/replay";
const client = new OpenAI();
const session = replay(client, {
apiKey: process.env.VESANOR_API_KEY,
});
// Use session.client exactly like the original
const response = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Help me with my order" }],
tools: myToolDefinitions,
});
Run your tests. Open the dashboard. Review the draft governance plan. Click Approve to freeze the current server-side snapshot.
Today this path is pass-through capture plus server-side review. It does not yet block tool calls locally.
Anthropic works the same way. Pass an Anthropic client instead of OpenAI — the SDK detects the provider automatically.
See Zero-Config Governance for the full walkthrough.
What replay() does in each mode
| Setup | What happens today |
|---|---|
replay(client, { apiKey }) | Pass-through capture, server-side governance inference, dashboard review flow |
replay(client, { contractsDir, mode: "enforce" }) | Local workflow governance with phases, preconditions, limits, and gating |
replay(client, { contractsDir, apiKey, tools }) | Server-backed Govern mode with durable state and stronger evidence on the governed path |
When you use local contracts, every call goes through the enforcement pipeline:
- Narrow — remove tools the LLM should not see in the current phase
- Pre-check — enforce session limits before the LLM call
- Validate — check invariants, preconditions, forbidden tools, and phase transitions
- Gate — block or strip illegal tool calls from the response
- Finalize — update session state, advance phase, and record evidence
- Capture — emit a redacted capture for audit and observability
Runtime enforcement today (manual contracts)
If you need runtime blocking today for a structured workflow, write YAML contracts and pass contractsDir:
npm install @vesanor/replay @vesanor/contracts-core
const session = replay(client, {
contractsDir: "./contracts",
agent: "my-agent",
apiKey: process.env.VESANOR_API_KEY,
});
Each tool gets a YAML contract defining invariants, preconditions, phase restrictions, and more:
# contracts/lookup_customer.yaml
tool: lookup_customer
side_effect: read
evidence_class: local_transaction
commit_requirement: acknowledged
timeouts: { total_ms: 30000 }
retries: { max_attempts: 1, retry_on: [] }
rate_limits: { on_429: { respect_retry_after: true, max_sleep_seconds: 60 } }
assertions: { input_invariants: [], output_invariants: [] }
golden_cases: []
allowed_errors: []
# Phase restrictions (v3)
transitions:
valid_in_phases: [triage]
advances_to: customer_identified
# Argument validation (v2)
argument_value_invariants:
- path: "$.customer_email"
regex: "^.+@.+"
See Writing Tests for the full contract reference.
These contracts are best when your workflow has explicit stages, irreversible actions, or cross-step invariants that are hard to express safely in ad hoc application code.
Replay-specific contract fields
These fields are only used by the replay() enforcement pipeline:
| Field | Type | Description |
|---|---|---|
side_effect | read | write | destructive | admin | financial | Tool risk classification |
evidence_class | local_transaction | ack_only | unverifiable | Evidence strength required |
commit_requirement | acknowledged | none | When to commit the step |
transitions | { valid_in_phases, advances_to } | Phase machine rules |
preconditions | [{ requires_prior_tool, with_output }] | Cross-step dependencies |
forbids_after | string[] | Tools that become illegal after this one executes |
argument_value_invariants | [{ path, gte, lte, regex, ... }] | Runtime argument checks |
gate | allow | block | Override risk_defaults for this tool |
session.yaml
The session contract defines the phase machine, limits, and risk defaults for the entire agent session.
# contracts/session.yaml
schema_version: "1.0"
agent: refund-support-bot
phases:
- name: triage
initial: true
- name: customer_identified
- name: eligibility_checked
- name: refund_issued
- name: completed
terminal: true
transitions:
triage: [customer_identified]
customer_identified: [eligibility_checked]
eligibility_checked: [refund_issued]
refund_issued: [completed]
session_limits:
max_steps: 10
max_tool_calls: 8
max_cost_per_session: 1.00
risk_defaults:
destructive: block
write: block
provider_constraints:
openai:
warn_incompatible: [no_streaming_enforcement]
Phases
Phases are a state machine. Each tool declares which phases it's valid in (valid_in_phases) and which phase it advances to (advances_to). The SDK automatically:
- Narrows tools — the LLM only sees tools valid in the current phase
- Validates transitions — blocks tool calls that would cause an illegal phase transition
- Advances state — moves to the next phase after a tool executes
Session limits
| Field | Description |
|---|---|
max_steps | Maximum LLM calls in the session |
max_tool_calls | Maximum total tool calls across all steps |
max_cost_per_session | Cost cap in dollars (computed from token usage) |
max_calls_per_tool | Per-tool call limits: { tool_name: max } |
loop_detection | { window, threshold } — block repeated identical calls |
circuit_breaker | { consecutive_blocks, consecutive_errors } — auto-kill after N failures |
Risk defaults
risk_defaults maps side effect classifications to gate behavior:
risk_defaults:
write: block # tools with side_effect: write get effectiveGate: block
destructive: block # tools with side_effect: destructive get effectiveGate: block
When a tool has effectiveGate: block (from risk_defaults or an explicit gate: block on the contract), invariant failures on that tool are escalated — the tool is blocked even if the gate mode would otherwise allow it through. Tools that pass all invariants are allowed regardless of their effectiveGate setting.
replay() options
const session = replay(client, {
// Required
contractsDir: "./contracts", // Path to contract YAML files + session.yaml
// Identity
agent: "my-agent", // Agent name for session tracking
sessionId: "custom-id", // Optional: override auto-generated session ID
principal: { role: "admin" }, // Optional: caller identity for policy evaluation
// Mode
mode: "enforce", // "enforce" | "shadow" | "log-only"
gate: "reject_all", // "reject_all" | "strip_partial" | "strip_blocked"
onError: "block", // "block" | "allow"
unmatchedPolicy: "block", // "block" | "allow" — for tools without a contract
// Tool executors (enables govern mode)
tools: {
lookup_customer: (args) => db.findCustomer(args.email),
issue_refund: (args) => billing.refund(args.amount),
},
// Server connection (enables durable state)
apiKey: process.env.VESANOR_API_KEY,
runtimeUrl: "https://app.vesanor.com",
// State persistence
store: myCustomStore, // Optional: durable store for state + captures
// Callbacks
onBlock: (decision) => log.warn("blocked", decision),
onNarrow: (result) => log.info("narrowed", result.removed),
diagnostics: (event) => log.debug(event),
// Advanced
maxRetries: 2, // Retry blocked calls (max 5)
maxUnguardedCalls: 3, // Auto-kill after N unguarded errors
compatEnforcement: "protective", // "protective" | "advisory" (compat tier only)
captureLevel: "redacted", // Capture privacy tier
});
Concurrent sessions: Each
replay()call wraps the provided client. If you need multiple concurrent sessions, create a separate client instance for each:const sessionA = replay(new OpenAI(), { contractsDir: "./contracts", ... });
const sessionB = replay(new OpenAI(), { contractsDir: "./contracts", ... });
Session API
The replay() return value is a ReplaySession:
const session = replay(client, options);
// The wrapped client — use this for all LLM calls
session.client
// Flush pending captures to the server
await session.flush();
// Kill the session — all subsequent calls throw ReplayKillError
session.kill();
// Restore the original client (remove enforcement wrapper)
session.restore();
// Session state snapshot (redacted — safe to log)
const state = session.getState();
// Health check
const health = session.getHealth();
// Shadow delta (shadow mode only)
const delta = session.getLastShadowDelta();
// Manual tool filtering (v3)
session.narrow(["lookup_customer", "check_eligibility"]); // restrict
session.widen(); // restore
// Wrapped tool executors (when tools option provided)
const result = await session.tools.lookup_customer({ email: "user@test" });
// result: { result: <executor output>, constraint_verdict: { passed, failures } }
getState()
Returns a redacted, JSON-serializable snapshot:
| Field | Type | Description |
|---|---|---|
sessionId | string | Session identifier |
agent | string | null | Agent name from options |
currentPhase | string | null | Current phase in the state machine |
stateVersion | number | Monotonically increasing state version |
controlRevision | number | Increments on narrow/widen operations |
totalStepCount | number | Total LLM calls made |
totalToolCalls | number | Total tool calls across all steps |
totalCost | number | Cost of committed steps only |
actualCost | number | All LLM calls including blocked/retried (used by limits) |
toolCallCounts | Record<string, number> | Per-tool call counts |
forbiddenTools | string[] | Tools blocked by forbids_after |
satisfiedPreconditions | Record<string, {}> | Preconditions met (values redacted) |
killed | boolean | Whether kill() was called |
totalBlockCount | number | Total enforcement blocks |
consecutiveBlockCount | number | Consecutive blocks (resets on success) |
totalUnguardedCalls | number | Calls that bypassed enforcement (degraded mode) |
lastStep | CompletedStepSnapshot | null | Details of the most recent step |
lastNarrowing | NarrowingSnapshot | null | Last tool narrowing from most recent create() call |
startedAt | Date | Session creation timestamp |
principal | null | Always null (redacted for safety) |
Principal is always redacted in
getState(). If you passedprincipal: { role: "admin", secret: "..." }, the state returnsprincipal: null. This prevents accidental leakage of identity data in logs or error reports.
getHealth()
| Field | Type | Description |
|---|---|---|
status | healthy | degraded | inactive | Overall health |
durability | server | degraded-local | inactive | Where state is persisted |
authorityState | active | advisory | compromised | recovering | killed | inactive | Server authority |
protectionLevel | govern | protect | monitor | Current enforcement level |
tier | strong | compat | Enforcement tier |
compatEnforcement | protective | advisory | Compat tier behavior |
killed | boolean | Session killed |
bypass_detected | boolean | Direct calls on original client detected |
cluster_detected | boolean | Whether clustering was detected in session |
totalSteps | number | Total steps completed |
totalBlocks | number | Total enforcement blocks |
totalErrors | number | Total errors encountered |
shadowEvaluations | number | Shadow-mode evaluations performed (always 0 in enforce/log-only) |
getLastShadowDelta()
Returns the most recent ShadowDelta when in shadow mode. Returns null in enforce/log-only modes or before the first call.
const session = replay(client, { mode: "shadow", ... });
await session.client.chat.completions.create({ ... });
const delta = session.getLastShadowDelta();
// delta.would_have_blocked — tool calls that would have been blocked
// delta.would_have_narrowed — tools that would have been removed by narrowing
Modes
Enforce (default)
Tool calls that violate contracts are blocked. The gate mode controls how:
| Gate | Behavior |
|---|---|
reject_all | Any violation throws ReplayContractError |
strip_partial | Remove blocked calls, throw if all blocked |
strip_blocked | Remove blocked calls, return text-only if all blocked |
Shadow
Evaluates all rules but never blocks. The response is returned unmodified. State is not mutated — totalStepCount stays at 0. A ShadowDelta is captured showing what would have happened:
const session = replay(client, {
contractsDir: "./contracts",
mode: "shadow",
tools: executors,
});
// Call succeeds even if it violates contracts
const response = await session.client.chat.completions.create({ ... });
// Check what would have happened in enforce mode
const delta = session.getLastShadowDelta();
console.log(delta?.would_have_blocked); // tool calls that would be blocked
console.log(delta?.would_have_narrowed); // tools that would be removed
Log-only
Captures tool calls for observability but performs no validation.
Protection levels
The SDK determines the protection level from your configuration:
| Level | Requirements | Server state | Local enforcement |
|---|---|---|---|
| Govern | mode: "enforce" + apiKey + all state-bearing tools in tools map | Durable (server round-trip per call) | Full |
| Protect | mode: "enforce" + tools map (no apiKey, or not all tools wrapped) | None | Full |
| Monitor | mode: "shadow" or mode: "log-only" | None | Evaluate only |
A tool is state-bearing if it has commit_requirement, transitions, execution_constraints, or forbids_after in its contract.
Govern mode — server round-trip
In govern mode, every LLM call goes through a durable server round-trip:
- Preflight — register the request with the server before calling the LLM
- Proposal — submit the LLM response for server-side evaluation
- Receipt — record execution evidence after tool execution
This creates a durable governed record with database-backed state for the wrapped path.
Graceful degradation
If the server becomes unreachable, govern mode degrades to local enforcement automatically. No crashes, no data loss — enforcement continues locally until the server recovers.
Error types
import { ReplayKillError } from "@vesanor/replay";
import { ReplayContractError } from "@vesanor/replay";
try {
await session.client.chat.completions.create({ ... });
} catch (err) {
if (err instanceof ReplayKillError) {
// Session was killed — cannot make more calls
console.log(err.sessionId, err.killedAt);
}
if (err instanceof ReplayContractError) {
// Tool call blocked by enforcement
console.log(err.decision); // ReplayDecision with blocked tool details
console.log(err.failures); // ContractFailure[] with specific violations
}
}
Preconditions
Preconditions enforce ordering between tools:
# contracts/issue_refund.yaml
tool: issue_refund
preconditions:
- requires_prior_tool: check_eligibility
with_output:
- path: "$.eligible"
equals: true
forbids_after: [issue_refund] # Prevent double refund
requires_prior_tool— this tool can only be called after the named tool has executedwith_output— assertions on the prior tool's output (path + equals)forbids_after— tools that become illegal after this tool executes
The SDK tracks precondition satisfaction across steps. The with_output mechanism extracts values from tool result messages in the conversation and stores them for later assertion.
Captures and redaction
Every enforcement decision is captured with full context. Before storage, captures pass through the SDK-side redaction pipeline:
- API keys (
sk-proj-...,sk-ant-...) are replaced with[REDACTED] - Bearer tokens, PEM keys, connection strings are redacted
- Email addresses in captures are redacted
If redaction fails, the capture is dropped entirely (fail-closed).
Captures do not include raw request messages. Only tool names, tool call arguments, and enforcement metadata are captured. Your prompt content stays local.
Full example — multi-phase agent
import OpenAI from "openai";
import { replay } from "@vesanor/replay";
const client = new OpenAI();
// Tool executors — your real business logic
const tools = {
lookup_customer: async (args) => {
const customer = await db.customers.findByEmail(args.customer_email);
return { customer_id: customer.id, name: customer.name };
},
check_eligibility: async (args) => {
const order = await db.orders.find(args.order_id);
return { eligible: order.status === "delivered", reason: order.status };
},
issue_refund: async (args) => {
const refund = await billing.createRefund(args.customer_id, args.amount);
return { refund_id: refund.id, status: "processed" };
},
};
const session = replay(client, {
contractsDir: "./contracts",
agent: "refund-bot",
apiKey: process.env.VESANOR_API_KEY,
tools,
principal: { role: "support", team: "tier-1" },
});
// Step 1: triage phase — only lookup_customer is visible
const r1 = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Look up customer user1@test" }],
tools: allToolDefs,
tool_choice: "required",
});
// SDK narrowed tools to [lookup_customer], phase advances to customer_identified
// Execute the tool
const tc1 = r1.choices[0].message.tool_calls[0];
const result1 = await session.tools.lookup_customer(JSON.parse(tc1.function.arguments));
// Step 2: customer_identified phase — only check_eligibility is visible
const r2 = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "user", content: "Look up customer user1@test" },
{ role: "assistant", tool_calls: r1.choices[0].message.tool_calls },
{ role: "tool", tool_call_id: tc1.id, content: JSON.stringify(result1.result) },
{ role: "user", content: "Check eligibility for order ORD-123" },
],
tools: allToolDefs,
tool_choice: "required",
});
// Precondition satisfied: lookup_customer was called
// Phase advances to eligibility_checked
// Check state at any time
console.log(session.getState().currentPhase); // "eligibility_checked"
console.log(session.getState().toolCallCounts); // { lookup_customer: 1, check_eligibility: 1 }
console.log(session.getState().forbiddenTools); // []
// When done
await session.flush();
session.restore();
Next steps
- Replay Quickstart — get
replay()running in 5 minutes - Protection Levels — Monitor, Protect, Govern explained
- Contract Cookbook — every contract field with examples
- Phases & Transitions — design your agent's state machine
- Workflow Governance — multi-session coordination
- API Reference — complete SDK types and options
- Writing Tests — full contract YAML reference for CI
- Promoting Contracts — observe, promote, enforce workflow