Skip to main content

Replay — Workflow Governance for Agent Reliability — Vesanor

replay() has two entry points today:

  • Zero-config governance review when you pass only an API key
  • Contract-driven workflow governance when you pass local contracts

Replay is built to make cooperative agent workflows more predictable, reviewable, and testable. If your LLM agent calls issue_refund when it shouldn't, the contract-driven replay() path can block it before execution. The zero-config path learns and reviews governance server-side, but it does not yet enforce the approved compiled_session locally in the SDK.


Quick start — zero-config review

No local contracts needed. One line of code starts the governance learning and review flow:

npm install @vesanor/replay
import OpenAI from "openai";
import { replay } from "@vesanor/replay";

const client = new OpenAI();

const session = replay(client, {
apiKey: process.env.VESANOR_API_KEY,
});

// Use session.client exactly like the original
const response = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Help me with my order" }],
tools: myToolDefinitions,
});

Run your tests. Open the dashboard. Review the draft governance plan. Click Approve to freeze the current server-side snapshot.

Today this path is pass-through capture plus server-side review. It does not yet block tool calls locally.

Anthropic works the same way. Pass an Anthropic client instead of OpenAI — the SDK detects the provider automatically.

See Zero-Config Governance for the full walkthrough.


What replay() does in each mode

SetupWhat happens today
replay(client, { apiKey })Pass-through capture, server-side governance inference, dashboard review flow
replay(client, { contractsDir, mode: "enforce" })Local workflow governance with phases, preconditions, limits, and gating
replay(client, { contractsDir, apiKey, tools })Server-backed Govern mode with durable state and stronger evidence on the governed path

When you use local contracts, every call goes through the enforcement pipeline:

  1. Narrow — remove tools the LLM should not see in the current phase
  2. Pre-check — enforce session limits before the LLM call
  3. Validate — check invariants, preconditions, forbidden tools, and phase transitions
  4. Gate — block or strip illegal tool calls from the response
  5. Finalize — update session state, advance phase, and record evidence
  6. Capture — emit a redacted capture for audit and observability

Runtime enforcement today (manual contracts)

If you need runtime blocking today for a structured workflow, write YAML contracts and pass contractsDir:

npm install @vesanor/replay @vesanor/contracts-core
const session = replay(client, {
contractsDir: "./contracts",
agent: "my-agent",
apiKey: process.env.VESANOR_API_KEY,
});

Each tool gets a YAML contract defining invariants, preconditions, phase restrictions, and more:

# contracts/lookup_customer.yaml
tool: lookup_customer
side_effect: read
evidence_class: local_transaction
commit_requirement: acknowledged

timeouts: { total_ms: 30000 }
retries: { max_attempts: 1, retry_on: [] }
rate_limits: { on_429: { respect_retry_after: true, max_sleep_seconds: 60 } }
assertions: { input_invariants: [], output_invariants: [] }
golden_cases: []
allowed_errors: []

# Phase restrictions (v3)
transitions:
valid_in_phases: [triage]
advances_to: customer_identified

# Argument validation (v2)
argument_value_invariants:
- path: "$.customer_email"
regex: "^.+@.+"

See Writing Tests for the full contract reference.

These contracts are best when your workflow has explicit stages, irreversible actions, or cross-step invariants that are hard to express safely in ad hoc application code.

Replay-specific contract fields

These fields are only used by the replay() enforcement pipeline:

FieldTypeDescription
side_effectread | write | destructive | admin | financialTool risk classification
evidence_classlocal_transaction | ack_only | unverifiableEvidence strength required
commit_requirementacknowledged | noneWhen to commit the step
transitions{ valid_in_phases, advances_to }Phase machine rules
preconditions[{ requires_prior_tool, with_output }]Cross-step dependencies
forbids_afterstring[]Tools that become illegal after this one executes
argument_value_invariants[{ path, gte, lte, regex, ... }]Runtime argument checks
gateallow | blockOverride risk_defaults for this tool

session.yaml

The session contract defines the phase machine, limits, and risk defaults for the entire agent session.

# contracts/session.yaml
schema_version: "1.0"
agent: refund-support-bot

phases:
- name: triage
initial: true
- name: customer_identified
- name: eligibility_checked
- name: refund_issued
- name: completed
terminal: true

transitions:
triage: [customer_identified]
customer_identified: [eligibility_checked]
eligibility_checked: [refund_issued]
refund_issued: [completed]

session_limits:
max_steps: 10
max_tool_calls: 8
max_cost_per_session: 1.00

risk_defaults:
destructive: block
write: block

provider_constraints:
openai:
warn_incompatible: [no_streaming_enforcement]

Phases

Phases are a state machine. Each tool declares which phases it's valid in (valid_in_phases) and which phase it advances to (advances_to). The SDK automatically:

  • Narrows tools — the LLM only sees tools valid in the current phase
  • Validates transitions — blocks tool calls that would cause an illegal phase transition
  • Advances state — moves to the next phase after a tool executes

Session limits

FieldDescription
max_stepsMaximum LLM calls in the session
max_tool_callsMaximum total tool calls across all steps
max_cost_per_sessionCost cap in dollars (computed from token usage)
max_calls_per_toolPer-tool call limits: { tool_name: max }
loop_detection{ window, threshold } — block repeated identical calls
circuit_breaker{ consecutive_blocks, consecutive_errors } — auto-kill after N failures

Risk defaults

risk_defaults maps side effect classifications to gate behavior:

risk_defaults:
write: block # tools with side_effect: write get effectiveGate: block
destructive: block # tools with side_effect: destructive get effectiveGate: block

When a tool has effectiveGate: block (from risk_defaults or an explicit gate: block on the contract), invariant failures on that tool are escalated — the tool is blocked even if the gate mode would otherwise allow it through. Tools that pass all invariants are allowed regardless of their effectiveGate setting.


replay() options

const session = replay(client, {
// Required
contractsDir: "./contracts", // Path to contract YAML files + session.yaml

// Identity
agent: "my-agent", // Agent name for session tracking
sessionId: "custom-id", // Optional: override auto-generated session ID
principal: { role: "admin" }, // Optional: caller identity for policy evaluation

// Mode
mode: "enforce", // "enforce" | "shadow" | "log-only"
gate: "reject_all", // "reject_all" | "strip_partial" | "strip_blocked"
onError: "block", // "block" | "allow"
unmatchedPolicy: "block", // "block" | "allow" — for tools without a contract

// Tool executors (enables govern mode)
tools: {
lookup_customer: (args) => db.findCustomer(args.email),
issue_refund: (args) => billing.refund(args.amount),
},

// Server connection (enables durable state)
apiKey: process.env.VESANOR_API_KEY,
runtimeUrl: "https://app.vesanor.com",

// State persistence
store: myCustomStore, // Optional: durable store for state + captures

// Callbacks
onBlock: (decision) => log.warn("blocked", decision),
onNarrow: (result) => log.info("narrowed", result.removed),
diagnostics: (event) => log.debug(event),

// Advanced
maxRetries: 2, // Retry blocked calls (max 5)
maxUnguardedCalls: 3, // Auto-kill after N unguarded errors
compatEnforcement: "protective", // "protective" | "advisory" (compat tier only)
captureLevel: "redacted", // Capture privacy tier
});

Concurrent sessions: Each replay() call wraps the provided client. If you need multiple concurrent sessions, create a separate client instance for each:

const sessionA = replay(new OpenAI(), { contractsDir: "./contracts", ... });
const sessionB = replay(new OpenAI(), { contractsDir: "./contracts", ... });

Session API

The replay() return value is a ReplaySession:

const session = replay(client, options);

// The wrapped client — use this for all LLM calls
session.client

// Flush pending captures to the server
await session.flush();

// Kill the session — all subsequent calls throw ReplayKillError
session.kill();

// Restore the original client (remove enforcement wrapper)
session.restore();

// Session state snapshot (redacted — safe to log)
const state = session.getState();

// Health check
const health = session.getHealth();

// Shadow delta (shadow mode only)
const delta = session.getLastShadowDelta();

// Manual tool filtering (v3)
session.narrow(["lookup_customer", "check_eligibility"]); // restrict
session.widen(); // restore

// Wrapped tool executors (when tools option provided)
const result = await session.tools.lookup_customer({ email: "user@test" });
// result: { result: <executor output>, constraint_verdict: { passed, failures } }

getState()

Returns a redacted, JSON-serializable snapshot:

FieldTypeDescription
sessionIdstringSession identifier
agentstring | nullAgent name from options
currentPhasestring | nullCurrent phase in the state machine
stateVersionnumberMonotonically increasing state version
controlRevisionnumberIncrements on narrow/widen operations
totalStepCountnumberTotal LLM calls made
totalToolCallsnumberTotal tool calls across all steps
totalCostnumberCost of committed steps only
actualCostnumberAll LLM calls including blocked/retried (used by limits)
toolCallCountsRecord<string, number>Per-tool call counts
forbiddenToolsstring[]Tools blocked by forbids_after
satisfiedPreconditionsRecord<string, {}>Preconditions met (values redacted)
killedbooleanWhether kill() was called
totalBlockCountnumberTotal enforcement blocks
consecutiveBlockCountnumberConsecutive blocks (resets on success)
totalUnguardedCallsnumberCalls that bypassed enforcement (degraded mode)
lastStepCompletedStepSnapshot | nullDetails of the most recent step
lastNarrowingNarrowingSnapshot | nullLast tool narrowing from most recent create() call
startedAtDateSession creation timestamp
principalnullAlways null (redacted for safety)

Principal is always redacted in getState(). If you passed principal: { role: "admin", secret: "..." }, the state returns principal: null. This prevents accidental leakage of identity data in logs or error reports.

getHealth()

FieldTypeDescription
statushealthy | degraded | inactiveOverall health
durabilityserver | degraded-local | inactiveWhere state is persisted
authorityStateactive | advisory | compromised | recovering | killed | inactiveServer authority
protectionLevelgovern | protect | monitorCurrent enforcement level
tierstrong | compatEnforcement tier
compatEnforcementprotective | advisoryCompat tier behavior
killedbooleanSession killed
bypass_detectedbooleanDirect calls on original client detected
cluster_detectedbooleanWhether clustering was detected in session
totalStepsnumberTotal steps completed
totalBlocksnumberTotal enforcement blocks
totalErrorsnumberTotal errors encountered
shadowEvaluationsnumberShadow-mode evaluations performed (always 0 in enforce/log-only)

getLastShadowDelta()

Returns the most recent ShadowDelta when in shadow mode. Returns null in enforce/log-only modes or before the first call.

const session = replay(client, { mode: "shadow", ... });
await session.client.chat.completions.create({ ... });

const delta = session.getLastShadowDelta();
// delta.would_have_blocked — tool calls that would have been blocked
// delta.would_have_narrowed — tools that would have been removed by narrowing

Modes

Enforce (default)

Tool calls that violate contracts are blocked. The gate mode controls how:

GateBehavior
reject_allAny violation throws ReplayContractError
strip_partialRemove blocked calls, throw if all blocked
strip_blockedRemove blocked calls, return text-only if all blocked

Shadow

Evaluates all rules but never blocks. The response is returned unmodified. State is not mutated — totalStepCount stays at 0. A ShadowDelta is captured showing what would have happened:

const session = replay(client, {
contractsDir: "./contracts",
mode: "shadow",
tools: executors,
});

// Call succeeds even if it violates contracts
const response = await session.client.chat.completions.create({ ... });

// Check what would have happened in enforce mode
const delta = session.getLastShadowDelta();
console.log(delta?.would_have_blocked); // tool calls that would be blocked
console.log(delta?.would_have_narrowed); // tools that would be removed

Log-only

Captures tool calls for observability but performs no validation.


Protection levels

The SDK determines the protection level from your configuration:

LevelRequirementsServer stateLocal enforcement
Governmode: "enforce" + apiKey + all state-bearing tools in tools mapDurable (server round-trip per call)Full
Protectmode: "enforce" + tools map (no apiKey, or not all tools wrapped)NoneFull
Monitormode: "shadow" or mode: "log-only"NoneEvaluate only

A tool is state-bearing if it has commit_requirement, transitions, execution_constraints, or forbids_after in its contract.

Govern mode — server round-trip

In govern mode, every LLM call goes through a durable server round-trip:

  1. Preflight — register the request with the server before calling the LLM
  2. Proposal — submit the LLM response for server-side evaluation
  3. Receipt — record execution evidence after tool execution

This creates a durable governed record with database-backed state for the wrapped path.

Graceful degradation

If the server becomes unreachable, govern mode degrades to local enforcement automatically. No crashes, no data loss — enforcement continues locally until the server recovers.


Error types

import { ReplayKillError } from "@vesanor/replay";
import { ReplayContractError } from "@vesanor/replay";

try {
await session.client.chat.completions.create({ ... });
} catch (err) {
if (err instanceof ReplayKillError) {
// Session was killed — cannot make more calls
console.log(err.sessionId, err.killedAt);
}
if (err instanceof ReplayContractError) {
// Tool call blocked by enforcement
console.log(err.decision); // ReplayDecision with blocked tool details
console.log(err.failures); // ContractFailure[] with specific violations
}
}

Preconditions

Preconditions enforce ordering between tools:

# contracts/issue_refund.yaml
tool: issue_refund
preconditions:
- requires_prior_tool: check_eligibility
with_output:
- path: "$.eligible"
equals: true
forbids_after: [issue_refund] # Prevent double refund
  • requires_prior_tool — this tool can only be called after the named tool has executed
  • with_output — assertions on the prior tool's output (path + equals)
  • forbids_after — tools that become illegal after this tool executes

The SDK tracks precondition satisfaction across steps. The with_output mechanism extracts values from tool result messages in the conversation and stores them for later assertion.


Captures and redaction

Every enforcement decision is captured with full context. Before storage, captures pass through the SDK-side redaction pipeline:

  • API keys (sk-proj-..., sk-ant-...) are replaced with [REDACTED]
  • Bearer tokens, PEM keys, connection strings are redacted
  • Email addresses in captures are redacted

If redaction fails, the capture is dropped entirely (fail-closed).

Captures do not include raw request messages. Only tool names, tool call arguments, and enforcement metadata are captured. Your prompt content stays local.


Full example — multi-phase agent

import OpenAI from "openai";
import { replay } from "@vesanor/replay";

const client = new OpenAI();

// Tool executors — your real business logic
const tools = {
lookup_customer: async (args) => {
const customer = await db.customers.findByEmail(args.customer_email);
return { customer_id: customer.id, name: customer.name };
},
check_eligibility: async (args) => {
const order = await db.orders.find(args.order_id);
return { eligible: order.status === "delivered", reason: order.status };
},
issue_refund: async (args) => {
const refund = await billing.createRefund(args.customer_id, args.amount);
return { refund_id: refund.id, status: "processed" };
},
};

const session = replay(client, {
contractsDir: "./contracts",
agent: "refund-bot",
apiKey: process.env.VESANOR_API_KEY,
tools,
principal: { role: "support", team: "tier-1" },
});

// Step 1: triage phase — only lookup_customer is visible
const r1 = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Look up customer user1@test" }],
tools: allToolDefs,
tool_choice: "required",
});
// SDK narrowed tools to [lookup_customer], phase advances to customer_identified

// Execute the tool
const tc1 = r1.choices[0].message.tool_calls[0];
const result1 = await session.tools.lookup_customer(JSON.parse(tc1.function.arguments));

// Step 2: customer_identified phase — only check_eligibility is visible
const r2 = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages: [
{ role: "user", content: "Look up customer user1@test" },
{ role: "assistant", tool_calls: r1.choices[0].message.tool_calls },
{ role: "tool", tool_call_id: tc1.id, content: JSON.stringify(result1.result) },
{ role: "user", content: "Check eligibility for order ORD-123" },
],
tools: allToolDefs,
tool_choice: "required",
});
// Precondition satisfied: lookup_customer was called
// Phase advances to eligibility_checked

// Check state at any time
console.log(session.getState().currentPhase); // "eligibility_checked"
console.log(session.getState().toolCallCounts); // { lookup_customer: 1, check_eligibility: 1 }
console.log(session.getState().forbiddenTools); // []

// When done
await session.flush();
session.restore();

Next steps