Skip to main content

Troubleshooting

Common issues with replay() and how to fix them.


Configuration errors

"ReplayConfigError: policy block exists but no principal supplied"

A contract or session.yaml has a policy block, but you didn't pass a principal to replay().

Fix: Add a principal to your options:

const session = replay(client, {
principal: { role: "agent", id: "my-agent" },
// ...
});

Or remove the policy block from your contracts if you don't need authorization.


"ReplayConfigError: execution_constraints declared but tool not wrapped"

A contract has execution_constraints but the tool isn't in the tools map.

Fix: Add the tool to your tools option:

const session = replay(client, {
tools: {
delete_file: myDeleteFunction, // Must be present
},
});

"ReplayConfigError: circular transitions detected"

Your session.yaml has a transition cycle with no way to reach a terminal phase.

Fix: Check your transitions map. Every path must eventually reach a terminal: true phase.


"ReplayConfigError: unreachable phases"

A declared phase can't be reached from the initial phase.

Fix: Either add a transition that leads to the unreachable phase, or remove it from phases.


"ReplayConfigError: observe() already active"

You called observe() on this client before calling replay(). They can't coexist on the same client.

Fix: Remove observe() and use replay() with mode: "log-only" for equivalent capture behavior:

// Before
const { client } = observe(openai, { apiKey });

// After
const session = replay(openai, { mode: "log-only", apiKey });
const client = session.client;

"ReplayConfigError: provider_incompatible"

A provider_constraints.block_incompatible rule in your session.yaml matched the current request.

Fix: Check your provider_constraints in session.yaml and either remove the blocking rule or adjust your request to avoid the incompatibility.


Enforcement issues

"My tool calls are being blocked"

Check these in order:

  1. Phase restriction — Is the tool valid in the current phase?

    const state = session.getState();
    console.log("Phase:", state.currentPhase);
  2. Precondition — Was the required prior tool called?

    console.log("Preconditions:", state.satisfiedPreconditions);
  3. Forbidden — Was the tool added to forbiddenTools by a prior step?

    console.log("Forbidden:", state.forbiddenTools);
  4. Session limit — Has the session exceeded a limit?

    console.log("Steps:", state.totalStepCount);
    console.log("Cost:", state.actualCost);
  5. No contract — Does the tool have a matching contract file? Tools without contracts are blocked by default (unmatchedPolicy: "block").

Use the onNarrow callback to see exactly what's being removed and why:

const session = replay(client, {
onNarrow: (n) => {
for (const r of n.removed) {
console.log(`Removed: ${r.tool}${r.reason}`);
}
},
});

"ReplayContractError thrown but the tool call looks valid"

The default gate (reject_all) throws if any tool call in the response is blocked — even if others are valid.

Options:

  • Fix the contract that's too strict
  • Switch to gate: "strip_partial" to allow valid calls while removing blocked ones
  • Use mode: "shadow" to test without blocking

"Tool calls pass validation but session state isn't advancing"

Check commit_requirement in your contract. If set to none, the tool is recorded but doesn't advance authoritative state.

# This tool advances state:
commit_requirement: acknowledged

# This tool does NOT advance state:
commit_requirement: none

Server connection issues

"Session health shows durability: degraded-local"

The SDK can't reach the Vesanor server. Check:

  1. API key — is VESANOR_API_KEY set?
  2. Server URL — is the server running? Default: https://app.vesanor.com
  3. Network — can you reach the server from your environment?

With onError: "block" (default), server failures block calls. With onError: "allow", calls proceed locally.


"Session health shows authorityState: compromised"

The SDK detected a bypass — the original client was used directly instead of session.client.

Fix: Use session.client for all LLM calls:

// Wrong
const response = await client.chat.completions.create({ ... });

// Right
const response = await session.client.chat.completions.create({ ... });

Once compromised, the session can't recover. Kill it and create a new session.


"Session health shows authorityState: advisory"

The session doesn't have full Govern mode. Common causes:

  • No apiKey provided (Protect mode)
  • No tools provided (no governed execution boundary)
  • Server is unreachable (degraded)

Check health.protectionLevel and health.tier for specifics.


"protectionLevel shows 'govern' but no server connection"

If health.protectionLevel says "govern" but health.durability says "inactive" and health.authorityState says "advisory", the session is not actually governing — it's running as Protect with compat tier. The real indicators are durability and authorityState, not protectionLevel alone.

Govern mode requires both a tools map covering all state-bearing contracts and a valid apiKey connected to the server.


Phases, limits, and loop detection not working

"currentPhase is always null"

If your session.yaml defines phases but session.getState().currentPhase is always null, the session.yaml failed to compile. Check your diagnostics callback for replay_compile_error events.

Common causes:

  • Missing initial: true on any phase
  • Missing terminal: true on any phase
  • transitions referencing a phase name not listed in phases
  • A non-terminal phase that can't be reached from the initial phase

When session.yaml compilation fails, the session is blocked — all create() calls throw ReplayConfigError. Fix the compilation error and restart.

"max_calls_per_tool / loop_detection not enforced"

Session limits defined in session.yaml require successful compilation. If there's a compile error anywhere in session.yaml (not just in session_limits — any section), the entire file fails and all features from it are lost, including limits, phases, and loop detection.

Check: Look for replay_compile_error in your diagnostics. Fix the root compilation error and all features will activate together.


Kill and recovery

"ReplayKillError thrown unexpectedly"

The session was killed — either manually or by the circuit breaker.

Check circuit breaker:

const state = session.getState();
console.log("Consecutive blocks:", state.consecutiveBlockCount);
console.log("Consecutive errors:", state.consecutiveErrorCount);

Recover:

session.restore();  // Release wrapper
const newSession = replay(client, { ... }); // Start fresh

"Can't create new session — another replay already attached"

Only one replay() session can be active on a given client at a time.

Fix: Call session.restore() on the old session before creating a new one.


Performance

"Enforcement adds latency"

Contract evaluation typically runs in <1ms (guard_overhead_ms in captures). If you see higher values:

  • Check contract complexity (many preconditions with output checks)
  • In Govern mode, server round-trips add network latency (preflight + proposal + receipt)
  • Consider Protect mode for latency-sensitive use cases

Layer 2-4 enforcement issues

"binding_not_found" / "ref_mismatch"

binding_not_found — A ref operator in an argument value invariant references a binding slot that hasn't been set yet. The producing tool hasn't been called.

Fix: Ensure the producing tool (with binds) is called before the consuming tool (with ref). Consider adding a preconditions: requires_prior_tool to enforce ordering.

ref_mismatch — The current argument value doesn't match the bound value from the producing tool.

Fix: Check that the LLM is propagating values correctly. The failure detail includes both expected (bound) and actual values.


"aggregate_limit_exceeded" / "aggregate_path_missing"

aggregate_limit_exceeded — A session aggregate (sum, count, max, min, count_distinct) would breach its bound if this call were allowed.

Fix: Check your aggregates in session.yaml. The failure detail shows current (before this call) and speculative (what it would be). Consider whether the bound is too tight or the session is legitimately exceeding limits.

aggregate_path_missing — A tool call matched an aggregate's tool filter but the specified JSONPath was missing from the arguments. This is fail-closed by design.

Fix: Ensure the tool always includes the aggregated field, or exclude the tool from the aggregate.


"envelope_not_established" / "envelope_violation"

envelope_not_established — A constrained value was checked before the reference tool (ceiling/floor/anchor/initial) was called. This is a sequencing error.

Fix: Ensure the reference tool runs before the constrained tool. Consider adding preconditions. The error message will suggest this.

envelope_violation — The constrained value violates the envelope's directional constraint (exceeded ceiling, dropped below floor, outside band, or broke monotonic direction).

Fix: Check the failure detail for the constraint type and the actual vs. expected values. Common cause: the LLM is increasing a value that should only decrease.


"checkpoint_timeout" / "checkpoint_denied" / "checkpoint_budget_exceeded"

checkpoint_timeout — No human approved or denied the checkpoint within timeout_seconds. Default behavior: deny.

Fix: Ensure approval routing is working (Slack, dashboard, webhook). Check timeout_seconds is long enough for your approval workflow. If you want calls to proceed on timeout, set on_timeout: allow (not recommended for high-stakes operations).

checkpoint_denied — A human explicitly denied the checkpoint.

Fix: This is intentional — the approver decided the call should not proceed. Check the denial reason in the trace.

checkpoint_budget_exceeded — More than 10 checkpoints triggered in this session. All subsequent checkpoints auto-deny.

Fix: Narrow your when conditions so checkpoints trigger less frequently, or investigate why the session is hitting so many checkpoints.


"label_gate" removal

A tool was removed during narrowing because an active session label triggered a label_gates rule.

Fix: Check session.getState() for active labels. Labels are immutable — once set, they cannot be removed within the session. If the label is no longer relevant, start a new session without it.

const state = session.getState();
// Check the onNarrow callback for label_gate removals

Schema-derived invariant blocks

A schema-derived invariant blocked a tool call. The failure detail includes [schema-derived] and source: "schema_derived" to distinguish it from manual invariants.

Fix options:

  • If the tool schema has incorrect bounds, exclude the field: schema_derived_exclude: [$.field_path]
  • If you want full control over a tool's invariants, disable per-tool: schema_derived: false
  • If you want to disable globally, set schema_derived: false in session.yaml

Contract graph analysis diagnostics

These diagnostics appear at compile time (session creation or vesanor validate):

DiagnosticMeaningFix
DEAD_TOOLTool has max_calls: 0Set max_calls >= 1 or remove the tool
UNREACHABLE_PRECONDITIONTool A requires tool B, but B is dead or phase-gated outFix the required tool's availability
DEAD_PHASEPhase exists but no tools are valid in itAdd tools to the phase or remove it
CIRCULAR_DEADLOCKInescapable phase cycle (error severity)Add an exit transition from the cycle

Suppress intentional diagnostics in session.yaml:

graph_analysis:
suppress:
- check: dead_tool
tool: sentinel_tool
reason: "Intentional — should never be called"

Debugging tips

Enable verbose diagnostics

const session = replay(client, {
diagnostics: (event) => {
console.log(`[replay] ${event.type}`, event);
},
});

Check the capture store

import { MemoryStore } from "@vesanor/replay";

const store = new MemoryStore();
const session = replay(client, { store });

// After some calls...
const captures = store.getCapturedCalls();
console.log(JSON.stringify(captures, null, 2));

Shadow mode for testing

Run mode: "shadow" to see enforcement decisions without blocking:

const session = replay(client, {
mode: "shadow",
diagnostics: (e) => console.log(e),
});

Next steps