Shadow Mode
Shadow mode runs the full enforcement pipeline without blocking anything. Your agent behaves normally while replay() records what it would have done — which tools would have been removed, which calls would have been blocked, and why.
Why shadow mode
You've written contracts. You think they're right. But enabling enforcement on a live agent is risky — what if a legitimate tool call gets blocked?
Shadow mode answers: "What would enforcement do to my real traffic?" without any risk.
How to use it
const session = replay(client, {
contractsDir: "./contracts",
agent: "my-agent",
mode: "shadow", // Compute but don't apply
apiKey: process.env.VESANOR_API_KEY, // Send captures to dashboard
});
// Agent runs normally — nothing blocked, nothing modified
const response = await session.client.chat.completions.create({
model: "gpt-4o-mini",
messages,
tools,
});
// Response is returned unmodified — even if contracts would block it
What shadow captures
Every call produces a shadow_delta — a record of what enforcement would have done:
would_have_narrowed
Tools that would have been removed before the LLM saw them:
{
"would_have_narrowed": [
{ "tool": "issue_refund", "reason": "wrong_phase" },
{ "tool": "delete_record", "reason": "forbidden_in_state" },
{ "tool": "admin_reset", "reason": "no_contract" }
]
}
would_have_blocked
Tool calls that would have been blocked after the LLM responded:
{
"would_have_blocked": [
{
"tool_name": "issue_refund",
"reason": "precondition_not_met",
"detail": "Required prior tool: check_eligibility"
}
]
}
Phase context
Where the session is in the phase machine:
{
"current_phase": "customer_identified",
"legal_next_phases": ["eligibility_checked", "escalated"]
}
The safe rollout path
Step 1: Shadow mode
Deploy with mode: "shadow". Monitor the dashboard for false positives.
const session = replay(client, {
mode: "shadow",
apiKey: process.env.VESANOR_API_KEY,
});
Look for:
- Tool calls that shadow says it would block — are they actually bad?
- Tools that would be narrowed — should they be available in that phase?
- Session limit projections — are limits too tight?
Step 2: Fix contracts
Adjust contracts based on shadow data:
- Too many false blocks? Relax preconditions or add phases.
- Missing blocks? Add
forbids_afteror tighten argument invariants. - Phase too restrictive? Allow more tools in that phase.
Step 3: Enable enforcement
Switch to mode: "enforce" when shadow shows zero false positives:
const session = replay(client, {
mode: "enforce",
gate: "reject_all",
});
Step 4: Add server backing (optional)
For production with audit needs, add API key and tool wrappers:
const session = replay(client, {
mode: "enforce",
apiKey: process.env.VESANOR_API_KEY,
tools: { issue_refund: myRefundFunction },
});
Counterfactual capture
Shadow mode captures the counterfactual — what was prevented and why. This is valuable for:
- Debugging — "Why would this call have been blocked?"
- Compliance — GDPR requires "meaningful information about the logic involved" in automated decisions. Counterfactual capture satisfies this directly.
- Tuning — Compare shadow results across model versions to see which model triggers more enforcement.
Important limitation
After shadow mode allows a call that enforce mode would block, the model is on a different execution path. It received feedback it wouldn't have received in enforce mode. All subsequent shadow projections are approximations, not exact counterfactuals.
Shadow mode tells you what enforcement would do on your real traffic. It does not guarantee what enforcement will do — because enforcement changes the model's behavior.
You can access the last shadow delta programmatically:
const delta = session.getLastShadowDelta();
if (delta) {
console.log("Would have blocked:", delta.would_have_blocked.length);
console.log("Would have narrowed:", delta.would_have_narrowed.length);
console.log("Current phase:", delta.current_phase);
console.log("Legal next phases:", delta.legal_next_phases);
}
Shadow coverage tracking
Shadow mode can only validate tool calls the shadow LLM actually makes. If a tool is never attempted, it's invisible to shadow analysis. Shadow coverage tracking measures these blind spots.
What it tracks
After each shadow run, a coverage record is emitted:
- Tools available — tools in the request's tool set
- Tools observed — tools the shadow LLM actually called
- Tool pairs — which tools were called together in the same session
Over time, these accumulate into a coverage ledger per agent and model pair.
Coverage report
Shadow Coverage — agent: trading-agent, models: gpt-4o -> gpt-4o-mini
Period: 2026-03-01 to 2026-03-24 (142 runs)
Tool Available Observed Coverage
────────────────────────────────────────────────────
get_market_data 142 138 97.2%
approve_risk_check 142 89 62.7%
submit_live_order 142 11 7.7% <- LOW
cancel_order 142 0 0.0% <- ZERO
Classification
| Coverage | Classification | Meaning |
|---|---|---|
| 0% | zero | Never observed. Complete blind spot. |
| 1-25% | low | Rarely observed. Shadow testing is thin. |
| 26-75% | partial | Sometimes observed. May miss edge cases. |
| 76-100% | good | Frequently observed. Reasonable confidence. |
Important limitations
- Coverage measures breadth, not depth — a tool called once with trivial args counts as "observed"
- Coverage is per-session, not cross-session — it tracks what shadow runs have tested, not what's possible
- 100% coverage doesn't mean all argument combinations have been tested
Accessing coverage data
Coverage data is available through:
- Dashboard — Shadow page includes a coverage table
- CLI —
vesanor doctorincludes shadow coverage in its health report - API —
GET /api/dashboard/shadow/coverage?agent=<name>
Checkpoint behavior in shadow mode
Checkpoints do not trigger in shadow mode. Pausing for human approval defeats the purpose of observational testing. Instead, shadow mode logs a diagnostic: "checkpoint would have triggered".
Shadow vs log-only
| Mode | Enforcement computed? | Captures sent? | Blocks calls? |
|---|---|---|---|
shadow | Yes — full pipeline | Yes | No |
log-only | No — just captures | Yes | No |
Use shadow when you want to evaluate contracts. Use log-only when you only want capture/observability with no enforcement computation.
log-only mode
log-only is the lightest mode. No enforcement pipeline runs — no narrowing, no validation, no shadow deltas. Calls pass through directly to the provider and captures are recorded for observability.
const session = replay(client, {
contractsDir: "./contracts",
agent: "my-agent",
mode: "log-only",
apiKey: process.env.VESANOR_API_KEY,
});
Use for:
- Pure observability with zero overhead
- Recovery sessions after a kill (capture what the recovery agent does)
- Migrating from
observe()—log-onlyis equivalent toobserve()with thereplay()API
Next steps
- Protection Levels — understand all three levels
- Govern Mode — enable server-backed enforcement
- Phases & Transitions — design the contracts shadow will evaluate