Skip to main content

SDK Integration — Vesanor

The Vesanor SDK lets you validate tool calls and capture observations directly from your application code. Instead of running tests separately with the CLI, you embed validation into your LLM pipeline.

Two packages:

  • @vesanor/replay — observe tool calls passively and validate responses against contracts
  • @vesanor/contracts-core — shared contract types and evaluation engine (used internally by @vesanor/replay)

Install

npm install @vesanor/replay

This pulls in @vesanor/contracts-core automatically. You also need your LLM provider SDK:

# OpenAI
npm install openai

# Anthropic
npm install @anthropic-ai/sdk

Both provider SDKs are optional peer dependencies — install whichever you use.


Observe — passive capture

observe() wraps your LLM client and captures every tool call in the background. No code changes to your existing logic.

import OpenAI from "openai";
import { observe } from "@vesanor/replay";

const openai = new OpenAI();

// Start observing — captures all tool calls automatically
const handle = observe(openai, {
apiKey: process.env.VESANOR_API_KEY,
agent: "my-agent",
});

// Use your client normally — observe is transparent
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "What's the weather in SF?" }],
tools: [{ type: "function", function: { name: "get_weather", ... } }],
});

// When done, restore the original client
handle.restore();

How it works

  1. observe() patches client.chat.completions.create (OpenAI) or client.messages.create (Anthropic)
  2. Each call is intercepted, the response captured, and the original response returned untouched
  3. Captures are batched and sent to POST /api/v1/captures asynchronously
  4. The server auto-generates contracts from captured calls (visible in the dashboard Contracts page)

Provider detection

The SDK auto-detects your provider from the client shape:

  • OpenAI — patches client.chat.completions.create
  • Anthropic — patches client.messages.create

No configuration needed. If detection fails, observe() returns a no-op handle (your app continues normally).

observe() requires an API key

Without apiKey, observe() logs one warning and then becomes a no-op — no captures are recorded, nothing is sent to the server, and no local storage occurs. Your application continues to work normally, but observation is completely inactive. Always provide apiKey for observe() to function. If you need local-only validation without a server, use replay() instead.

Options

observe(client, {
// Required
apiKey: "vsn_...", // or set VESANOR_API_KEY env var

// Optional
agent: "my-agent", // agent name for grouping (default: "default")
captureLevel: "redacted", // privacy tier (see below)
endpoint: "https://...", // custom API endpoint
maxBuffer: 100, // max buffered items (default: 100, max: 1000)
flushMs: 5000, // flush interval in ms (default: 5000)
timeoutMs: 5000, // API timeout in ms (default: 5000, max: 10000)
disabled: false, // disable capture entirely
diagnostics: (event) => {}, // callback for diagnostic events
});

Privacy tiers

Control what gets captured with captureLevel:

TierTool namesArgumentsMessagesContent
metadataYesNoNoNo
redacted (default)YesYesNoNo
fullYesYesYesYes

Streaming support

observe() handles streaming responses automatically. When your code uses stream: true, the SDK collects chunks and captures the complete response after the stream finishes.

Circuit breaker

If the capture API fails 5 times in a row, the SDK auto-disables for 10 minutes. Your application is never affected by capture failures.

Disable at runtime

# Disable via environment variable
export VESANOR_DISABLE=true

Or pass disabled: true in options.


Validate — in-code contract checks

validate() checks an LLM response against your contracts synchronously. Use this to catch contract violations at runtime.

import OpenAI from "openai";
import { prepareContracts, validate } from "@vesanor/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");

const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "What's the weather?" }],
tools: [...],
});

const result = validate(response, { contracts });

if (!result.pass) {
console.error("Contract violations:", result.failures);
// [{ path: "$.tool_calls[0].name", operator: "equals",
// expected: "get_weather", found: "search", message: "..." }]
}

Loading contracts

prepareContracts() accepts multiple input formats:

// From a pack directory (loads all contracts)
const contracts = prepareContracts("./packs/my-pack");

// From specific files
const contracts = prepareContracts([
"./contracts/weather.yaml",
"./contracts/search.yaml",
]);

// From contract objects directly
const contracts = prepareContracts({
tool: "get_weather",
assertions: {
output_invariants: [
{ path: "$.tool_calls[0].name", equals: "get_weather" },
],
},
});

Validation result

type ValidationResult = {
pass: boolean; // true if all contracts pass
failures: ContractFailure[]; // list of violations
matched_contracts: number; // how many contracts matched
unmatched_tools: string[]; // tool calls with no matching contract
evaluation_ms: number; // how long validation took
};

type ContractFailure = {
path: string; // JSON path that failed (e.g. "$.tool_calls[0].name")
operator: string; // which check failed (e.g. "equals", "type", "exists")
expected: unknown; // what the contract expected
found: unknown; // what the response contained
message?: string; // human-readable description
contract_file?: string; // which contract file triggered this
};

Unmatched tool policy

By default, tool calls with no matching contract cause a failure. Change this with unmatchedPolicy:

// Fail if any tool call has no contract (default)
validate(response, { contracts, unmatchedPolicy: "deny" });

// Ignore tool calls without contracts
validate(response, { contracts, unmatchedPolicy: "allow" });

Provider response formats

validate() handles responses from both OpenAI and Anthropic natively. It auto-detects the format and normalizes tool calls for evaluation.

// OpenAI response — works directly
const openaiResponse = await openai.chat.completions.create({ ... });
validate(openaiResponse, { contracts });

// Anthropic response — works directly
const anthropicResponse = await anthropic.messages.create({ ... });
validate(anthropicResponse, { contracts });

// Pre-normalized response — also works
validate({
tool_calls: [{ id: "1", name: "get_weather", arguments: '{"location":"SF"}' }],
}, { contracts });

Multi-turn agents

The CLI runner evaluates one LLM response per contract — it does not chain tool results back into follow-up turns. For agents that call multiple tools across a conversation loop, use validate() inside your agent's own loop to check each turn independently.

import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@vesanor/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-agent");

// Start observing all turns
const handle = observe(openai, {
apiKey: process.env.VESANOR_API_KEY,
agent: "export-compliance",
});

// Your agent's conversation loop
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
{ role: "system", content: "You are an export compliance agent..." },
{ role: "user", content: "Classify this shipment to Canada..." },
];

const turnResults = [];

for (let turn = 0; turn < maxTurns; turn++) {
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages,
tools,
});

// Validate THIS turn's tool calls against contracts
const result = validate(response, { contracts, unmatchedPolicy: "allow" });
turnResults.push({ turn, ...result });

if (!result.pass) {
console.error(`Turn ${turn} contract violation:`, result.failures);
// Decide: retry, fallback, or abort
}

// Extract tool calls and feed results back for next turn
const toolCalls = response.choices[0].message.tool_calls;
if (!toolCalls || toolCalls.length === 0) break;

messages.push(response.choices[0].message);
for (const tc of toolCalls) {
const toolResult = await executeToolLocally(tc);
messages.push({ role: "tool", tool_call_id: tc.id, content: toolResult });
}
}

// Summary: did every turn pass?
const allPassed = turnResults.every(r => r.pass);
console.log(`Agent finished: ${turnResults.length} turns, all passed: ${allPassed}`);

handle.restore();

Why the CLI can't do this

Each CLI contract maps to one fixture (one request/response pair). A 7-tool agent that calls tools across 7 turns will only trigger 1 tool call per CLI run — the other 6 happen in subsequent turns that the CLI doesn't execute. This isn't a bug; the CLI tests individual contract compliance, not full agent trajectories.

Use the SDK for multi-turn validation, and the CLI for single-turn regression testing in CI.


Observe + Validate together

The most common pattern uses both: observe captures calls for the dashboard, validate catches violations in real-time.

import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@vesanor/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");

// Start observing
const handle = observe(openai, {
apiKey: process.env.VESANOR_API_KEY,
agent: "weather-agent",
});

// Make the call
const response = await openai.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "What's the weather?" }],
tools: [...],
});

// Validate locally
const result = validate(response, { contracts });
if (!result.pass) {
// Handle violation — log, retry, fallback, etc.
}

// Clean up when done
handle.restore();

Diagnostics

Pass a diagnostics callback to observe() to trace what the SDK is doing:

observe(openai, {
apiKey: process.env.VESANOR_API_KEY,
diagnostics: (event) => {
switch (event.type) {
case "double_wrap":
// observe() called twice on same client
console.warn("Client already observed");
break;
case "unsupported_client":
// Client shape not recognized
console.warn("Unsupported client:", event.detail);
break;
case "buffer_overflow":
// Too many captures buffered
console.warn(`Dropped ${event.dropped} captures`);
break;
}
},
});

Environment variables

VariableDescription
VESANOR_API_KEYAPI key for capture ingestion (fallback if not passed in options)
VESANOR_DISABLESet to true to disable all capture (1, yes, on also work)
VESANOR_API_URLCustom API endpoint (default: https://app.vesanor.com)

What happens after capture

Once observe() sends captures to the server:

  1. Contracts auto-generated — the server infers contracts from observed tool calls (structure, types, schema bounds)
  2. Confidence scoring — contracts gain confidence as more samples arrive (low < 5, medium 5–9, high ≥ 10)
  3. Dashboard visibility — captured tools appear on the Contracts page with coverage analysis
  4. Guard evaluation — the Guard page shows pass rates and failure patterns across all captured calls

See Dashboard Guide for how to review and promote auto-generated contracts.


Next steps

Runtime enforcement (recommended):

CI-time validation: