SDK Integration — Vesanor

The Vesanor SDK lets you validate tool calls and capture observations directly from your application code. Instead of running tests separately with the CLI, you embed validation into your LLM pipeline.

Two packages:

@vesanor/replay — observe tool calls passively and validate responses against contracts
@vesanor/contracts-core — shared contract types and evaluation engine (used internally by @vesanor/replay)

Install

npm install @vesanor/replay

This pulls in @vesanor/contracts-core automatically. You also need your LLM provider SDK:

# OpenAI
npm install openai

# Anthropic
npm install @anthropic-ai/sdk

Both provider SDKs are optional peer dependencies — install whichever you use.

Observe — passive capture

observe() wraps your LLM client and captures every tool call in the background. No code changes to your existing logic.

import OpenAI from "openai";
import { observe } from "@vesanor/replay";

const openai = new OpenAI();

// Start observing — captures all tool calls automatically
const handle = observe(openai, {
  apiKey: process.env.VESANOR_API_KEY,
  agent: "my-agent",
});

// Use your client normally — observe is transparent
const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather in SF?" }],
  tools: [{ type: "function", function: { name: "get_weather", ... } }],
});

// When done, restore the original client
handle.restore();

How it works

observe() patches client.chat.completions.create (OpenAI) or client.messages.create (Anthropic)
Each call is intercepted, the response captured, and the original response returned untouched
Captures are batched and sent to POST /api/v1/captures asynchronously
The server auto-generates contracts from captured calls (visible in the dashboard Contracts page)

Provider detection

The SDK auto-detects your provider from the client shape:

OpenAI — patches client.chat.completions.create
Anthropic — patches client.messages.create

No configuration needed. If detection fails, observe() returns a no-op handle (your app continues normally).

observe() requires an API key

Without apiKey, observe() logs one warning and then becomes a no-op — no captures are recorded, nothing is sent to the server, and no local storage occurs. Your application continues to work normally, but observation is completely inactive. Always provide apiKey for observe() to function. If you need local-only validation without a server, use replay() instead.

Options

observe(client, {
  // Required
  apiKey: "vsn_...",       // or set VESANOR_API_KEY env var

  // Optional
  agent: "my-agent",           // agent name for grouping (default: "default")
  captureLevel: "redacted",    // privacy tier (see below)
  endpoint: "https://...",     // custom API endpoint
  maxBuffer: 100,              // max buffered items (default: 100, max: 1000)
  flushMs: 5000,               // flush interval in ms (default: 5000)
  timeoutMs: 5000,             // API timeout in ms (default: 5000, max: 10000)
  disabled: false,             // disable capture entirely
  diagnostics: (event) => {},  // callback for diagnostic events
});

Privacy tiers

Control what gets captured with captureLevel:

Tier	Tool names	Arguments	Messages	Content
`metadata`	Yes	No	No	No
`redacted` (default)	Yes	Yes	No	No
`full`	Yes	Yes	Yes	Yes

Streaming support

observe() handles streaming responses automatically. When your code uses stream: true, the SDK collects chunks and captures the complete response after the stream finishes.

Circuit breaker

If the capture API fails 5 times in a row, the SDK auto-disables for 10 minutes. Your application is never affected by capture failures.

Disable at runtime

# Disable via environment variable
export VESANOR_DISABLE=true

Or pass disabled: true in options.

Validate — in-code contract checks

validate() checks an LLM response against your contracts synchronously. Use this to catch contract violations at runtime.

import OpenAI from "openai";
import { prepareContracts, validate } from "@vesanor/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");

const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather?" }],
  tools: [...],
});

const result = validate(response, { contracts });

if (!result.pass) {
  console.error("Contract violations:", result.failures);
  // [{ path: "$.tool_calls[0].name", operator: "equals",
  //    expected: "get_weather", found: "search", message: "..." }]
}

Loading contracts

prepareContracts() accepts multiple input formats:

// From a pack directory (loads all contracts)
const contracts = prepareContracts("./packs/my-pack");

// From specific files
const contracts = prepareContracts([
  "./contracts/weather.yaml",
  "./contracts/search.yaml",
]);

// From contract objects directly
const contracts = prepareContracts({
  tool: "get_weather",
  assertions: {
    output_invariants: [
      { path: "$.tool_calls[0].name", equals: "get_weather" },
    ],
  },
});

Validation result

type ValidationResult = {
  pass: boolean;               // true if all contracts pass
  failures: ContractFailure[]; // list of violations
  matched_contracts: number;   // how many contracts matched
  unmatched_tools: string[];   // tool calls with no matching contract
  evaluation_ms: number;       // how long validation took
};

type ContractFailure = {
  path: string;          // JSON path that failed (e.g. "$.tool_calls[0].name")
  operator: string;      // which check failed (e.g. "equals", "type", "exists")
  expected: unknown;     // what the contract expected
  found: unknown;        // what the response contained
  message?: string;      // human-readable description
  contract_file?: string; // which contract file triggered this
};

Unmatched tool policy

By default, tool calls with no matching contract cause a failure. Change this with unmatchedPolicy:

// Fail if any tool call has no contract (default)
validate(response, { contracts, unmatchedPolicy: "deny" });

// Ignore tool calls without contracts
validate(response, { contracts, unmatchedPolicy: "allow" });

Provider response formats

validate() handles responses from both OpenAI and Anthropic natively. It auto-detects the format and normalizes tool calls for evaluation.

// OpenAI response — works directly
const openaiResponse = await openai.chat.completions.create({ ... });
validate(openaiResponse, { contracts });

// Anthropic response — works directly
const anthropicResponse = await anthropic.messages.create({ ... });
validate(anthropicResponse, { contracts });

// Pre-normalized response — also works
validate({
  tool_calls: [{ id: "1", name: "get_weather", arguments: '{"location":"SF"}' }],
}, { contracts });

Multi-turn agents

The CLI runner evaluates one LLM response per contract — it does not chain tool results back into follow-up turns. For agents that call multiple tools across a conversation loop, use validate() inside your agent's own loop to check each turn independently.

import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@vesanor/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-agent");

// Start observing all turns
const handle = observe(openai, {
  apiKey: process.env.VESANOR_API_KEY,
  agent: "export-compliance",
});

// Your agent's conversation loop
const messages: OpenAI.Chat.ChatCompletionMessageParam[] = [
  { role: "system", content: "You are an export compliance agent..." },
  { role: "user", content: "Classify this shipment to Canada..." },
];

const turnResults = [];

for (let turn = 0; turn < maxTurns; turn++) {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages,
    tools,
  });

  // Validate THIS turn's tool calls against contracts
  const result = validate(response, { contracts, unmatchedPolicy: "allow" });
  turnResults.push({ turn, ...result });

  if (!result.pass) {
    console.error(`Turn ${turn} contract violation:`, result.failures);
    // Decide: retry, fallback, or abort
  }

  // Extract tool calls and feed results back for next turn
  const toolCalls = response.choices[0].message.tool_calls;
  if (!toolCalls || toolCalls.length === 0) break;

  messages.push(response.choices[0].message);
  for (const tc of toolCalls) {
    const toolResult = await executeToolLocally(tc);
    messages.push({ role: "tool", tool_call_id: tc.id, content: toolResult });
  }
}

// Summary: did every turn pass?
const allPassed = turnResults.every(r => r.pass);
console.log(`Agent finished: ${turnResults.length} turns, all passed: ${allPassed}`);

handle.restore();

Why the CLI can't do this

Each CLI contract maps to one fixture (one request/response pair). A 7-tool agent that calls tools across 7 turns will only trigger 1 tool call per CLI run — the other 6 happen in subsequent turns that the CLI doesn't execute. This isn't a bug; the CLI tests individual contract compliance, not full agent trajectories.

Use the SDK for multi-turn validation, and the CLI for single-turn regression testing in CI.

Observe + Validate together

The most common pattern uses both: observe captures calls for the dashboard, validate catches violations in real-time.

import OpenAI from "openai";
import { observe, prepareContracts, validate } from "@vesanor/replay";

const openai = new OpenAI();
const contracts = prepareContracts("./packs/my-pack");

// Start observing
const handle = observe(openai, {
  apiKey: process.env.VESANOR_API_KEY,
  agent: "weather-agent",
});

// Make the call
const response = await openai.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "What's the weather?" }],
  tools: [...],
});

// Validate locally
const result = validate(response, { contracts });
if (!result.pass) {
  // Handle violation — log, retry, fallback, etc.
}

// Clean up when done
handle.restore();

Diagnostics

Pass a diagnostics callback to observe() to trace what the SDK is doing:

observe(openai, {
  apiKey: process.env.VESANOR_API_KEY,
  diagnostics: (event) => {
    switch (event.type) {
      case "double_wrap":
        // observe() called twice on same client
        console.warn("Client already observed");
        break;
      case "unsupported_client":
        // Client shape not recognized
        console.warn("Unsupported client:", event.detail);
        break;
      case "buffer_overflow":
        // Too many captures buffered
        console.warn(`Dropped ${event.dropped} captures`);
        break;
    }
  },
});

Environment variables

Variable	Description
`VESANOR_API_KEY`	API key for capture ingestion (fallback if not passed in options)
`VESANOR_DISABLE`	Set to `true` to disable all capture (`1`, `yes`, `on` also work)
`VESANOR_API_URL`	Custom API endpoint (default: `https://app.vesanor.com`)

What happens after capture

Once observe() sends captures to the server:

Contracts auto-generated — the server infers contracts from observed tool calls (structure, types, schema bounds)
Confidence scoring — contracts gain confidence as more samples arrive (low < 5, medium 5–9, high ≥ 10)
Dashboard visibility — captured tools appear on the Contracts page with coverage analysis
Guard evaluation — the Guard page shows pass rates and failure patterns across all captured calls

See Dashboard Guide for how to review and promote auto-generated contracts.

Next steps

Runtime enforcement (recommended):

Replay Quickstart — upgrade from observe/validate to workflow governance in 5 minutes
Migrating from observe() — step-by-step migration guide
Why Runtime Governance? — why per-call validation isn't enough

CI-time validation:

Write contracts manually: See Writing Tests for the full YAML format
Auto-generate with CLI: See Observe Guide for CLI-based contract generation
Review in dashboard: See Dashboard Guide for the contract review workspace
Set up CI: See CI Integration for automated testing

Install​

Observe — passive capture​

How it works​

Provider detection​

Options​

Privacy tiers​

Streaming support​

Circuit breaker​

Disable at runtime​

Validate — in-code contract checks​

Loading contracts​

Validation result​

Unmatched tool policy​

Provider response formats​

Multi-turn agents​

Why the CLI can't do this​

Observe + Validate together​

Diagnostics​

Environment variables​

What happens after capture​

Next steps​