Skip to main content

What is Vesanor?

Vesanor is an LLM function-calling reliability platform. It validates, monitors, and governs the tool calls your AI agents make — across providers like OpenAI and Anthropic — so that when a model silently changes behavior, your app doesn't break.


The problem

LLM-powered agents work by calling tools: looking up customers, issuing refunds, deploying services, querying databases. These tool calls are your agent's actions in the real world — and they're the most dangerous part of your AI stack.

The problem is that models change without warning:

  • Silent model updates — your provider ships a new version and your agent starts skipping a required tool, or calling tools in the wrong order
  • Cross-provider drift — you switch from OpenAI to Anthropic and discover they format arguments differently, or make different tool choices for the same prompt
  • No regression tests — traditional unit tests don't cover LLM behavior. You can't write expect(model.response).toBe(...) because responses are non-deterministic
  • Runtime surprises — your agent calls delete_account when it should have called suspend_account, and you only find out from a customer complaint

There's no way to know if your agent's tool-calling behavior is correct, consistent, or safe — unless you have a system that checks every call against a contract.


What Vesanor does

Vesanor gives you contracts — declarative YAML rules that define what your agent should do. Then it enforces them at two levels:

CI-time validation

Catch regressions before they ship:

  • Auto-generate contracts from real traffic — wrap your client with observe() and Vesanor builds contracts from what the model actually does
  • Run contracts in CI — deterministic, offline, free. Recorded fixtures replay in milliseconds with no API calls
  • Two-lane CI — Lane A blocks merges (recorded fixtures, deterministic). Lane B runs live models for advisory evidence
  • Track every failure — each failure gets a fingerprint that stays the same when the same issue recurs. See trends, not noise

Runtime workflow governance

Control what your agent can do on governed workflow paths in production:

  • Block illegal tool calls before they execute — not after
  • Phase-based state machine — the model only sees tools valid in its current workflow phase
  • Preconditions — "you must call verify_identity before issue_refund"
  • Session limits — cost caps, call limits, loop detection, circuit breakers
  • Kill switch — immediately halt any session

Who is Vesanor for?

Engineering teams shipping LLM agents to production. If your AI calls tools — APIs, databases, external services — and you need confidence that it does the right thing reliably, Vesanor is for you.

Common use cases:

  • Support agents that look up customers, check eligibility, and issue refunds
  • DevOps agents that deploy services, check health, and create incident tickets
  • Compliance agents that classify documents, run checks, and generate reports
  • Security agents that triage alerts, scan systems, and isolate threats

If your agent's tool calls have real-world consequences and follow repeatable workflows, you need contracts.


What makes Vesanor different

Contracts, not vibes

Most LLM testing is qualitative — "does this response look right?" Vesanor tests are deterministic and structural: did the model call the right tools, with the right arguments, in the right order? Contracts are YAML files with precise assertions. No fuzzy matching, no subjective scoring.

Observe, then enforce

You don't have to write contracts from scratch. observe() wraps your existing client, captures real tool calls, and auto-generates contracts on the dashboard. Review what was generated, tighten the rules, and promote to enforcement. The loop is: observe → promote → enforce.

Provider-agnostic

Write contracts once, run them against OpenAI, Anthropic, or any future provider. The same YAML, the same assertions, the same golden fixtures. Vesanor normalizes provider-specific response formats so your tests are portable.

Two-lane CI

Your merge gate runs against recorded fixtures — deterministic, offline, zero API cost. Live model testing runs in a separate advisory lane that never blocks merges. You get safety and evidence without coupling your CI to third-party API availability.

Runtime workflow governance

CI catches regressions. replay() catches runtime surprises. It wraps your LLM client and enforces contracts on every call — blocking illegal tool calls, enforcing workflow phases, and providing a kill switch for runaway agents. Three protection levels: Monitor (observe), Protect (enforce locally), Govern (durable server-backed state and evidence).

Fingerprinted failures

Every failure gets a short hash (fingerprint) that stays stable when the same issue recurs. Instead of a wall of logs, you see unique failure patterns with trend lines. Regressions are immediately obvious.


How the pieces fit together

                     Your LLM Agent
|
+-----------+-----------+
| |
observe() replay()
(passive capture) (workflow governance)
| |
v v
Auto-generate Enforce contracts
contracts on every call
| |
+----------+------------+
|
Dashboard
(review, promote, monitor)
|
+----------+------------+
| |
CI Gate Baselines
(recorded fixtures, (drift detection,
merge-blocking) fingerprints)
  1. Observe — wrap your client with observe() or run vesanor observe to capture tool calls and auto-generate contracts
  2. Review — inspect auto-generated contracts on the dashboard, tweak assertions, check coverage
  3. Promote — turn draft contracts into enforced truth contracts
  4. Test in CI — run contracts against recorded fixtures (Lane A, merge-blocking) and live models (Lane B, advisory)
  5. Govern at runtime — wrap your client with replay() to block illegal tool calls before they execute
  6. Monitor — track baselines, detect drift, see failure trends on the dashboard

Next steps

Get started fast:

  • Quickstart — first test in 60 seconds, or wrap your real app in 5 minutes
  • How Vesanor Works — key concepts before diving into code

Go deeper: