Skip to main content

CI Integration — Vesanor

Add Vesanor to your CI pipeline to catch tool-call regressions before they reach production. This guide covers GitHub Actions, GitLab CI, and general CI setup.


Quick setup

The simplest CI integration is one line:

npx vesanor --provider recorded

This runs your contracts against recorded fixtures — no API keys needed, no network calls, deterministic results. If any contract fails, the command exits with a non-zero code and your CI pipeline fails.


GitHub Actions

Basic workflow

# .github/workflows/vesanor.yml
name: Vesanor Gate

on:
pull_request:
branches: [main]
push:
branches: [main]

jobs:
vesanor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-node@v4
with:
node-version: '20'

- run: npm ci

- name: Run Vesanor contracts
run: npx vesanor --provider recorded

That's it. Contracts run against recorded fixtures, results appear in your PR checks.

With dashboard push

To see results in the Vesanor dashboard, add your API key:

      - name: Run Vesanor contracts
env:
VESANOR_API_KEY: ${{ secrets.VESANOR_API_KEY }}
run: npx vesanor --provider recorded

Results are automatically pushed to app.vesanor.com where you can see run history, failure trends, and fingerprint tracking. Push is non-blocking — if the dashboard is unreachable, your CI still passes or fails based on contract results alone.

With live provider testing

For advisory live-provider testing alongside your recorded gate:

jobs:
# Hard gate — recorded fixtures, merge-blocking
vesanor-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Recorded gate
run: npx vesanor --provider recorded

# Advisory — live provider, non-blocking
vesanor-live:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
continue-on-error: true # never blocks merge
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Live provider check
env:
VESANOR_PROVIDER_KEY: ${{ secrets.OPENAI_API_KEY }}
VESANOR_API_KEY: ${{ secrets.VESANOR_API_KEY }}
run: npx vesanor --provider openai --model gpt-4o-mini

This follows the two-lane pattern: the recorded gate blocks merges (Lane A), while the live check provides advisory evidence (Lane B).


GitLab CI

# .gitlab-ci.yml
vesanor:
image: node:20
stage: test
script:
- npm ci
- npx vesanor --provider recorded
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"

With dashboard push:

vesanor:
image: node:20
stage: test
variables:
VESANOR_API_KEY: $VESANOR_API_KEY
script:
- npm ci
- npx vesanor --provider recorded

General CI setup

For any CI system, the pattern is the same:

  1. Install dependencies (npm ci)
  2. Run npx vesanor --provider recorded
  3. Check the exit code

Exit codes

CodeMeaningCI behavior
0All contracts passedPipeline passes
1One or more contracts failed, or a runtime errorPipeline fails
2Drift or unknown-rate gate failurePipeline fails

JSON output in CI

In non-TTY environments (CI pipelines, piped output), Vesanor automatically outputs JSON instead of pretty-printed text. You can also force it with --json:

npx vesanor --provider recorded --json | jq '.provider_run.steps[] | select(.status == "Fail")'

The JSON output includes everything you need for CI integration:

{
"pack": "packs/starter",
"contracts_count": 1,
"provider_run": {
"provider": "recorded",
"model": "recorded",
"steps": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"status": "Pass",
"fingerprint": "71ac81a7..."
}
]
}
}

Parsing results

Extract specific information with jq:

# Count failures
npx vesanor --json | jq '[.provider_run.steps[] | select(.status == "Fail")] | length'

# Get failure fingerprints
npx vesanor --json | jq '.provider_run.steps[] | select(.status == "Fail") | .fingerprint'

# Check if determinism proof passed
npx vesanor --repeat 3 --json | jq '.determinism_proof.proven'

The two-lane CI model

Vesanor is designed around a two-lane CI architecture that separates deterministic safety from live advisory testing.

Lane A — Hard gate (merge-blocking)

  • Runs against recorded fixtures only (no live API calls)
  • Deterministic — same input always produces the same output
  • Fast — no network latency
  • Free — no API costs
  • Blocks merges on failure

Lane A catches regressions in your contract definitions, fixture files, and runner logic. If a recorded test that used to pass now fails, something changed in your code.

What blocks a merge:

  • A contract assertion fails against a recorded fixture
  • A new failure fingerprint appears that wasn't in the previous baseline
  • The unknown classification rate exceeds 20%
  • A NonReproducible result on the deterministic corpus (without an allowlist entry)

Lane B — Evidence lane (advisory)

  • Runs against live providers (OpenAI, Anthropic, etc.)
  • Non-deterministic — model responses can vary
  • Advisory only — never blocks merges
  • Results feed the dashboard for trending and comparison

Lane B answers questions like: "Does gpt-4o-mini still call my tools correctly?" and "How does Anthropic compare to OpenAI on my contracts?"

Setting up both lanes

The two-lane model maps naturally to CI jobs:

# Lane A: merge-blocking
vesanor-gate:
script: npx vesanor --provider recorded

# Lane B: advisory
vesanor-live:
allow_failure: true
script: npx vesanor --provider openai --model gpt-4o-mini

Determinism proof in CI

Verify that your provider returns consistent results by running contracts multiple times:

npx vesanor --provider openai --model gpt-4o-mini --repeat 3

This runs every contract 3 times and compares fingerprints. If all runs produce identical fingerprints for each step, the proof passes.

The JSON output includes a determinism_proof field:

{
"determinism_proof": {
"proven": true,
"total_runs": 3,
"per_step": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"deterministic": true,
"run_count": 3
}
]
}
}

Use this to build confidence before promoting a model version or adding new contracts.


Keeping recordings up to date

When you change your tool definitions or message prompts, your recorded fixtures become stale. The recorded provider detects this via boundary hash validation and flags it as NonReproducible.

To refresh your recordings:

# Re-capture from live provider
npx vesanor --provider openai --model gpt-4o-mini --capture-recordings

# Verify the new recordings pass
npx vesanor --provider recorded

# Commit the updated recordings
git add packs/*/recordings/
git commit -m "Update recorded fixtures"

A good workflow is to update recordings in a dedicated PR, separate from feature changes. This keeps your CI gate stable and makes recording changes easy to review.


Environment variables reference

VariablePurposeRequired in CI?
VESANOR_API_KEYPush results to dashboardNo (optional)
VESANOR_PROVIDER_KEYAPI key for live providersOnly for Lane B
VESANOR_API_URLOverride dashboard URLNo (default: https://app.vesanor.com)

For recorded-only CI (Lane A), no environment variables are needed.


Next steps

  • Write contracts: See Writing Tests for the full YAML format
  • Understand providers: See Providers for how provider abstraction works
  • Debug failures: See Troubleshooting for common CI issues