CI Integration — Vesanor
Add Vesanor to your CI pipeline to catch tool-call regressions before they reach production. This guide covers GitHub Actions, GitLab CI, and general CI setup.
Quick setup
The simplest CI integration is one line:
npx vesanor --provider recorded
This runs your contracts against recorded fixtures — no API keys needed, no network calls, deterministic results. If any contract fails, the command exits with a non-zero code and your CI pipeline fails.
GitHub Actions
Basic workflow
# .github/workflows/vesanor.yml
name: Vesanor Gate
on:
pull_request:
branches: [main]
push:
branches: [main]
jobs:
vesanor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Run Vesanor contracts
run: npx vesanor --provider recorded
That's it. Contracts run against recorded fixtures, results appear in your PR checks.
With dashboard push
To see results in the Vesanor dashboard, add your API key:
- name: Run Vesanor contracts
env:
VESANOR_API_KEY: ${{ secrets.VESANOR_API_KEY }}
run: npx vesanor --provider recorded
Results are automatically pushed to app.vesanor.com where you can see run history, failure trends, and fingerprint tracking. Push is non-blocking — if the dashboard is unreachable, your CI still passes or fails based on contract results alone.
With live provider testing
For advisory live-provider testing alongside your recorded gate:
jobs:
# Hard gate — recorded fixtures, merge-blocking
vesanor-gate:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Recorded gate
run: npx vesanor --provider recorded
# Advisory — live provider, non-blocking
vesanor-live:
runs-on: ubuntu-latest
if: github.event_name == 'pull_request'
continue-on-error: true # never blocks merge
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Live provider check
env:
VESANOR_PROVIDER_KEY: ${{ secrets.OPENAI_API_KEY }}
VESANOR_API_KEY: ${{ secrets.VESANOR_API_KEY }}
run: npx vesanor --provider openai --model gpt-4o-mini
This follows the two-lane pattern: the recorded gate blocks merges (Lane A), while the live check provides advisory evidence (Lane B).
GitLab CI
# .gitlab-ci.yml
vesanor:
image: node:20
stage: test
script:
- npm ci
- npx vesanor --provider recorded
rules:
- if: $CI_PIPELINE_SOURCE == "merge_request_event"
- if: $CI_COMMIT_BRANCH == "main"
With dashboard push:
vesanor:
image: node:20
stage: test
variables:
VESANOR_API_KEY: $VESANOR_API_KEY
script:
- npm ci
- npx vesanor --provider recorded
General CI setup
For any CI system, the pattern is the same:
- Install dependencies (
npm ci) - Run
npx vesanor --provider recorded - Check the exit code
Exit codes
| Code | Meaning | CI behavior |
|---|---|---|
0 | All contracts passed | Pipeline passes |
1 | One or more contracts failed, or a runtime error | Pipeline fails |
2 | Drift or unknown-rate gate failure | Pipeline fails |
JSON output in CI
In non-TTY environments (CI pipelines, piped output), Vesanor automatically outputs JSON instead of pretty-printed text. You can also force it with --json:
npx vesanor --provider recorded --json | jq '.provider_run.steps[] | select(.status == "Fail")'
The JSON output includes everything you need for CI integration:
{
"pack": "packs/starter",
"contracts_count": 1,
"provider_run": {
"provider": "recorded",
"model": "recorded",
"steps": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"status": "Pass",
"fingerprint": "71ac81a7..."
}
]
}
}
Parsing results
Extract specific information with jq:
# Count failures
npx vesanor --json | jq '[.provider_run.steps[] | select(.status == "Fail")] | length'
# Get failure fingerprints
npx vesanor --json | jq '.provider_run.steps[] | select(.status == "Fail") | .fingerprint'
# Check if determinism proof passed
npx vesanor --repeat 3 --json | jq '.determinism_proof.proven'
The two-lane CI model
Vesanor is designed around a two-lane CI architecture that separates deterministic safety from live advisory testing.
Lane A — Hard gate (merge-blocking)
- Runs against recorded fixtures only (no live API calls)
- Deterministic — same input always produces the same output
- Fast — no network latency
- Free — no API costs
- Blocks merges on failure
Lane A catches regressions in your contract definitions, fixture files, and runner logic. If a recorded test that used to pass now fails, something changed in your code.
What blocks a merge:
- A contract assertion fails against a recorded fixture
- A new failure fingerprint appears that wasn't in the previous baseline
- The unknown classification rate exceeds 20%
- A NonReproducible result on the deterministic corpus (without an allowlist entry)
Lane B — Evidence lane (advisory)
- Runs against live providers (OpenAI, Anthropic, etc.)
- Non-deterministic — model responses can vary
- Advisory only — never blocks merges
- Results feed the dashboard for trending and comparison
Lane B answers questions like: "Does gpt-4o-mini still call my tools correctly?" and "How does Anthropic compare to OpenAI on my contracts?"
Setting up both lanes
The two-lane model maps naturally to CI jobs:
# Lane A: merge-blocking
vesanor-gate:
script: npx vesanor --provider recorded
# Lane B: advisory
vesanor-live:
allow_failure: true
script: npx vesanor --provider openai --model gpt-4o-mini
Determinism proof in CI
Verify that your provider returns consistent results by running contracts multiple times:
npx vesanor --provider openai --model gpt-4o-mini --repeat 3
This runs every contract 3 times and compares fingerprints. If all runs produce identical fingerprints for each step, the proof passes.
The JSON output includes a determinism_proof field:
{
"determinism_proof": {
"proven": true,
"total_runs": 3,
"per_step": [
{
"contract_path": "packs/starter/contracts/incident_response.yaml",
"deterministic": true,
"run_count": 3
}
]
}
}
Use this to build confidence before promoting a model version or adding new contracts.
Keeping recordings up to date
When you change your tool definitions or message prompts, your recorded fixtures become stale. The recorded provider detects this via boundary hash validation and flags it as NonReproducible.
To refresh your recordings:
# Re-capture from live provider
npx vesanor --provider openai --model gpt-4o-mini --capture-recordings
# Verify the new recordings pass
npx vesanor --provider recorded
# Commit the updated recordings
git add packs/*/recordings/
git commit -m "Update recorded fixtures"
A good workflow is to update recordings in a dedicated PR, separate from feature changes. This keeps your CI gate stable and makes recording changes easy to review.
Environment variables reference
| Variable | Purpose | Required in CI? |
|---|---|---|
VESANOR_API_KEY | Push results to dashboard | No (optional) |
VESANOR_PROVIDER_KEY | API key for live providers | Only for Lane B |
VESANOR_API_URL | Override dashboard URL | No (default: https://app.vesanor.com) |
For recorded-only CI (Lane A), no environment variables are needed.
Next steps
- Write contracts: See Writing Tests for the full YAML format
- Understand providers: See Providers for how provider abstraction works
- Debug failures: See Troubleshooting for common CI issues