Skip to main content

Workflow Governance

When your system uses multiple agents that hand off work to each other, single-session governance isn't enough. Workflow governance coordinates multiple sessions under one durable workflow_id with explicit handoffs, shared resource protection, and cross-session budget limits.


When you need workflow governance

Single-session replay() is sufficient when:

  • One agent handles the entire task
  • No delegation to other agents
  • No shared resources between concurrent processes

Workflow governance is needed when:

  • An orchestrator delegates tasks to specialist agents
  • Multiple agents might act on the same resource (same order, same deployment)
  • You need to kill an entire agent tree at once
  • Cross-agent budgets matter (total cost across all agents)

How it works

The model

workflow.yaml          ← defines roles, handoffs, limits, shared resources
├── orchestrator/
│ └── session.yaml + contracts/ ← root session (v1-v3 contracts)
├── code-scanner/
│ └── session.yaml + contracts/ ← child session
├── risk-analyst/
│ └── session.yaml + contracts/ ← child session
└── release-manager/
└── session.yaml + contracts/ ← child session

Each role has its own session.yaml and per-tool contracts — standard v1-v3 enforcement. The workflow.yaml adds coordination above sessions.

Key principle: The session remains the unit of authority. Each session is individually correct. The workflow governs coordination between them — it does not merge mutable state across agents.


workflow.yaml

schema_version: "1.0"
workflow: code-review-pipeline

roles:
- name: orchestrator
session_contract: packs/orchestrator/session.yaml
- name: code-scanner
session_contract: packs/code-scanner/session.yaml
- name: risk-analyst
session_contract: packs/risk-analyst/session.yaml
- name: release-manager
session_contract: packs/release-manager/session.yaml

handoffs:
- from: orchestrator
to: code-scanner
- from: orchestrator
to: risk-analyst
- from: code-scanner
to: release-manager
- from: risk-analyst
to: release-manager

workflow_limits:
max_sessions: 8
max_active_sessions: 4
max_total_steps: 100
max_total_cost: 25.00
max_open_handoffs: 6

shared_resources:
- alias: change_request
mode: single_writer
- alias: service_env
mode: exclusive_pending
- alias: tenant_migration
mode: serial_only

cancellation:
subtree_kill: true
workflow_kill: operator_only

Roles and handoffs

Creating a workflow (root session)

The orchestrator creates the workflow by starting a root session:

const session = replay(client, {
contractsDir: "packs/orchestrator/contracts",
agent: "orchestrator",
mode: "enforce",
apiKey: process.env.VESANOR_API_KEY,
workflow: {
type: "root",
role: "orchestrator",
// workflowId auto-generated if omitted
},
});

Offering a handoff

After completing some work, the orchestrator offers a handoff to a child role:

const ticket = await session.handoff({
toRole: "code-scanner",
handoffId: "handoff-pr42-scan",
summary: { task: "Review PR #42", priority: "high" },
});
// ticket.handoffId — use this to attach the child

Child session claims the handoff

A new agent process creates a child session that claims the handoff:

const childSession = replay(childClient, {
contractsDir: "packs/code-scanner/contracts",
agent: "code-scanner",
mode: "enforce",
apiKey: process.env.VESANOR_API_KEY,
workflow: {
type: "child",
workflowId: ticket.workflowId,
role: "code-scanner",
parentSessionId: ticket.parentSessionId,
handoffId: ticket.handoffId,
},
});

Single-claim semantics: Once one child claims a handoff, competing claims fail. No accidental parallel execution.


Handoff lifecycle

offered → claimed → in_progress → completed
StatusMeaning
offeredParent offered the handoff, waiting for a child to claim it
claimedA child session attached and took ownership
in_progressChild produced its first authoritative committed step
completedChild session finished and all conditions met

Reclaim

If a child claims a handoff but doesn't make progress (idle or stuck), the handoff can be reclaimed:

offered → claimed → [no progress] → offered (generation bumped)

Reclaim fails after progress. Once the child has made authoritative progress (in_progress), the handoff can't be reclaimed — the child owns it.


Shared resources

Shared resources prevent conflicts when multiple sessions act on the same entity (same order, same deployment environment, same database migration).

exclusive_pending

While one session has an unresolved pending step on a resource, no other session can open a conflicting step on the same resource.

shared_resources:
- alias: service_env
mode: exclusive_pending

Example: Release manager starts deploying to staging. While that deployment is pending, no other session can start a deployment to the same staging environment. Once the deployment resolves (succeeds, fails, or is discarded), the lock is released automatically.

single_writer

At most one session can be the mutating owner of a resource value at a time.

shared_resources:
- alias: change_request
mode: single_writer

Example: Two release managers try to stage the same change request. The first proposal succeeds. The second is rejected with WORKFLOW_RESOURCE_CONFLICT — only one session can write to that change request.

serial_only

Multiple sessions may act on a resource, but only one authoritative step can commit at a time. Re-checked at commit time under concurrency control.

shared_resources:
- alias: tenant_migration
mode: serial_only

Example: Two database migrators work on the same migration. Both can plan and prepare. But only one can commit at a time — the second must wait until the first's step is fully resolved.

How resources are matched

Resources are matched by alias + normalized value. The value is extracted from tool call arguments using resource definitions in the session contract:

# In session.yaml for release-manager role
resources:
change_request:
type: change_request
extract_from:
path: "$.change_id"
service_env:
type: environment
extract_from:
path: "$.environment"

When a tool call proposes { change_id: "CHG-123", environment: "staging" }, the pipeline extracts change_request=CHG-123 and service_env=staging, then checks for conflicts with other sessions in the workflow.


Workflow limits

Budgets that apply across all sessions in the workflow:

workflow_limits:
max_sessions: 8 # Total sessions ever created
max_active_sessions: 4 # Concurrent active sessions
max_total_steps: 100 # Steps across all sessions
max_total_cost: 25.00 # Cost across all sessions
max_open_handoffs: 6 # Unresolved handoffs at any time

When a limit is exceeded, preflight is blocked with WORKFLOW_BUDGET_EXCEEDED — before the LLM call.


Kill cascade

Session kill

Standard v1-v3 behavior — kills one session only.

Subtree kill

Kills a session and all its descendants:

orchestrator → code-scanner → release-manager
↑ killed (and all below)

Workflow kill

Kills every active session in the workflow. Rejects all future handoff claims.

Kill is durable. It's a control-plane event, not a best-effort signal. Killed sessions can't advance authoritative state. Future preflight/proposal/receipt requests are rejected. Workflow resource claims in the killed scope are released.


What child sessions inherit (and don't)

A child session starts fresh:

  • Empty session state (no steps, no forbidden tools)
  • Its own phase machine (from its own session.yaml)
  • Its own contracts

A child session does not inherit:

  • Parent's phase or step history
  • Parent's forbidden tools
  • Parent's loop counters or cost budget
  • Parent's session limits

Dependencies between parent and child are explicit — through handoff summaries, artifact references, or workflow-scoped resource bindings. Nothing is inherited implicitly.


Stale worker detection

If a workflow branch is reassigned (handoff reclaimed and re-offered), the original worker's session becomes stale. Any attempt by the stale worker to prepare requests, register resource claims, or commit steps fails with a stale-worker error.

This is tracked through generation numbers that increment on reassignment. Stale workers have an older generation than the current workflow state.


Next steps