Skip to main content

Session Limits

Session limits protect you from runaway agents. They cap total steps, cost, tool calls, and detect loops — catching the $47K recursive loop problem before it starts.


The problem

Four LangChain agents looped for 11 days. Each individual call was under 200ms, under token limits. Monitoring said "SYSTEM NOMINAL." Total cost: $47,000.

Per-call validation can't catch loops. Each call is valid. The problem is the cumulative count — and no one was counting.


Defining session limits

Add session_limits to your session.yaml:

# session.yaml
schema_version: "1.0"
agent: my-agent

session_limits:
max_steps: 20 # Max LLM calls
max_tool_calls: 50 # Max tool calls across all steps
max_cost_per_session: 10.00 # Dollar cap
max_calls_per_tool:
issue_refund: 1 # Only 1 refund per session
send_email: 5 # Max 5 emails
loop_detection:
window: 5 # Look at last 5 steps
threshold: 3 # Same tool+args 3x = blocked
circuit_breaker:
consecutive_blocks: 5 # Auto-kill after 5 consecutive blocks
consecutive_errors: 3 # Auto-kill after 3 consecutive errors
session.yaml must compile successfully

Session limits defined in session.yaml only work if the file compiles successfully. If your session.yaml has a compilation error (e.g., invalid phases or undeclared transition targets), the session is blocked and limits are not silently ignored — create() throws ReplayConfigError.

If you see replay_compile_error in your diagnostics, fix the compilation error first. Limits, phases, loop detection, and policy all depend on successful compilation.


Step and tool call limits

max_steps

Maximum number of LLM calls in the session. Checked before each call (Stage 2). If exceeded, the call is blocked with session_limit_exceeded.

session_limits:
max_steps: 20

Counting: Uses totalStepCount which is monotonic — it counts committed steps and never decreases, even if older steps are evicted from memory for long sessions.

max_tool_calls

Maximum total tool calls across all steps. A single LLM response can contain multiple tool calls — this counts each one.

session_limits:
max_tool_calls: 50

max_tool_calls_mode

Controls what happens when max_tool_calls is reached. Default is "block".

ModeBehavior
blockHard-block the session. No more LLM calls. (default)
narrowNarrow the tool set to only tools with remaining max_calls_per_tool budget. If none remain, block.
session_limits:
max_tool_calls: 15
max_tool_calls_mode: narrow
max_calls_per_tool:
collect_forensic_image: 3
containment_scan: 2

When to use narrow: Multi-phase workflows where the LLM fires multiple tools per turn, consuming the global budget faster than expected. Without narrow, tools reserved for later phases (like forensic collection) become unreachable once the global cap is hit — even if they've never been called.

How it works:

  • When totalToolCalls >= max_tool_calls, instead of blocking, the engine filters the visible tool set to only tools that have an explicit max_calls_per_tool entry with remaining budget
  • Tools without a max_calls_per_tool entry are excluded (no explicit budget = not reachable past the cap)
  • Stage 3 per-tool limits still apply after the LLM responds
  • If max_steps or max_cost_per_session is also exceeded, the session blocks regardless of mode
max_tool_calls becomes a soft cap in narrow mode

In narrow mode, total tool calls can exceed max_tool_calls by the sum of remaining per-tool budgets. In the example above, the real ceiling is 15 + 3 + 2 = 20 in the worst case. Set max_tool_calls with this overshoot in mind.

max_calls_per_tool

Per-tool call limits. Different tools can have different limits.

session_limits:
max_calls_per_tool:
issue_refund: 1 # Exactly 1 refund allowed
send_email: 5 # Max 5 emails
search_orders: 10 # Max 10 searches

Tools not listed have no per-tool limit. The issue_refund: 1 limit is a common pattern for idempotent operations (complements forbids_after).


Cost limits

max_cost_per_session

Dollar cap for the entire session. Computed from token usage reported by the provider.

session_limits:
max_cost_per_session: 10.00

How it works:

  • Cost is tracked in actualCost — updated immediately after every LLM call, including blocked and retried calls
  • This is a soft cap — the call that pushes past the threshold already ran (it was billed). The next call is blocked.
  • actualCost includes all calls. totalCost only includes committed steps. Limits check actualCost.

Check it at runtime:

const state = session.getState();
console.log(`Cost so far: $${state.actualCost.toFixed(4)}`);

Loop detection

Catches repeated identical calls that individually look healthy.

session_limits:
loop_detection:
window: 5 # Look at the last 5 steps
threshold: 3 # If same (tool, arguments) appears 3 times → block

How it works:

  1. After each LLM call, extract (tool_name, arguments_hash) tuples
  2. Look at the last window steps
  3. Count occurrences of each tuple
  4. If any tuple appears threshold or more times → block with loop_detected

Example: Agent calls search_orders({ query: "pending" }) three times in a row. Each call is valid. But the third call is blocked because the same tool with the same arguments appeared 3 times in the last 5 steps.

What it catches:

  • Recursive retry loops (the $47K incident)
  • Stuck agents repeating the same action
  • Models that ignore "no results found" and keep searching

What it doesn't catch:

  • Loops with slightly different arguments (different hash)
  • Loops with different tool names (different tuple)
  • Slow loops outside the window

Circuit breaker

Auto-kills the session after too many consecutive failures. Prevents cascading failure when something is fundamentally wrong.

session_limits:
circuit_breaker:
consecutive_blocks: 5 # 5 blocked calls in a row → auto-kill
consecutive_errors: 3 # 3 internal errors in a row → auto-kill

How it works:

  • consecutiveBlockCount increments on each blocked call, resets to 0 on any non-block
  • consecutiveErrorCount increments on each internal error, resets to 0 on any non-error
  • When either threshold is hit, session.kill() is called automatically
  • After auto-kill, all subsequent calls throw ReplayKillError

Why this matters: Without a circuit breaker, a misconfigured agent can burn through your LLM budget retrying blocked calls forever. The circuit breaker stops it.


How limits interact

Limits are checked in this order during Stage 2 (Pre-check):

  1. Kill check — is the session already killed?
  2. Step limittotalStepCount >= max_steps?
  3. Tool call limittotalToolCalls >= max_tool_calls? (if max_tool_calls_mode: narrow, narrows instead of blocking)
  4. Cost limitactualCost >= max_cost_per_session?
  5. Per-tool limit — (checked after LLM response, before gate)
  6. Loop detection — (checked after LLM response, before gate)
  7. Circuit breaker — (checked after each decision outcome)

If any check fails, the call is blocked with session_limit_exceeded (or loop_detected). The exception is max_tool_calls_mode: narrow, which narrows the tool set instead of blocking (see max_tool_calls_mode).


Checking limits at runtime

const state = session.getState();

console.log("Steps:", state.totalStepCount); // Committed steps
console.log("Tool calls:", state.totalToolCalls); // Total tool calls
console.log("Cost:", state.actualCost); // All LLM calls (including blocked)
console.log("Per-tool:", state.toolCallCounts); // { issue_refund: 1, search: 3 }
console.log("Blocks:", state.totalBlockCount); // Total blocked calls
console.log("Consecutive blocks:", state.consecutiveBlockCount);
console.log("Killed:", state.killed); // true if auto-killed

For most agents:

session_limits:
max_steps: 20
max_tool_calls: 50
max_cost_per_session: 5.00
loop_detection:
window: 5
threshold: 3
circuit_breaker:
consecutive_blocks: 5
consecutive_errors: 3

For high-risk agents (payments, infrastructure):

session_limits:
max_steps: 10
max_tool_calls: 15
max_cost_per_session: 1.00
max_calls_per_tool:
process_payment: 1
delete_resource: 1
loop_detection:
window: 3
threshold: 2
circuit_breaker:
consecutive_blocks: 3
consecutive_errors: 2

Next steps