← Back to blog

2026-05-29

designing the cost cap

A runaway agent that ignores its budget is a P0. Here's how we make sure it can't happen.

A gauge with a hard spending limit and a watchdogHARD CAP$0.42current spend$2.00max costWATCHDOGreal-time metersafewarningstopa per-task spend limit you set

Autonomous agents have one failure mode that scares people more than bad code: spend. An agent stuck in a loop can burn real money fast. So before we shipped anything autonomous, we shipped the brake.

Every task has a hard cap

Each task carries a maxCostUsd. As the task runs, we track spend against it in real time. Cross the line and the task is killed immediately — no PR, and the platform fee for that task isn't charged. You see the failure log and the last state.

This isn't a soft nudge or an after-the-fact email. It's a watchdog that stops the work. A runaway agent that bypasses its cap is, by our own definition, a P0 incident.

hard

spend cap

80%

soft warning

0ms

grace period at cap

refunded

platform fee on kill

What happens to partial work

If a task hits its cap mid-edit, the sandbox is frozen and you get a complete log of everything the agent did up to that point. You can review the diff, decide what is usable, and open a PR yourself — or restart the task with a higher cap if the work was genuinely heading in the right direction.

Nothing is lost. Nothing auto-merges. The boundary is clean.

how the watchdog meters spend

  1. 01

    Task starts

    A maxCostUsd is bound to the task before any inference begins.

  2. 02

    Meter

    Every inference call, sandbox second, and storage op adds to the running total.

  3. 03

    Check

    After each operation, the total is checked against the cap.

  4. 04

    Block

    If the next op would exceed the cap, it is rejected and the agent receives a stop signal.

  5. 05

    Report

    You get the full log, the last diff, and the reason for termination.

Two layers, not one

  • Per-task cap — the hard stop above. Defaults vary by plan; you can raise it for a specific task.
  • Per-month plan budget — a soft warning at 80%, a hard limit at 100%. On bundled plans the hard limit pauses bundled inference for the rest of the cycle; drop in a BYOK key to keep going.

Two layers means a single weird task can't drain your month, and a busy month can't surprise you on the last day.

per-task cap
per-month budget
Single task exceeds maxCostUsd
vs
Monthly spend hits 100%
Task killed immediately
vs
Bundled inference paused
Platform fee refunded
vs
BYOK key keeps you running
Stop one runaway task
vs
Stop a busy month from surprising you
two layers of protection

How the watchdog works

Spend is metered on every inference call, every sandbox second, and every storage operation. The running total is checked after each operation. If the next operation would push the task over its cap, the operation is rejected and the agent receives a stop signal.

There is no race condition where two expensive calls slip through. The cap is serialized and blocking.

serialized and blocking

The spend counter is held in a single serialized check. Two expensive operations cannot race past the cap simultaneously — one will be rejected, the agent will stop, and the task will end.

Why caps make the rest possible

Predictable spend is what lets us offer flat-rate plans at all. If a single task could cost anything, pricing would have to assume the worst. The cap is what turns "an agent ran for a while" into a bounded, reviewable line item — which is the whole point of a hosted runtime over a script you babysit.

Bounded, reviewable, predictable. If a task's cost is a random variable, flat-rate pricing is a lie. The cap is what makes the promise real.

how we think about pricingproduct principle

Details, including what counts as a task and what's never metered, are in costs and limits.