Autonomous agents have one failure mode that scares people more than bad code: spend. An agent stuck in a loop can burn real money fast. So before we shipped anything autonomous, we shipped the brake.
Every task has a hard cap
Each task carries a maxCostUsd. As the task runs, we track spend against it in real time. Cross the line and the task is killed immediately — no PR, and the platform fee for that task isn't charged. You see the failure log and the last state.
This isn't a soft nudge or an after-the-fact email. It's a watchdog that stops the work. A runaway agent that bypasses its cap is, by our own definition, a P0 incident.
hard
spend cap
80%
soft warning
0ms
grace period at cap
refunded
platform fee on kill
What happens to partial work
If a task hits its cap mid-edit, the sandbox is frozen and you get a complete log of everything the agent did up to that point. You can review the diff, decide what is usable, and open a PR yourself — or restart the task with a higher cap if the work was genuinely heading in the right direction.
Nothing is lost. Nothing auto-merges. The boundary is clean.
how the watchdog meters spend
- 01
Task starts
A maxCostUsd is bound to the task before any inference begins.
- 02
Meter
Every inference call, sandbox second, and storage op adds to the running total.
- 03
Check
After each operation, the total is checked against the cap.
- 04
Block
If the next op would exceed the cap, it is rejected and the agent receives a stop signal.
- 05
Report
You get the full log, the last diff, and the reason for termination.
Two layers, not one
- Per-task cap — the hard stop above. Defaults vary by plan; you can raise it for a specific task.
- Per-month plan budget — a soft warning at 80%, a hard limit at 100%. On bundled plans the hard limit pauses bundled inference for the rest of the cycle; drop in a BYOK key to keep going.
Two layers means a single weird task can't drain your month, and a busy month can't surprise you on the last day.
How the watchdog works
Spend is metered on every inference call, every sandbox second, and every storage operation. The running total is checked after each operation. If the next operation would push the task over its cap, the operation is rejected and the agent receives a stop signal.
There is no race condition where two expensive calls slip through. The cap is serialized and blocking.
The spend counter is held in a single serialized check. Two expensive operations cannot race past the cap simultaneously — one will be rejected, the agent will stop, and the task will end.
Why caps make the rest possible
Predictable spend is what lets us offer flat-rate plans at all. If a single task could cost anything, pricing would have to assume the worst. The cap is what turns "an agent ran for a while" into a bounded, reviewable line item — which is the whole point of a hosted runtime over a script you babysit.
Bounded, reviewable, predictable. If a task's cost is a random variable, flat-rate pricing is a lie. The cap is what makes the promise real.
Details, including what counts as a task and what's never metered, are in costs and limits.