designing the cost cap

Autonomous agents have one failure mode that scares people more than bad code: spend. An agent stuck in a loop can burn real money fast. So before we shipped anything autonomous, we shipped the brake.

Every task has a hard cap

Each task carries a maxCostUsd. As the task runs, we track spend against it in real time. Cross the line and the task is killed immediately — no PR, and the platform fee for that task isn't charged. You see the failure log and the last state.

This isn't a soft nudge or an after-the-fact email. It's a watchdog that stops the work. A runaway agent that bypasses its cap is, by our own definition, a P0 incident.

hard

spend cap

80%

soft warning

0ms

grace period at cap

refunded

platform fee on kill

What happens to partial work

If a task hits its cap mid-edit, the sandbox is frozen and you get a complete log of everything the agent did up to that point. You can review the diff, decide what is usable, and open a PR yourself — or restart the task with a higher cap if the work was genuinely heading in the right direction.

Nothing is lost. Nothing auto-merges. The boundary is clean.

how the watchdog meters spend

01
Task starts
A maxCostUsd is bound to the task before any inference begins.
02
Meter
Every inference call, sandbox second, and storage op adds to the running total.
03
Check
After each operation, the total is checked against the cap.
04
Block
If the next op would exceed the cap, it is rejected and the agent receives a stop signal.
05
Report
You get the full log, the last diff, and the reason for termination.

Two layers, not one

Per-task cap — the hard stop above. Defaults vary by plan; you can raise it for a specific task.
Per-month plan budget — a soft warning at 80%, a hard limit at 100%. On bundled plans the hard limit pauses bundled inference for the rest of the cycle; drop in a BYOK key to keep going.

Two layers means a single weird task can't drain your month, and a busy month can't surprise you on the last day.

per-task cap

per-month budget

Single task exceeds maxCostUsd

Monthly spend hits 100%

Task killed immediately

Bundled inference paused

Platform fee refunded

BYOK key keeps you running

Stop one runaway task

Stop a busy month from surprising you

two layers of protection

How the watchdog works

Spend is metered on every inference call, every sandbox second, and every storage operation. The running total is checked after each operation. If the next operation would push the task over its cap, the operation is rejected and the agent receives a stop signal.

There is no race condition where two expensive calls slip through. The cap is serialized and blocking.

▸serialized and blocking

The spend counter is held in a single serialized check. Two expensive operations cannot race past the cap simultaneously — one will be rejected, the agent will stop, and the task will end.

Why caps make the rest possible

Predictable spend is what lets us offer flat-rate plans at all. If a single task could cost anything, pricing would have to assume the worst. The cap is what turns "an agent ran for a while" into a bounded, reviewable line item — which is the whole point of a hosted runtime over a script you babysit.

Bounded, reviewable, predictable. If a task's cost is a random variable, flat-rate pricing is a lie. The cap is what makes the promise real.
how we think about pricingproduct principle

Details, including what counts as a task and what's never metered, are in costs and limits.

Every task has a hard cap

What happens to partial work

Task starts

Meter

Check

Block

Report

Two layers, not one

How the watchdog works

Why caps make the rest possible