Cost Guard#

Stacked daily / weekly / monthly USD caps and an optional sessions-per-day rate cap. Optional. Off by default. Override per-run.

For educational research and paper trading. This is not investment advice.

What it is#

Cost Guard is a quota layer that sits in front of the live LLM debate. Before every paid-API session, the engine estimates the worst-case cost of the debate and checks it against the caps you've set. If the cost would push any cap over the limit, the request is blocked and a modal opens — you can either Cancel or Override and run anyway. The override is per-run; it does not raise the cap.

The four cap dimensions:

Dimension	What it limits
Daily	USD spent across all debates that started today (UTC)
Weekly	USD spent across the rolling 7-day window
Monthly	USD spent across the rolling 30-day window
Sessions per day	Count of debates started today, regardless of cost

A cap of 0 means disabled. You can enable any subset — e.g. monthly USD only, or sessions-per-day only.

Why we built it#

LLM costs scale with model choice + token usage in ways that are easy to underestimate. A single debate with gpt-5 or Claude Opus 4.7 across 12 agents can easily cost more than you'd expect. Cost Guard turns "did I just spend $20 on three debates?" into a deliberate decision rather than a billing surprise.

It also protects against accidental loops, runaway scripts, and the occasional UI glitch that fires multiple debates. The rate cap (sessions-per-day) catches what the USD cap might miss.

Where to go in the UI#

Settings → Cost Guard.

Top of the page: a status row showing your current spend totals (daily / weekly / monthly USD, sessions today) with color bars indicating how close you are to each cap. Green = below 50%, amber = 50-90%, red = 90%+.

Below that: an Enable / Disable toggle and a form with four numeric inputs (one per cap dimension). Caps update via PUT on save. Settings can be changed at any time, including mid-session.

What counts as spend?#

The engine computes a per-session cost using a static cost rate table (engine/llm_providers.py). Rates are USD per million input/output tokens — refreshed manually when providers change them. Conservative numbers preferred.

Provider / Auth	Cost behavior
API-key on any per-token-billed provider	Real token usage × rate table. Logged on `session.complete`.
OpenAI OAuth (ChatGPT subscription)	$0. Subscription billing happens outside our app; we record 0 so the cost ledger only reflects per-token API spend.
Local LLM (Ollama / LM Studio)	$0. Local sessions cost nothing in real dollars.
OpenRouter	Recorded as `0.0` since model rates vary by underlying model and we don't track them. The rate cap (sessions-per-day) is what you'd use to discipline OpenRouter usage.

OAuth and Local sessions skip the three USD caps but do count against the sessions-per-day rate cap — even free runs benefit from quota discipline on runaway debate counts.

The pre-debate reservation flow#

Before a live debate kicks off, the renderer:

Estimates worst-case cost for the session (assumes every agent maxes out its output budget, transcript grows triangularly across agents).
POSTs /cost-guard/reserve with {model, auth_kind, max_tokens}.
If the reservation succeeds → the debate proceeds with a reservation_id threaded on the WebSocket start frame.
If the reservation fails with CostGuardBlocked → the Cost Guard modal opens.

The modal shows:

Which dimension would be exceeded (daily / weekly / monthly / rate)
Your current spend in that dimension
The configured cap
The estimated cost of the requested run

You then choose:

Cancel — the debate aborts. No tokens spent.
Override and run anyway — the renderer re-reserves with override=true. The session runs, and the cost is added to the rolling totals just like any other run. The cap itself does not change.

A three-second anti-tamper countdown disables the Override button on first appearance so you can't accidentally double-click through it. The Cancel button is always enabled.

The TOCTOU-safe reservation#

Cost Guard uses a time-of-check-to-time-of-use safe reservation pattern: the database insert of the reservation row is the atomic check. Two simultaneous debates cannot both squeeze under the same cap — the first to insert wins; the second sees the bumped totals and gets blocked.

Reservations have a 15-minute TTL. If a debate ends cleanly, the reservation is finalized with real token usage replacing the worst-case estimate. If the renderer crashes mid-debate, the reservation expires and the slot frees up. A background sweeper cleans expired reservations on every state read.

When to use which cap#

Tune to your risk tolerance:

Strict daily cap (e.g. $1/day) — pairs well with gpt-4o-mini/claude-haiku-4-5 and frequent experimentation. ~200 debates/day at this budget.
Monthly cap only (e.g. $30/month) — pairs with mixed-model use; lets you have expensive days as long as the month averages out.
Sessions-per-day cap (e.g. 20) — disciplines OAuth + Local users where USD caps don't fire. Also a sane secondary cap alongside USD limits.
All four enabled — paranoid mode. Useful for long-running unattended scenarios.

What does NOT count as spend#

Stub debates (no provider configured) — never charged. They run for free against the canned content.
OAuth debates — subscription billing is out-of-band. We record $0.
Local LLM debates — $0.
Failed sessions — if the LLM provider errors before any tokens are spent, the reservation is finalized at 0.

Reading the spend bars#

The status row at the top of the Cost Guard tab shows each cap dimension as a bar. The bar color reflects "how close are you to this cap":

Color	% of cap used	What it means
🟢 Green	0-49%	Plenty of headroom.
🟡 Amber	50-89%	You're past halfway. Next runs may push you near the limit.
🔴 Red	90-100%	You're at or above the cap. Next live debate will be blocked.

For unset caps (value = 0 / disabled), the bar shows the raw spend with no fill — just a number, no progress visualization.

Resetting spend#

Spend totals are rolling windows, not user-resettable values. They automatically roll forward as time passes:

Daily totals reset at 00:00 UTC.
Weekly totals shed entries older than 7 days each midnight.
Monthly totals shed entries older than 30 days each midnight.

There is no manual reset — by design, so spend history can't be wiped to dodge a cap. If you need to forget historical data entirely, you can delete <userData>/data/sessions.db (or specifically the sessions table). This is destructive and not recommended; debate history is the same DB.

Tracking historical spend#

The History page shows per-session estimated_cost_usd next to each entry. Sort by date or by cost to see what your expensive runs were. This is independent of the Cost Guard windows — History never expires.

Engine API surface#

For developers integrating against the engine:

Endpoint	Purpose
`GET /cost-guard/state`	Current spend + config
`PUT /cost-guard/config`	Update caps + enable/disable
`POST /cost-guard/check`	Dry-run: would this debate be blocked?
`POST /cost-guard/reserve`	Atomic check + reservation; returns `reservation_id` or 402 `CostGuardBlocked`

Full request/response shapes in docs/api.md.