AI Agent Session-Cost Calculator
A chat call costs what one prompt costs. An agent costs what a whole loop costs — and that's the part finance never sees coming. Every reasoning step resends the system prompt, the tool schema and the entire history so far, so token usage compounds with each iteration. This calculator models that accumulation step by step, shows where the money actually goes, and exposes your runaway-loop worst case so you can size step limits and kill-switches before a stuck agent runs up the bill.
- Agent cost scales ~quadratically with steps because history is resent every iteration.
- The fixed system prompt + tool schema is paid for on every step — caching targets exactly this.
- A runaway loop that hits its step cap can cost many times the average session.
- True cost = accumulated input + per-step output + retries + fixed overhead.
Model your agent's cost per session
| Model | Input $/1M | Output $/1M |
|---|
Estimated agent cost
| Model | Tokens/session | Cost/session | Monthly | Annual TCO |
|---|
How agent cost is calculated
A single completion is one input and one output. An agent loop is different: to decide its next action the model must see everything that happened before it, so on every step the runtime resends the fixed system prompt and tool schema, the original request, and the full running transcript of prior outputs and tool observations. Early steps are cheap; later steps drag a long history behind them. Sum the input across all steps and the growth is quadratic — this calculator computes it as the fixed prompt paid once per step plus an accumulating block that grows by your output and observation sizes each iteration.
Three levers change the bill. Prompt caching rebills the repeated prefix — your system prompt and the stable head of the transcript — at a fraction of the input rate, which is why the breakdown below flags when most of your spend is re-sent context. The retry buffer inflates the total for failed steps and fallback hops, and the runaway figure recomputes a session at your maximum step cap instead of the average, showing the worst case a stuck agent can reach before a kill-switch intervenes. Together they turn a per-call price into a defensible per-session and monthly cost for agent FinOps planning.
Frequently Asked Questions (FAQ)
An agent completes one task over many loop steps, and each step resends the system prompt, tool schema and the entire growing history. Because context accumulates, total tokens scale roughly with the square of the number of steps, not linearly.
Every reasoning step appends the model's output and the tool result to the conversation, and that longer history is resent as input on the next step. Late steps therefore carry far more input tokens than early ones, which is the main hidden cost driver.
Add the fixed system prompt and tool schema, the user request, plus the running history of prior outputs and observations. Measure a few real sessions with your provider's tokenizer, then enter average system, observation and output token sizes into the calculator.
Caching is most valuable for agents because the system prompt and conversation prefix repeat on every step. Billing that repeated context at roughly ten percent of the input rate can cut a large share of total cost, especially for long sessions with big tool schemas.
A runaway loop is an agent that keeps iterating without converging, hitting its maximum step limit. Because cost grows quadratically with steps, a session that runs to the cap can cost many times the average, which is why step limits and kill-switches protect your budget.
Set a hard maximum on loop steps and on total tokens, add a budget kill-switch that stops a session when it exceeds a cost threshold, and trim or summarise history so context stops growing unbounded. The runaway figure here shows your worst case before those guards.
Yes. Tool results are fed back into the model as observation tokens and then persist in the history for every later step. Large tool outputs, such as full API responses or documents, inflate accumulation quickly, so summarising observations before reinsertion saves money.
Mixing models often helps: a smaller, cheaper model can handle routine steps while a frontier model handles hard reasoning. The trade-off is that weaker models may take more steps or retries, so compare total session cost here rather than the per-token rate alone.
It treats each step's input as the fixed prompt plus all accumulated history, sums input across every step, adds output per step, then applies caching, a retry buffer and fixed overhead. The result is a realistic per-session and monthly cost rather than a single-call estimate.
No. All calculation runs in your browser and your inputs are saved only in local storage so the tool remembers them next time. Nothing is transmitted to a server, and the reset button clears everything instantly.