18 June 2026 · Alphabench
What an AI coding agent actually costs per task
You buy an AI coding agent by the month, but you use it by the task. The subscription tells you the monthly bill and nothing else: not what fixing this bug cost, not whether you left the tool idle on a Tuesday because you were watching the meter. We find it more useful to track cost per finished job, the bug closed or the endpoint shipped, because that is the number you can actually act on.
Tasks have a predictable token shape. The agent loads your code, your tests, and its own earlier steps, then writes a fairly small diff at the end. Across our own usage it reads roughly four or five tokens for every one it writes, and the tokens it reads are the cheap ones. So a task's cost comes down mostly to how much the agent had to read, times the per-token price of whatever model did the reading.
That second factor is where the bill moves. Run the same task on a frontier model at around nine dollars per million blended tokens and on the Sarvam model Pier uses at roughly seven cents, and you have the same reading, the same diff, and a bill that differs by two orders of magnitude. For routine work, fixing a test, answering a question about a repo, adding a small feature, the cheaper model gets it done, and seven cents versus nine dollars is not a rounding error.
Rather than ask you to take that on faith, we publish the math. The cost per task page works through representative jobs from start to finish, and pricing lists the per-million-token rates so you can run the numbers against your own workload.
We are not claiming the expensive model is never worth it. The very top models still pull ahead on the gnarliest whole-repo problems, and some days call for one. The narrower point is that paying premium rates for ordinary work is usually a habit nobody chose on purpose. Use the cheap model by default, reach for the dear one when a task earns it, and the per-task cost stops being something you flinch at.
If you take one thing from this: count jobs, not months, and watch the reading as much as the writing, because that is where the money goes. The use cases show several of those everyday jobs run end to end on the cheaper default.