Rein In Your Out-of-Control AI Spending with Arthur

June 4, 20264 min read

The mood around enterprise AI shifted almost overnight. For two years, the prevailing instinct was to spend aggressively or risk falling behind. Now the headlines read differently. Companies are publicly pulling back, renegotiating, and in some cases canceling AI contracts because consumption is outpacing any clear sense of return.

Microsoft reportedly canceled most of its Claude Code licenses, a decision tied in part to cost. Days later, Uber's COO said AI costs have become "harder to justify." One company reportedly spent $500 million in a single month on Claude after failing to set usage limits. The pattern is consistent: spending climbed faster than anyone's ability to measure what it was actually buying.

This is not the end of enterprise AI. It's the start of its disciplined phase. The companies that come through this correction in good shape won't be the ones that spent the most or the ones that cut the most. They'll be the ones who can see what they're spending on, prove what's working, and retire what isn't.

Why AI Spend Spiraled in the First Place

Before fixing the problem, it helps to understand the mechanics behind it. A few recurring patterns explain how budgets got away from teams.

License sprawl. Many organizations rolled out AI tools with a "let a thousand flowers bloom" mentality: hand out licenses broadly, see what sticks. Without measurement, that approach produces a large recurring bill and very little signal about which seats generate value.

Token overconsumption. Enterprise plans are rarely all-you-can-eat in practice. When employees burn tokens on trivial queries, the cost compounds quietly across thousands of users. Usage runs ahead of value, and no single person sees the full picture until the invoice arrives.

Wrong use cases. Teams often point AI at the tasks they personally find annoying rather than the tasks that drive revenue. The work gets automated, but the return doesn't show up in any metric leadership cares about.

Data starvation. Agents are frequently kept away from the proprietary data that would make them genuinely useful. Starved of context, they underperform, which makes the spend even harder to justify.

Each of these is a measurement problem before it is a spending problem.

The Real Problem Isn't Spend, It's Visibility

You can't manage what you can't measure. Most enterprises have no reliable way to see which models, prompts, agents, or users are driving cost versus value. Finance sees a large monthly bill. Engineering sees a vendor invoice. Almost no one sees the line-by-line picture connecting the two.

That gap is exactly what makes the current pullback feel so blunt. When you can't attribute cost to outcomes, your only levers are "spend more" or "shut it down." Neither is a strategy.

Closing that visibility gap is what the Arthur platform was built to do.

How Arthur Helps You Take Back Control

Arthur addresses each of the cost drivers above with capabilities grounded in real production deployments. The foundation is observability; everything else builds on it.

Per-call cost attribution. Arthur's platform is built on the OpenInference semantic conventions, which capture token counts, cost, and model metadata on every LLM span. Cost stops being a single monthly invoice and becomes attributable down to the individual call, tool invocation, user session, application, and team. When finance asks what a number represents, you can answer it line by line.

Evaluation-driven model selection. Once traces are tied to evals, teams can test whether a smaller, cheaper model performs comparably before swapping it into production. Experiments hold the dataset and evals fixed and change only the model, so you stop overpaying for frontier models on workloads that don't need them.

Guardrails against wasteful usage. Real-time pre- and post-LLM guardrails can block trivial, off-policy, or runaway calls before they run up the bill, with every intervention emitted as telemetry. Instead of discovering a runaway workload after the fact, you set controls in the execution path and watch their effect in real time.

Continuous evaluation and ROI measurement. Continuous evals run against production traffic to confirm an initiative is actually performing, not just running. Tied to the metadata that connects spend to users and outcomes, this lets leaders defend what works and retire what doesn't, which is the heart of governance in Arthur's Agent Development Lifecycle (ADLC).

From Sticker Shock to Strategic Spend

The pullback making headlines this week is a correction, not a collapse. Enterprise AI is moving from its experimental phase into its mature one, where spend has to be defensible.

Companies that install observability and governance now won't have to choose between spending recklessly and shutting everything down. They'll have a third option: spend deliberately, measure continuously, and let the evidence guide every dollar. Arthur turns AI from an unbounded line item into a measured, defensible investment.

See Your AI Spend Clearly

If you can't yet attribute your AI costs to the models, agents, and users driving them, that's the place to start. Book a demo with an Arthur AI expert to see how observability, evals, and guardrails give you visibility into your AI spend and the controls to act on it.

Why AI Spend Spiraled in the First Place

The Real Problem Isn't Spend, It's Visibility

How Arthur Helps You Take Back Control

From Sticker Shock to Strategic Spend

See Your AI Spend Clearly

SHARE