The AI token shortage begins

The subsidised era of AI is ending. The flat monthly fee that let your staff run unlimited queries through an AI assistant was never a sustainable business model. It was a land-grab, and the land has now been grabbed. Usage-based billing is arriving across the major platforms, and for a financial advice firm that has quietly woven AI tools into daily operations, the implications are worth understanding now rather than at the next invoice.

What is actually changing, and why it matters to your firm

The shift is structural. AI providers are moving from flat-rate subscriptions, where heavy users were effectively subsidised by light ones, to token-based billing, where you pay per million tokens consumed. ^[1] Google cut its Ultra tier from $250 to $200 per month in May 2026, but added usage-based billing for heavy consumption. ^[1] GitHub Copilot made a similar move, triggering real dissatisfaction among developers who had budgeted on the old model. ^[2]

The driver is not greed. It is compute. The volume of tokens being processed has increased by 700%, from roughly 480 trillion to 3.2 quadrillion monthly. ^[1] The infrastructure cost of running frontier models at that scale cannot be absorbed indefinitely into a flat fee. Something had to give, and what is giving is the assumption that AI usage is a fixed overhead.

For a financial advice firm, this matters in two connected ways. First, your AI costs are about to become variable, and possibly opaque, unless you build governance around usage now. Second, the tools your staff use most heavily, including AI drafting assistants, meeting summarisers, and research tools, may have hidden token footprints that are larger than you expect.

The firms that build usage governance in 2026, before token costs become a surprise on the P&L, are in a materially better position than those that wait.

What a token actually costs, and where the cost lands

A token is roughly three-quarters of a word. Processing a long client document through a frontier model like GPT-4o or Claude Opus costs in the region of $5 per million input tokens and $25 per million output tokens. ^[3] That sounds trivial until you consider that a million-token context window, useful for analysing a complete client portfolio or a lengthy regulatory document, costs dollars per request, ^[4] and your firm may be running dozens of those a day without anyone tracking the total.

Newer, smaller models present a different picture. Some are now priced at $0.50 per million input tokens and $2.50 per million output tokens, ^[3] and independent benchmarks suggest certain models in this tier deliver 80 to 90% of frontier model capability at around 10% of the cost. ^[3] I am not suggesting you route your most sensitive work through the cheapest available model. I am suggesting that a default of always using the most powerful, most expensive model for every task, including routine ones, is a budget decision worth revisiting with evidence in hand rather than habit.

How token-based pricing intersects with your regulatory position

If your firm is processing client data through AI tools, and the billing model for those tools is now usage-based, you have a new set of questions to answer for governance purposes. Which tools are processing which data? Who authorised that use? Is the volume of data flowing through each system being monitored? These are not abstract audit questions. They are the kinds of questions a Senior Manager under SMCR could be expected to answer if something goes wrong.

Usage-based billing, counterintuitively, can support your governance efforts. A flat fee gives you no visibility into what is actually being processed. A per-token bill, if you read it carefully, is a rough audit trail of AI activity across the firm. That is useful information if you are trying to build a map of your AI use, which is increasingly something firms need to do.

What to do before the billing model changes under your feet

First, map what you are actually running. List every AI tool in use across the firm, including the ones staff found themselves and expense rather than asking IT. Note whether each is on a flat rate or usage-based model, and whether that is likely to change.

Second, estimate your token footprint. Most AI platforms now provide usage dashboards. Pull the numbers for the last 30 days. If you do not have access to that data, that is itself a governance gap worth closing.

Third, assess whether you are using the right model for each task. Routing a routine email draft through a frontier model when a smaller, cheaper model would do the same job adequately is a cost decision you can make intentionally. It is also, in some cases, a data minimisation question: sending less data to a more powerful external model is not always the right call for a regulated firm.

Fourth, treat vendor viability as a due diligence question. Several application-layer AI vendors are under margin pressure as compute costs rise. A tool your operations team relies on today may pivot upmarket, raise prices sharply, or close. Document your dependencies and have a contingency position for the tools that would hurt most to lose suddenly.

The model-agnostic argument, and why it applies here

One framing I find useful for regulated firms: the goal is not to be committed to a specific AI vendor. It is to control what the AI is allowed to do and see, regardless of which model sits underneath. Chamath Palihapitiya put it plainly in a May 2026 discussion: “controlling the tokens is controlling the leverage.” ^[5] For a financial advice firm, that means the grounding layer, the constraints on what the model can access and output, matters more than which frontier model you use. If you have built your AI workflows on top of a specific vendor’s product without an abstraction layer, switching when pricing shifts is painful. If you have built around your own data and process constraints first, the underlying model is a procurement decision, not an operational dependency.

This is the same principle that makes grounded AI, where the model is constrained to respond only from firm-owned knowledge bases, the more defensible choice for regulated use cases. It keeps hallucination rates low, keeps data handling auditable, and makes the firm’s knowledge, not the vendor’s model, the thing that matters.

The shift to token-based billing is not a crisis, but it is a prompt. The firms that treat AI as a managed cost with governance around it, rather than a flat overhead that runs in the background, are going to find the next phase of this market significantly easier to navigate. If you want to think through what that looks like for your firm specifically, a conversation with Cordrey Consulting is a reasonable place to start.

This article is for informational purposes only and does not constitute regulated financial advice or a compliance opinion. Consult a qualified compliance professional for advice specific to your firm.

Sources

^[1] Google, Google One AI Premium pricing update, May 2026. Supports the price reduction and usage-based billing shift for heavy users. Vendor-sourced.
^[2] GitHub, GitHub Copilot billing model announcement, May 2026. Supports the transition to token-based billing and developer response. Vendor-sourced.
^[3] Anthropic / OpenAI, Published API pricing pages, May 2026. Supports the $0.50/$2.50 vs $5/$25 per million token figures and the 80, 90% capability at 10% cost claim. Vendor-sourced.
^[4] Various, AI provider context window documentation, 2026. Supports the dollars-per-request cost of million-token windows.
^[5] Chamath Palihapitiya, All-In Podcast, May 2026. Attribution of the “controlling the tokens” framing.

What is actually changing, and why it matters to your firm

What a token actually costs, and where the cost lands

How token-based pricing intersects with your regulatory position

What to do before the billing model changes under your feet

The model-agnostic argument, and why it applies here

Sources

Also in the Journal

Inside the software factory: what a month of building with AI agents actually looks like

The future of automation in financial services

What the shift from rented AI to autonomous agents means for your firm

Schedule an obligation-free call

The AI token shortage begins

What is actually changing, and why it matters to your firm

What a token actually costs, and where the cost lands

How token-based pricing intersects with your regulatory position

What to do before the billing model changes under your feet

The model-agnostic argument, and why it applies here

Sources

Also in the Journal

Inside the software factory: what a month of building with AI agents actually looks like

The future of automation in financial services

What the shift from rented AI to autonomous agents means for your firm

Schedule an obligation-free call

Get the weekly journal

Get in touch

Get the weekly journal