In June 2026, the US government restricted access to Anthropic’s Fable model[1], a system independently assessed as delivering superior performance on the high-complexity analytical tasks that financial services and professional services firms rely on most[2]. Anthropic’s own position was that the identified vulnerability was comparable to capabilities present in other models already in circulation[3]. That dispute is almost beside the point for a firm that had built workflows around Fable. The workflows stopped working. There was no notice, no transition period, and no appeal process that helped in the short term.
This is not a once-in-a-decade edge case any more. It is the shape of the risk that cloud-dependent AI carries as a structural feature.
Why cloud-only AI has become a continuity risk
Cloud-based AI means your workflows depend on a provider’s infrastructure, a government’s export controls, and a commercial relationship that can change on terms you do not set.
Three separate developments in 2026 make this clearer than it was a year ago. First, the Fable intervention showed that regulators can intervene in deployed commercial AI without notice[3]. Second, Apple’s Siri AI features have been withheld from EU markets under Digital Markets Act compliance requirements, demonstrating that regulatory fragmentation can create a two-speed AI landscape where the same tool is unavailable in different jurisdictions. Third, AI startups at the application layer are under significant margin pressure, with the tools regulated firms have built workflows around facing a realistic risk of pivoting, repricing, or failing within twelve to eighteen months.
For a firm with ten to two hundred advisers, any one of these events could interrupt a process that clients or regulators depend on.
When a government can remove a model from commercial deployment overnight, “we use cloud AI” is no longer a complete answer to your operational resilience question.
What local AI deployment actually means
Local deployment sits on a spectrum. It does not necessarily mean a server rack in your office. A useful way to think about it is a four-level strategy[4]:
Level 1 (Routing). You use a cloud model for general tasks but route sensitive or regulated work to a different system. No local infrastructure, but reduced exposure for your highest-risk data.
Level 2 (Private cloud). Your AI runs in a cloud environment your firm controls, segregated from shared infrastructure. The provider cannot retrain on your data, and a model-level intervention by a third-party government is less likely to affect you directly.
Level 3 (Self-hosted cloud). You run an open model on infrastructure you rent but fully control. Meta’s Llama series is the most widely used option here, specifically because it gives engineers full control over model weights and deployment[5]. You are responsible for updates and security, but no external decision about a model’s availability touches your workflows.
Level 4 (Fully local). The model runs on hardware you own, on your premises or in a data centre under your direct control. Maximum data sovereignty, maximum operational overhead.
Most financial services firms in the ten-to-two-hundred-adviser range will find the right answer sits at Level 2 or Level 3, not Level 4. Full local deployment requires specialised hardware investment and ongoing engineering support that is hard to justify unless you have unusual data sensitivity requirements or are already running a substantial internal technology function.
What you need to get this right
Moving toward local or sovereign AI is not a simple procurement decision. There are five layers to get right[6]:
Hardware. Running a capable model locally requires a GPU with sufficient VRAM. Consumer-grade laptops will not run a model with the performance of a frontier cloud system. For a private cloud or self-hosted approach, you are typically looking at cloud GPU instances rather than physical hardware, which keeps the cost manageable.
Model choice. Open models vary significantly in performance for financial services tasks. Llama 3 series models are well-suited to customisation and local deployment[5]. Evaluate on your actual tasks, not benchmark scores.
Serving layer. The software that sits between the model and your applications. Tools like LM Studio handle this for smaller deployments and can be accessed from mobile devices via a private mesh VPN[7], which is worth knowing if your advisers work across locations.
Agent harness. If you are running any agentic or multi-step workflows (research summaries, document drafting, data extraction), you need an orchestration layer. This is where most implementations get complicated, and where human oversight checkpoints need to be deliberately designed in. Any agentic workflow in a regulated context requires a defined human review step before output enters a client-facing process.
Interface. How your advisers and staff actually interact with the system. This is often the layer that gets the least attention and causes the most adoption problems.
The data governance question you need to answer first
Before you decide where to run your AI, you need a clear map of what data it will process. This matters for two reasons.
First, GDPR obligations for personal data depend on where processing takes place, under what contractual arrangements, and how the system is configured and used. Hosting a model on your own infrastructure does not automatically satisfy those requirements. Whether local deployment supports your GDPR position depends on factors including the jurisdiction of the infrastructure, the adequacy status of that jurisdiction, your data processing agreements, and how the system is actually operated. Running a model on infrastructure in a jurisdiction without an adequacy decision or appropriate safeguards does not resolve those obligations simply by virtue of being self-hosted.
Second, the case for local deployment is strongest when the AI is processing client data, internal financial models, or anything that you would not want to send to a third-party server under any circumstances. For genuinely sensitive work, local or private-cloud deployment removes a category of risk entirely.
What to do if your firm currently runs cloud-only AI
This does not require an immediate infrastructure overhaul. It requires an honest assessment.
First, map your AI dependencies. List every AI tool or workflow your firm currently uses and ask: what happens to this process if that service becomes unavailable tomorrow? How long before a client or a regulatory obligation is affected? This does not need to be a formal project. It needs to be honest.
Second, separate high-risk from low-risk workflows. Not everything carries the same exposure. A tool that drafts internal meeting notes is different from a tool that supports your suitability assessment process. Focus your continuity planning on the second category.
Third, identify your sovereign AI options. For most regulated SMEs, this means evaluating private cloud hosting for your highest-sensitivity workloads, and testing an open model (Llama 3 is a reasonable starting point[5]) against the tasks that matter most. This is a weeks-long exercise, not a multi-month programme.
Fourth, build the human review step explicitly. Whatever architecture you choose, the governance principle does not change: AI outputs in a regulated process require human review before they influence a client decision or a regulated document. The architecture makes that easier or harder to enforce. Make it easier.
The financial services firms that will handle the next Fable-style intervention well are not necessarily the ones with the most sophisticated local AI. They are the ones that knew what they depended on before it was taken away.
If this is a conversation worth having for your firm, a discovery call with Cordrey Consulting is a straightforward starting point.
This article reflects the EU AI Act as understood at the date of publication. Implementation timelines have been subject to amendment. Verify current requirements against primary EU sources and take qualified legal advice for your specific circumstances.
This article is for informational purposes only and does not constitute regulated financial advice or a compliance opinion. Consult a qualified compliance professional for advice specific to your firm.
This article does not constitute legal advice. Data protection obligations vary by circumstance and jurisdiction. Consult a qualified solicitor or data protection adviser for advice specific to your firm.
This article does not constitute legal or regulatory advice. DORA obligations apply to regulated financial entities and their ICT third-party providers. Consult a qualified adviser for your firm’s specific requirements.
Sources
[1] Anthropic, ‘Claude Fable 5 and Mythos 5: Access and availability’, Anthropic, June 2026. Available at: https://www.anthropic.com/news/fable-mythos-access
[2] Hebbia (2026) ‘Claude Fable 5 performance in finance and professional services tasks’, cited in Anthropic, ‘Claude Fable 5 and Mythos 5’, Anthropic. Available at: https://www.anthropic.com/news/claude-fable-5-mythos-5
[3] Anthropic, ‘Claude Fable 5 and Mythos 5: Access and availability’, Anthropic, June 2026. Available at: https://www.anthropic.com/news/fable-mythos-access
[4] The 4-Level Deployment Strategy framework (Level 1: Routing; Level 2: Private Cloud; Level 3: Self-Hosted Cloud; Level 4: Fully Local) as outlined in AI governance literature, June 2026.
[5] van Riel, Z. (2026) ‘7 best large language models for AI engineers’, Zen van Riel, AI Engineer Blog. Available at: https://zenvanriel.com/ai-engineer-blog/7-best-large-language-models-for-ai-engineers
[6] The 5-Layer AI Stack (Hardware, Model, Serving Layer, Agent Harness, Interface) as outlined in AI infrastructure literature, June 2026.
[7] Digital Applied (2026) ‘LM Studio locally: LM Link iPhone local LLM’, Digital Applied. Available at: https://www.digitalapplied.com/blog/lm-studio-locally-lm-link-iphone-local-llm-2026