AI for financial services: what compliance actually approves.
Most AI demos for financial firms die in the first compliance review. Not because the tech is bad — because the architecture is wrong for the audit. Here's the architecture pattern we use when scoping AI for trading desks, asset managers, and regulated brokerages — and why it passes.
What compliance is actually checking.
Risk officers are not engineers. They want answers to a small set of questions:
- Where does the data go? If any sensitive data leaves the firm's tenancy, the conversation stops.
- Can every action be audited? Specifically: every model input, every model output, every retrieval, every tool call, with timestamps and identity.
- Is the model deterministic enough to evaluate? Or is it a roulette wheel they can never sign off on?
- How fast can we kill it? Kill switch granularity matters. "Turn off the whole platform" is bad. "Turn off this tool for this user" is good.
The patterns that pass.
1. RAG over fine-tuning, every time.
A grounded retrieval pipeline with citations passes review faster than any fine-tuned model. The reason is auditability — you can show the source of every claim. Fine-tunes can't do that. More on this here.
2. Read-only by default.
The trading desk can read all positions; no AI tool can modify them. Writes require a human in the loop, scoped to the analyst's session, logged. This is the boundary that gets a build through review.
3. Single-region, single-tenancy.
If the firm is in EU/UK, infrastructure stays there. Even sub-processor relationships matter. Document them.
4. Customer-managed audit storage.
Audit logs write to the firm's S3, not yours. They control retention, discovery, deletion. You're the platform; they're the system of record.
5. Model selection in the firm's hands.
Many firms have approved-model lists. Let them pick: Anthropic, Azure OpenAI, AWS Bedrock, self-hosted. Don't lock the model.
The patterns that don't pass.
- Calling external APIs without enumeration. Compliance wants the full list of outbound calls and their data classifications. "Various LLM APIs" is a non-answer.
- "The model handles authentication." No, it doesn't. Auth lives outside the LLM, always.
- Prompt-injection-as-a-feature. Letting users stuff arbitrary text into the system prompt is a vulnerability. Constrain the surface.
- Black-box vendor pipelines. If you can't show what happens at each step, the answer is no.
The language that helps.
Talk to risk in their words. "Tenant-scoped." "Inference logging to customer-managed storage." "Model-agnostic orchestration." "Read-default with explicit write authorization." "Per-feature kill switches."
Not "we use the latest LLM." Not "it's secure." Not "AI-native." Those phrases are bingo squares for what doesn't pass review.
The shortest version.
Auditability beats accuracy. Tenant isolation beats throughput. Constrain the tool surface. Read by default. Customer-managed audit storage. Bring your own model. Most pilots fail compliance because they optimize for the wrong axis. The firms aren't asking "is this smart?" They're asking "if something goes wrong, can I find it, prove it, and stop it?" Architect for that.
If you're shipping AI into a regulated buyer, file an intent — we'll talk about the architecture before we talk about the build.