Building AI agents with guardrails for regulated industries.
In 2026 every B2B AI pitch involves "agents." Most of them fail the first compliance review. Here's what we've learned shipping voice and tool-calling agents into regulated buyers — the seven controls a risk officer actually wants, and the architectural patterns that keep agents in scope.
An "agent" in 2026 means an LLM-powered system that takes actions: calling APIs, modifying data, sending messages, executing tools. The closer that loop runs to a customer or a regulated record, the more carefully it has to be bounded. Demo-ware skips this entirely. Production cannot.
The seven controls.
1. Scope: define the agent's tool surface explicitly.
An agent is a list of tools plus a model that picks among them. Make the list short and named. "The reservation agent has access to checkAvailability, quotePrice, and holdReservation" — and nothing else. Don't give it shell access. Don't give it a generic HTTP-fetch. Constrained surfaces pass review.
2. Read/write asymmetry by default.
Most "agent failure" incidents come from write actions. Default every tool to read-only. Writes require an explicit, scoped, time-limited token issued per session, not held in the agent's config. The pattern: read by default, write requires a deliberate toggle the operator flips, and every flip is logged.
3. Per-tenant isolation, top to bottom.
Tool credentials, retrieval indices, prompt caches, audit logs — all per-tenant. The breach pattern in 2026 isn't model leakage; it's cross-tenant context contamination. Build isolation at the queue level, not just the API key.
4. Audit completeness, customer-side.
Every turn — input, model, tool calls, tool responses, latency, tokens — written to the customer's storage, not yours. Enterprise buyers want the audit log in their S3 bucket, on their retention policy, under their compliance review. Holding the record yourself is a tax you'll pay forever.
5. Guardrail layers, not a single filter.
One PII filter or content classifier is not a guardrail; it's a fig leaf. Real guardrails are layered:
- Pre-LLM: redact secrets, validate input shape, enforce auth/scope.
- Tool-call: validate every tool argument against a schema; reject anything off-spec without invoking the tool.
- Post-tool: sanity-check tool outputs before they re-enter the LLM context (rate limits, anomaly checks).
- Pre-response: refusal patterns, format enforcement, hallucination checks against retrieval.
6. Deterministic refusals, observable.
An agent that refuses based on model judgment is unreliable. Refusals you trust are rule-based: this user role can't invoke this tool, this dollar amount exceeds approval threshold, this PII pattern triggers a human escalation. The model never gets to override the rule.
7. Bring-your-own-model for enterprise.
The fastest enterprise sales unlock is letting the customer pick the model. OpenAI, Anthropic, Azure OpenAI, AWS Bedrock, self-hosted. Build the orchestration; let them rent the inference. This is what gets a platform through risk review when every locked-model platform stalls.
The architecture pattern that works.
Stripped to essentials:
- API gateway → tenant-scoped agent runtime
- Agent runtime → typed tool registry (per tenant)
- Tool calls → scoped credentials, validated by schema
- All inputs/outputs → customer-tenancy audit log
- Guardrail middleware at four layers (above)
- Model behind a pluggable provider interface
What this is not.
This is not "fully autonomous agents." Multi-step planning agents that act unsupervised over hours are not, in our experience, ready for regulated production in 2026 — not because the models can't, but because the audit, isolation, and rollback story isn't there yet. Constrained, single-step or small-loop agents with human-in-the-loop are. That's what passes review, and that's what ships.
If your roadmap requires unattended multi-step automation, design for the workflow the agent participates in, not the agent itself. Humans approve actions. Agents propose them.
The shortest version.
Constrain the tool surface. Default to read. Audit to the customer. Layer guardrails. Make refusals deterministic. Let the customer pick the model. The model is the easy part; the architecture around it is the work.
Oviompt builds production AI agents for regulated buyers. File an intent if you're sizing an agent build — references on request after the first conversation.