Your organization has three AI agents in production: a sepsis screener reading lab results, a scheduling agent booking follow-ups, and a documentation agent generating discharge summaries. Each one was built by a different team. Each one authenticates to the EHR independently. Each one logs PHI access in its own format. And when the CISO asks "which agents accessed patient 12345’s data last Tuesday?" — nobody can answer the question without querying three separate log stores.
This is the multi-agent access control problem, and it gets exponentially worse with every new agent you deploy. The solution is a pattern borrowed from API management and adapted for clinical AI: the agent gateway. One centralized layer between your EHR and all your AI agents that handles authentication, scope enforcement, rate limiting, PHI access logging, and circuit breaking. Think Kong or Envoy, but purpose-built for healthcare.
The Problem: Direct Agent-to-EHR Access Does Not Scale
When you deploy your first AI agent, direct EHR access works fine. The agent has a SMART on FHIR client credential, a set of scopes, and its own logging. Simple.
By agent five, you have these problems:
- Credential sprawl: Five sets of client IDs, secrets, and scope definitions. Rotating credentials requires coordinating with five different teams.
- No aggregate rate limiting: Each agent rate-limits independently, but the EHR has a global API rate cap. Five agents each thinking they have 100 requests/second can collectively exceed the EHR limit, causing cascading failures.
- Fragmented audit trail: Five different log formats, five different storage systems, five different retention policies. A HIPAA audit requires manually correlating logs across all five systems.
- Scope creep: Agent A was authorized for
patient/Observation.readbut over time also started reading Conditions and Encounters. Without centralized enforcement, scope violations are invisible until an audit. - No circuit breaking: When the EHR goes into maintenance mode, all five agents hammer the unavailable endpoint simultaneously, generating thousands of errors.
The compliance team nightmare scenario: a breach investigation requires reconstructing exactly which agents accessed which patient data over a 90-day window. With direct access, this takes 2 weeks of forensic log analysis. With an agent gateway, it takes 2 hours of querying a single audit store.
The Agent Gateway: Architecture Overview
The agent gateway sits between all AI agents and the EHR. Every request from any agent passes through it. The gateway handles seven core responsibilities:
1. Authentication and Identity Verification
Every agent request includes a JWT that identifies the agent. The gateway validates the token, confirms the agent is registered in the agent registry, and extracts the agent identity for downstream logging.
# gateway_auth.py - Agent authentication middleware
import jwt
from dataclasses import dataclass
from typing import Optional
@dataclass
class AgentIdentity:
agent_id: str
agent_name: str
agent_version: str
owner_team: str
tier: int # Bounded autonomy tier (1-4)
scopes: list # Authorized FHIR scopes
rate_limit: int # Requests per minute
AGENT_REGISTRY = {
"sepsis-screener-v3": AgentIdentity(
agent_id="sepsis-screener-v3",
agent_name="Sepsis Risk Screener",
agent_version="3.2.1",
owner_team="clinical-ai",
tier=2,
scopes=["patient/Observation.read", "patient/Condition.read",
"patient/Encounter.read"],
rate_limit=300
),
"discharge-planner-v2": AgentIdentity(
agent_id="discharge-planner-v2",
agent_name="Discharge Planning Agent",
agent_version="2.1.0",
owner_team="care-transitions",
tier=3,
scopes=["patient/Observation.read", "patient/Encounter.read",
"patient/CarePlan.write", "patient/Appointment.write"],
rate_limit=120
),
"scheduler-v1": AgentIdentity(
agent_id="scheduler-v1",
agent_name="Appointment Scheduler",
agent_version="1.4.0",
owner_team="operations-ai",
tier=4,
scopes=["patient/Appointment.write", "patient/Patient.read"],
rate_limit=200
),
}
def authenticate_agent(token: str) -> Optional[AgentIdentity]:
try:
payload = jwt.decode(token, GATEWAY_PUBLIC_KEY, algorithms=["RS256"])
agent_id = payload.get("agent_id")
return AGENT_REGISTRY.get(agent_id)
except jwt.InvalidTokenError:
return None 2. Scope Enforcement
The gateway enforces SMART on FHIR scopes at the request level. If the sepsis screener tries to write to CarePlan (not in its authorized scopes), the gateway rejects the request before it reaches the EHR. This is zero-trust enforcement.
# gateway_scope_enforcement.py
def enforce_scope(agent, method, resource_type):
"""Check if agent scopes permit this request."""
interaction_map = {
"GET": "read", "POST": "write",
"PUT": "write", "DELETE": "write",
}
interaction = interaction_map.get(method, "read")
required = f"patient/{resource_type}.{interaction}"
if required in agent.scopes:
return True
# Check wildcard scopes
if f"patient/*.{interaction}" in agent.scopes:
return True
return False # Reject - scope violation 3. Rate Limiting and Priority Queuing
The gateway implements two levels of rate limiting:
- Per-agent limits: Each agent has a requests-per-minute budget. The sepsis screener gets 300 RPM (it processes high-volume lab streams). The scheduler gets 200 RPM.
- Global EHR limit: The total rate across all agents cannot exceed the EHR API capacity. If five agents are active and the EHR supports 500 RPM total, the gateway distributes capacity based on agent priority.
Priority queuing ensures clinical agents (Tier 2 sepsis screener) get priority over administrative agents (Tier 4 scheduler) when the global limit is approached. The bounded autonomy tier maps directly to request priority.
4. PHI Access Logging
Every request through the gateway generates a standardized audit record:
# gateway_audit.py - Standardized PHI access logging
import json, uuid
from datetime import datetime
def create_audit_record(agent, request, response):
return {
"event_id": str(uuid.uuid4()),
"timestamp": datetime.utcnow().isoformat() + "Z",
"agent_id": agent.agent_id,
"agent_version": agent.agent_version,
"agent_tier": agent.tier,
"owner_team": agent.owner_team,
"method": request.method,
"resource_type": request.resource_type,
"resource_id": request.resource_id,
"patient_id_hash": hash_patient_id(request.patient_id),
"fhir_scopes_used": [request.scope],
"response_status": response.status_code,
"response_time_ms": response.elapsed_ms,
"purpose_of_use": agent.purpose or "treatment",
"gateway_node": GATEWAY_NODE_ID,
"compliance_flags": generate_flags(agent, request),
}
def generate_flags(agent, request):
flags = ["HIPAA-access-logged"]
if request.method in ("POST", "PUT", "PATCH", "DELETE"):
flags.append("write-operation")
if agent.tier >= 3:
flags.append("autonomous-or-approval-tier")
return flags This single audit format replaces the five different log formats from direct access. HIPAA audit queries go from weeks to minutes.
5. Circuit Breaking
When the EHR becomes slow or unavailable, the gateway protects both the EHR and the agents:
- Half-open circuit: When EHR response times exceed 2x the normal p95, the gateway reduces request rate by 50% and queues non-critical requests.
- Open circuit: When the EHR returns 5xx errors for 30+ seconds, the gateway stops forwarding requests entirely. Agents receive a "service unavailable" response with retry-after headers.
- Recovery: The gateway sends probe requests every 5 seconds. Once the EHR responds normally to 3 consecutive probes, the circuit closes and normal traffic resumes.
Without circuit breaking, a 60-second EHR maintenance window generates thousands of agent errors, fills up error logs, and can trigger false clinical alerts. With the gateway, agents gracefully degrade and resume when the EHR recovers.
6. Request Routing
The gateway routes agent requests to the appropriate backend based on the request type:
- FHIR requests: Route to the FHIR server (HAPI, Google Healthcare API, Azure Health Data Services)
- HL7v2 queries: Route to the v2 message interface for agents that work with native v2
- Bulk data requests: Route to the bulk export endpoint with separate rate limiting
7. Response Caching
Multiple agents often request the same patient data within seconds of each other. The gateway caches FHIR read responses for a configurable TTL (typically 5-30 seconds for clinical data). This reduces EHR load without serving stale data. Cache invalidation happens on any write to the same resource.
The Request Lifecycle: End to End
Every request through the agent gateway follows five phases:
- Authenticate (2ms): Verify agent JWT, look up agent in registry, extract identity and scopes.
- Authorize (5ms): Enforce FHIR scopes, check patient consent status, validate the request matches the agent bounded autonomy tier.
- Rate Limit (1ms): Check per-agent token bucket, check global EHR rate cap, apply priority queuing if needed.
- Route and Execute (50-200ms): Forward request to appropriate backend, check cache, receive response.
- Audit and Log (3ms): Generate standardized audit record, write to log store, update metrics.
Total gateway overhead: 11ms. For a FHIR read that takes 80ms at the EHR, the total round-trip is 91ms. The 11ms investment buys centralized auth, scope enforcement, rate limiting, audit logging, and circuit breaking.
Implementation Options: Build vs Buy
Three implementation paths, each with different trade-offs:
Option 1: Kong Gateway + Custom Plugins
Kong provides the API gateway foundation with built-in rate limiting, authentication, and logging. Add custom plugins for FHIR scope enforcement, agent registry integration, and bounded autonomy tier-based priority. Kong 2026 AI Gateway includes native MCP support and PII redaction.
Best for: Organizations that want a managed solution with minimal custom development.
Option 2: Envoy Proxy + Custom Filters
Envoy is the CNCF-standard proxy for Kubernetes-native deployments. Write custom WASM or Lua filters for healthcare-specific logic. Envoy handles rate limiting, circuit breaking, and routing natively.
Best for: Cloud-native organizations running on Kubernetes with strong platform engineering capabilities.
Option 3: Custom Middleware
Build the gateway as a custom service (Go, Rust, or Python with async frameworks). Full control over every aspect of the pipeline. Higher development cost but maximum flexibility.
Best for: Organizations with unique compliance requirements that commercial gateways cannot satisfy.
Regardless of implementation choice, the gateway must support these non-negotiable healthcare requirements:
- TLS 1.2+ for all connections (HIPAA encryption in transit)
- Audit log immutability (write-once, no modification or deletion)
- Sub-15ms gateway overhead at p95 (clinical latency requirements)
- 99.99% availability (the gateway is on the critical path for all agent operations)
- Zero-downtime deployments (agents cannot tolerate gateway maintenance windows)
Deployment Architecture
The agent gateway should be deployed as a highly available cluster, not a single instance:
- Minimum 3 nodes across availability zones for fault tolerance
- Shared state in Redis for rate limiting counters and session data
- Audit logs to immutable storage (S3 with Object Lock, or a dedicated HIPAA-compliant log store)
- Health checks every 5 seconds with automatic node removal on failure
- Blue-green deployments for gateway updates without agent disruption
Monitor the gateway itself with OpenTelemetry: request rate, error rate, latency percentiles, circuit breaker state, rate limit utilization, and audit log write throughput. Set SLOs on gateway availability and latency separate from the EHR SLOs.
Migration Path: From Direct Access to Gateway
You do not migrate all agents at once. The recommended path:
- Deploy gateway in shadow mode (2 weeks): Route a copy of all agent traffic through the gateway without enforcement. Compare gateway logs with agent-side logs to validate accuracy.
- Enable audit logging (2 weeks): Switch agents to route through the gateway for logging only. Authentication and scope enforcement remain at the agent level.
- Enable auth and scope enforcement (1 week per agent): Turn on gateway authentication and scope enforcement. Migrate one agent at a time, starting with the lowest-risk agent.
- Enable rate limiting and circuit breaking (1 week): Once all agents route through the gateway, enable global rate limiting and circuit breaking.
- Decommission agent-side auth (1 week per agent): Remove direct EHR credentials from individual agents. The gateway is now the single point of authentication.
Total migration: 6-10 weeks for a typical 5-agent deployment. Each step is reversible.
Getting Started with Nirmitee
At Nirmitee, we design and deploy agent gateway infrastructure for healthcare organizations scaling their clinical AI platforms. Our teams build gateways on Kong, Envoy, or custom middleware depending on your infrastructure and compliance requirements. Every gateway implementation includes bounded autonomy enforcement, HIPAA-compliant audit logging, and SMART scope enforcement.
Whether you have 3 agents or 30, the gateway pattern is the architecture investment that makes scaling possible without scaling your compliance risk.
Talk to our platform engineering team about deploying an agent gateway for your healthcare AI infrastructure.
Frequently Asked Questions
What is the agent gateway pattern for healthcare AI?
Why not let each AI agent connect directly to the EHR?
How much latency does the agent gateway add?
Should I use Kong, Envoy, or custom middleware for the agent gateway?
How long does it take to migrate from direct access to an agent gateway?

