Healthcare has been a batch-processing industry for decades. Lab results sit in queues for hours. ADT messages crawl through overnight feeds. Critical value alerts arrive after the clinical window has closed. The architecture that underpins most hospital data exchange today was designed for an era when nightly file transfers were considered state-of-the-art.
That era is ending. FHIR Subscriptions, combined with modern event-driven architecture patterns, are enabling a fundamental shift from "check for updates periodically" to "get notified the instant something changes." The difference is not incremental -- it is the difference between a pharmacist catching a dangerous drug interaction in 4 hours versus 4 seconds.
This guide breaks down how FHIR Subscriptions work across R4 and R5, how to architect real-time clinical pipelines using event buses and cloud-native services, and how to build production-grade subscription services that handle the reliability demands healthcare requires.
Why Healthcare Needs Event-Driven Architecture
Most healthcare integration today follows one of two patterns: batch file transfers (HL7v2 flat files processed on schedules) or polling-based queries (applications repeatedly asking "anything new?"). Both patterns were designed around the limitations of 1990s-era technology, and both impose latency that has real clinical consequences.
The Cost of Batch Latency
Consider what happens in a typical hospital when a lab result comes back critical. In a batch-processing architecture, the flow looks like this:
- Lab analyzer produces result at 2:14 PM
- LIS batches results every 30 minutes -- result enters queue at 2:30 PM
- HL7v2 interface engine picks up batch at 2:35 PM
- Message transformation and routing takes 2-5 minutes
- EHR ingests result and triggers alert at 2:42 PM
- Notification system batches alerts for provider -- delivered at 3:00 PM
That is 46 minutes from result to alert for a critical value -- a potassium of 6.8 mEq/L that requires immediate intervention. Research published in the Journal of Clinical Pathology has shown that delays in critical value notification beyond 30 minutes are associated with increased adverse events. A 2024 HIMSS analytics survey found that organizations still relying primarily on batch-based HL7v2 feeds experience average notification latencies of 15 to 60 minutes for critical results, compared to under 60 seconds for organizations using event-driven architectures.
The numbers extend beyond lab results. Batch-processed ADT (Admit-Discharge-Transfer) notifications mean that care coordination teams may not learn about a high-risk patient's emergency department admission until the next morning. Pharmacy systems that poll for new orders every 15 minutes create windows where drug interactions go undetected. Over 90% of Health Information Exchange (HIE) networks still rely on HL7v2 ADT messages as their primary data exchange mechanism, meaning the batch-latency problem affects the vast majority of U.S. healthcare infrastructure.
What Event-Driven Architecture Changes
Event-driven architecture (EDA) inverts the communication model. Instead of consumers asking producers "do you have anything for me?", producers announce events as they happen, and interested consumers react immediately. In healthcare terms:
- Lab result available -- event fires the instant the LIS validates the result
- Patient admitted -- event fires when the ADT registration is saved
- Medication prescribed -- event fires when the order is signed
- Vital sign recorded -- event fires when the device observation is stored
Studies on streaming and event-driven architectures in clinical settings have documented latency reductions of approximately 70% compared to batch-based workflows, surfacing clinical risk signals 3 to 5 hours earlier in some care coordination scenarios. For a hospital processing 50,000 ADT events per day, the aggregate clinical impact of eliminating batch delays is substantial.
If you are working with existing integration infrastructure, our guide on healthcare integration architecture with Mirth and Kafka covers how to layer event-driven patterns onto traditional interface engines.
FHIR Subscriptions Explained
FHIR Subscriptions provide a standardized mechanism for servers to notify clients when clinical data changes. Rather than inventing proprietary webhook systems, FHIR Subscriptions define a specification-compliant way to say "tell me when a new critical lab result is created" or "notify me when an encounter status changes to discharged."
The subscription model has evolved significantly between FHIR R4 and R5, and understanding both is essential since most production systems today run R4 while the R5 model represents the future direction.
FHIR R4: Criteria-Based Subscriptions
In FHIR R4, subscriptions use a simple criteria string -- essentially a FHIR search query that defines what resources to watch. When a resource matching that criteria is created or updated, the server sends a notification.
{
"resourceType": "Subscription",
"id": "critical-lab-alerts",
"status": "requested",
"reason": "Monitor critical lab results for immediate clinical alerting",
"criteria": "Observation?category=laboratory&value-quantity=gt|7.0||mmol/L&code=http://loinc.org|2823-3",
"channel": {
"type": "rest-hook",
"endpoint": "https://alerts.hospital.org/fhir/subscription-hook",
"payload": "application/fhir+json",
"header": [
"Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9..."
]
},
"end": "2026-12-31T23:59:59Z",
"contact": [
{
"system": "email",
"value": "integration-team@hospital.org"
}
]
} The R4 model is straightforward: the criteria field contains a search query (Observation?category=laboratory&code=2823-3), the channel defines where and how to deliver notifications, and the status tracks the subscription lifecycle. When a matching Observation resource is created on the server, it sends an HTTP POST to the specified endpoint with the resource payload.
R4 Limitations: The criteria-based model has well-documented shortcomings. There is no standard way for servers to advertise which subscription criteria they support. Clients cannot filter on resource state transitions (e.g., "only when status changes from active to completed"). Delete operations cannot be subscribed to. And there is no standard way to define cross-resource triggers (e.g., "when an Encounter is completed AND has associated DiagnosticReports").
FHIR R5: Topic-Based Subscriptions
FHIR R5 introduced a completely rearchitected subscription framework that addresses every R4 limitation. The key innovation is separating what can be subscribed to (SubscriptionTopic) from who is subscribing and how they want to be notified (Subscription).
First, the server defines a SubscriptionTopic -- a canonical resource that declares an event type, its triggers, and what filters subscribers can apply:
{
"resourceType": "SubscriptionTopic",
"id": "encounter-complete",
"url": "http://hospital.org/fhir/SubscriptionTopic/encounter-complete",
"version": "1.0.0",
"title": "Encounter Completed",
"status": "active",
"date": "2026-01-15",
"description": "Triggers when an encounter transitions to a completed or discharged status",
"resourceTrigger": [
{
"description": "Encounter resource transitions to finished status",
"resource": "http://hl7.org/fhir/StructureDefinition/Encounter",
"supportedInteraction": ["update"],
"queryCriteria": {
"previous": "status:not=finished",
"resultForCreate": "test-passes",
"current": "status=finished",
"resultForDelete": "test-fails",
"requireBoth": true
}
}
],
"canFilterBy": [
{
"description": "Filter by encounter service type",
"resource": "http://hl7.org/fhir/StructureDefinition/Encounter",
"filterParameter": "service-type",
"modifier": ["=", "in"]
},
{
"description": "Filter by patient",
"resource": "http://hl7.org/fhir/StructureDefinition/Encounter",
"filterParameter": "patient",
"modifier": ["=", "eq"]
},
{
"description": "Filter by location",
"resource": "http://hl7.org/fhir/StructureDefinition/Encounter",
"filterParameter": "location",
"modifier": ["="]
}
],
"notificationShape": [
{
"resource": "http://hl7.org/fhir/StructureDefinition/Encounter",
"include": [
"Encounter:patient",
"Encounter:practitioner"
]
}
]
} Then, a client creates a Subscription that references the topic and applies filters:
{
"resourceType": "Subscription",
"id": "cardiology-discharge-alerts",
"status": "requested",
"topic": "http://hospital.org/fhir/SubscriptionTopic/encounter-complete",
"reason": "Notify cardiology care coordination team when cardiac patients are discharged",
"filterBy": [
{
"resourceType": "Encounter",
"filterParameter": "service-type",
"value": "http://terminology.hl7.org/CodeSystem/service-type|305"
},
{
"resourceType": "Encounter",
"filterParameter": "location",
"value": "Location/cardiology-ward-3b"
}
],
"channelType": {
"system": "http://terminology.hl7.org/CodeSystem/subscription-channel-type",
"code": "rest-hook"
},
"endpoint": "https://cardiology-app.hospital.org/webhooks/discharge",
"heartbeatPeriod": 120,
"timeout": 30,
"contentType": "application/fhir+json",
"content": "full-resource",
"maxCount": 10
}
The Subscription Lifecycle
Both R4 and R5 subscriptions follow a defined state machine:
- Requested -- Client submits a Subscription resource with status "requested." The server validates the criteria (R4) or topic/filters (R5) and either activates it or returns an error.
- Active -- Server has accepted the subscription and is actively monitoring for matching events. The server updates the Subscription status to "active."
- Error -- Delivery failures have exceeded the server's retry threshold. The subscription is suspended and the server sets status to "error" with details in the
errorfield. - Off -- Client or server has deactivated the subscription. Can be reactivated by updating status back to "requested."
R5 adds a critical capability: the handshake. When a subscription is first activated, the server sends an empty notification to the endpoint to verify it is reachable and returns the expected response. This prevents "phantom subscriptions" where the server thinks it is delivering notifications but the endpoint is unreachable.
R5 Backport Implementation Guide
Recognizing that R5 adoption takes time, HL7 published the Subscriptions R5 Backport Implementation Guide, which allows R4 and R4B servers to support topic-based subscriptions using extensions. This is the recommended path for organizations that want R5 subscription capabilities without waiting for a full FHIR version upgrade. Servers like HAPI FHIR, Smile CDR, and Aidbox already support the backport specification.
Architecture Pattern: FHIR Server to Event Bus to Consumers
FHIR Subscriptions define how to get events out of a FHIR server. But a production clinical pipeline needs more than point-to-point webhooks. You need fan-out (one event, many consumers), guaranteed delivery, replay capability, and backpressure handling. That is where an event bus architecture comes in.
The Three-Layer Architecture
Layer 1: FHIR Server (Event Source)
The FHIR server (HAPI FHIR, Smile CDR, Azure Health Data Services, Google Cloud Healthcare API) acts as the system of record and event producer. It evaluates subscription criteria against incoming resource operations and emits notifications via rest-hook channels.
Layer 2: Event Gateway + Message Broker
A lightweight gateway service receives webhook notifications from the FHIR server, validates the payload, enriches the event with metadata (timestamp, source system, correlation ID), and publishes to a message broker. Apache Kafka and RabbitMQ are the two dominant choices here, each with different trade-offs:
- Apache Kafka -- Log-based, ordered, replayable. Best for high-throughput clinical data pipelines where you need event replay, stream processing, and long-term retention. Typical healthcare deployments handle 10,000 to 100,000+ events per minute.
- RabbitMQ -- Queue-based, flexible routing, lower operational complexity. Best for point-to-point notification patterns with moderate throughput. Easier to operate for teams without dedicated platform engineering.
Layer 3: Consumer Services
Independent microservices subscribe to relevant topics/queues and process events. Each consumer has its own processing logic, failure handling, and scaling characteristics:
- Clinical Alerting Service -- Evaluates critical value rules, generates provider notifications (pager, SMS, in-app). Latency target: under 5 seconds.
- Care Coordination Service -- Processes ADT events, updates patient panels, triggers transition-of-care workflows. Latency target: under 30 seconds.
- Analytics Pipeline -- Streams events to a data warehouse for population health analytics and operational dashboards. Latency target: under 5 minutes.
- AI/CDS Service -- Feeds clinical events to machine learning models for real-time clinical decision support. Latency target: under 10 seconds.
This architecture decouples producers from consumers entirely. The FHIR server does not need to know or care how many systems consume its events. New consumers can be added without modifying any existing component. For teams building AI-powered clinical workflows on this foundation, our article on FHIR-first AI healthcare strategy covers how interoperability infrastructure enables agentic AI systems.
Polling vs Push: Why Subscriptions Win
Some teams question whether FHIR Subscriptions are worth the architectural complexity when polling-based approaches "work fine." Here is the honest comparison:
| Factor | Polling (FHIR Search) | FHIR Subscriptions (Push) |
|---|---|---|
| Latency | Polling interval / 2 average (e.g., 5-min poll = 2.5-min avg delay) | Sub-second to seconds. Events delivered as they occur. |
| Server Load | Constant. N clients x polling frequency = continuous query load even when nothing changes. | Proportional to change rate. Zero load when nothing changes. |
| Scalability | Degrades linearly. 100 polling clients at 1-min intervals = 100 queries/min regardless of data changes. | Scales with event volume, not subscriber count. 100 subscribers add near-zero marginal server load. |
| Missed Events | Possible if resource is created and updated between poll intervals. Requires _lastUpdated tracking with no gaps. | Server guarantees delivery (with retry). No gap-tracking needed. |
| Network Efficiency | High bandwidth waste. Most poll responses return "no changes" (HTTP 200 with empty Bundle). | Efficient. Network traffic only on actual events. |
| Implementation Complexity | Low. Simple HTTP GET on a timer. | Moderate. Requires webhook endpoint, subscription management, error handling. |
| State Management | Client must track high-water marks, handle pagination, manage deduplication. | Server manages state. Client processes events as they arrive. |
| Real-Time Suitability | Poor. Aggressive polling (every 5s) is resource-prohibitive at scale. | Excellent. Designed for real-time use cases. |
When polling is acceptable: Low-frequency data sync (daily patient roster updates), systems where the FHIR server does not support subscriptions, and development/testing environments.
When subscriptions are essential: Critical value alerting, real-time care coordination, clinical decision support triggers, bed management, and any workflow where minutes of latency have clinical consequences.
Real-World Use Cases
FHIR Subscriptions are not a theoretical improvement. Here are five production use cases where event-driven architecture delivers measurable clinical and operational value.
1. Lab Critical Value Alerts
Subscription Topic: Observation resources with critical flag or values outside reference ranges.
Clinical Impact: A potassium result of 6.8 mEq/L triggers an immediate notification to the ordering physician, charge nurse, and pharmacist within seconds of LIS validation. The College of American Pathologists (CAP) requires critical value notification within 30 minutes; event-driven systems achieve it in under 60 seconds.
Architecture: FHIR server detects new Observation with interpretation=critical, publishes to Kafka topic clinical.alerts.critical-values, alert service consumes event, applies escalation rules, and dispatches via the most appropriate channel (pager for inpatient, SMS for outpatient). For teams setting up alerting infrastructure, our guide on alerting in healthcare systems with PagerDuty and FHIR covers the operational runbook side.
2. ADT Notifications for Care Coordination
Subscription Topic: Encounter resources with status transitions (admitted, transferred, discharged).
Clinical Impact: CMS Interoperability and Prior Authorization final rule (CMS-0057-F) requires real-time ADT notifications. Event-driven ADT processing enables same-hour care transition outreach instead of next-day follow-up, reducing 30-day readmissions by 10-20% according to multiple health system case studies.
Architecture: SubscriptionTopic monitors Encounter status changes, filters by care team assignment, notifies care managers, primary care practices, and health plan care coordinators simultaneously via fan-out.
3. Prescription Alerts for Pharmacy
Subscription Topic: MedicationRequest resources created or modified.
Clinical Impact: New prescriptions immediately trigger drug-drug interaction checks, formulary verification, and prior authorization workflows. Eliminates the 15-30 minute polling delay that exists in most pharmacy information systems today.
Architecture: FHIR subscription triggers on new MedicationRequest, event enriched with patient's current medication list, published to pharmacy processing pipeline. CDS engine evaluates interactions in real time before the prescription reaches the pharmacy queue.
4. Real-Time Bed Management
Subscription Topic: Location or Encounter resources reflecting bed status changes.
Operational Impact: Discharge events immediately update bed availability dashboards. Environmental services are notified for room turnover. Admitting can assign incoming patients to clean beds without manual status checking. Hospitals with real-time bed management report 15-25% improvement in bed turnover time.
5. AI and Clinical Decision Support Triggers
Subscription Topic: Configurable based on CDS rule requirements -- vital signs, lab results, medication orders, or composite events.
Clinical Impact: Real-time event streams feed machine learning models for sepsis prediction, deterioration detection, and clinical pathway adherence. A sepsis prediction model that receives vital signs 30 minutes earlier can improve detection sensitivity by 15-20%. This is the foundation for the AI-powered clinical workflows we describe in our FHIR-first AI strategy guide.
Building a FHIR Subscription Service
Let us build a production-grade FHIR Subscription service in TypeScript/Node.js. This service registers subscriptions with a FHIR server, receives webhook notifications, processes clinical events, and publishes to an event bus.
Step 1: Register a Subscription
import axios, { AxiosInstance } from "axios";
interface SubscriptionConfig {
fhirBaseUrl: string;
webhookEndpoint: string;
bearerToken: string;
criteria: string; // R4 criteria string
topic?: string; // R5 topic canonical URL
filterBy?: Array<{
resourceType: string;
filterParameter: string;
value: string;
}>;
}
class FHIRSubscriptionManager {
private client: AxiosInstance;
private config: SubscriptionConfig;
constructor(config: SubscriptionConfig) {
this.config = config;
this.client = axios.create({
baseURL: config.fhirBaseUrl,
headers: {
"Content-Type": "application/fhir+json",
Authorization: `Bearer ${config.bearerToken}`,
},
});
}
// Register an R4 criteria-based subscription
async registerR4Subscription(): Promise<string> {
const subscription = {
resourceType: "Subscription",
status: "requested",
reason: "Real-time clinical event monitoring",
criteria: this.config.criteria,
channel: {
type: "rest-hook",
endpoint: this.config.webhookEndpoint,
payload: "application/fhir+json",
header: [`Authorization: Bearer ${this.config.bearerToken}`],
},
};
const response = await this.client.post("/Subscription", subscription);
const subscriptionId = response.headers.location || response.data.id;
console.log(`Subscription registered: ${subscriptionId}`);
// Poll until active (server validates and activates)
await this.waitForActivation(subscriptionId);
return subscriptionId;
}
// Register an R5 topic-based subscription
async registerR5Subscription(): Promise<string> {
const subscription = {
resourceType: "Subscription",
status: "requested",
topic: this.config.topic,
reason: "Real-time clinical event monitoring",
filterBy: this.config.filterBy || [],
channelType: {
system: "http://terminology.hl7.org/CodeSystem/subscription-channel-type",
code: "rest-hook",
},
endpoint: this.config.webhookEndpoint,
heartbeatPeriod: 120,
timeout: 30,
contentType: "application/fhir+json",
content: "full-resource",
};
const response = await this.client.post("/Subscription", subscription);
return response.data.id;
}
// Wait for server to activate the subscription
private async waitForActivation(
subscriptionId: string,
maxAttempts: number = 10
): Promise<void> {
for (let i = 0; i < maxAttempts; i++) {
const response = await this.client.get(
`/Subscription/${subscriptionId}`
);
if (response.data.status === "active") {
console.log(`Subscription ${subscriptionId} is active`);
return;
}
if (response.data.status === "error") {
throw new Error(
`Subscription failed: ${response.data.error || "Unknown error"}`
);
}
await new Promise((resolve) => setTimeout(resolve, 1000));
}
throw new Error("Subscription activation timed out");
}
// Check subscription health
async getSubscriptionStatus(subscriptionId: string): Promise<string> {
const response = await this.client.get(
`/Subscription/${subscriptionId}`
);
return response.data.status;
}
// Deactivate subscription gracefully
async deactivateSubscription(subscriptionId: string): Promise<void> {
await this.client.patch(`/Subscription/${subscriptionId}`, [
{ op: "replace", path: "/status", value: "off" },
]);
console.log(`Subscription ${subscriptionId} deactivated`);
}
} Step 2: Webhook Receiver and Event Processing
import express, { Request, Response } from "express";
import { Kafka, Producer } from "kafkajs";
import crypto from "crypto";
// --- Kafka Producer Setup ---
const kafka = new Kafka({
clientId: "fhir-subscription-gateway",
brokers: [process.env.KAFKA_BROKERS || "localhost:9092"],
ssl: true,
sasl: {
mechanism: "scram-sha-256",
username: process.env.KAFKA_USERNAME || "",
password: process.env.KAFKA_PASSWORD || "",
},
});
const producer: Producer = kafka.producer({
idempotent: true, // Exactly-once semantics
maxInFlightRequests: 5,
transactionalId: "fhir-events", // Enable transactions
});
// --- Idempotency Store (Redis-backed in production) ---
const processedEvents = new Set<string>();
function generateEventId(bundle: any): string {
const timestamp = bundle.meta?.lastUpdated || new Date().toISOString();
const content = JSON.stringify(bundle.entry?.[0]?.resource || bundle);
return crypto.createHash("sha256").update(`${timestamp}:${content}`).digest("hex");
}
// --- Event Classification ---
interface ClinicalEvent {
eventId: string;
eventType: string;
resourceType: string;
resourceId: string;
topic: string; // Kafka topic
priority: "critical" | "high" | "normal" | "low";
timestamp: string;
payload: any;
metadata: {
subscriptionId: string;
fhirServer: string;
notificationType: string;
};
}
function classifyEvent(bundle: any, subscriptionId: string): ClinicalEvent {
const entry = bundle.entry?.[1]; // First entry after SubscriptionStatus
const resource = entry?.resource;
const resourceType = resource?.resourceType || "Unknown";
const resourceId = resource?.id || "unknown";
// Determine clinical priority
let priority: ClinicalEvent["priority"] = "normal";
if (resourceType === "Observation") {
const interpretation = resource.interpretation?.[0]?.coding?.[0]?.code;
if (["critical", "HH", "LL", "AA"].includes(interpretation)) {
priority = "critical";
}
}
// Map to Kafka topics based on resource type and priority
const topicMap: Record<string, string> = {
Observation: priority === "critical"
? "clinical.alerts.critical-values"
: "clinical.observations",
Encounter: "clinical.encounters.adt",
MedicationRequest: "clinical.pharmacy.orders",
DiagnosticReport: "clinical.diagnostics.reports",
Condition: "clinical.conditions",
};
return {
eventId: generateEventId(bundle),
eventType: `${resourceType}.${entry?.request?.method || "update"}`,
resourceType,
resourceId,
topic: topicMap[resourceType] || "clinical.events.unclassified",
priority,
timestamp: new Date().toISOString(),
payload: resource,
metadata: {
subscriptionId,
fhirServer: bundle.meta?.source || "unknown",
notificationType: bundle.type || "subscription-notification",
},
};
}
// --- Express Webhook Server ---
const app = express();
app.use(express.json({ limit: "1mb", type: "application/fhir+json" }));
// Health check for subscription handshake
app.get("/webhooks/fhir/:subscriptionId", (req: Request, res: Response) => {
console.log(`Handshake received for subscription: ${req.params.subscriptionId}`);
res.status(200).send("OK");
});
// Main webhook handler
app.post("/webhooks/fhir/:subscriptionId", async (req: Request, res: Response) => {
const startTime = Date.now();
const subscriptionId = req.params.subscriptionId;
const bundle = req.body;
try {
// 1. Validate notification bundle
if (bundle.resourceType !== "Bundle") {
res.status(400).json({ error: "Expected a FHIR Bundle" });
return;
}
// 2. Check for handshake notification (empty notification)
const subscriptionStatus = bundle.entry?.[0]?.resource;
if (subscriptionStatus?.type === "handshake") {
console.log(`Handshake confirmed for ${subscriptionId}`);
res.status(200).send("OK");
return;
}
// 3. Idempotency check
const eventId = generateEventId(bundle);
if (processedEvents.has(eventId)) {
console.log(`Duplicate event skipped: ${eventId}`);
res.status(200).send("OK"); // Return 200 to prevent server retry
return;
}
// 4. Classify and enrich event
const event = classifyEvent(bundle, subscriptionId);
// 5. Publish to Kafka with transactional guarantees
const transaction = await producer.transaction();
try {
await transaction.send({
topic: event.topic,
messages: [
{
key: `${event.resourceType}/${event.resourceId}`,
value: JSON.stringify(event),
headers: {
"event-id": event.eventId,
"event-type": event.eventType,
priority: event.priority,
"content-type": "application/json",
},
},
],
});
await transaction.commit();
} catch (kafkaError) {
await transaction.abort();
throw kafkaError;
}
// 6. Mark as processed
processedEvents.add(eventId);
const processingTime = Date.now() - startTime;
console.log(
`Event processed: ${event.eventType} [${event.priority}] in ${processingTime}ms`
);
// Return 200 to acknowledge receipt
res.status(200).json({
eventId: event.eventId,
processingTimeMs: processingTime,
});
} catch (error) {
console.error(`Webhook processing failed for ${subscriptionId}:`, error);
// Return 500 so the FHIR server retries delivery
res.status(500).json({ error: "Processing failed, will retry" });
}
});
// --- Start Server ---
async function start() {
await producer.connect();
app.listen(3000, () => {
console.log("FHIR Subscription webhook server running on :3000");
});
}
start().catch(console.error);
Step 3: Consumer Service Example
import { Kafka, Consumer, EachMessagePayload } from "kafkajs";
const kafka = new Kafka({
clientId: "critical-alert-consumer",
brokers: [process.env.KAFKA_BROKERS || "localhost:9092"],
});
const consumer: Consumer = kafka.consumer({
groupId: "clinical-alerting-service",
sessionTimeout: 30000,
heartbeatInterval: 3000,
});
interface AlertChannel {
type: "pager" | "sms" | "secure-message" | "dashboard";
recipient: string;
}
async function processAlertEvent(event: any): Promise<void> {
const resource = event.payload;
const resourceType = event.resourceType;
if (resourceType === "Observation" && event.priority === "critical") {
// Extract critical value details
const value = resource.valueQuantity?.value;
const unit = resource.valueQuantity?.unit;
const code = resource.code?.coding?.[0]?.display || resource.code?.text;
const patientRef = resource.subject?.reference;
// Determine alert channels based on clinical context
const channels: AlertChannel[] = [
{ type: "pager", recipient: resource.performer?.[0]?.reference || "on-call" },
{ type: "dashboard", recipient: "charge-nurse-station" },
];
// Dispatch alerts
for (const channel of channels) {
await dispatchAlert({
severity: "CRITICAL",
message: `Critical ${code}: ${value} ${unit} for patient ${patientRef}`,
channel,
eventId: event.eventId,
timestamp: event.timestamp,
});
}
console.log(
`Critical alert dispatched: ${code} = ${value} ${unit} via ${channels.length} channels`
);
}
}
async function dispatchAlert(alert: any): Promise<void> {
// Integration with PagerDuty, Twilio, or internal messaging
// Implementation depends on your alerting infrastructure
console.log(`ALERT [${alert.severity}]: ${alert.message} -> ${alert.channel.type}`);
}
async function startConsumer(): Promise<void> {
await consumer.connect();
await consumer.subscribe({
topic: "clinical.alerts.critical-values",
fromBeginning: false,
});
await consumer.run({
eachMessage: async ({ topic, partition, message }: EachMessagePayload) => {
const event = JSON.parse(message.value?.toString() || "{}");
try {
await processAlertEvent(event);
} catch (error) {
console.error(`Failed to process alert event ${event.eventId}:`, error);
// Dead letter queue handling would go here
}
},
});
console.log("Critical alert consumer running");
}
startConsumer().catch(console.error); For teams building monitoring dashboards around these pipelines, our guide on healthcare integration dashboard metrics covers the seven key metrics every interface team should track.
Cloud-Native Event Architecture
Running your own Kafka cluster is powerful but operationally demanding. Cloud-native event services can simplify the broker layer while adding healthcare-specific capabilities like HIPAA-eligible infrastructure, managed encryption, and built-in audit logging.
| Capability | AWS EventBridge | Azure Event Grid | GCP Pub/Sub |
|---|---|---|---|
| Architecture Model | Serverless event bus with rules-based routing | Topic/subscription with push delivery | Topic/subscription with push and pull delivery |
| HIPAA Eligible | Yes (with BAA) | Yes (with BAA) | Yes (with BAA) |
| Max Message Size | 256 KB | 1 MB (Cloud Events schema) | 10 MB |
| Throughput | Up to tens of millions of events/sec per region | 10 million events/sec per region | Effectively unlimited with auto-scaling |
| Message Retention | 24 hours (archive for replay) | 24 hours | 31 days (configurable) |
| Ordering Guarantees | No native ordering | No ordering guarantee | Ordered delivery with ordering keys |
| Dead Letter Support | Yes (DLQ with SQS) | Yes (built-in dead-lettering) | Yes (dead-letter topics) |
| Native FHIR Integration | Via AWS HealthLake events | Via Azure Health Data Services | Via Cloud Healthcare API Pub/Sub notifications |
| Encryption at Rest | AWS KMS | Microsoft-managed or CMK | Google-managed or CMEK |
| Pricing Model | $1.00 per million events published | $0.60 per million operations | $40 per TiB ingested + $20 per TiB delivered (at volume) |
| Best For | AWS-native healthcare stacks with HealthLake | Microsoft/Epic healthcare shops using Azure Health Data Services | High-throughput clinical data pipelines, multi-cloud |
Healthcare-Specific Considerations
Google Cloud Healthcare API has the tightest native FHIR integration. It can emit Pub/Sub notifications directly when FHIR resources are created, updated, or deleted -- no separate subscription management needed. You configure a FHIR store to publish to a Pub/Sub topic, and every resource operation automatically generates an event.
Azure Health Data Services integrates with Event Grid through its FHIR service, enabling event-driven workflows that trigger Logic Apps, Azure Functions, or custom webhook endpoints when FHIR data changes. This is particularly compelling for health systems already invested in the Microsoft ecosystem (Epic on Azure, Teams for clinical communication).
AWS HealthLake supports change data capture that can feed into EventBridge rules, enabling routing to Lambda functions, Step Functions, or SQS queues based on resource type and content. For teams already using Mirth Connect as their integration engine, our guide on change data capture for healthcare data warehouses covers how CDC patterns complement event-driven architectures.
Error Handling and Reliability
In healthcare, a lost event can mean a missed critical alert. Reliability engineering for clinical event pipelines requires a different standard than typical web application event systems.
Retry Strategies
When a webhook delivery fails (network timeout, 5xx response, connection refused), the FHIR server should retry with exponential backoff:
// Retry configuration for healthcare-grade reliability
const retryConfig = {
maxRetries: 8,
baseDelayMs: 1000, // 1 second initial delay
maxDelayMs: 300000, // 5-minute maximum delay
backoffMultiplier: 2, // Exponential: 1s, 2s, 4s, 8s, 16s, 32s, 64s, 128s
jitterFactor: 0.1, // +/- 10% random jitter to prevent thundering herd
// Calculate delay for attempt N
getDelay(attempt: number): number {
const exponentialDelay = this.baseDelayMs * Math.pow(this.backoffMultiplier, attempt);
const clampedDelay = Math.min(exponentialDelay, this.maxDelayMs);
const jitter = clampedDelay * this.jitterFactor * (Math.random() * 2 - 1);
return Math.round(clampedDelay + jitter);
},
// Total retry window: approximately 4.25 minutes before DLQ
getTotalWindow(): number {
let total = 0;
for (let i = 0; i < this.maxRetries; i++) {
total += this.baseDelayMs * Math.pow(this.backoffMultiplier, i);
}
return total;
},
}; Dead Letter Queues
Events that exhaust all retry attempts must go somewhere -- never silently dropped. A dead letter queue (DLQ) captures failed events for manual investigation and replay:
interface DeadLetterEntry {
originalEvent: ClinicalEvent;
failureReason: string;
failureTimestamp: string;
retryCount: number;
lastAttemptError: string;
requiresManualReview: boolean;
}
async function sendToDeadLetterQueue(
event: ClinicalEvent,
error: Error,
retryCount: number
): Promise<void> {
const dlqEntry: DeadLetterEntry = {
originalEvent: event,
failureReason: error.message,
failureTimestamp: new Date().toISOString(),
retryCount,
lastAttemptError: error.stack || error.message,
requiresManualReview: event.priority === "critical",
};
// Publish to DLQ topic
await producer.send({
topic: "clinical.events.dead-letter",
messages: [
{
key: event.eventId,
value: JSON.stringify(dlqEntry),
headers: {
"original-topic": event.topic,
priority: event.priority,
"failure-reason": error.message,
},
},
],
});
// CRITICAL: Alert on-call if this is a clinical alert that failed delivery
if (event.priority === "critical") {
await triggerPagerDutyIncident({
severity: "high",
summary: `Critical clinical event delivery failed: ${event.eventType}`,
details: dlqEntry,
service: "fhir-subscription-gateway",
});
}
} Idempotency
FHIR servers may deliver the same notification multiple times (at-least-once delivery). Every consumer must handle duplicates gracefully:
- Use the event ID (derived from resource version + timestamp) as a deduplication key
- Store processed event IDs in Redis with a TTL matching the subscription's maximum retry window
- Design all downstream operations to be idempotent -- sending the same alert twice should not page a physician twice
What Happens When an Alert Is Lost?
This is the question that keeps healthcare integration engineers awake at night. The mitigation strategy is defense in depth:
- Primary path: FHIR Subscription delivers event in real time
- Secondary path: Polling-based reconciliation job runs every 5 minutes, catches any events missed by the subscription
- Tertiary path: End-of-shift report compares critical results generated vs. critical alerts delivered, flags any gaps
- Monitoring: Alerting on subscription error rate, delivery latency percentiles, and DLQ depth triggers on-call escalation within minutes of a delivery failure pattern
Monitoring and Observability
An event-driven system you cannot observe is an event-driven system you cannot trust. Healthcare-grade observability requires monitoring at every layer of the pipeline.
Subscription Health Metrics
| Metric | What It Measures | Alert Threshold | Why It Matters |
|---|---|---|---|
subscription.status | Current status of each subscription (active/error/off) | Any status != active | An errored subscription means events are being silently dropped |
webhook.delivery.latency_p99 | 99th percentile webhook delivery time | > 5 seconds | Delivery latency directly impacts clinical alert timeliness |
webhook.delivery.failure_rate | Percentage of webhook deliveries returning non-2xx | > 1% over 5 minutes | Rising failure rate indicates endpoint degradation |
event.processing.throughput | Events processed per second across all topics | Drop > 50% from baseline | Throughput collapse may indicate upstream FHIR server issues |
kafka.consumer.lag | Messages waiting to be consumed per consumer group | > 1000 messages for critical topics | Growing lag means consumers cannot keep up with event volume |
dlq.depth | Number of events in dead letter queue | > 0 for critical-priority events | Any critical event in DLQ requires immediate investigation |
event.end_to_end.latency | Time from FHIR resource save to consumer processing complete | > 30 seconds for critical alerts | End-to-end latency is the metric that maps to clinical outcomes |
reconciliation.gap_count | Events detected by reconciliation but missed by subscription | > 0 | Non-zero gap count indicates subscription delivery problems |
Structured Logging for Event Tracing
import pino from "pino";
const logger = pino({
level: "info",
formatters: {
level: (label) => ({ level: label }),
},
serializers: {
event: (event: ClinicalEvent) => ({
eventId: event.eventId,
eventType: event.eventType,
resourceType: event.resourceType,
resourceId: event.resourceId,
priority: event.priority,
topic: event.topic,
// Never log PHI in plain text -- reference IDs only
patientRef: event.payload?.subject?.reference || "unknown",
}),
},
});
// Usage in webhook handler
logger.info(
{
event: classifiedEvent,
subscriptionId,
processingTimeMs: Date.now() - startTime,
kafkaPartition: result.partition,
kafkaOffset: result.offset,
},
"FHIR subscription event processed successfully"
); The logging pattern above deliberately avoids recording any Protected Health Information (PHI) in log entries. Only resource references (e.g., Patient/12345) appear in logs, never names, dates of birth, or clinical values. This is critical for HIPAA compliance in your logging and observability infrastructure.
For a deeper look at production monitoring patterns for healthcare integration systems, see our guide on what reliable monitoring looks like in production.
Production Deployment Checklist
Before going live with a FHIR Subscription pipeline in a clinical environment, validate every item on this checklist:
- Endpoint Security: Webhook endpoint uses TLS 1.2+ with a valid certificate. Authentication header contains a non-expiring service credential or the token refresh is automated.
- Subscription Recovery: If the FHIR server restarts, subscriptions auto-recover to active status. Test this explicitly -- some servers require re-registration after restart.
- Backpressure Handling: If the webhook endpoint is slow or unavailable, the FHIR server queues notifications rather than dropping them. Verify the server's queue depth and retention policy.
- Idempotency Verified: Replay the same notification 3 times and confirm the consumer produces exactly one downstream effect.
- DLQ Monitoring: Dead letter queue has alerting configured. On-call team has a documented runbook for DLQ triage and event replay.
- Reconciliation Job Active: Polling-based reconciliation runs on schedule and alerts on detected gaps.
- PHI Logging Audit: Review all log statements to confirm no PHI appears in plain text. Resource references only.
- Load Testing Completed: Tested at 2x expected peak throughput. Confirmed no message loss under sustained load.
- Failover Tested: Simulated Kafka broker failure, webhook endpoint outage, and FHIR server restart. Confirmed zero event loss across all failure scenarios.
- Clinical Stakeholder Sign-off: Clinical informatics team has validated that alert routing rules match clinical expectations and escalation policies.
Frequently Asked Questions
Do all FHIR servers support Subscriptions?
No. Subscription support varies significantly across FHIR server implementations. HAPI FHIR (open source), Smile CDR, IBM FHIR Server, and Google Cloud Healthcare API support R4 Subscriptions. For R5 topic-based subscriptions, HAPI FHIR 6.x+, Smile CDR, and Aidbox have implemented the specification. Microsoft Azure Health Data Services supports FHIR event notifications through its own eventing mechanism rather than native FHIR Subscriptions. Always check your server's CapabilityStatement resource (GET /metadata) for the Subscription resource in rest.resource to confirm support.
Can I use FHIR Subscriptions with an existing HL7v2 interface engine like Mirth Connect?
Yes. The typical pattern is to place a FHIR facade or integration layer between your HL7v2 systems and a FHIR server. The FHIR server handles subscription management and notification delivery. Mirth Connect can act as the webhook consumer, receiving FHIR notifications and transforming them into HL7v2 messages for downstream systems that have not migrated to FHIR. This hybrid approach lets you adopt event-driven architecture incrementally without replacing your entire integration stack.
How do FHIR Subscriptions handle high-availability and failover?
At the FHIR server level, subscription state is typically persisted in the server's database, so it survives server restarts. For the consumer side, running multiple webhook endpoint instances behind a load balancer provides availability. The idempotency mechanisms described in this guide ensure that if both instances receive a notification (due to load balancer behavior), the event is processed exactly once. For the event bus layer, Kafka's replication factor and consumer group rebalancing provide automatic failover.
What is the maximum throughput for FHIR Subscriptions?
Throughput depends entirely on the FHIR server implementation and the webhook delivery mechanism. HAPI FHIR can handle thousands of subscription evaluations per second on modest hardware. The bottleneck is typically not the subscription evaluation but the webhook delivery -- HTTP POST calls to external endpoints are inherently limited by network latency and endpoint processing speed. Using an event bus architecture (FHIR server posts to a local gateway, gateway publishes to Kafka) removes the webhook delivery bottleneck from the critical path.
Are FHIR Subscriptions HIPAA compliant?
FHIR Subscriptions are a specification, not a product, so HIPAA compliance depends on your implementation. Key requirements: webhook endpoints must use TLS encryption in transit, the FHIR server and all consumer services must be covered by Business Associate Agreements (BAAs), notification payloads containing PHI must be encrypted at rest in any message broker or queue, audit logs must capture all subscription creation, modification, and notification delivery events, and access to subscription management must be restricted to authorized systems. Using content: "id-only" in your subscription (rather than "full-resource") minimizes PHI exposure in transit by sending only resource references rather than full clinical data.
Should I use R4 or R5 Subscriptions for a new project starting in 2026?
If your FHIR server supports R5 or the R5 Backport IG, use the topic-based model. The architecture is cleaner, the server advertises available topics (so clients can discover what is subscribable), and the filter mechanism is more powerful. If your server only supports R4, use R4 criteria-based subscriptions but structure your consumer architecture as described in this guide -- the event bus pattern is the same regardless of the subscription model, so migrating from R4 to R5 subscriptions later only requires changing the subscription registration code, not rearchitecting your pipeline.
Moving From Batch to Real-Time
The shift from batch-processing to event-driven architecture in healthcare is not a technology upgrade -- it is a clinical capability upgrade. When a critical potassium result triggers an alert in 4 seconds instead of 46 minutes, the downstream impact is not measured in system metrics but in patient outcomes. When an ADT notification reaches a care coordinator in real time instead of the next morning, the window for preventing a readmission opens wide enough to actually act.
FHIR Subscriptions provide the standardized mechanism. Event bus architectures provide the reliability and scale. Cloud-native services reduce the operational burden. And the patterns described in this guide -- idempotent processing, dead letter queues, reconciliation safety nets, defense-in-depth reliability -- provide the healthcare-grade guarantees that clinical workflows demand.
The batch-processing era served healthcare for decades. But the clinical workflows of 2026 and beyond -- real-time CDS, AI-driven deterioration detection, same-hour care transitions -- require an architecture that moves at the speed of clinical need.
At Nirmitee, we build event-driven healthcare integration systems that connect FHIR servers, EHRs, and clinical applications with the real-time reliability that patient care demands. If your organization is ready to move beyond batch processing, let's talk about your architecture.



