Why Polling Is Killing Your EHR Integration
Most healthcare integrations today rely on polling—periodically querying a FHIR server or database for changes. A downstream system asks "anything new?" every 30 seconds, 5 minutes, or 15 minutes. This approach has three fundamental problems: it wastes resources when nothing has changed, it introduces latency proportional to the polling interval, and it does not scale when dozens of consumers all poll the same source.
Event-driven architecture solves all three. Instead of consumers asking for changes, the source system publishes events when data changes occur. Consumers subscribe to the events they care about and receive notifications in real time. For healthcare systems where a missed lab result or delayed medication alert can affect patient safety, the difference between 15-minute polling lag and sub-second event delivery is clinically significant.
This guide covers a production architecture that combines FHIR Subscriptions as the event source, Apache Kafka as the event backbone, and multiple consumer patterns for clinical dashboards, analytics pipelines, AI agents, and notification services. We include Kafka topic design, consumer group patterns, and exactly-once delivery semantics for clinical events.
Architecture Overview: FHIR Subscriptions + Kafka
The architecture has three layers: event sources that detect and publish changes, an event backbone that routes and stores events durably, and event consumers that process events for specific use cases.
Event Sources
Two primary mechanisms detect changes in healthcare data:
- FHIR Subscriptions (R5 Topic-Based): The FHIR server itself detects resource changes and sends notifications to subscribers. FHIR R5 introduced topic-based subscriptions that replace the R4 channel-based model with a more scalable approach. Each
SubscriptionTopicdefines what triggers a notification (resource type, interaction, filter criteria), and eachSubscriptionexpresses a consumer's interest in specific topics. - Change Data Capture (CDC): For systems where the FHIR server does not support R5 subscriptions, CDC captures changes directly from the database. Tools like Debezium monitor the database transaction log and emit events for every insert, update, or delete—bypassing the FHIR API entirely.
Event Backbone: Apache Kafka
Kafka serves as the central nervous system of the event-driven architecture. Every clinical event flows through Kafka, which provides durable storage, ordered delivery, and the ability for multiple consumers to process the same event independently.
| Kafka Feature | Healthcare Benefit |
|---|---|
| Durable storage | Clinical events are never lost, even if consumers are temporarily down |
| Ordered delivery | Events for the same patient arrive in the order they occurred |
| Consumer groups | Multiple services process the same events independently |
| Replay capability | Re-process historical events when deploying new analytics or fixing bugs |
| Partitioning | Scale horizontally by partitioning events by patient ID |
Event Consumers
Downstream consumers subscribe to Kafka topics and process events for specific use cases:
- Clinical dashboards: Real-time updates when new lab results, vitals, or orders arrive.
- Notification services: Push alerts to clinicians when critical values are detected.
- Analytics pipelines: Stream events to data warehouses for population health analytics.
- AI/ML agents: Feed clinical events to inference services for sepsis prediction, readmission risk, and drug interaction alerts.
- Audit logging: Record every data access and modification for compliance.
Kafka Topic Design for Healthcare
Topic design is one of the most important architectural decisions. In healthcare, the natural topic structure maps to FHIR resource types, with patient ID as the partition key.
Recommended Topic Structure
# Topic naming convention
fhir.events.{resource-type}
# Examples:
fhir.events.observation # Lab results, vitals, assessments
fhir.events.condition # Diagnoses, problems
fhir.events.medicationrequest # Medication orders
fhir.events.encounter # Admissions, discharges, transfers
fhir.events.procedure # Completed procedures
fhir.events.diagnosticreport # Radiology, pathology reports
fhir.events.allergyintolerance # Allergy updates
fhir.events.servicerequest # Orders, referrals
# Partition key: Patient ID (ensures ordering per patient)
# Retention: 30 days for operational events, 7 years for audit eventsPartitioning by patient ID guarantees that all events for the same patient are processed in order within a single partition. This is critical for clinical correctness—if an order is placed and then cancelled, the cancellation event must be processed after the order event, never before.
Event Schema with FHIR Resources
{
"eventId": "evt-2026-03-16-001",
"eventType": "resource.created",
"timestamp": "2026-03-16T14:30:00.123Z",
"source": "ehr-fhir-server",
"subject": "Patient/patient-456",
"resource": {
"resourceType": "Observation",
"id": "lab-result-789",
"status": "final",
"category": [{
"coding": [{
"system": "http://terminology.hl7.org/CodeSystem/observation-category",
"code": "laboratory"
}]
}],
"code": {
"coding": [{
"system": "http://loinc.org",
"code": "2345-7",
"display": "Glucose [Mass/volume] in Serum or Plasma"
}]
},
"valueQuantity": {
"value": 250,
"unit": "mg/dL",
"system": "http://unitsofmeasure.org"
},
"referenceRange": [{
"low": {"value": 70, "unit": "mg/dL"},
"high": {"value": 100, "unit": "mg/dL"}
}]
}
}The event envelope wraps the FHIR resource with metadata: a unique event ID for deduplication, the event type (created, updated, deleted), a timestamp, the source system, and the patient reference as the subject. This envelope enables Kafka consumers to filter and route events without parsing the FHIR resource payload.
FHIR R5 Subscription Configuration
FHIR R5 subscriptions use a topic-based model. First, you define a SubscriptionTopic that specifies what triggers notifications. Then, consumers create Subscription resources to express their interest in specific topics.
{
"resourceType": "SubscriptionTopic",
"id": "critical-lab-results",
"url": "https://ehr.example.com/SubscriptionTopic/critical-lab-results",
"status": "active",
"title": "Critical Lab Results",
"resourceTrigger": [{
"resource": "Observation",
"supportedInteraction": ["create", "update"],
"queryCriteria": {
"current": "category=laboratory&status=final"
}
}]
}
// Consumer subscription:
{
"resourceType": "Subscription",
"status": "active",
"topic": "https://ehr.example.com/SubscriptionTopic/critical-lab-results",
"reason": "Critical lab result notifications for ICU dashboard",
"channelType": {
"system": "http://terminology.hl7.org/CodeSystem/subscription-channel-type",
"code": "rest-hook"
},
"endpoint": "https://kafka-bridge.example.com/fhir-events",
"content": "full-resource",
"filterBy": [{
"filterParameter": "category",
"value": "laboratory"
}]
}The subscription sends notifications to a Kafka bridge service that receives the REST webhook and publishes the event to the appropriate Kafka topic. This bridge decouples the FHIR server from Kafka, allowing the FHIR server to use its native notification mechanism while events flow through Kafka for downstream processing. This architectural pattern builds on the FHIR interoperability standards that define the subscription specification.
Change Data Capture as Alternative Event Source
When the FHIR server does not support R5 subscriptions—or when you need to capture changes from legacy systems that lack FHIR APIs—Change Data Capture (CDC) provides an alternative event source. Tools like Debezium read the database transaction log and emit events for every data change.
# Debezium CDC connector configuration for PostgreSQL
{
"name": "ehr-fhir-cdc",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "fhir-db.internal",
"database.port": "5432",
"database.user": "cdc_reader",
"database.dbname": "fhir_store",
"database.server.name": "ehr-fhir",
"table.include.list": "fhir.observation,fhir.condition,fhir.encounter,fhir.medicationrequest",
"plugin.name": "pgoutput",
"transforms": "route",
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
"transforms.route.regex": "ehr-fhir.fhir.(.*)",
"transforms.route.replacement": "fhir.events.$1"
}
}CDC captures every database change regardless of whether it came through the FHIR API, an HL7 v2 interface, or direct database writes. This makes it valuable for organizations using HL7 interface engines like Mirth Connect alongside FHIR APIs.
Exactly-Once Delivery for Clinical Events
In healthcare, duplicate event processing can have clinical consequences—a duplicate medication order event could trigger a double dose alert, or a duplicate admission event could create phantom encounters. Kafka's exactly-once semantics (EOS) prevent this.
Implementing Exactly-Once
- Idempotent producers: Configure Kafka producers with
enable.idempotence=trueto prevent duplicate messages from retries. - Transactional consumers: Use Kafka transactions to atomically read, process, and commit offsets. If processing fails, the entire transaction is rolled back.
- Deduplication at the consumer: Even with EOS, design consumers to be idempotent. Use the event ID as a deduplication key and check before processing.
Consumer Group Patterns for Healthcare
Kafka consumer groups enable multiple services to process the same events independently while ensuring each service processes each event exactly once.
| Consumer Group | Kafka Topic | Purpose | Processing |
|---|---|---|---|
| clinical-dashboard | fhir.events.observation | Real-time vitals display | Low latency, last-value wins |
| alert-engine | fhir.events.observation | Critical value alerts | Rule evaluation, notification |
| analytics-pipeline | fhir.events.* | Data warehouse loading | Batch micro-batches, dedup |
| ai-inference | fhir.events.observation, encounter | Sepsis prediction, risk scoring | ML model inference, real-time |
| audit-service | fhir.events.* | Compliance audit trail | Append-only, long retention |
Each consumer group maintains its own offset position in each topic partition. The clinical dashboard group and the alert engine group both read from fhir.events.observation, but they process events independently and at their own pace. If the analytics pipeline falls behind during a high-volume period, it does not affect the real-time dashboard or alerts.
Schema Registry for FHIR Events
A schema registry ensures that all event producers and consumers agree on the event format. For FHIR events, the schema should validate both the event envelope and the embedded FHIR resource.
# Schema Registry configuration
# Register FHIR event schema
curl -X POST https://schema-registry.internal/subjects/fhir.events.observation-value/versions \
-H 'Content-Type: application/json' \
-d '{"schema": "{...avro or json schema...}", "schemaType": "JSON"}'Use JSON Schema rather than Avro for FHIR events—FHIR resources are natively JSON, and JSON Schema validation can leverage the existing FHIR StructureDefinitions. The schema registry enforces backward compatibility, preventing producers from breaking consumers when the event format evolves. This is especially important when choosing the right technology stack for healthcare event processing.
Monitoring and Observability
An event-driven healthcare system requires comprehensive monitoring. Unlike synchronous API calls where failures are immediately visible, event processing failures can be silent—a consumer that stops processing events may not trigger any errors until a clinician notices missing data on their dashboard hours later.
Key Metrics to Track
| Metric | Threshold | Impact if Breached |
|---|---|---|
| Consumer lag (events behind) | < 100 events | Stale clinical data on dashboards |
| Event processing latency (p99) | < 2 seconds | Delayed critical alerts |
| Failed event rate | < 0.01% | Missing clinical events |
| Dead letter queue depth | 0 (alert on any) | Unprocessable clinical data |
| Kafka broker disk usage | < 80% | Event loss risk |
Every consumer should publish processing metrics to a monitoring system (Prometheus, Datadog, or CloudWatch). Set up alerting on consumer lag—if the alert engine consumer falls more than 100 events behind, pages should fire immediately. For AI-driven clinical decision support consumers, latency monitoring is critical because delayed inference results lose their clinical relevance.
Saga Patterns for Cross-System Consistency
Healthcare workflows often span multiple systems—a medication order involves the EHR, pharmacy system, medication dispensing cabinet, and billing system. The saga pattern coordinates these multi-system workflows through events rather than distributed transactions.
Example: Medication Order Saga
- EHR publishes
MedicationRequest.createdevent to Kafka. - Pharmacy service consumes the event, validates the order, publishes
MedicationDispense.prepared. - Dispensing cabinet consumes the event, releases the medication, publishes
MedicationAdministration.completed. - Billing service consumes the administration event, creates the charge.
If any step fails, a compensating event is published. For example, if the pharmacy detects a drug interaction, it publishes MedicationRequest.rejected, which triggers the EHR to cancel the order and notify the prescriber. Each service is responsible only for its own step, and the saga coordinator (often implemented as a separate Kafka Streams application) monitors the overall workflow.
Migration Strategy: Polling to Event-Driven
Moving from polling-based integrations to event-driven architecture does not require a big-bang migration. A practical approach uses the strangler fig pattern: run both polling and event-driven consumers in parallel, gradually shifting consumers to the event-driven path as confidence builds.
- Phase 1 (Shadow mode): Deploy Kafka and event producers alongside existing polling integrations. Both systems run simultaneously, with the event-driven path in shadow mode. Compare results to verify correctness.
- Phase 2 (Incremental migration): Migrate one consumer at a time to the event-driven path. Start with non-critical consumers like analytics pipelines, then move to dashboards, and finally to clinical alert systems.
- Phase 3 (Decommission polling): Once all consumers are verified on the event-driven path, disable polling integrations. Retain the polling code as a fallback for disaster recovery.
This phased approach reduces risk and allows the team to build operational expertise with Kafka before it becomes the primary clinical data delivery mechanism. Organizations that have adopted both HL7 and FHIR standards can apply the same strangler pattern to migrate HL7 v2 ADT feeds to FHIR-based event streams.
Frequently Asked Questions
Why use Kafka instead of direct webhooks for FHIR events?
Direct webhooks fail when the consumer is temporarily unavailable—the event is lost. Kafka provides durable storage, replay capability, and the ability for multiple independent consumers to process the same events. For clinical systems where event loss can affect patient safety, Kafka's durability guarantees are essential.
What is the difference between FHIR Subscriptions and CDC?
FHIR Subscriptions are application-level—the FHIR server detects changes and notifies subscribers. CDC is database-level—it reads the transaction log and captures every change regardless of how it entered the database. Use FHIR Subscriptions when the FHIR server supports R5 topics. Use CDC when you need to capture changes from non-FHIR sources or legacy databases.
How do you handle event ordering for the same patient?
Partition Kafka topics by patient ID. All events for the same patient flow through the same partition, and Kafka guarantees ordering within a partition. This ensures that clinical events for a given patient are processed in the correct chronological sequence.
What retention period should clinical events have?
Operational events (dashboards, alerts) need 7-30 days of retention. Analytics events need 90 days to 1 year. Audit events should be retained for 7 years to meet HIPAA requirements. Use tiered storage—hot storage on Kafka brokers for recent events, cold storage on S3 for long-term audit retention.
Can this architecture work with existing HL7 v2 interfaces?
Yes. HL7 v2 messages can be published to Kafka topics using a Mirth Connect channel or a custom adapter. The event consumer normalizes HL7 v2 messages into FHIR resources before processing. This hybrid approach allows gradual migration from HL7 v2 to FHIR without disrupting existing integrations.
How do you handle schema evolution for FHIR events?
Use a schema registry with backward compatibility mode. When the FHIR event schema changes (e.g., adding new fields to the event envelope), the registry validates that existing consumers can still parse the new format. FHIR resources themselves are forward-compatible by design—consumers should ignore unknown elements rather than failing.
What happens when a Kafka broker goes down?
Kafka replicates data across multiple brokers. With a replication factor of 3 (standard for production), the cluster tolerates the loss of any two brokers without data loss. Producers and consumers automatically failover to remaining brokers. For healthcare deployments, run Kafka across three availability zones to protect against infrastructure failures.




