Event-Driven EHR Architecture: Moving from Polling to Real-Time with FHIR Subscriptions and Kafka

Q: Why use Kafka instead of direct webhooks for FHIR events?

Direct webhooks fail when the consumer is temporarily unavailable—the event is lost. Kafka provides durable storage, replay capability, and the ability for multiple independent consumers to process the same events. For clinical systems where event loss can affect patient safety, Kafka's durability guarantees are essentia

Q: What is the difference between FHIR Subscriptions and CDC?

FHIR Subscriptions are application-level—the FHIR server detects changes and notifies subscribers. CDC is database-level—it reads the transaction log and captures every change regardless of how it entered the database. Use FHIR Subscriptions when the FHIR server supports R5 topics. Use CDC when you need to capture changes from non-FHIR sources or legacy database

Q: How do you handle event ordering for the same patient?

Partition Kafka topics by patient ID. All events for the same patient flow through the same partition, and Kafka guarantees ordering within a partition. This ensures that clinical events for a given patient are processed in the correct chronological sequenc

Q: What retention period should clinical events have?

Operational events (dashboards, alerts) need 7-30 days of retention. Analytics events need 90 days to 1 year. Audit events should be retained for 7 years to meet HIPAA requirements. Use tiered storage—hot storage on Kafka brokers for recent events, cold storage on S3 for long-term audit retentio

Q: Can this architecture work with existing HL7 v2 interfaces?

Yes. HL7 v2 messages can be published to Kafka topics using a Mirth Connect channel or a custom adapter. The event consumer normalizes HL7 v2 messages into FHIR resources before processing. This hybrid approach allows gradual migration from HL7 v2 to FHIR without disrupting existing integration

Q: How do you handle schema evolution for FHIR events?

Use a schema registry with backward compatibility mode. When the FHIR event schema changes (e.g., adding new fields to the event envelope), the registry validates that existing consumers can still parse the new format. FHIR resources themselves are forward-compatible by design—consumers should ignore unknown elements rather than failin

Q: What happens when a Kafka broker goes down?

Kafka replicates data across multiple brokers. With a replication factor of 3 (standard for production), the cluster tolerates the loss of any two brokers without data loss. Producers and consumers automatically failover to remaining brokers. For healthcare deployments, run Kafka across three availability zones to protect against infrastructure failures.

March 16, 2026

14 min read

Why Polling Is Killing Your EHR Integration

Most healthcare integrations today rely on polling—periodically querying a FHIR server or database for changes. A downstream system asks "anything new?" every 30 seconds, 5 minutes, or 15 minutes. This approach has three fundamental problems: it wastes resources when nothing has changed, it introduces latency proportional to the polling interval, and it does not scale when dozens of consumers all poll the same source.

Event-driven architecture solves all three. Instead of consumers asking for changes, the source system publishes events when data changes occur. Consumers subscribe to the events they care about and receive notifications in real time. For healthcare systems where a missed lab result or delayed medication alert can affect patient safety, the difference between 15-minute polling lag and sub-second event delivery is clinically significant.

This guide covers a production architecture that combines FHIR Subscriptions as the event source, Apache Kafka as the event backbone, and multiple consumer patterns for clinical dashboards, analytics pipelines, AI agents, and notification services. We include Kafka topic design, consumer group patterns, and exactly-once delivery semantics for clinical events.

Architecture Overview: FHIR Subscriptions + Kafka

The architecture has three layers: event sources that detect and publish changes, an event backbone that routes and stores events durably, and event consumers that process events for specific use cases.

Event Sources

Two primary mechanisms detect changes in healthcare data:

FHIR Subscriptions (R5 Topic-Based): The FHIR server itself detects resource changes and sends notifications to subscribers. FHIR R5 introduced topic-based subscriptions that replace the R4 channel-based model with a more scalable approach. Each SubscriptionTopic defines what triggers a notification (resource type, interaction, filter criteria), and each Subscription expresses a consumer's interest in specific topics.
Change Data Capture (CDC): For systems where the FHIR server does not support R5 subscriptions, CDC captures changes directly from the database. Tools like Debezium monitor the database transaction log and emit events for every insert, update, or delete—bypassing the FHIR API entirely.

Event Backbone: Apache Kafka

Kafka serves as the central nervous system of the event-driven architecture. Every clinical event flows through Kafka, which provides durable storage, ordered delivery, and the ability for multiple consumers to process the same event independently.

Kafka Feature	Healthcare Benefit
Durable storage	Clinical events are never lost, even if consumers are temporarily down
Ordered delivery	Events for the same patient arrive in the order they occurred
Consumer groups	Multiple services process the same events independently
Replay capability	Re-process historical events when deploying new analytics or fixing bugs
Partitioning	Scale horizontally by partitioning events by patient ID

Event Consumers

Downstream consumers subscribe to Kafka topics and process events for specific use cases:

Clinical dashboards: Real-time updates when new lab results, vitals, or orders arrive.
Notification services: Push alerts to clinicians when critical values are detected.
Analytics pipelines: Stream events to data warehouses for population health analytics.
AI/ML agents: Feed clinical events to inference services for sepsis prediction, readmission risk, and drug interaction alerts.
Audit logging: Record every data access and modification for compliance.

Kafka Topic Design for Healthcare

Topic design is one of the most important architectural decisions. In healthcare, the natural topic structure maps to FHIR resource types, with patient ID as the partition key.

Event Schema with FHIR Resources

{
  "eventId": "evt-2026-03-16-001",
  "eventType": "resource.created",
  "timestamp": "2026-03-16T14:30:00.123Z",
  "source": "ehr-fhir-server",
  "subject": "Patient/patient-456",
  "resource": {
    "resourceType": "Observation",
    "id": "lab-result-789",
    "status": "final",
    "category": [{
      "coding": [{
        "system": "http://terminology.hl7.org/CodeSystem/observation-category",
        "code": "laboratory"
      }]
    }],
    "code": {
      "coding": [{
        "system": "http://loinc.org",
        "code": "2345-7",
        "display": "Glucose [Mass/volume] in Serum or Plasma"
      }]
    },
    "valueQuantity": {
      "value": 250,
      "unit": "mg/dL",
      "system": "http://unitsofmeasure.org"
    },
    "referenceRange": [{
      "low": {"value": 70, "unit": "mg/dL"},
      "high": {"value": 100, "unit": "mg/dL"}
    }]
  }
}

The event envelope wraps the FHIR resource with metadata: a unique event ID for deduplication, the event type (created, updated, deleted), a timestamp, the source system, and the patient reference as the subject. This envelope enables Kafka consumers to filter and route events without parsing the FHIR resource payload.

FHIR R5 Subscription Configuration

FHIR R5 subscriptions use a topic-based model. First, you define a SubscriptionTopic that specifies what triggers notifications. Then, consumers create Subscription resources to express their interest in specific topics.

{
  "resourceType": "SubscriptionTopic",
  "id": "critical-lab-results",
  "url": "https://ehr.example.com/SubscriptionTopic/critical-lab-results",
  "status": "active",
  "title": "Critical Lab Results",
  "resourceTrigger": [{
    "resource": "Observation",
    "supportedInteraction": ["create", "update"],
    "queryCriteria": {
      "current": "category=laboratory&status=final"
    }
  }]
}

// Consumer subscription:
{
  "resourceType": "Subscription",
  "status": "active",
  "topic": "https://ehr.example.com/SubscriptionTopic/critical-lab-results",
  "reason": "Critical lab result notifications for ICU dashboard",
  "channelType": {
    "system": "http://terminology.hl7.org/CodeSystem/subscription-channel-type",
    "code": "rest-hook"
  },
  "endpoint": "https://kafka-bridge.example.com/fhir-events",
  "content": "full-resource",
  "filterBy": [{
    "filterParameter": "category",
    "value": "laboratory"
  }]
}

The subscription sends notifications to a Kafka bridge service that receives the REST webhook and publishes the event to the appropriate Kafka topic. This bridge decouples the FHIR server from Kafka, allowing the FHIR server to use its native notification mechanism while events flow through Kafka for downstream processing. This architectural pattern builds on the FHIR interoperability standards that define the subscription specification.

Change Data Capture as Alternative Event Source

When the FHIR server does not support R5 subscriptions—or when you need to capture changes from legacy systems that lack FHIR APIs—Change Data Capture (CDC) provides an alternative event source. Tools like Debezium read the database transaction log and emit events for every data change.

# Debezium CDC connector configuration for PostgreSQL
{
  "name": "ehr-fhir-cdc",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "fhir-db.internal",
    "database.port": "5432",
    "database.user": "cdc_reader",
    "database.dbname": "fhir_store",
    "database.server.name": "ehr-fhir",
    "table.include.list": "fhir.observation,fhir.condition,fhir.encounter,fhir.medicationrequest",
    "plugin.name": "pgoutput",
    "transforms": "route",
    "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
    "transforms.route.regex": "ehr-fhir.fhir.(.*)",
    "transforms.route.replacement": "fhir.events.$1"
  }
}

CDC captures every database change regardless of whether it came through the FHIR API, an HL7 v2 interface, or direct database writes. This makes it valuable for organizations using HL7 interface engines like Mirth Connect alongside FHIR APIs.

Exactly-Once Delivery for Clinical Events

In healthcare, duplicate event processing can have clinical consequences—a duplicate medication order event could trigger a double dose alert, or a duplicate admission event could create phantom encounters. Kafka's exactly-once semantics (EOS) prevent this.

Implementing Exactly-Once

Idempotent producers: Configure Kafka producers with enable.idempotence=true to prevent duplicate messages from retries.
Transactional consumers: Use Kafka transactions to atomically read, process, and commit offsets. If processing fails, the entire transaction is rolled back.
Deduplication at the consumer: Even with EOS, design consumers to be idempotent. Use the event ID as a deduplication key and check before processing.

Consumer Group Patterns for Healthcare

Kafka consumer groups enable multiple services to process the same events independently while ensuring each service processes each event exactly once.

Consumer Group	Kafka Topic	Purpose	Processing
clinical-dashboard	fhir.events.observation	Real-time vitals display	Low latency, last-value wins
alert-engine	fhir.events.observation	Critical value alerts	Rule evaluation, notification
analytics-pipeline	fhir.events.*	Data warehouse loading	Batch micro-batches, dedup
ai-inference	fhir.events.observation, encounter	Sepsis prediction, risk scoring	ML model inference, real-time
audit-service	fhir.events.*	Compliance audit trail	Append-only, long retention

Each consumer group maintains its own offset position in each topic partition. The clinical dashboard group and the alert engine group both read from fhir.events.observation, but they process events independently and at their own pace. If the analytics pipeline falls behind during a high-volume period, it does not affect the real-time dashboard or alerts.

Schema Registry for FHIR Events

A schema registry ensures that all event producers and consumers agree on the event format. For FHIR events, the schema should validate both the event envelope and the embedded FHIR resource.

# Schema Registry configuration
# Register FHIR event schema
curl -X POST https://schema-registry.internal/subjects/fhir.events.observation-value/versions \
  -H 'Content-Type: application/json' \
  -d '{"schema": "{...avro or json schema...}", "schemaType": "JSON"}'

Use JSON Schema rather than Avro for FHIR events—FHIR resources are natively JSON, and JSON Schema validation can leverage the existing FHIR StructureDefinitions. The schema registry enforces backward compatibility, preventing producers from breaking consumers when the event format evolves. This is especially important when choosing the right technology stack for healthcare event processing.

Monitoring and Observability

An event-driven healthcare system requires comprehensive monitoring. Unlike synchronous API calls where failures are immediately visible, event processing failures can be silent—a consumer that stops processing events may not trigger any errors until a clinician notices missing data on their dashboard hours later.

Key Metrics to Track

Metric	Threshold	Impact if Breached
Consumer lag (events behind)	< 100 events	Stale clinical data on dashboards
Event processing latency (p99)	< 2 seconds	Delayed critical alerts
Failed event rate	< 0.01%	Missing clinical events
Dead letter queue depth	0 (alert on any)	Unprocessable clinical data
Kafka broker disk usage	< 80%	Event loss risk

Every consumer should publish processing metrics to a monitoring system (Prometheus, Datadog, or CloudWatch). Set up alerting on consumer lag—if the alert engine consumer falls more than 100 events behind, pages should fire immediately. For AI-driven clinical decision support consumers, latency monitoring is critical because delayed inference results lose their clinical relevance.

Saga Patterns for Cross-System Consistency

Healthcare workflows often span multiple systems—a medication order involves the EHR, pharmacy system, medication dispensing cabinet, and billing system. The saga pattern coordinates these multi-system workflows through events rather than distributed transactions.

Example: Medication Order Saga

EHR publishes MedicationRequest.created event to Kafka.
Pharmacy service consumes the event, validates the order, publishes MedicationDispense.prepared.
Dispensing cabinet consumes the event, releases the medication, publishes MedicationAdministration.completed.
Billing service consumes the administration event, creates the charge.

If any step fails, a compensating event is published. For example, if the pharmacy detects a drug interaction, it publishes MedicationRequest.rejected, which triggers the EHR to cancel the order and notify the prescriber. Each service is responsible only for its own step, and the saga coordinator (often implemented as a separate Kafka Streams application) monitors the overall workflow.

Migration Strategy: Polling to Event-Driven

Moving from polling-based integrations to event-driven architecture does not require a big-bang migration. A practical approach uses the strangler fig pattern: run both polling and event-driven consumers in parallel, gradually shifting consumers to the event-driven path as confidence builds.

Phase 1 (Shadow mode): Deploy Kafka and event producers alongside existing polling integrations. Both systems run simultaneously, with the event-driven path in shadow mode. Compare results to verify correctness.
Phase 2 (Incremental migration): Migrate one consumer at a time to the event-driven path. Start with non-critical consumers like analytics pipelines, then move to dashboards, and finally to clinical alert systems.
Phase 3 (Decommission polling): Once all consumers are verified on the event-driven path, disable polling integrations. Retain the polling code as a fallback for disaster recovery.

This phased approach reduces risk and allows the team to build operational expertise with Kafka before it becomes the primary clinical data delivery mechanism. Organizations that have adopted both HL7 and FHIR standards can apply the same strangler pattern to migrate HL7 v2 ADT feeds to FHIR-based event streams.

Frequently Asked Questions

Why use Kafka instead of direct webhooks for FHIR events?

Direct webhooks fail when the consumer is temporarily unavailable—the event is lost. Kafka provides durable storage, replay capability, and the ability for multiple independent consumers to process the same events. For clinical systems where event loss can affect patient safety, Kafka's durability guarantees are essential.

What is the difference between FHIR Subscriptions and CDC?

FHIR Subscriptions are application-level—the FHIR server detects changes and notifies subscribers. CDC is database-level—it reads the transaction log and captures every change regardless of how it entered the database. Use FHIR Subscriptions when the FHIR server supports R5 topics. Use CDC when you need to capture changes from non-FHIR sources or legacy databases.

How do you handle event ordering for the same patient?

Partition Kafka topics by patient ID. All events for the same patient flow through the same partition, and Kafka guarantees ordering within a partition. This ensures that clinical events for a given patient are processed in the correct chronological sequence.

What retention period should clinical events have?

Operational events (dashboards, alerts) need 7-30 days of retention. Analytics events need 90 days to 1 year. Audit events should be retained for 7 years to meet HIPAA requirements. Use tiered storage—hot storage on Kafka brokers for recent events, cold storage on S3 for long-term audit retention.

Can this architecture work with existing HL7 v2 interfaces?

Yes. HL7 v2 messages can be published to Kafka topics using a Mirth Connect channel or a custom adapter. The event consumer normalizes HL7 v2 messages into FHIR resources before processing. This hybrid approach allows gradual migration from HL7 v2 to FHIR without disrupting existing integrations.

How do you handle schema evolution for FHIR events?

Use a schema registry with backward compatibility mode. When the FHIR event schema changes (e.g., adding new fields to the event envelope), the registry validates that existing consumers can still parse the new format. FHIR resources themselves are forward-compatible by design—consumers should ignore unknown elements rather than failing.

What happens when a Kafka broker goes down?