Streaming Healthcare Data with Kafka and FHIR: Real-Time Pipelines for ADT, Labs, and Clinical Alerts

Q: How does Kafka compare to a healthcare integration engine like Mirth Connect?

They serve different purposes and work well together. Mirth Connect excels at protocol translation (HL7v2 to FHIR, CSV to HL7v2) and message transformation. Kafka excels at high-throughput event distribution, replay, and stream processing. The recommended architecture: Mirth Connect as the protocol translator on the edges, Kafka as the central event backbone. Mirth produces to Kafka topics after translating messages; downstream consumers read from Kafk

Q: What Kafka cluster size do I need for a hospital?

A 500-bed hospital generating ~5,000 events per minute needs a 3-broker cluster minimum (for replication factor 3). Each broker should have 32GB RAM, 8+ cores, and NVMe SSDs. For health systems with multiple hospitals, scale to 5-7 brokers. The primary bottleneck is usually disk I/O for retention, not CPU or network. Budget $15-25K/year for a self-managed cluster or $30-50K/year for Confluent Clou

Q: Is Kafka HIPAA-compliant?

Kafka itself is infrastructure software, so HIPAA compliance depends on your deployment. Requirements: encrypt data in transit (TLS), encrypt data at rest (disk encryption), implement ACLs for topic access, enable audit logging, and sign a BAA with your infrastructure provider. Confluent Cloud offers HIPAA-eligible deployments. Self-managed deployments must configure encryption, authentication (SASL/SCRAM or mTLS), and authorization (Kafka ACLs) manuall

Q: How do I handle message ordering across multiple topics?

Kafka guarantees ordering within a partition, not across topics. If you need a patient admission to be processed before their medication order, and they are on different topics, use Kafka Streams with a windowed join: buffer events from both topics in a time window, then process them in the correct order. Alternatively, use a single topic with event type headers and partition by patient I

Q: What happens if Kafka goes down?

With replication factor 3 and min.insync.replicas=2, Kafka tolerates one broker failure with zero data loss. Producers fail over to available brokers automatically. Consumers rebalance and resume from committed offsets. For multi-datacenter resilience, use Kafka MirrorMaker 2 to replicate across data centers. The key metric: monitor under-replicated partitions and alert immediately when any appea

Q: Should I use Avro or Protobuf for healthcare event schemas?

Both work well. Avro has better Schema Registry integration and is the Confluent ecosystem default. Protobuf has stronger cross-language support and more explicit typing. For healthcare teams primarily using Java/Python with the Confluent platform, choose Avro. For teams with Go/Rust services or gRPC APIs, choose Protobuf. The critical point is to use a schema at all; untyped JSON events are the source of most healthcare integration failures.

April 6, 2026

15 min read

Data EngineeringInteroperabilityArchitecture

Every hospital generates thousands of clinical events per minute: patient admissions, lab results, medication administrations, vital sign readings, and discharge notifications. Traditional batch processing handles these events with hourly or daily ETL jobs, meaning a critical lab result might sit in a queue for hours before triggering an alert. In an industry where delayed responses cost lives, this latency is unacceptable.

Apache Kafka provides the event streaming backbone that healthcare organizations need: real-time delivery of clinical events with exactly-once guarantees, millisecond latency, and the ability to replay events for debugging or reprocessing. Combined with FHIR as the data standard, Kafka enables a unified streaming architecture where every system, from the EHR to the clinical dashboard to the AI agent, speaks the same language in real time.

Kafka as Healthcare Event Backbone - Real-time Clinical Event Streaming

Why Kafka for Healthcare Event Streaming

Healthcare integration has traditionally relied on point-to-point HL7v2 interfaces. A 500-bed hospital typically maintains 200-400 individual interfaces, each a brittle connection between two systems. When one system changes, every connected interface must be updated. According to HIMSS research, hospitals spend 30-40% of their IT budget maintaining these interfaces.

Kafka replaces this point-to-point spaghetti with a centralized event bus. Producers publish events to topics, consumers subscribe to the topics they need, and neither side needs to know about the other. Adding a new consumer (a clinical dashboard, an analytics pipeline, an AI agent) requires zero changes to existing systems.

Kafka vs Traditional Healthcare Messaging

Capability	HL7v2 Point-to-Point	Mirth Connect Hub	Apache Kafka
Delivery guarantee	Best-effort TCP	At-least-once (with DLQ)	Exactly-once semantics
Latency	~100ms	~500ms (transform overhead)	<10ms (producer to consumer)
Replay capability	None	Channel history (limited)	Full replay from any offset
Throughput	~1K msgs/sec per interface	~10K msgs/sec	~1M msgs/sec per cluster
Adding new consumer	New interface required	New channel + destination	New consumer group (zero changes)
Schema enforcement	HL7v2 spec (loosely followed)	Custom validation filters	Schema Registry (strict)

For organizations already using Mirth Connect for HL7v2 integration, Kafka complements rather than replaces it. Mirth handles protocol translation (HL7v2 to FHIR), Kafka handles event distribution. See our guide on building a FHIR facade over legacy HL7v2 systems for the pattern.

Healthcare Kafka Topic Design - Namespace Hierarchy and Partitioning

Topic Design for Healthcare Events

Topic design is the most consequential architectural decision in a healthcare Kafka deployment. Get it wrong, and you end up with ordering problems, consumer bottlenecks, and compliance headaches. The key principles: partition by patient ID for ordering guarantees, separate topics by event domain for independent scaling, and set retention based on regulatory requirements.

Partitioning Strategy: Why Patient ID

In healthcare, event ordering matters at the patient level. An admit event must be processed before a discharge event for the same patient. A lab order must be processed before its result. By using patient_id as the partition key, Kafka guarantees that all events for a single patient are processed in order by the same consumer instance.

from confluent_kafka import Producer
import json
import hashlib

def create_healthcare_producer():
    """Create Kafka producer with healthcare-grade settings."""
    return Producer({
        "bootstrap.servers": "kafka-1:9092,kafka-2:9092,kafka-3:9092",
        # Exactly-once delivery
        "enable.idempotence": True,
        "acks": "all",
        "max.in.flight.requests.per.connection": 5,
        # Transactional producer for exactly-once
        "transactional.id": "clinical-producer-001",
        # Compression for FHIR JSON payloads
        "compression.type": "lz4",
        # Retry settings
        "retries": 2147483647,
        "retry.backoff.ms": 100,
        "delivery.timeout.ms": 120000,
    })

def publish_fhir_event(producer, resource_type, resource, event_type="create"):
    """Publish a FHIR resource event to the appropriate topic."""
    topic_map = {
        "Patient": "healthcare.adt.events",
        "Encounter": "healthcare.adt.events",
        "Observation": "healthcare.labs.results",
        "MedicationRequest": "healthcare.medications.orders",
        "DiagnosticReport": "healthcare.labs.results",
    }

    topic = topic_map.get(resource_type, "healthcare.fhir.other")
    patient_id = extract_patient_id(resource)

    event = {
        "event_type": event_type,
        "resource_type": resource_type,
        "resource_id": resource.get("id"),
        "patient_id": patient_id,
        "timestamp": resource.get("meta", {}).get("lastUpdated"),
        "fhir_resource": resource
    }

    # Partition by patient_id for per-patient ordering
    producer.produce(
        topic=topic,
        key=patient_id.encode("utf-8") if patient_id else None,
        value=json.dumps(event).encode("utf-8"),
        callback=delivery_callback
    )
    producer.flush()

def extract_patient_id(resource):
    """Extract patient reference from any FHIR resource."""
    if resource.get("resourceType") == "Patient":
        return resource.get("id")
    subject = resource.get("subject", {})
    ref = subject.get("reference", "")
    if ref.startswith("Patient/"):
        return ref.replace("Patient/", "")
    return None

def delivery_callback(err, msg):
    if err:
        print(f"DELIVERY FAILED: {err}")
        # In production: alert + retry queue
    else:
        print(f"Delivered to {msg.topic()}[{msg.partition()}] @ {msg.offset()}")

FHIR Subscription to Kafka Pipeline - Event Bridge

FHIR Subscriptions as Kafka Event Sources

FHIR R4 Subscriptions provide a standard mechanism for EHR systems to notify external consumers when resources change. By connecting FHIR Subscriptions to Kafka producers, you get a standards-based CDC pipeline that works with any FHIR-compliant EHR. For deep coverage of FHIR Subscription architecture, see our guide on moving from polling to real-time with FHIR Subscriptions.

FHIR Subscription to Kafka Bridge

from flask import Flask, request, jsonify
from confluent_kafka import Producer
import json
import logging

app = Flask(__name__)
logger = logging.getLogger("fhir-kafka-bridge")

producer = create_healthcare_producer()

@app.route("/fhir-webhook", methods=["POST"])
def handle_fhir_subscription():
    """Receive FHIR Subscription notification, forward to Kafka."""
    bundle = request.get_json()

    if bundle.get("resourceType") != "Bundle":
        return jsonify({"error": "Expected Bundle"}), 400

    entries = bundle.get("entry", [])
    published = 0

    for entry in entries:
        resource = entry.get("resource", {})
        resource_type = resource.get("resourceType")

        if not resource_type:
            continue

        # Determine event type from Bundle.entry.request
        req = entry.get("request", {})
        method = req.get("method", "PUT")
        event_type = {
            "POST": "create",
            "PUT": "update",
            "DELETE": "delete"
        }.get(method, "update")

        publish_fhir_event(
            producer, resource_type, resource, event_type
        )
        published += 1

    logger.info(
        f"Published {published} events from "
        f"FHIR Subscription notification"
    )
    return jsonify({"published": published}), 200

if __name__ == "__main__":
    app.run(host="0.0.0.0", port=8080)

Kafka Connect FHIR Source Connector

For systems that do not support FHIR Subscriptions, Kafka Connect can poll FHIR APIs at scheduled intervals. This is less efficient than push-based subscriptions but works with any FHIR server:

{
  "name": "fhir-source-connector",
  "config": {
    "connector.class": "io.confluent.connect.http.HttpSourceConnector",
    "tasks.max": "3",
    "http.api.url": "https://ehr.hospital.org/fhir",
    "http.request.method": "GET",
    "http.request.headers": "Authorization: Bearer ${FHIR_TOKEN}",
    "http.topic.name.pattern": "healthcare.fhir.${resource_type}",
    "poll.interval.ms": "30000",
    "http.response.json.path": "$.entry[*].resource",
    "key.converter": "org.apache.kafka.connect.storage.StringConverter",
    "value.converter": "io.confluent.connect.avro.AvroConverter",
    "value.converter.schema.registry.url": "http://schema-registry:8081",
    "transforms": "extractPatientId",
    "transforms.extractPatientId.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
    "transforms.extractPatientId.field": "id"
  }
}

Kafka Streams for Clinical Alerting - Real-time Lab Value Processing

Kafka Streams for Real-Time Clinical Logic

Kafka Streams enables you to process clinical events in real time with exactly-once semantics, no separate cluster required. The most compelling healthcare use case: real-time alerting on critical lab values.

Critical Lab Value Alert Pipeline

import org.apache.kafka.streams.StreamsBuilder;
import org.apache.kafka.streams.KafkaStreams;
import org.apache.kafka.streams.kstream.*;
import com.fasterxml.jackson.databind.JsonNode;
import com.fasterxml.jackson.databind.ObjectMapper;

public class CriticalLabAlertProcessor {

    // Critical value thresholds (LOINC code -> range)
    private static final Map<String, double[]> CRITICAL_RANGES = Map.of(
        "2345-7",  new double[]{40, 500},   // Glucose mg/dL
        "2160-0",  new double[]{0.5, 10.0}, // Creatinine mg/dL
        "6298-4",  new double[]{3.5, 5.5},  // Potassium mEq/L
        "2951-2",  new double[]{136, 145},   // Sodium mEq/L
        "718-7",   new double[]{7.0, 18.0}  // Hemoglobin g/dL
    );

    public static void main(String[] args) {
        StreamsBuilder builder = new StreamsBuilder();
        ObjectMapper mapper = new ObjectMapper();

        KStream<String, String> labResults =
            builder.stream("healthcare.labs.results");

        // Branch into critical and normal streams
        KStream<String, String>[] branches = labResults
            .filter((key, value) -> {
                JsonNode event = mapper.readTree(value);
                return "final".equals(
                    event.at("/fhir_resource/status").asText()
                );
            })
            .branch(
                (key, value) -> isCriticalValue(
                    mapper.readTree(value)
                ),
                (key, value) -> true  // default: normal
            );

        // Critical path: enrich and alert
        branches[0]
            .mapValues(value -> enrichWithPatientContext(
                mapper.readTree(value)
            ))
            .to("healthcare.alerts.critical");

        // Normal path: send to processed topic
        branches[1].to("healthcare.labs.processed");

        KafkaStreams streams = new KafkaStreams(
            builder.build(), getStreamsConfig()
        );
        streams.start();
    }

    private static boolean isCriticalValue(JsonNode event) {
        String loincCode = event
            .at("/fhir_resource/code/coding/0/code")
            .asText();
        double value = event
            .at("/fhir_resource/valueQuantity/value")
            .asDouble();

        double[] range = CRITICAL_RANGES.get(loincCode);
        if (range == null) return false;
        return value < range[0] || value > range[1];
    }
}

This pipeline processes every lab result in under 100 milliseconds. A critical potassium level triggers an alert on the nurse station dashboard before the physician has finished reviewing the order in the EHR. For organizations building clinical AI agents on top of these streams, see our guide on building healthcare AI agent data pipelines.

Schema Registry for FHIR Events - Avro and Protobuf

Schema Registry for FHIR Event Schemas

Without schema enforcement, producers can send malformed events that break consumers. The Confluent Schema Registry enforces contracts between producers and consumers using Avro, Protobuf, or JSON Schema. For healthcare, this means a producer cannot send a lab result without a patient reference, a LOINC code, and a value, because the schema rejects it at write time.

Avro Schema for FHIR Observation Events

{
  "type": "record",
  "name": "FhirObservationEvent",
  "namespace": "io.nirmitee.healthcare.events",
  "fields": [
    {"name": "event_id", "type": "string"},
    {"name": "event_type", "type": {
      "type": "enum",
      "name": "EventType",
      "symbols": ["CREATE", "UPDATE", "DELETE"]
    }},
    {"name": "timestamp", "type": "long",
     "logicalType": "timestamp-millis"},
    {"name": "patient_id", "type": "string"},
    {"name": "encounter_id", "type": ["null", "string"],
     "default": null},
    {"name": "observation", "type": {
      "type": "record",
      "name": "Observation",
      "fields": [
        {"name": "id", "type": "string"},
        {"name": "status", "type": "string"},
        {"name": "code_system", "type": "string"},
        {"name": "code", "type": "string"},
        {"name": "code_display", "type": "string"},
        {"name": "value", "type": ["null", "double"],
         "default": null},
        {"name": "value_string", "type": ["null", "string"],
         "default": null},
        {"name": "unit", "type": ["null", "string"],
         "default": null},
        {"name": "effective_datetime", "type": "string"},
        {"name": "reference_range_low",
         "type": ["null", "double"], "default": null},
        {"name": "reference_range_high",
         "type": ["null", "double"], "default": null}
      ]
    }}
  ]
}

Schema Compatibility Rules for Healthcare

Compatibility	Allowed Changes	Healthcare Use Case
BACKWARD	Add optional fields, remove fields with defaults	Adding new FHIR extensions (safe for existing consumers)
FORWARD	Remove optional fields, add fields with defaults	Deprecated fields (safe for existing producers)
FULL	Add/remove optional fields with defaults only	Recommended for clinical events (maximum safety)
NONE	Any change	Development only (never for production clinical data)

Consumer Patterns for Healthcare Kafka - Dashboard, Analytics, AI, Compliance

Consumer Patterns for Healthcare Data

Different consumers have different requirements. A clinical dashboard needs sub-second latency. An analytics pipeline can tolerate a lag of minutes. An AI agent needs an enriched context. A compliance system needs guaranteed delivery. Kafka consumer groups handle all of these patterns simultaneously.

Clinical Dashboard Consumer (Real-Time)

from confluent_kafka import Consumer
import json
import asyncio
import websockets

class ClinicalDashboardConsumer:
    def __init__(self):
        self.consumer = Consumer({
            "bootstrap.servers": "kafka:9092",
            "group.id": "clinical-dashboard-v1",
            "auto.offset.reset": "latest",
            "enable.auto.commit": False,
            "max.poll.interval.ms": 30000,
            "fetch.min.bytes": 1,
            "fetch.wait.max.ms": 100,
        })
        self.consumer.subscribe([
            "healthcare.adt.events",
            "healthcare.labs.results",
            "healthcare.alerts.critical"
        ])
        self.websocket_clients = set()

    async def stream_to_dashboard(self):
        """Stream clinical events to dashboard via WebSocket."""
        while True:
            msg = self.consumer.poll(timeout=0.1)
            if msg is None:
                continue
            if msg.error():
                continue

            event = json.loads(msg.value())
            dashboard_event = self.transform_for_dashboard(event)

            # Push to all connected dashboard clients
            if self.websocket_clients:
                await asyncio.gather(*[
                    ws.send(json.dumps(dashboard_event))
                    for ws in self.websocket_clients
                ])

            self.consumer.commit(asynchronous=False)

    def transform_for_dashboard(self, event):
        """Transform FHIR event to dashboard-friendly format."""
        return {
            "type": event["event_type"],
            "resource": event["resource_type"],
            "patient_id": event["patient_id"],
            "timestamp": event["timestamp"],
            "summary": self.build_summary(event),
            "priority": "critical"
                if event.get("is_critical") else "normal"
        }

Analytics Pipeline Consumer (Micro-Batch)

class AnalyticsPipelineConsumer:
    """Micro-batch consumer that writes to Delta Lake."""

    def __init__(self, batch_size=1000, flush_interval_sec=60):
        self.consumer = Consumer({
            "bootstrap.servers": "kafka:9092",
            "group.id": "analytics-pipeline-v1",
            "auto.offset.reset": "earliest",
            "enable.auto.commit": False,
            "max.poll.records": batch_size,
            "isolation.level": "read_committed",
        })
        self.consumer.subscribe([
            "healthcare.adt.events",
            "healthcare.labs.results",
            "healthcare.medications.orders"
        ])
        self.buffer = []
        self.batch_size = batch_size

    def consume_and_load(self):
        """Consume events in micro-batches, load to Delta Lake."""
        while True:
            msg = self.consumer.poll(timeout=1.0)
            if msg and not msg.error():
                self.buffer.append(
                    json.loads(msg.value())
                )

            if len(self.buffer) >= self.batch_size:
                self.flush_to_delta_lake()

    def flush_to_delta_lake(self):
        """Write buffered events to Delta Lake bronze layer."""
        if not self.buffer:
            return

        df = spark.createDataFrame(self.buffer)
        df.write.format("delta") \
            .mode("append") \
            .option("mergeSchema", "true") \
            .save("/data/healthcare/bronze/kafka_events")

        self.consumer.commit(asynchronous=False)
        self.buffer.clear()

For the Delta Lake side of this pipeline, see our detailed guide on Delta Lake for healthcare with ACID transactions and time travel.

Exactly-Once Delivery Guarantees for Clinical Events

Exactly-Once Delivery for Clinical Events

For clinical data, "at-least-once" delivery means a medication order might be recorded twice. "At-most-once" means it might be lost entirely. Neither is acceptable. Kafka exactly-once semantics (EOS) guarantee that each clinical event is processed exactly one time, even in the face of producer retries, consumer crashes, or broker failures.

Configuring Exactly-Once End-to-End

# Producer: transactional exactly-once
producer_config = {
    "bootstrap.servers": "kafka:9092",
    "enable.idempotence": True,
    "acks": "all",
    "transactional.id": "clinical-ehr-producer-001",
    "max.in.flight.requests.per.connection": 5,
}

producer = Producer(producer_config)
producer.init_transactions()

# Wrap clinical writes in a transaction
try:
    producer.begin_transaction()

    # Publish patient admission and associated resources
    # atomically — all succeed or all fail
    producer.produce("healthcare.adt.events",
        key=patient_id, value=admit_event)
    producer.produce("healthcare.adt.events",
        key=patient_id, value=encounter_event)
    producer.produce("healthcare.medications.orders",
        key=patient_id, value=med_order_event)

    producer.commit_transaction()
except Exception as e:
    producer.abort_transaction()
    raise

# Consumer: read_committed isolation
consumer_config = {
    "bootstrap.servers": "kafka:9092",
    "group.id": "clinical-consumer-v1",
    "isolation.level": "read_committed",
    "enable.auto.commit": False,
}

Exactly-Once Checklist for Healthcare Kafka

Component	Setting	Why It Matters
Producer	`enable.idempotence=true`	Prevents duplicate messages from retries
Producer	`acks=all`	Ensures all replicas acknowledge before confirming
Producer	`transactional.id` set	Enables atomic multi-topic writes
Broker	`min.insync.replicas=2`	Prevents data loss if one broker fails
Consumer	`isolation.level=read_committed`	Only reads committed transaction results
Consumer	`enable.auto.commit=false`	Manual commit after processing confirmed

Complete Healthcare Streaming Architecture - End to End

Production Deployment Considerations

Monitoring Healthcare Kafka Clusters

# Prometheus alerting rules for healthcare Kafka
groups:
  - name: healthcare-kafka-alerts
    rules:
      # Alert if consumer lag exceeds 5 minutes
      - alert: ClinicalConsumerLagHigh
        expr: |
          kafka_consumer_group_lag_sum{
            group=~"clinical-.*"
          } > 30000
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Clinical consumer lag > 5 min"
          description: "Consumer group {{ $labels.group }}
            has {{ $value }} messages lag"

      # Alert if under-replicated partitions
      - alert: HealthcareUnderReplicated
        expr: |
          kafka_server_replica_manager_under_replicated_partitions > 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Under-replicated partitions detected"

      # Alert if producer error rate spikes
      - alert: ClinicalProducerErrors
        expr: |
          rate(kafka_producer_record_error_total{
            client_id=~"clinical-.*"
          }[5m]) > 0.01
        for: 3m
        labels:
          severity: warning

For comprehensive monitoring strategies, see our guide on 7 metrics every healthcare integration team should track and our article on alerting for healthcare systems with PagerDuty runbooks.

Frequently Asked Questions

How does Kafka compare to a healthcare integration engine like Mirth Connect?

They serve different purposes and work well together. Mirth Connect excels at protocol translation (HL7v2 to FHIR, CSV to HL7v2) and message transformation. Kafka excels at high-throughput event distribution, replay, and stream processing. The recommended architecture: Mirth Connect as the protocol translator on the edges, Kafka as the central event backbone. Mirth produces to Kafka topics after translating messages; downstream consumers read from Kafka.

What Kafka cluster size do I need for a hospital?

A 500-bed hospital generating ~5,000 events per minute needs a 3-broker cluster minimum (for replication factor 3). Each broker should have 32GB RAM, 8+ cores, and NVMe SSDs. For health systems with multiple hospitals, scale to 5-7 brokers. The primary bottleneck is usually disk I/O for retention, not CPU or network. Budget $15-25K/year for a self-managed cluster or $30-50K/year for Confluent Cloud.

Is Kafka HIPAA-compliant?

Kafka itself is infrastructure software, so HIPAA compliance depends on your deployment. Requirements: encrypt data in transit (TLS), encrypt data at rest (disk encryption), implement ACLs for topic access, enable audit logging, and sign a BAA with your infrastructure provider. Confluent Cloud offers HIPAA-eligible deployments. Self-managed deployments must configure encryption, authentication (SASL/SCRAM or mTLS), and authorization (Kafka ACLs) manually.

How do I handle message ordering across multiple topics?

Kafka guarantees ordering within a partition, not across topics. If you need a patient admission to be processed before their medication order, and they are on different topics, use Kafka Streams with a windowed join: buffer events from both topics in a time window, then process them in the correct order. Alternatively, use a single topic with event type headers and partition by patient ID.

What happens if Kafka goes down?

With replication factor 3 and min.insync.replicas=2, Kafka tolerates one broker failure with zero data loss. Producers fail over to available brokers automatically. Consumers rebalance and resume from committed offsets. For multi-datacenter resilience, use Kafka MirrorMaker 2 to replicate across data centers. The key metric: monitor under-replicated partitions and alert immediately when any appear.

Should I use Avro or Protobuf for healthcare event schemas?