Nirmitee.io
Kubernetes Observability for Healthcare Workloads: Monitoring Pods, Services, and GPU Nodes

Kubernetes Observability for Healthcare Workloads: Monitoring Pods, Services, and GPU Nodes

May 3, 2026
14 min read
Healthcare

Why Healthcare Kubernetes Needs Different Observability

Kubernetes observability for a SaaS startup and Kubernetes observability for a healthcare platform are fundamentally different problems. When your pods are running FHIR servers that clinicians depend on for patient data, Mirth Connect channels processing lab orders, and GPU-accelerated ML models generating clinical decision support -- the stakes of missing a monitoring signal are clinical, not just commercial. A pod restart that goes unnoticed could mean lost HL7 messages. A GPU memory leak could cause an AI diagnostic model to return garbage predictions. A network policy misconfiguration could expose PHI across namespace boundaries.

This guide covers healthcare-specific Kubernetes monitoring patterns with production configurations for Prometheus, Grafana, DCGM, and OpenTelemetry. We focus on the three workload categories that define healthcare K8s deployments: FHIR servers, integration engines, and ML inference pods -- plus the infrastructure-level concerns (GPU monitoring, PHI namespace isolation, cross-service tracing) that are unique to healthcare.

FHIR Server Pod Monitoring

FHIR servers like HAPI FHIR are JVM-based applications with specific performance characteristics. Monitoring a FHIR server pod requires tracking both HTTP-level metrics (request latency, error rates, throughput) and JVM-level metrics (heap usage, garbage collection pauses, thread pool exhaustion). Here are the critical metrics and their clinical significance:

MetricWarning ThresholdCritical ThresholdClinical Impact
Request latency (p99)> 2 seconds> 5 secondsClinician workflow delays, timeout errors in EHR
Error rate (5xx)> 1%> 5%Failed patient lookups, broken clinical workflows
JVM heap usage> 80%> 90%Imminent OOM kill, pod restart, request failures
GC pause duration> 500ms> 2 secondsRequest timeouts during stop-the-world GC
Connection pool usage> 75%> 90%Database connection exhaustion, cascading failures
Pod restarts (1hr window)> 2> 5Recurring crashes, potential data loss in flight

Prometheus ServiceMonitor for HAPI FHIR

# servicemonitor-hapi-fhir.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: hapi-fhir-monitor
  namespace: phi-workloads
  labels:
    app: hapi-fhir
    team: platform
spec:
  selector:
    matchLabels:
      app: hapi-fhir
  endpoints:
    - port: metrics
      path: /actuator/prometheus
      interval: 15s
      scrapeTimeout: 10s
      metricRelabelings:
        # Keep only essential metrics to control cardinality
        - sourceLabels: [__name__]
          regex: "(http_server_requests_seconds.*|jvm_memory_used_bytes|jvm_gc_pause_seconds.*|hikaricp_connections.*|process_cpu_usage|jvm_threads_current)"
          action: keep
  namespaceSelector:
    matchNames:
      - phi-workloads

FHIR-Specific Prometheus Recording Rules

# prometheus-rules-fhir.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: fhir-server-rules
  namespace: monitoring
spec:
  groups:
    - name: fhir-server-performance
      interval: 30s
      rules:
        # FHIR request latency by resource type
        - record: fhir:request_duration:p99
          expr: |
            histogram_quantile(0.99,
              sum(rate(http_server_requests_seconds_bucket{
                namespace="phi-workloads",
                uri=~"/fhir/.*"
              }[5m])) by (le, uri)
            )

        # FHIR error rate by resource type
        - record: fhir:error_rate:5m
          expr: |
            sum(rate(http_server_requests_seconds_count{
              namespace="phi-workloads",
              status=~"5.."
            }[5m])) by (uri)
            /
            sum(rate(http_server_requests_seconds_count{
              namespace="phi-workloads"
            }[5m])) by (uri)

        # Connection pool saturation
        - record: fhir:db_pool_utilization
          expr: |
            hikaricp_connections_active{
              namespace="phi-workloads"
            }
            /
            hikaricp_connections_max{
              namespace="phi-workloads"
            }

    - name: fhir-server-alerts
      rules:
        - alert: FHIRServerHighLatency
          expr: fhir:request_duration:p99 > 2
          for: 5m
          labels:
            severity: warning
            team: platform
          annotations:
            summary: "FHIR server p99 latency > 2s"
            description: "{{ $labels.uri }} p99 latency is {{ $value }}s"

        - alert: FHIRServerHighErrorRate
          expr: fhir:error_rate:5m > 0.05
          for: 2m
          labels:
            severity: critical
            team: platform
          annotations:
            summary: "FHIR server error rate > 5%"

        - alert: FHIRServerOOMRisk
          expr: |
            jvm_memory_used_bytes{area="heap", namespace="phi-workloads"}
            / jvm_memory_max_bytes{area="heap", namespace="phi-workloads"}
            > 0.9
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "FHIR server JVM heap > 90% - OOM kill imminent"

Mirth Connect Pod Monitoring

Mirth Connect (NextGen Connect) is the most common healthcare integration engine. When running in Kubernetes, monitoring channel status, message throughput, and queue depth is critical for ensuring lab orders, results, and ADT messages flow without interruption:

# servicemonitor-mirth.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: mirth-connect-monitor
  namespace: phi-workloads
spec:
  selector:
    matchLabels:
      app: mirth-connect
  endpoints:
    - port: metrics
      path: /metrics
      interval: 30s
      # Mirth custom exporter endpoint
      # Use mirth-prometheus-exporter sidecar

Since Mirth does not expose Prometheus metrics natively, deploy a sidecar exporter that queries the Mirth REST API and exposes metrics in Prometheus format. Key metrics to expose include channel state (started/stopped/paused), messages sent/received per channel, queue depth per channel, and processing error counts.

GPU Monitoring with DCGM Exporter

Healthcare organizations deploying ML models for clinical decision support, radiology AI, or natural language processing on Kubernetes GPU nodes need specialized GPU monitoring. The NVIDIA Data Center GPU Manager (DCGM) exporter provides Prometheus metrics for GPU utilization, memory, temperature, power draw, and ECC errors.

DCGM Exporter DaemonSet

# dcgm-exporter-daemonset.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: dcgm-exporter
  namespace: monitoring
  labels:
    app: dcgm-exporter
spec:
  selector:
    matchLabels:
      app: dcgm-exporter
  template:
    metadata:
      labels:
        app: dcgm-exporter
    spec:
      nodeSelector:
        nvidia.com/gpu.present: "true"
      tolerations:
        - key: nvidia.com/gpu
          operator: Exists
          effect: NoSchedule
      containers:
        - name: dcgm-exporter
          image: nvcr.io/nvidia/k8s/dcgm-exporter:3.3.5-3.4.1-ubuntu22.04
          ports:
            - name: metrics
              containerPort: 9400
          env:
            - name: DCGM_EXPORTER_KUBERNETES
              value: "true"
            - name: DCGM_EXPORTER_LISTEN
              value: ":9400"
          securityContext:
            runAsNonRoot: false
            runAsUser: 0
            capabilities:
              add: ["SYS_ADMIN"]
          volumeMounts:
            - name: device-plugin
              mountPath: /var/lib/kubelet/device-plugins
      volumes:
        - name: device-plugin
          hostPath:
            path: /var/lib/kubelet/device-plugins
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: dcgm-exporter
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: dcgm-exporter
  endpoints:
    - port: metrics
      interval: 15s

GPU Alert Rules

# prometheus-rules-gpu.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: gpu-monitoring-rules
  namespace: monitoring
spec:
  groups:
    - name: gpu-health
      rules:
        - alert: GPUHighTemperature
          expr: DCGM_FI_DEV_GPU_TEMP > 80
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "GPU temperature > 80C on {{ $labels.node }}"
            description: "GPU {{ $labels.gpu }} temperature is {{ $value }}C"

        - alert: GPUMemoryExhaustion
          expr: |
            DCGM_FI_DEV_FB_USED
            / (DCGM_FI_DEV_FB_USED + DCGM_FI_DEV_FB_FREE)
            > 0.95
          for: 2m
          labels:
            severity: critical
          annotations:
            summary: "GPU VRAM > 95% on {{ $labels.node }}"

        - alert: GPUECCErrors
          expr: rate(DCGM_FI_DEV_ECC_DBE_VOL_TOTAL[1h]) > 0
          labels:
            severity: critical
          annotations:
            summary: "GPU ECC double-bit errors detected"
            description: "GPU {{ $labels.gpu }} on {{ $labels.node }} has uncorrectable memory errors - model predictions may be corrupted"

        - alert: GPUUnderutilized
          expr: DCGM_FI_DEV_GPU_UTIL < 5
          for: 30m
          labels:
            severity: info
          annotations:
            summary: "GPU underutilized - consider rightsizing"

The ECC error alert is particularly important for healthcare ML workloads. A GPU with uncorrectable memory errors can produce silently corrupted model outputs -- a radiology AI model that returns incorrect predictions due to faulty GPU memory is a patient safety issue, not just an infrastructure concern.

Cross-Service Request Tracing

Healthcare workflows often span multiple services: a clinician searches for a patient in the EHR, which hits the FHIR server, which queries the database, which may trigger an ML model for risk scoring. Distributed tracing with OpenTelemetry gives you visibility into these cross-service flows.

# otel-collector-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
  namespace: monitoring
data:
  config.yaml: |
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318

    processors:
      batch:
        timeout: 5s
        send_batch_size: 1024

      # Remove PHI from trace attributes
      attributes:
        actions:
          - key: patient.name
            action: delete
          - key: patient.mrn
            action: delete
          - key: patient.dob
            action: delete
          - key: http.request.body
            action: delete
          - key: http.response.body
            action: delete
          - key: db.statement
            action: hash  # Hash SQL queries to prevent PHI exposure

    exporters:
      otlp:
        endpoint: tempo.monitoring.svc:4317
        tls:
          insecure: false
          cert_file: /etc/otel/tls/client.crt
          key_file: /etc/otel/tls/client.key

    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: [attributes, batch]
          exporters: [otlp]

Notice the attributes processor that removes PHI from trace spans. This is essential -- without it, trace data stored in Tempo or Jaeger could contain patient names, MRNs, or query parameters with PHI. The db.statement attribute is hashed rather than deleted to preserve query pattern visibility for debugging while preventing PHI exposure in SQL parameters.

Namespace Isolation for PHI Workloads

HIPAA requires access controls on systems containing PHI. In Kubernetes, this translates to namespace isolation with NetworkPolicies that restrict which pods can communicate with PHI-handling services.

NetworkPolicy for PHI Namespace

# network-policy-phi-namespace.yaml
# Default deny all ingress to PHI namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: phi-workloads
spec:
  podSelector: {}
  policyTypes:
    - Ingress
---
# Allow specific ingress from API gateway
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-gateway
  namespace: phi-workloads
spec:
  podSelector:
    matchLabels:
      app: hapi-fhir
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: api-gateway
          podSelector:
            matchLabels:
              app: api-gateway
      ports:
        - protocol: TCP
          port: 8080
---
# Allow monitoring namespace to scrape metrics
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-prometheus-scrape
  namespace: phi-workloads
spec:
  podSelector:
    matchLabels:
      monitoring: enabled
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: monitoring
          podSelector:
            matchLabels:
              app: prometheus
      ports:
        - protocol: TCP
          port: 9090
        - protocol: TCP
          port: 8080  # actuator/metrics

The monitoring exception is important: Prometheus needs to scrape metrics from pods in the PHI namespace, but the NetworkPolicy should restrict this access to only the Prometheus server pods and only on the metrics port. This prevents a compromised pod in the monitoring namespace from accessing FHIR server API ports.

Grafana Dashboard for Healthcare Workloads

A well-designed Grafana dashboard gives your platform team a single pane of glass for all healthcare workloads. Here is a dashboard JSON model that combines FHIR server, Mirth, and GPU metrics into a unified view:

{
  "dashboard": {
    "title": "Healthcare K8s Workloads",
    "uid": "healthcare-k8s",
    "panels": [
      {
        "title": "FHIR Server - Request Latency (p99)",
        "type": "timeseries",
        "gridPos": {"h": 8, "w": 12, "x": 0, "y": 0},
        "targets": [{
          "expr": "fhir:request_duration:p99",
          "legendFormat": "{{ uri }}"
        }],
        "fieldConfig": {
          "defaults": {
            "unit": "s",
            "thresholds": {
              "steps": [
                {"value": 0, "color": "green"},
                {"value": 2, "color": "orange"},
                {"value": 5, "color": "red"}
              ]
            }
          }
        }
      },
      {
        "title": "GPU Utilization by Node",
        "type": "gauge",
        "gridPos": {"h": 8, "w": 12, "x": 12, "y": 0},
        "targets": [{
          "expr": "DCGM_FI_DEV_GPU_UTIL",
          "legendFormat": "{{ node }}/gpu{{ gpu }}"
        }],
        "fieldConfig": {
          "defaults": {
            "unit": "percent",
            "max": 100,
            "thresholds": {
              "steps": [
                {"value": 0, "color": "blue"},
                {"value": 70, "color": "green"},
                {"value": 90, "color": "orange"},
                {"value": 95, "color": "red"}
              ]
            }
          }
        }
      },
      {
        "title": "Pod Restart Count (24h)",
        "type": "stat",
        "gridPos": {"h": 4, "w": 6, "x": 0, "y": 8},
        "targets": [{
          "expr": "sum(increase(kube_pod_container_status_restarts_total{namespace=\"phi-workloads\"}[24h])) by (pod)"
        }]
      }
    ]
  }
}

Dependency Health Monitoring

Healthcare Kubernetes workloads have critical dependencies: PostgreSQL databases, Redis caches, message queues (RabbitMQ, Kafka), and external services (payer APIs, lab systems). Monitor each dependency as a first-class citizen:

DependencyKey MetricsAlert ConditionImpact if Unhealthy
PostgreSQLConnection count, replication lag, query latencyReplication lag > 30s, connections > 80% maxFHIR reads return stale data, writes fail
RedisMemory usage, eviction rate, hit ratioEviction rate > 0, hit ratio < 90%Session loss, cache miss storms
RabbitMQQueue depth, consumer count, message rateQueue depth > 1000, consumers = 0HL7 messages not processed
External APIsResponse time, error rate, circuit breaker stateError rate > 10%, circuit openPayer lookups fail, lab orders rejected

Frequently Asked Questions

Do I need a BAA with my monitoring vendor for Kubernetes observability?

If your monitoring system collects any data from pods handling PHI -- even just metrics and logs, not raw PHI -- you should have a BAA in place. Grafana Cloud, Datadog, and New Relic all offer HIPAA-eligible tiers. Self-hosted monitoring (Prometheus + Grafana on your own infrastructure) avoids the BAA requirement but increases operational burden. Make sure your log pipeline sanitizes PHI before it reaches any monitoring backend.

How do I monitor Kubernetes workloads without exposing PHI in metrics?

Design your metrics with PHI-free labels. Never use patient IDs, names, or MRNs as Prometheus label values -- this would create high-cardinality metrics and expose PHI in your monitoring system. Use aggregate metrics (e.g., "total patient searches per minute") rather than per-patient metrics. For traces, configure the OpenTelemetry collector to strip PHI attributes before exporting, as shown in the configuration above.

What is the recommended GPU monitoring stack for healthcare ML?

NVIDIA DCGM Exporter + Prometheus + Grafana is the standard stack. DCGM provides comprehensive GPU metrics including utilization, memory, temperature, power draw, ECC errors, and PCIe throughput. For healthcare-specific concerns, pay special attention to ECC errors (corrupted model outputs) and GPU memory usage (OOM kills that restart inference pods). If you are running clinical decision support models, consider adding model-level metrics (inference latency, prediction confidence scores) alongside infrastructure metrics.

How do NetworkPolicies affect Prometheus scraping?

NetworkPolicies in a PHI namespace will block Prometheus from scraping metrics unless you explicitly allow it. Create a NetworkPolicy that permits ingress from the monitoring namespace to the metrics port (usually 9090 or 8080 for actuator endpoints) on pods labeled for monitoring. This allows metric collection while maintaining the default-deny posture for all other traffic. Test your NetworkPolicies in a staging environment before deploying to production -- a misconfigured policy can silently break monitoring.

Should I use managed Kubernetes (EKS/GKE/AKS) or self-hosted for healthcare?

Managed Kubernetes is strongly recommended for healthcare workloads. AWS EKS, Google GKE, and Azure AKS all support HIPAA compliance with BAA coverage. The operational burden of self-hosting Kubernetes -- patching, upgrading, securing the control plane -- diverts resources from building healthcare features. Use managed K8s with a well-architected technology stack and focus your engineering effort on the application and observability layers described in this guide.

Conclusion

Healthcare Kubernetes observability requires going beyond standard pod and node metrics. You need FHIR-specific latency tracking, integration engine channel monitoring, GPU health metrics with ECC error detection, PHI-aware namespace isolation, and cross-service tracing that sanitizes sensitive data. The configurations in this guide provide a production-ready foundation -- deploy the ServiceMonitors, alert rules, and NetworkPolicies, then customize the thresholds and dashboards for your specific workload characteristics. The goal is not just knowing when something breaks, but understanding the clinical impact of every infrastructure event.

Frequently Asked Questions

Why does Kubernetes observability differ for healthcare workloads?

Because the stakes of a missed monitoring signal are clinical, not just commercial. Healthcare clusters run FHIR servers that clinicians depend on for patient data, Mirth Connect channels processing lab orders, and GPU-accelerated ML models powering clinical decision support. An unnoticed pod restart can mean lost HL7 messages, a GPU memory leak can make a diagnostic model return garbage predictions, and a network policy misconfiguration can expose PHI across namespace boundaries.

What metrics should you monitor on a FHIR server pod?

Track both HTTP-level and JVM-level metrics, each with clinical significance. Key thresholds: p99 request latency warns above 2 seconds and is critical above 5, where clinician workflows start timing out; 5xx error rate warns at 1% and is critical at 5%; JVM heap usage above 90% signals an imminent OOM kill; GC pauses above 2 seconds cause request timeouts; and more than 5 pod restarts in an hour indicates recurring crashes with potential in-flight data loss.

How do you monitor Mirth Connect in Kubernetes?

Mirth Connect does not expose Prometheus metrics natively, so the standard pattern is a sidecar exporter that queries the Mirth REST API and exposes metrics in Prometheus format. The key metrics are channel state (started, stopped, or paused), messages sent and received per channel, queue depth per channel, and processing error counts, which together tell you whether lab orders, results, and ADT messages are flowing without interruption.

How do you monitor GPU nodes running clinical ML models on Kubernetes?

Use the NVIDIA Data Center GPU Manager (DCGM) exporter, deployed as a DaemonSet on GPU nodes, which provides Prometheus metrics for GPU utilization, memory, temperature, power draw, and ECC errors. This matters for healthcare organizations running radiology AI, clinical decision support, or NLP inference on GPU nodes, where a memory leak or thermal issue can silently degrade model output quality before anyone notices.

How should a healthtech team set up Kubernetes monitoring for PHI workloads?

Start with Prometheus ServiceMonitors scoped to your PHI namespaces, recording rules for FHIR latency, error rate, and database pool saturation, and alerts wired to clinically meaningful thresholds like p99 latency over 2 seconds or heap above 90%. Add a Mirth sidecar exporter for integration channels and DCGM for GPU nodes. Nirmitee's healthcare engineering teams build these Prometheus and Grafana stacks for production healthcare clusters.