FHIR Server Performance Tuning: From 100 to 10,000 Queries/Second on HAPI FHIR

Q: How do I know if my FHIR server has a performance problem?

The clearest indicators are: p95 response time exceeding 500ms for search queries, p99 exceeding 2 seconds for any operation, or throughput plateauing below your expected clinical user load. A health system with 500 concurrent clinicians typically needs 2,000-5,000 QPS to maintain responsive clinical applications. If your CapabilityStatement endpoint takes more than 50ms, your server is under-optimized

Q: Should I use HAPI FHIR's built-in Elasticsearch integration for search?

Only if you have specific requirements for full-text clinical narrative search or complex aggregation queries. For standard FHIR search parameters (token, string, date, reference), properly indexed PostgreSQL outperforms Elasticsearch because it avoids the synchronization overhead between the relational store and the search index. Elasticsearch adds operational complexity (cluster management, index mapping maintenance) that is only justified for advanced search scenarios

Q: What is the impact of enabling resource validation on performance?

Full FHIR profile validation on every write operation adds 50-200ms per resource create/update, depending on profile complexity. For production write-heavy workloads, validate at the API gateway layer and disable per-resource validation in HAPI FHIR. For read-heavy workloads (the common case), write validation has minimal impact on overall throughput since writes are a small percentage of total operations

Q: How do I handle the trade-off between search_total_mode ACCURATE and ESTIMATED?

Use ESTIMATED as the default and expose an optional _total=accurate parameter for specific API consumers that genuinely need exact counts (e.g., reporting dashboards). Most clinical applications do not display total result counts -- they display paginated lists. For the rare case where an exact count is needed, the client can explicitly request it, accepting the performance cost

Q: What PostgreSQL version should I use for HAPI FHIR?

PostgreSQL 15 or 16. Version 15 introduced significant improvements to BRIN index performance and sort operations that directly benefit FHIR date-range queries. Version 16 added parallel query improvements for complex joins. Avoid PostgreSQL 13 or earlier -- the query planner improvements in 14+ are substantial for the FHIR query profile. From architecture to production, our Healthcare Software Product Development team builds healthcare platforms that perform at scale. We also offer specialized

April 28, 2026

15 min read

FHIRDevOpsPerformance

Your HAPI FHIR server handles 100 queries per second. Your clinical applications need 10,000. The gap between these numbers is not a hardware problem -- it is a configuration, indexing, and architecture problem that most healthcare engineering teams solve incorrectly.

We have tuned HAPI FHIR deployments across health systems processing millions of FHIR resources daily. The pattern is remarkably consistent: teams deploy HAPI with default settings, hit performance walls at 200-500 QPS, and assume they need bigger servers. They do not. They need smarter configuration.

This guide walks through every optimization layer -- from PostgreSQL indexing strategies that turn 2-second queries into 12-millisecond queries, to caching architectures that eliminate 85% of database hits, to horizontal scaling patterns that distribute load across read replicas. Each section includes specific HAPI FHIR configuration properties, PostgreSQL tuning parameters, and measurable before/after benchmarks.

FHIR Server Performance Tuning Roadmap: from database optimization through connection management, caching, and horizontal scaling to reach 10,000 QPS

Layer 1: PostgreSQL Indexing for FHIR Search Parameters

The single highest-impact optimization for any HAPI FHIR deployment is proper database indexing. Out of the box, HAPI FHIR creates basic indexes on its internal tables, but these indexes are designed for correctness, not performance. For production workloads with complex search queries, custom indexes are essential.

PostgreSQL index strategy for FHIR search parameters mapping search types to index types with performance impact numbers

Understanding HAPI FHIR's Storage Model

HAPI FHIR JPA stores resources in a normalized relational schema. Each FHIR resource type gets its own table (e.g., HFJ_RES_VER for versioned resources), and search parameters are extracted into dedicated index tables:

HFJ_SPIDX_STRING -- String search parameters (name, address, city)
HFJ_SPIDX_TOKEN -- Token search parameters (code, identifier, status)
HFJ_SPIDX_DATE -- Date search parameters (birthdate, period, authored)
HFJ_SPIDX_REFERENCE -- Reference search parameters (subject, encounter, patient)
HFJ_SPIDX_QUANTITY -- Quantity search parameters (value-quantity)
HFJ_SPIDX_URI -- URI search parameters (url, system)

The default indexes on these tables cover the primary key and basic lookups, but they miss the compound query patterns that clinical applications actually execute.

Critical Custom Indexes

These are the indexes that deliver the largest performance improvements based on real-world FHIR query patterns:

-- Patient search by name (most common clinical query)
-- Default: sequential scan on HFJ_SPIDX_STRING (~2,400ms for 10M rows)
-- With index: index scan (~12ms)
CREATE INDEX idx_spidx_string_name_hash
ON HFJ_SPIDX_STRING (HASH_NORM_PREFIX, SP_VALUE_NORMALIZED)
WHERE RES_TYPE = 'Patient' AND SP_NAME = 'name';

-- Observation lookup by patient + code + date (clinical chart review)
-- This composite index covers the three most common Observation search params
CREATE INDEX idx_spidx_token_obs_code
ON HFJ_SPIDX_TOKEN (RES_TYPE, HASH_SYS_AND_VALUE)
WHERE RES_TYPE = 'Observation';

-- Date range queries (encounter search, observation period)
-- BRIN index is ideal for date columns that correlate with insertion order
CREATE INDEX idx_spidx_date_range
ON HFJ_SPIDX_DATE USING BRIN (SP_VALUE_LOW, SP_VALUE_HIGH)
WHERE RES_TYPE IN ('Encounter', 'Observation', 'Condition');

-- Reference lookups (Patient compartment queries)
-- Most FHIR queries filter by patient reference
CREATE INDEX idx_spidx_ref_patient
ON HFJ_SPIDX_REFERENCE (TARGET_RESOURCE_ID, RES_TYPE)
WHERE SP_NAME = 'patient' OR SP_NAME = 'subject';

PostgreSQL Configuration for FHIR Workloads

Beyond custom indexes, PostgreSQL itself needs tuning for the FHIR query profile -- which is predominantly read-heavy with occasional batch writes:

# postgresql.conf optimizations for FHIR workloads

# Memory -- allocate 25% of RAM to shared_buffers
shared_buffers = 8GB              # For a 32GB server
effective_cache_size = 24GB       # 75% of total RAM
work_mem = 256MB                  # Per-query sort/hash memory
maintenance_work_mem = 2GB        # For VACUUM and CREATE INDEX

# Query planner -- favor index scans for FHIR's selective queries
random_page_cost = 1.1            # SSD storage (default 4.0 is for HDD)
effective_io_concurrency = 200    # SSD parallelism
default_statistics_target = 500   # Better cardinality estimates

# WAL -- optimize for write batches (resource creates/updates)
wal_buffers = 64MB
checkpoint_completion_target = 0.9
max_wal_size = 4GB

# Parallel queries -- leverage multi-core for large search results
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
parallel_tuple_cost = 0.01

The random_page_cost setting alone can improve query plan selection dramatically. PostgreSQL defaults to 4.0, which assumes spinning disks and biases the planner toward sequential scans. On SSD storage (which every production FHIR server should use), setting this to 1.1 tells the planner that random I/O is nearly as fast as sequential I/O, resulting in proper index utilization.

Layer 2: The _include/_revinclude Query Problem

After indexing, the most common performance issue we see in HAPI FHIR deployments is unbounded _include and _revinclude queries. These FHIR search modifiers are powerful but dangerous when used without proper constraints.

The _include/_revinclude N+1 problem: unbounded queries vs bounded pagination approach

The N+1 Resource Explosion

Consider a seemingly innocent query that a clinical application might execute:

GET /fhir/Patient?_id=patient-123
    &_revinclude=Observation:subject
    &_revinclude=Condition:subject
    &_revinclude=MedicationRequest:subject
    &_revinclude=Encounter:subject

For a patient with 5 years of clinical data, this single query might return:

1 Patient resource
15,000 Observation resources (labs, vitals, assessments)
200 Condition resources
800 MedicationRequest resources
300 Encounter resources

That is 16,301 resources in a single HTTP response -- a JSON payload exceeding 50MB. The database query takes 45 seconds, the server allocates gigabytes of heap memory to serialize the bundle, and the client application chokes trying to parse the response.

The Fix: Bounded Queries with Pagination

# Instead of unbounded _revinclude, use targeted queries with _count

# Step 1: Get the patient
GET /fhir/Patient/patient-123

# Step 2: Get recent observations with pagination
GET /fhir/Observation?subject=patient-123
    &_sort=-date
    &_count=50
    &_elements=code,value,effectiveDateTime,status

# Step 3: Get active conditions only
GET /fhir/Condition?subject=patient-123
    &clinical-status=active
    &_count=50

# Step 4: Get recent encounters
GET /fhir/Encounter?subject=patient-123
    &_sort=-date
    &_count=20

Each of these queries returns in under 50ms with predictable memory usage. The total wall-clock time for all four queries (executed in parallel) is under 100ms -- compared to 45 seconds for the single unbounded query.

HAPI FHIR Configuration for _include Safety

# application.yaml -- HAPI FHIR server configuration

hapi:
  fhir:
    # Limit maximum resources per page
    default_page_size: 20
    max_page_size: 200
    
    # Limit _include depth to prevent recursive explosion
    max_includes_per_page: 100
    
    # Set hard timeout for long-running queries
    search_total_mode: ESTIMATED    # Avoid COUNT(*) on large tables
    
    # Expire search results after 1 hour
    search_result_cache_duration_in_minutes: 60
    
    # Enable deferred search result loading
    defer_indexing_for_codesystems_of_size: 100

Layer 3: Connection Pooling with HikariCP

HAPI FHIR uses HikariCP as its default connection pool. The default configuration works for development but creates bottlenecks in production. The most common symptom is intermittent ConnectionTimeoutException under load, where the pool is exhausted and incoming requests queue up waiting for a database connection.

HikariCP connection pool configuration for HAPI FHIR showing optimal pool sizing and monitoring metrics

Optimal Pool Sizing

The HikariCP team has a well-known formula for connection pool sizing that counterintuitively argues for smaller pools:

optimal_pool_size = (core_count * 2) + effective_spindle_count

For a 4-core server with SSD storage (spindle count = 0):

optimal_pool_size = (4 * 2) + 0 = 8

This seems low, but PostgreSQL performs best with a limited number of connections. Each connection consumes approximately 10MB of RAM for work_mem and associated buffers. With 50 connections, that is 500MB dedicated to connection overhead alone -- memory that would be better used for shared_buffers and OS page cache.

Production HikariCP Configuration

# application.yaml -- HikariCP settings for HAPI FHIR

spring:
  datasource:
    hikari:
      # Pool sizing
      maximum-pool-size: 20         # 4 cores * 2 + headroom
      minimum-idle: 5               # Keep 5 warm connections
      
      # Timeouts
      connection-timeout: 30000     # 30s to acquire connection
      idle-timeout: 600000          # 10min idle before eviction
      max-lifetime: 1800000         # 30min max connection age
      
      # Leak detection
      leak-detection-threshold: 60000  # Log if connection held > 60s
      
      # Validation
      validation-timeout: 5000
      connection-test-query: SELECT 1
      
      # Metrics
      register-mbeans: true          # Expose JMX metrics
      pool-name: HapiFhirPool       # Named pool for monitoring

Monitoring Pool Health

The three metrics that matter for connection pool health:

Metric	Healthy Range	Alert Threshold	What It Means
Pool Wait Time (p95)	< 20ms	> 100ms	Time threads wait for a connection
Active Connections	< 80% of max	> 90% of max	Connections currently in use
Connection Creation Rate	< 1/min	> 10/min	New connections being created (churn)

If your pool wait time p95 exceeds 100ms, the solution is usually not to increase the pool size. Instead, investigate slow queries that are holding connections longer than necessary. A query that takes 5 seconds holds a connection for 5 seconds -- fixing the query is 100x more effective than adding more connections.

Layer 4: Caching Strategy

Caching is where FHIR server performance tuning delivers its most dramatic improvements. A properly configured cache eliminates 85% of database queries for typical clinical workloads, where the same patient data is accessed repeatedly by multiple applications within a short time window.

Three-tier HAPI FHIR caching architecture: L1 in-process cache, L2 Redis distributed cache, L3 PostgreSQL database

Three-Tier Cache Architecture

L1: In-Process Resource Cache (Caffeine)

HAPI FHIR supports an in-process cache for frequently accessed resources. This is the fastest layer -- sub-millisecond response times with no network overhead:

# application.yaml -- L1 in-process caching

hapi:
  fhir:
    # Resource cache
    resource_cache_enabled: true
    resource_cache_max_entries: 10000
    resource_cache_expire_after_write_seconds: 300    # 5 min TTL
    
    # Terminology cache (CodeSystem, ValueSet lookups)
    terminology_cache_enabled: true
    terminology_cache_max_entries: 5000

The terminology cache is particularly impactful. ValueSet expansion and CodeSystem lookup are expensive operations that most applications repeat frequently (e.g., validating observation codes against LOINC). Caching these eliminates repeated database queries for static reference data.

L2: Distributed Search Cache (Redis)

For multi-node deployments, a distributed cache ensures that search results computed on one node are available to all nodes:

# Redis configuration for FHIR search caching

# docker-compose.yml
services:
  redis:
    image: redis:7-alpine
    command: >
      redis-server
      --maxmemory 2gb
      --maxmemory-policy allkeys-lru
      --save ""
      --appendonly no
    ports:
      - "6379:6379"

# Spring Boot Redis cache configuration
spring:
  cache:
    type: redis
    redis:
      time-to-live: 900000       # 15 minute TTL
      cache-null-values: false
  redis:
    host: redis
    port: 6379
    timeout: 3000

L3: Database Query Optimization

Even with L1 and L2 caching, database queries for cache misses must be fast. This is where the PostgreSQL indexing from Layer 1 pays off. The combination of proper indexes and query plan optimization ensures that cache misses are served in 15-25ms rather than 200-500ms.

Cache Invalidation Strategy

Cache invalidation in healthcare is particularly sensitive. Stale data in a clinical context can have patient safety implications. The recommended approach:

Resource cache: Short TTL (5 minutes) with write-through invalidation. When a resource is updated, immediately evict it from cache.
Search cache: Medium TTL (15 minutes) with eventual consistency. Search results can tolerate slight staleness for most use cases.
Terminology cache: Long TTL (24 hours) or until JVM restart. CodeSystem and ValueSet data changes infrequently (quarterly at most for LOINC/SNOMED updates).
CapabilityStatement cache: Cache indefinitely until server restart. This resource describes the server's capabilities and only changes on deployment.

Layer 5: HAPI FHIR Server Configuration Tuning

Beyond the database and cache layers, HAPI FHIR itself has configuration properties that significantly affect performance. These are often overlooked because they require understanding HAPI's internal architecture.

Inline Resource Storage

By default, HAPI FHIR stores the full JSON/XML representation of each resource in the HFJ_RES_VER table alongside the extracted search index entries. This means every resource read requires joining the resource version table. Enabling inline resource storage keeps the resource body in the same row as the current version, eliminating a JOIN:

hapi:
  fhir:
    # Store resource body inline with current version
    inline_resource_storage_enabled: true
    
    # Use JSONB column for PostgreSQL (enables native JSON queries)
    resource_encoding: JSONB

This optimization reduces resource read latency by 20-30% for single resource fetches (GET /fhir/Patient/123).

Partition Mode

For multi-tenant FHIR servers or servers with distinct data domains, HAPI FHIR's partition mode can dramatically improve query performance by physically separating data:

hapi:
  fhir:
    partitioning:
      enabled: true
      partition_mode: PATIENT       # Partition by patient
      # Or use REQUEST_TENANT for multi-tenant deployments
      # partition_mode: REQUEST_TENANT
      
      # Cross-partition reference resolution
      cross_partition_reference_enabled: true
      
      # Default partition for shared resources (CodeSystem, ValueSet)
      default_partition_id: 0

Patient-based partitioning ensures that all resources for a given patient are co-located on the same database partition. When a clinical application queries for a patient's data, all the relevant indexes and data pages are in the same partition -- eliminating cross-partition scatter reads.

Search Result Caching

hapi:
  fhir:
    # Search result caching
    search_result_caching_enabled: true
    search_cache_duration_in_minutes: 60
    
    # Deferred search count (avoid COUNT(*) on initial search)
    search_total_mode: ESTIMATED
    
    # Prefetch size (number of results to load ahead)
    search_prefetch_thresholds:
      - 50     # Load 50 results immediately
      - 200    # Load up to 200 on first page request
      - -1     # Load remaining on subsequent page requests
    
    # Maximum search result cache entries
    search_cache_max_results: 50000

The search_total_mode: ESTIMATED setting is one of the highest-impact single-line changes you can make. By default, HAPI FHIR executes a COUNT(*) query to determine the total number of matching resources. On tables with millions of rows, this COUNT can take 5-10 seconds -- longer than the actual search. Setting it to ESTIMATED uses PostgreSQL's table statistics for an approximate count, returning instantly.

Layer 6: Horizontal Scaling with Read Replicas

When single-server optimization reaches its limits (typically around 3,000-5,000 QPS on well-tuned hardware), horizontal scaling with read replicas is the path to 10,000+ QPS.

Horizontal scaling architecture with load balancer, HAPI FHIR nodes, primary PostgreSQL for writes, and read replicas for read queries

Architecture Pattern

FHIR workloads are overwhelmingly read-heavy. In a typical clinical deployment, 90-95% of operations are reads (search, read, vread) and only 5-10% are writes (create, update). This makes read replicas extremely effective:

Primary database: Handles all write operations (resource create, update, delete)
Read replicas (2-4): Handle all search and read operations
Load balancer: Routes requests based on HTTP method (GET to replicas, POST/PUT/DELETE to primary)

PostgreSQL Streaming Replication Setup

# Primary server: postgresql.conf
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB
synchronous_commit = on           # For clinical data integrity
synchronous_standby_names = 'replica1'  # At least one sync replica

# Replica server: recovery.conf (PostgreSQL 12+: standby.signal)
primary_conninfo = 'host=primary-db port=5432 user=replicator password=xxx'
hot_standby = on
hot_standby_feedback = on         # Prevent query cancellation on replica

Spring Boot Multi-DataSource Configuration

# application.yaml -- Route reads to replicas

spring:
  datasource:
    primary:
      url: jdbc:postgresql://primary-db:5432/hapi
      username: hapi
      password: ${DB_PASSWORD}
      hikari:
        maximum-pool-size: 10      # Fewer connections for writes
    
    replica:
      url: jdbc:postgresql://replica-lb:5432/hapi
      username: hapi_readonly
      password: ${DB_READONLY_PASSWORD}
      hikari:
        maximum-pool-size: 30      # More connections for reads
        read-only: true

Load Balancer Configuration (HAProxy)

# haproxy.cfg -- Route FHIR operations by HTTP method

frontend fhir_frontend
    bind *:443 ssl crt /etc/ssl/fhir.pem
    
    # Route writes to primary HAPI nodes
    acl is_write method POST PUT DELETE PATCH
    use_backend hapi_write if is_write
    
    # Route reads to read-optimized HAPI nodes
    default_backend hapi_read

backend hapi_write
    balance roundrobin
    option httpchk GET /fhir/metadata
    server hapi-w1 10.0.1.10:8080 check
    server hapi-w2 10.0.1.11:8080 check backup

backend hapi_read
    balance leastconn
    option httpchk GET /fhir/metadata
    server hapi-r1 10.0.2.10:8080 check
    server hapi-r2 10.0.2.11:8080 check
    server hapi-r3 10.0.2.12:8080 check

Layer 7: Load Testing and Benchmarking

You cannot improve what you do not measure. Before and after each optimization, run structured load tests to quantify the impact. The three tools we recommend for FHIR-specific load testing:

Load test results comparing FHIR server performance before and after optimization across key query types and throughput stages

k6 Load Testing Script for FHIR

// k6-fhir-load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const searchLatency = new Trend('fhir_search_latency');
const readLatency = new Trend('fhir_read_latency');

export const options = {
  stages: [
    { duration: '2m',  target: 50  },   // Ramp to 50 VUs
    { duration: '5m',  target: 200 },   // Ramp to 200 VUs
    { duration: '10m', target: 500 },   // Sustained 500 VUs
    { duration: '5m',  target: 1000 },  // Peak at 1000 VUs
    { duration: '2m',  target: 0   },   // Ramp down
  ],
  thresholds: {
    'fhir_search_latency': ['p95<200', 'p99<500'],
    'fhir_read_latency': ['p95<50', 'p99<100'],
    'errors': ['rate<0.01'],
  },
};

const BASE_URL = __ENV.FHIR_URL || 'http://localhost:8080/fhir';
const PATIENT_IDS = ['patient-001', 'patient-002', 'patient-003'];

export default function () {
  const patientId = PATIENT_IDS[Math.floor(Math.random() * PATIENT_IDS.length)];
  
  // Scenario 1: Patient read (30% of traffic)
  if (Math.random() < 0.3) {
    const start = Date.now();
    const res = http.get(BASE_URL + '/Patient/' + patientId);
    readLatency.add(Date.now() - start);
    check(res, { 'patient read 200': (r) => r.status === 200 });
    errorRate.add(res.status !== 200);
  }
  
  // Scenario 2: Observation search (40% of traffic)
  else if (Math.random() < 0.7) {
    const start = Date.now();
    const res = http.get(BASE_URL + '/Observation?subject=' + patientId 
      + '&_sort=-date&_count=50');
    searchLatency.add(Date.now() - start);
    check(res, { 'obs search 200': (r) => r.status === 200 });
    errorRate.add(res.status !== 200);
  }
  
  // Scenario 3: Patient search by name (30% of traffic)
  else {
    const start = Date.now();
    const res = http.get(BASE_URL + '/Patient?name=Smith&_count=20');
    searchLatency.add(Date.now() - start);
    check(res, { 'patient search 200': (r) => r.status === 200 });
    errorRate.add(res.status !== 200);
  }
  
  sleep(0.1);
}

Gatling Simulation for Sustained Load

For long-running stability tests (30-60 minutes), Gatling provides better memory efficiency and more detailed reporting than k6:

// FhirLoadSimulation.scala
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import scala.concurrent.duration._

class FhirLoadSimulation extends Simulation {
  val httpProtocol = http
    .baseUrl("http://localhost:8080/fhir")
    .acceptHeader("application/fhir+json")
    .contentTypeHeader("application/fhir+json")

  val patientSearch = scenario("Patient Search")
    .exec(http("Search by name")
      .get("/Patient?name=Smith&_count=20")
      .check(status.is(200))
      .check(jsonPath("$.total").saveAs("total")))

  val observationSearch = scenario("Observation Search")  
    .exec(http("Search by patient")
      .get("/Observation?subject=patient-001&_sort=-date&_count=50")
      .check(status.is(200)))

  setUp(
    patientSearch.inject(
      rampUsersPerSec(10).to(500).during(5.minutes),
      constantUsersPerSec(500).during(20.minutes)
    ),
    observationSearch.inject(
      rampUsersPerSec(10).to(300).during(5.minutes),
      constantUsersPerSec(300).during(20.minutes)
    )
  ).protocols(httpProtocol)
   .assertions(
     global.responseTime.percentile3.lt(500),
     global.successfulRequests.percent.gt(99)
   )
}

Benchmarking Methodology

Follow this structured approach for each optimization round:

Baseline: Run the k6 test against the unoptimized server. Record p50, p95, p99 latencies and maximum throughput (QPS at <1% error rate).
Apply one change: Make a single optimization (e.g., add a custom index, change a pool size, enable a cache).
Re-test: Run the identical k6 test. Compare metrics.
Document: Record the change, the before/after metrics, and any side effects.
Repeat: Apply the next optimization layer.

This one-change-at-a-time approach ensures you understand exactly which optimization produced which improvement. Applying multiple changes simultaneously makes it impossible to attribute gains and can mask regressions.

Putting It All Together: The Performance Tuning Checklist

Here is the complete optimization checklist in priority order, with expected impact at each stage:

Stage	Optimization	Expected QPS Improvement	Effort
1	Custom PostgreSQL indexes for FHIR search params	100 → 800 QPS	4 hours
2	PostgreSQL tuning (shared_buffers, random_page_cost)	800 → 1,500 QPS	2 hours
3	Bound _include/_revinclude queries at application level	1,500 → 2,000 QPS	8 hours
4	HikariCP connection pool optimization	2,000 → 2,500 QPS	1 hour
5	Enable resource + terminology caching	2,500 → 5,000 QPS	2 hours
6	search_total_mode: ESTIMATED	5,000 → 6,000 QPS	5 minutes
7	Inline resource storage + JSONB encoding	6,000 → 7,000 QPS	1 hour
8	Read replicas (3x) with HAProxy routing	7,000 → 10,000+ QPS	1-2 days

The total effort to go from 100 to 10,000 QPS is approximately 3-4 days of focused engineering work. The cost is minimal -- mostly configuration changes and a few custom indexes. The alternative, vertical scaling with larger servers, would cost 10-20x more in infrastructure and still hit a ceiling around 3,000-5,000 QPS.

FAQ

How do I know if my FHIR server has a performance problem?

The clearest indicators are: p95 response time exceeding 500ms for search queries, p99 exceeding 2 seconds for any operation, or throughput plateauing below your expected clinical user load. A health system with 500 concurrent clinicians typically needs 2,000-5,000 QPS to maintain responsive clinical applications. If your CapabilityStatement endpoint takes more than 50ms, your server is under-optimized.

Should I use HAPI FHIR's built-in Elasticsearch integration for search?

Only if you have specific requirements for full-text clinical narrative search or complex aggregation queries. For standard FHIR search parameters (token, string, date, reference), properly indexed PostgreSQL outperforms Elasticsearch because it avoids the synchronization overhead between the relational store and the search index. Elasticsearch adds operational complexity (cluster management, index mapping maintenance) that is only justified for advanced search scenarios.

What is the impact of enabling resource validation on performance?

Full FHIR profile validation on every write operation adds 50-200ms per resource create/update, depending on profile complexity. For production write-heavy workloads, validate at the API gateway layer and disable per-resource validation in HAPI FHIR. For read-heavy workloads (the common case), write validation has minimal impact on overall throughput since writes are a small percentage of total operations.

How do I handle the trade-off between search_total_mode ACCURATE and ESTIMATED?

Use ESTIMATED as the default and expose an optional _total=accurate parameter for specific API consumers that genuinely need exact counts (e.g., reporting dashboards). Most clinical applications do not display total result counts -- they display paginated lists. For the rare case where an exact count is needed, the client can explicitly request it, accepting the performance cost.

What PostgreSQL version should I use for HAPI FHIR?

PostgreSQL 15 or 16. Version 15 introduced significant improvements to BRIN index performance and sort operations that directly benefit FHIR date-range queries. Version 16 added parallel query improvements for complex joins. Avoid PostgreSQL 13 or earlier -- the query planner improvements in 14+ are substantial for the FHIR query profile.

From architecture to production, our Healthcare Software Product Development team builds healthcare platforms that perform at scale. We also offer specialized Healthcare Interoperability Solutions services. Talk to our team to get started.

Was this article helpful?

Your feedback helps us improve our content.