A large health system's data science team was spending 80% of their time cleaning data and 20% building models. Their sepsis prediction model had an AUROC of 0.72 — mediocre for a problem where every percentage point translates to lives saved. The bottleneck was not the model. It was the data. Raw HL7v2 messages from 6 source systems, inconsistent patient identifiers, unstandardized lab codes (local codes instead of LOINC), and duplicate records that inflated training data with near-identical examples.
We implemented a medallion architecture — Bronze, Silver, Gold layers — that transformed their raw clinical data into ML-ready datasets. After migration, the same sepsis model retrained on clean Gold-layer data achieved an AUROC of 0.91. The data scientists now spend 20% of their time on data prep and 80% on modeling. This case study covers the architecture, FHIR-native implementation, and the impact on AI model performance.
Why Medallion Architecture for Healthcare Data
The medallion architecture (popularized by Databricks) organizes data into three progressive quality layers. For healthcare, each layer addresses specific data challenges:
Bronze Layer: Raw Data Preservation
Every message, every file, every API response lands in Bronze exactly as received. HL7v2 ADT messages with their pipe-delimited segments. FHIR Bundles from API calls. X12 835/837 claims files. CSV exports from legacy systems. Nothing is transformed, nothing is dropped. The Bronze layer is your audit trail and your ability to reprocess data when Silver-layer logic changes.
Implementation: Delta Lake tables partitioned by source system and date. Each record includes the raw payload, source system identifier, ingestion timestamp, and a unique message ID. Storage cost is low ($23/TB/month on S3) — store everything.
Silver Layer: Standardized and Linked
The Silver layer is where the real engineering happens. Raw data from multiple sources is: deduplicated (same patient from 3 systems becomes one record), standardized (local lab codes mapped to LOINC, SNOMED CT, RxNorm), linked (patient matching across systems using probabilistic MPI), and structured into FHIR R4 resources.
The FHIR-native approach is critical for healthcare. Instead of inventing a custom data model, we normalize everything into FHIR Patient, Observation, Condition, MedicationRequest, and Encounter resources. This means downstream consumers (ML models, analytics dashboards, research queries) all speak the same language.
Gold Layer: Analytics and ML-Ready
Gold layer tables are purpose-built for specific use cases. A sepsis prediction feature table includes: vital sign trends (6-hour rolling averages), lab result trajectories (creatinine, lactate, WBC), medication administration history, and demographic risk factors — all pre-computed, pre-joined, and ready for model training with no additional data preparation required.
Impact on AI Model Performance
| Metric | Before (Raw Data) | After (Gold Layer) |
|---|---|---|
| Sepsis prediction AUROC | 0.72 | 0.91 |
| AKI prediction AUROC | 0.68 | 0.87 |
| Readmission risk AUROC | 0.71 | 0.82 |
| Data scientist time on data prep | 80% | 20% |
| Time to train new model | 6-8 weeks | 1-2 weeks |
| Duplicate patient records | 23% | 0.3% |
| Code standardization (LOINC/SNOMED) | 41% | 98% |
The AUROC improvement from 0.72 to 0.91 for sepsis prediction is not a model improvement — it is a data quality improvement. The same XGBoost model architecture, the same hyperparameters, the same feature set. The only change was feeding it clean, deduplicated, standardized data instead of raw, messy, inconsistent data.
At Nirmitee, we build healthcare data infrastructure with FHIR-native data pipelines and HL7-to-FHIR migration. If you are building ML-ready clinical data infrastructure, talk to our team.


