Migrating from a Custom EHR Schema to openEHR: The Database Migration Nobody Documents

April 13, 2026

14 min read

openEHRHealthcareMigration

The Migration Nobody Warns You About

You have built a custom EHR system. Maybe it started as a handful of PostgreSQL tables tracking patients, encounters, and vitals. Over the years it grew into a sprawling schema with hundreds of tables, custom fields clinicians demanded, and business logic baked into stored procedures. Now you need to migrate to openEHR.

Perhaps your organization needs to participate in a national health data exchange that mandates openEHR. Perhaps you are hitting the ceiling of your custom schema's ability to represent evolving clinical models. Perhaps you have realized that maintaining your own clinical data model is a full-time job that has nothing to do with your core product.

Whatever the reason, this migration is one of the most complex technical projects in health IT. It is not a simple ETL job. You are fundamentally changing how clinical data is modeled, stored, queried, and validated. This guide covers what actually happens when you do it, based on real migrations we have supported at Nirmitee.

Custom EHR to openEHR Migration Overview - 5 stage pipeline from Schema Analysis through Validation and Cutover spanning 12-19 weeks

Phase 1: Schema Analysis and Archetype Mapping

Auditing Your Existing Schema

Before touching any openEHR tooling, you need a complete inventory of your current data model. This is not just a list of tables. You need to understand:

Active vs. dead tables — In any system older than three years, at least 15-20% of tables are no longer written to. Migrating dead data wastes months.
Data volumes per table — A table with 50 million rows of vital signs requires a different migration strategy than a lookup table with 200 rows.
Foreign key relationships — Custom schemas often have implicit relationships not enforced by constraints. Interview the developers who built them.
Business logic in the database — Stored procedures, triggers, views, and materialized views that transform data on read. This logic must be replicated or replaced.
Custom fields and extensions — The columns clinicians asked for that do not map to any standard. These are the hardest part of the migration.

Run a comprehensive schema audit. Here is a SQL query that generates a useful starting inventory on PostgreSQL:

-- Schema audit: table sizes, row counts, and column inventory
SELECT 
    t.table_schema,
    t.table_name,
    pg_size_pretty(pg_total_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name))) as total_size,
    (SELECT count(*) FROM information_schema.columns c WHERE c.table_name = t.table_name AND c.table_schema = t.table_schema) as column_count,
    pg_stat_get_live_tuples(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name))::regclass) as estimated_rows
FROM information_schema.tables t
WHERE t.table_schema NOT IN ('pg_catalog', 'information_schema')
ORDER BY pg_total_relation_size(quote_ident(t.table_schema) || '.' || quote_ident(t.table_name)) DESC;

Mapping Tables to Archetypes

This is the core intellectual work of the migration. Every table or group of related tables in your custom schema must be mapped to one or more openEHR archetypes. An archetype is a formal, version-controlled clinical data model that defines the structure, constraints, and terminology bindings for a specific clinical concept.

Relational Schema to openEHR Archetype Mapping showing SQL tables mapped to openEHR archetypes with direct and partial mapping arrows

The openEHR Clinical Knowledge Manager (CKM) hosts the international archetype library. For most common clinical concepts, an archetype already exists. Here are the typical mappings:

Custom EHR Table	openEHR Archetype	Mapping Complexity
patients / demographics	openEHR-EHR-ADMIN_ENTRY.demographic	Low — direct field mapping
vitals / observations	openEHR-EHR-OBSERVATION.blood_pressure, .body_temperature, .pulse	Medium — one table splits into multiple archetypes
lab_results	openEHR-EHR-OBSERVATION.laboratory_test_result	Medium — needs CLUSTER.laboratory_test_analyte for each analyte
allergies	openEHR-EHR-EVALUATION.adverse_reaction_risk	Low — well-defined archetype
medications	openEHR-EHR-INSTRUCTION.medication_order + ACTION.medication	High — order vs. administration split
diagnoses / problems	openEHR-EHR-EVALUATION.problem_diagnosis	Medium — terminology mapping needed (ICD-10 to SNOMED CT)
clinical_notes	openEHR-EHR-EVALUATION.clinical_synopsis or OBSERVATION.progress_note	High — unstructured text, limited archetype fit
custom_assessments	May require custom archetype or GENERIC_ENTRY	Very High — no standard archetype exists

The critical insight: your custom schema is entity-oriented (one table per concept), while openEHR is composition-oriented (clinical documents containing multiple archetypes). A single patient encounter in your system might pull from 8-10 tables. In openEHR, that same encounter becomes a single Composition containing multiple Observations, Evaluations, and Instructions.

When No Archetype Exists

For roughly 10-20% of your data, you will not find a suitable archetype in CKM. You have three options:

Create a custom archetype — Follow the openEHR Archetype Design Guidelines. Submit it to CKM for peer review. This takes 4-8 weeks for review and acceptance.
Use GENERIC_ENTRY — A catch-all archetype for data that does not fit established patterns. Acceptable for migration, but should be replaced with proper archetypes over time.
Archive without migration — Some data (UI preferences, temporary processing fields, deprecated columns) does not belong in a clinical data repository. Archive it to cold storage and document the decision.

Phase 2: Building the Data Transformation Pipeline

With mappings defined, you need to build the pipeline that actually transforms data from your relational schema into openEHR Compositions and posts them to the Clinical Data Repository (CDR).

Data Transformation Pipeline showing 4 stages: SQL Query and Extract, Map to Flat JSON, Build openEHR Composition, POST to CDR with error queue and audit log branches

Step 1: Extract from Custom Schema

Write SQL queries that extract data grouped by patient encounter, not by table. This is critical because openEHR Compositions are encounter-centric. A single composition might need data from your vitals, lab_results, medications, and allergies tables for one encounter.

-- Extract encounter-grouped data for transformation
SELECT 
    e.encounter_id,
    e.patient_id,
    e.encounter_date,
    e.encounter_type,
    -- Vitals (pivoted from rows to columns)
    MAX(CASE WHEN v.vital_type = 'BP_SYSTOLIC' THEN v.value END) as bp_systolic,
    MAX(CASE WHEN v.vital_type = 'BP_DIASTOLIC' THEN v.value END) as bp_diastolic,
    MAX(CASE WHEN v.vital_type = 'HEART_RATE' THEN v.value END) as heart_rate,
    MAX(CASE WHEN v.vital_type = 'TEMPERATURE' THEN v.value END) as temperature,
    -- Allergies (aggregated as JSON array)
    COALESCE(
        json_agg(DISTINCT jsonb_build_object(
            'substance', a.substance,
            'reaction', a.reaction,
            'severity', a.severity
        )) FILTER (WHERE a.id IS NOT NULL),
        '[]'::json
    ) as allergies
FROM encounters e
LEFT JOIN vitals v ON v.encounter_id = e.encounter_id
LEFT JOIN patient_allergies a ON a.patient_id = e.patient_id
WHERE e.encounter_date >= '2020-01-01'
GROUP BY e.encounter_id, e.patient_id, e.encounter_date, e.encounter_type
ORDER BY e.encounter_date;

Step 2: Map to Flat JSON (Web Template Format)

openEHR CDRs like EHRbase and Better Platform accept data in a flat JSON format derived from the Operational Template (OPT). Each field is addressed by its AQL path. Here is what a blood pressure mapping looks like:

{
  "ctx/language": "en",
  "ctx/territory": "US",
  "ctx/composer_name": "Migration Script",
  "encounter/blood_pressure/any_event/systolic|magnitude": 120,
  "encounter/blood_pressure/any_event/systolic|unit": "mm[Hg]",
  "encounter/blood_pressure/any_event/diastolic|magnitude": 80,
  "encounter/blood_pressure/any_event/diastolic|unit": "mm[Hg]",
  "encounter/blood_pressure/any_event/time": "2024-06-15T10:30:00Z",
  "encounter/blood_pressure/position": "at1000",
  "encounter/body_temperature/any_event/temperature|magnitude": 37.2,
  "encounter/body_temperature/any_event/temperature|unit": "Cel",
  "encounter/pulse_heart_beat/any_event/rate|magnitude": 72,
  "encounter/pulse_heart_beat/any_event/rate|unit": "/min"
}

Step 3: POST Compositions to the CDR

Each transformed record becomes a POST request to the CDR's composition endpoint. The CDR validates the composition against the archetype constraints before storing it.

# POST a flat composition to EHRbase
curl -X POST "https://ehrbase.example.com/ehrbase/rest/openehr/v1/ehr/${EHR_ID}/composition"   -H "Content-Type: application/json"   -H "Accept: application/json"   -H "Prefer: return=representation"   -d @composition.json

# Expected response: 201 Created with composition UID
# {"_type": "COMPOSITION", "uid": {"value": "8f3a...::ehrbase::1"}}

Key considerations for the POST pipeline:

Rate limiting — Most CDRs handle 50-200 compositions per second. Plan your batch sizes accordingly.
Idempotency — Track which records have been migrated. If the pipeline fails mid-batch, you need to resume without duplicating data.
Validation errors — The CDR will reject compositions that violate archetype constraints. Route failures to an error queue for manual review, not into a retry loop.
EHR creation — Each patient needs an EHR object created in the CDR before compositions can be posted. Batch-create EHRs first using the /ehr endpoint with the patient's external ID as the subject.

Phase 3: Handling Unmappable Data

Every custom EHR has data that does not cleanly map to openEHR archetypes. Based on migrations we have worked on, expect this distribution:

Three columns showing data mapping categories: Direct Mapping (60-70%), Partial Mapping (20-25%), and Unmappable (5-10%)

Free-Text Clinical Notes

Unstructured clinical notes are the biggest challenge. A typical EHR has millions of free-text notes that contain clinically valuable information but cannot be decomposed into structured archetype fields without NLP. Options:

Store as-is in openEHR-EHR-EVALUATION.clinical_synopsis — preserves the text but gains no structured querying benefit.
NLP extraction — Use clinical NLP (cTAKES, MetaMap, or a fine-tuned LLM) to extract coded concepts from notes and store both the original text and extracted codes. This is expensive and error-prone.
Hybrid approach — Store the original text in a synopsis archetype, flag notes for future NLP processing. This is the pragmatic choice for most migrations.

Custom Fields and Proprietary Extensions

Fields like workflow_status, billing_flag, custom_risk_score, or department-specific fields created at clinician request. For each one, decide:

Is this clinical data that should exist in the CDR? If yes, find or create an archetype.
Is this operational data (workflow, billing, audit)? If yes, it belongs in a separate operational database, not the CDR.
Is this dead data nobody uses? If yes, archive and document.

Phase 4: Running Dual-Write During Transition

Cutting over from one data store to another on a specific date is a recipe for disaster in healthcare. The safe approach is a dual-write transition where both the old and new systems receive writes simultaneously.

Dual-Write Architecture showing application layer writing to both Custom EHR Database and openEHR CDR simultaneously with a Reconciliation Engine comparing outputs

Implementation Architecture

The dual-write layer sits between your application and both databases. Every write operation is duplicated:

class DualWriteService:
    def __init__(self, custom_db, openehr_cdr, reconciler):
        self.custom_db = custom_db
        self.openehr_cdr = openehr_cdr
        self.reconciler = reconciler
        self.phase = "PHASE_1"  # Read Custom, Write Both
    
    def save_encounter(self, encounter_data):
        # Always write to custom DB (existing path)
        custom_result = self.custom_db.save(encounter_data)
        
        # Transform and write to openEHR CDR
        try:
            composition = self.transform_to_composition(encounter_data)
            openehr_result = self.openehr_cdr.post_composition(composition)
            
            # Queue reconciliation check
            self.reconciler.queue_check(
                custom_id=custom_result.id,
                composition_uid=openehr_result.uid,
                timestamp=datetime.utcnow()
            )
        except OpenEHRValidationError as e:
            # Log but do not fail the primary write
            logger.error(f"Dual-write openEHR failure: {e}")
            self.reconciler.queue_failure(custom_result.id, str(e))
        
        # Read from the appropriate source based on phase
        if self.phase == "PHASE_1":
            return custom_result
        elif self.phase == "PHASE_2":
            return self.openehr_cdr.get_composition(openehr_result.uid)
        else:  # PHASE_3
            return self.openehr_cdr.get_composition(openehr_result.uid)

The Three Phases

Phase 1: Read from Custom, Write to Both — Your application continues reading from the custom database. The openEHR CDR receives shadow writes. The reconciliation engine compares both stores nightly, flagging discrepancies. Duration: 2-4 weeks.

Phase 2: Read from openEHR, Write to Both — Flip the read path to openEHR. The custom database is now the shadow. If openEHR reads fail or return unexpected results, you can instantly revert to Phase 1. Duration: 2-4 weeks.

Phase 3: Read and Write openEHR Only — Decommission the custom database write path. The custom database becomes read-only for historical queries during a sunset period. Duration: permanent after validation.

Reconciliation Engine

The reconciliation engine is the safety net that makes dual-write viable. It runs nightly batch comparisons:

# Reconciliation query: compare record counts by date
-- Custom DB side
SELECT DATE(encounter_date) as dt, COUNT(*) as custom_count
FROM encounters
WHERE encounter_date >= CURRENT_DATE - INTERVAL '7 days'
GROUP BY DATE(encounter_date);

-- openEHR CDR side (via AQL)
SELECT c/context/start_time/value as dt, COUNT(c) as openehr_count
FROM EHR e CONTAINS COMPOSITION c
WHERE c/context/start_time/value >= '2024-06-01'
GROUP BY c/context/start_time/value

Key metrics to reconcile: record counts per day, field-level checksums on a sample, coded value distribution (are the same SNOMED codes appearing?), and query result equivalence for key clinical reports.

Phase 5: Validation and Go-Live

Validation and Reconciliation Checklist covering Data Completeness, Clinical Accuracy, Query Equivalence, and Performance Baseline

Clinical Validation

Technical validation (record counts, checksums) is necessary but not sufficient. You also need clinical validation from domain experts:

Chart review — Have clinicians compare patient records in both systems for a sample of 50-100 patients. They check that clinical narratives, medication lists, allergy information, and problem lists are complete and accurate.
Clinical decision support — If your system has CDS rules (drug interaction checks, allergy alerts, screening reminders), verify they produce identical results against the openEHR data.
Reporting equivalence — Run all regulatory reports (quality measures, adverse event reports, public health reporting) against both data stores and compare outputs.

Performance Testing

openEHR CDRs have different performance characteristics than custom relational databases. Key benchmarks to establish:

Operation	Target	Notes
Single composition POST	Under 100ms	EHRbase: ~50-80ms typical
AQL query (single patient)	Under 200ms	Varies by query complexity
AQL query (population)	Under 5s	Full-scan queries will be slower than indexed SQL
Batch composition POST	200+ per second	Use async HTTP clients and connection pooling
Template upload	Under 2s	One-time operation, not performance-critical

If AQL query performance does not meet your requirements, consider:

Materialized views — Some CDRs support materialized AQL views that pre-compute frequently used queries.
FHIR facade — Expose frequently queried data via a FHIR API backed by a read-optimized secondary store.
Denormalized read models — CQRS pattern where writes go to the CDR and reads come from a denormalized PostgreSQL or Elasticsearch index.

Timeline, Staffing, and Budget

Gantt chart showing migration timeline with workstreams for Schema Analysis, ETL Development, Dual-Write, Validation, and Cutover spanning 20 weeks

Realistic Timeline

Based on migrations involving 50-200 table custom schemas with 1-10 million patient records:

Schema analysis and archetype mapping: 3-5 weeks. This is slower than expected because it requires clinical informatics expertise, not just database skills.
ETL pipeline development: 6-10 weeks. Each archetype mapping requires its own transformation logic, unit tests, and edge case handling.
Dual-write implementation: 2-4 weeks. The application layer changes are straightforward; the reconciliation engine takes longer.
Validation and testing: 4-6 weeks. Clinical validation cannot be rushed. Chart reviews and report comparisons take time.
Cutover and go-live: 1-2 weeks. The actual switch is fast if dual-write has been running successfully.

Total: 16-27 weeks for a medium-complexity migration. Smaller systems (under 50 tables) can be done in 12-16 weeks. Large enterprise EHRs with thousands of tables and complex stored procedure logic can take 6-12 months.

Team Composition

You need a cross-functional team, not just backend engineers:

Clinical Informaticist (1, full-time) — The single most important role. This person understands both the clinical domain and openEHR modeling. They drive archetype selection, review mappings for clinical accuracy, and lead chart reviews. Without this role, engineers will make incorrect clinical assumptions.
Backend Engineers (2-3, full-time) — Build the ETL pipeline, dual-write layer, and reconciliation engine. Need experience with your existing tech stack plus openEHR REST APIs.
DBA / Data Engineer (1, part-time) — Manages the schema audit, optimizes extraction queries, handles data quality issues in the source system.
QA / Test Engineer (1, part-time becoming full-time) — Builds validation test suites, runs clinical report comparisons, manages the testing phase.

Common Pitfalls

Underestimating archetype mapping time — Engineers think this is a simple column-to-field mapping. It is not. openEHR archetypes have constraints, terminology bindings, and cardinality rules that your custom schema does not. Budget 2-3x more time than your initial estimate.
Ignoring terminology mapping — Your custom schema might use local codes (1 = Active, 2 = Inactive). openEHR expects standard terminologies (SNOMED CT, LOINC, ICD-10). Building a terminology crosswalk is a project within the project.
Skipping dual-write — Going directly from old to new on a cutover date. In healthcare, this is unacceptable risk. Dual-write adds 2-4 weeks but eliminates the possibility of data loss during transition.
Treating it as a pure IT project — Without clinical informaticist involvement, you will migrate the data but lose clinical meaning. A blood pressure recorded without proper context (body position, cuff size, measurement method) is less useful than the original custom field.

After Migration: What Changes

Once running on openEHR, your relationship with clinical data changes fundamentally:

Schema evolution becomes governed — New clinical concepts are added by selecting or creating archetypes, not by adding columns. This sounds slower, but it prevents the schema sprawl that made migration necessary in the first place.
Querying moves to AQL — The Archetype Query Language replaces SQL for clinical queries. AQL is powerful but different. Your team needs training. The openEHR AQL specification is the definitive reference.
Interoperability becomes native — openEHR data can be exposed via FHIR without complex mapping layers. Most CDRs include built-in FHIR facades.
Vendor independence — Your data model is now based on an open standard. You can switch CDR vendors (EHRbase, Better, Nedap) without re-modeling your data.

Need expert help with healthcare data integration? Explore our Healthcare Interoperability Solutions to see how we connect systems seamlessly. Talk to our team to get started.
Frequently Asked Questions

Can we migrate incrementally instead of all at once?

Yes, and you should. Start with the simplest, highest-volume data (vitals, lab results) to build pipeline confidence, then tackle complex concepts (medications, clinical notes) in later phases. Each data domain can follow its own timeline through the five phases described above.

Do we need to stop using our custom EHR during migration?

No. The dual-write architecture allows your existing application to continue operating normally throughout the migration. End users should not notice any change until you are ready for the final read-path cutover.

What happens to our existing reports and analytics?

Reports built on SQL queries need to be rewritten in AQL or fed by a CQRS read model. Budget time for report migration as a separate workstream. Many organizations maintain a SQL-based reporting database alongside the CDR for complex analytical queries.

How do we handle historical data that predates our current schema?

Legacy data imported from even older systems is often the lowest quality. Consider migrating only data from the last 5-7 years at full fidelity, with older data archived in a simplified format or kept in the original system as read-only.

Is openEHR the right choice, or should we consider FHIR as our primary store?

FHIR and openEHR serve different purposes. FHIR is an interoperability standard optimized for data exchange. openEHR is a clinical data persistence standard optimized for storage, querying, and clinical modeling. Most modern health IT architectures use both: openEHR as the system of record, FHIR as the integration layer. See our guide on running SMART apps against an openEHR backend for more detail.

Conclusion

Migrating from a custom EHR schema to openEHR is a significant undertaking, but it is a well-understood problem. The critical success factors are: invest heavily in archetype mapping with clinical expertise, build a robust transformation pipeline with proper error handling, use dual-write to eliminate cutover risk, and validate clinically, not just technically.

The payoff is substantial. You move from maintaining a bespoke clinical data model — a full-time job that grows more expensive every year — to an open, governed, interoperable standard backed by a global community. Your data becomes portable, queryable in standard ways, and ready for the next generation of health IT integrations.

If you are evaluating this migration and need guidance on the archetype mapping phase, reach out to our team. We have supported multiple organizations through this transition and can help you avoid the common pitfalls.

Was this article helpful?

Your feedback helps us improve our content.

USA Office - Elintex Technologies Inc.

India Office - Elintex Technologies Pvt. Ltd.

Migrating from a Custom EHR Schema to openEHR: The Database Migration Nobody Documents

The Migration Nobody Warns You About

Phase 1: Schema Analysis and Archetype Mapping

Auditing Your Existing Schema

Mapping Tables to Archetypes

When No Archetype Exists

Phase 2: Building the Data Transformation Pipeline

Step 1: Extract from Custom Schema

Step 2: Map to Flat JSON (Web Template Format)

Step 3: POST Compositions to the CDR

Phase 3: Handling Unmappable Data

Free-Text Clinical Notes

Custom Fields and Proprietary Extensions

Phase 4: Running Dual-Write During Transition

Implementation Architecture

The Three Phases

Reconciliation Engine

Phase 5: Validation and Go-Live

Clinical Validation

Performance Testing

Timeline, Staffing, and Budget

Realistic Timeline

Team Composition

Common Pitfalls

After Migration: What Changes

Need expert help with healthcare data integration? Explore our Healthcare Interoperability Solutions to see how we connect systems seamlessly. Talk to our team to get started.
Frequently Asked Questions

Can we migrate incrementally instead of all at once?

Do we need to stop using our custom EHR during migration?

What happens to our existing reports and analytics?

How do we handle historical data that predates our current schema?

Is openEHR the right choice, or should we consider FHIR as our primary store?

Conclusion

Related Posts

Healthcare Data Quality: The Pre-Requisite Nobody Checks Before Building AI or Analytics

AI Agents for Claims Denial Management: Building the System That Appeals 4x Faster Than Your Staff

Healthcare ML Metrics That Matter: Beyond Accuracy to Clinical Utility

Migrating from a Custom EHR Schema to openEHR: The Database Migration Nobody Documents

The Migration Nobody Warns You About

Phase 1: Schema Analysis and Archetype Mapping

Auditing Your Existing Schema

Mapping Tables to Archetypes

When No Archetype Exists

Phase 2: Building the Data Transformation Pipeline

Step 1: Extract from Custom Schema

Step 2: Map to Flat JSON (Web Template Format)

Step 3: POST Compositions to the CDR

Phase 3: Handling Unmappable Data

Free-Text Clinical Notes

Custom Fields and Proprietary Extensions

Phase 4: Running Dual-Write During Transition

Implementation Architecture

The Three Phases

Reconciliation Engine

Phase 5: Validation and Go-Live

Clinical Validation

Performance Testing

Timeline, Staffing, and Budget

Realistic Timeline

Team Composition

Common Pitfalls

After Migration: What Changes

Need expert help with healthcare data integration? Explore our Healthcare Interoperability Solutions to see how we connect systems seamlessly. Talk to our team to get started. Frequently Asked Questions

Can we migrate incrementally instead of all at once?

Do we need to stop using our custom EHR during migration?

What happens to our existing reports and analytics?

How do we handle historical data that predates our current schema?

Is openEHR the right choice, or should we consider FHIR as our primary store?

Conclusion

Related Posts

Healthcare Data Quality: The Pre-Requisite Nobody Checks Before Building AI or Analytics

AI Agents for Claims Denial Management: Building the System That Appeals 4x Faster Than Your Staff

Healthcare ML Metrics That Matter: Beyond Accuracy to Clinical Utility

Need expert help with healthcare data integration? Explore our Healthcare Interoperability Solutions to see how we connect systems seamlessly. Talk to our team to get started.
Frequently Asked Questions