The Convergence of Clinical Research and EHR Data
Pharmaceutical companies and health systems are increasingly partnering to leverage real-world data (RWD) from electronic health records for clinical research. The FDA now accepts real-world evidence (RWE) derived from EHR data for regulatory decisions, and FHIR provides the standardized API framework to make this data exchange practical.
FHIR defines four core resources for clinical research: ResearchStudy (trial metadata and protocol), ResearchSubject (participant enrollment and status), PlanDefinition (study protocol as computable logic), and ActivityDefinition (individual study activities). Together, these resources create a digital representation of a clinical trial that can integrate directly with EHR systems.
This guide covers how to model clinical trials in FHIR, extract research cohorts using Bulk Data operations, match eligible patients to trials, handle IRB requirements and de-identification, and build the data pipeline from EHR to research platform. We include working FHIR examples and architecture patterns drawn from real implementations.
FHIR Resources for Clinical Research
ResearchStudy: Trial Metadata
The ResearchStudy resource captures everything about a clinical trial except the participants: title, phase, conditions being studied, sponsor, study sites, study arms, and protocol references. It also links to ClinicalTrials.gov via the identifier element.
{
"resourceType": "ResearchStudy",
"id": "trial-diabetes-mgmt-2026",
"identifier": [{
"system": "https://clinicaltrials.gov",
"value": "NCT06789012"
}],
"title": "Continuous Glucose Monitoring Impact on HbA1c in Type 2 Diabetes",
"status": "active",
"phase": {
"coding": [{
"system": "http://terminology.hl7.org/CodeSystem/research-study-phase",
"code": "phase-3",
"display": "Phase 3"
}]
},
"category": [{
"text": "Interventional"
}],
"condition": [{
"coding": [{
"system": "http://snomed.info/sct",
"code": "44054006",
"display": "Type 2 diabetes mellitus"
}]
}],
"enrollment": [{
"reference": "Group/eligible-patients-cgm-trial"
}],
"period": {
"start": "2026-01-01",
"end": "2028-06-30"
},
"sponsor": {
"reference": "Organization/pharma-corp"
},
"principalInvestigator": {
"reference": "Practitioner/dr-research-lead"
},
"site": [{
"reference": "Location/metro-hospital-research-center"
}],
"arm": [
{
"name": "CGM Intervention",
"type": {
"coding": [{
"system": "http://terminology.hl7.org/CodeSystem/research-study-arm-type",
"code": "experimental"
}]
},
"description": "Continuous glucose monitoring with real-time alerts"
},
{
"name": "Standard Care",
"type": {
"coding": [{
"system": "http://terminology.hl7.org/CodeSystem/research-study-arm-type",
"code": "active-comparator"
}]
},
"description": "Standard fingerstick glucose monitoring"
}
]
}The enrollment element references a Group resource that defines the eligible patient population. The arm element describes the study arms—critical for randomized controlled trials. Linking to ClinicalTrials.gov via the identifier enables automated synchronization with the federal registry.
ResearchSubject: Participant Enrollment
The ResearchSubject resource tracks an individual patient's participation in a study. It links the patient to the study, records their assigned arm, and tracks their enrollment status through the study lifecycle.
{
"resourceType": "ResearchSubject",
"id": "subject-patient-456",
"status": "on-study",
"study": {
"reference": "ResearchStudy/trial-diabetes-mgmt-2026"
},
"individual": {
"reference": "Patient/patient-456"
},
"assignedArm": "CGM Intervention",
"actualArm": "CGM Intervention",
"period": {
"start": "2026-02-15"
},
"consent": {
"reference": "Consent/research-consent-patient-456"
}
}The status field tracks the participant through enrollment states: candidate, eligible, follow-up, on-study, on-study-intervention, on-study-observation, withdrawn, and off-study. The consent reference links to a FHIR Consent resource capturing the patient's research consent—connecting directly to the consent management patterns covered in our previous guide.
PlanDefinition and ActivityDefinition: Computable Protocols
The PlanDefinition resource represents the study protocol as machine-readable logic. It defines inclusion and exclusion criteria, study activities, timing, and decision points. The ActivityDefinition resource defines individual activities within the protocol—lab tests, questionnaires, drug administrations, follow-up visits.
| Resource | Purpose | Example |
|---|---|---|
PlanDefinition | Overall protocol logic | Inclusion: HbA1c 7.0-10.0%, Age 18-75, Type 2 DM diagnosis |
ActivityDefinition | Individual study activities | HbA1c lab test every 3 months, weekly CGM data download |
Group | Eligible patient cohort | Patients matching inclusion criteria |
Consent | Research participation consent | IRB-approved informed consent |
Extracting Research Cohorts with FHIR Bulk Data
One of the most powerful applications of FHIR in clinical research is cohort extraction. Instead of manually querying the EHR database, researchers can use the FHIR Bulk Data $export operation to extract patient populations that match study criteria.
Cohort Identification Workflow
- Define eligibility criteria in a
PlanDefinitionwith structured inclusion/exclusion rules. - Query the FHIR server for patients matching the criteria using search parameters.
- Create a Group resource containing the matched patient references.
- Run Bulk Data Export on the Group to extract all relevant clinical data.
- De-identify the exported data before making it available to the research team.
# Step 1: Search for eligible patients (Type 2 DM, HbA1c 7.0-10.0%)
GET /Patient?_has:Condition:patient:code=44054006
&_has:Observation:patient:code=4548-4
&_has:Observation:patient:value-quantity=gt7.0|%
&_has:Observation:patient:value-quantity=lt10.0|%
&birthdate=le2008-01-01&birthdate=ge1951-01-01
# Step 2: Create Group from matched patients
POST /Group
{
"resourceType": "Group",
"type": "person",
"actual": true,
"name": "CGM Trial Eligible Cohort - March 2026",
"member": [
{"entity": {"reference": "Patient/patient-101"}},
{"entity": {"reference": "Patient/patient-102"}},
{"entity": {"reference": "Patient/patient-103"}}
]
}
# Step 3: Bulk export the cohort data
GET /Group/cgm-trial-cohort/$export
?_type=Patient,Condition,Observation,MedicationRequest,Procedure
&_since=2024-01-01
Accept: application/fhir+json
Prefer: respond-asyncThis approach leverages the same FHIR infrastructure used for healthcare interoperability, reducing the need to build separate research data extraction pipelines.
Real-World Evidence from EHR Data
Real-world evidence (RWE) uses data collected during routine clinical care—rather than in controlled trial settings—to generate evidence about treatment effectiveness, safety, and outcomes. FHIR makes RWE generation practical by providing standardized access to the underlying data.
RWE Data Sources
| Source | FHIR Resources | Data Quality |
|---|---|---|
| EHR Clinical Notes | DocumentReference, DiagnosticReport | Requires NLP extraction |
| Structured EHR Data | Condition, Observation, MedicationRequest, Procedure | High quality, coded |
| Claims Data | ExplanationOfBenefit, Coverage | Complete for billed services |
| Patient Registries | ResearchStudy, ResearchSubject | Purpose-built, curated |
| Lab Systems (LIS) | Observation, DiagnosticReport | Quantitative, time-series |
| Wearables/RPM | Observation (vital signs) | High frequency, variable quality |
Challenges in RWE Generation
While FHIR standardizes data access, several challenges remain in generating credible real-world evidence:
- Data completeness: EHR data reflects what clinicians documented, not necessarily what happened. Missing data—especially for patients who seek care at multiple institutions—creates gaps that can bias results.
- Coding variability: Different clinicians and institutions code the same condition differently. SNOMED CT, ICD-10, and local codes may all represent the same diagnosis with varying specificity.
- Confounding variables: Unlike randomized trials, RWE studies cannot control for confounders. Propensity score matching and other statistical methods are required to draw valid conclusions.
- Temporal accuracy: The
effectiveDateTimeon an Observation may reflect when data was entered, not when the measurement was taken. Build validation checks for temporal consistency.
IRB Requirements and De-Identification
Every research use of EHR data requires Institutional Review Board (IRB) approval. FHIR-based research data pipelines must build in IRB compliance from the start. The IRB reviews the research protocol, data access controls, de-identification methodology, and informed consent process before any patient data can be accessed for research purposes. For prospective studies where patients are actively enrolled, a full informed consent process is required—captured as a FHIR Consent resource with scope: research. For retrospective studies using only de-identified data, the IRB may grant a waiver of informed consent under 45 CFR 46.116(d), but the de-identification process itself must be documented and validated.
Many academic medical centers now maintain a Research Data Warehouse (RDW) that receives nightly de-identified extracts from the clinical FHIR server. Researchers submit study protocols to the IRB, and upon approval, receive access to the relevant subset of the RDW. This model separates clinical operations from research access, reducing the risk of inadvertent PHI exposure while maintaining data freshness for time-sensitive studies.
HIPAA De-Identification Methods
HIPAA provides two methods for de-identification:
- Safe Harbor (45 CFR 164.514(b)): Remove 18 specified identifiers including names, dates (except year), geographic data smaller than state, phone numbers, SSN, MRN, and all other unique identifiers. Dates must be generalized to year only for patients over 89.
- Expert Determination (45 CFR 164.514(a)): A qualified statistical expert certifies that the risk of re-identification is very small. This method allows retaining more data elements (exact dates, partial zip codes) when the expert can demonstrate low re-identification risk.
When building a FHIR-based de-identification pipeline, apply transformations at the resource level: strip Patient.name, Patient.address, Patient.telecom, generalize Patient.birthDate to year, replace Patient.id with a research-specific pseudonym, and remove all references to other patients in RelatedPerson resources.
Building the Research Data Pipeline
A production research data pipeline connects the EHR's FHIR API to a research data platform through several stages. The first stage is cohort identification—querying the FHIR server for patients matching study criteria. The second stage is data extraction via Bulk Data Export, pulling all relevant resource types for the identified cohort. The third stage is de-identification, applying HIPAA Safe Harbor or Expert Determination rules to strip protected health information. The fourth stage is data quality validation—checking for completeness, coding consistency, and temporal accuracy. The fifth stage is loading the de-identified data into the research data warehouse in a format suitable for statistical analysis.
Each stage must be auditable. The research data pipeline should generate provenance records (FHIR Provenance resources) documenting what data was extracted, when, by whom, under what IRB protocol, and what transformations were applied. This audit trail is essential for regulatory submissions—the FDA expects organizations to demonstrate data lineage from source EHR to final analysis dataset. Organizations already building healthcare integration platforms can extend their existing FHIR infrastructure to support research use cases, rather than building separate research data systems from scratch.
ClinicalTrials.gov Integration
The FHIR ResearchStudy resource maps naturally to ClinicalTrials.gov registry entries. By maintaining study metadata in FHIR format, organizations can automate the registration and update process.
| ClinicalTrials.gov Field | FHIR ResearchStudy Element |
|---|---|
| NCT Number | identifier |
| Brief Title | title |
| Overall Status | status |
| Study Phase | phase |
| Conditions | condition |
| Interventions | arm.description |
| Sponsor | sponsor |
| Principal Investigator | principalInvestigator |
| Enrollment | enrollment |
| Start/End Date | period |
Patient-Trial Matching
One of the most impactful applications of FHIR in clinical research is automated patient-trial matching. Today, fewer than 5% of adult cancer patients participate in clinical trials—largely because physicians are unaware of available trials for their patients. Automated matching changes this by continuously screening the EHR patient population against active trial eligibility criteria and surfacing matches during clinical encounters. The matching system queries active ResearchStudy resources, retrieves the associated PlanDefinition with inclusion/exclusion criteria, and evaluates each patient's clinical data against those criteria in real time.
# Patient-Trial Matching Service (Python pseudocode)
def match_patient_to_trials(patient_id):
"""Find eligible trials for a patient based on their clinical data."""
# 1. Gather patient clinical profile
conditions = fhir_search('Condition', {'patient': patient_id, 'clinical-status': 'active'})
labs = fhir_search('Observation', {'patient': patient_id, 'category': 'laboratory', '_sort': '-date', '_count': 50})
meds = fhir_search('MedicationRequest', {'patient': patient_id, 'status': 'active'})
patient = fhir_read('Patient', patient_id)
# 2. Query active trials matching patient conditions
patient_conditions = [c.code.coding[0].code for c in conditions]
trials = fhir_search('ResearchStudy', {'status': 'active', 'condition': ','.join(patient_conditions)})
# 3. Evaluate eligibility criteria for each trial
matches = []
for trial in trials:
plan = get_plan_definition(trial)
if evaluate_inclusion_criteria(plan, patient, conditions, labs, meds):
if not evaluate_exclusion_criteria(plan, patient, conditions, labs, meds):
matches.append({
'trial': trial,
'match_score': calculate_match_score(plan, patient)
})
return sorted(matches, key=lambda m: m['match_score'], reverse=True)This type of clinical decision support can significantly increase trial enrollment rates. Integrating trial matching into the EHR workflow—as a CDS Hooks service or SMART on FHIR app—ensures that clinicians see relevant trials at the point of care, not in a separate research portal they rarely visit. The matching service should rank results by match confidence, highlight which criteria the patient satisfies, and flag any exclusion criteria that are close to being triggered, giving the clinician actionable context for discussing trial participation with the patient.
Regulatory Landscape for FHIR-Based Research
The FDA's 2018 Framework for Real-World Evidence established that RWE can support regulatory decisions, including new indications for approved drugs. Since then, the FDA has issued multiple guidance documents on using electronic health record data for clinical evidence generation. The 21st Century Cures Act further accelerated this by mandating interoperable health data exchange—with FHIR as the designated standard. For organizations building FHIR-based research capabilities, this regulatory alignment means that investments in FHIR infrastructure serve both clinical operations and research needs. A single FHIR API can support patient care, payer data exchange, quality reporting, and clinical research—maximizing the return on infrastructure investment.
Frequently Asked Questions
What FHIR resources are used for clinical trials?
The four core resources are ResearchStudy (trial metadata), ResearchSubject (participant enrollment), PlanDefinition (study protocol), and ActivityDefinition (study activities). Supporting resources include Group (eligible cohorts), Consent (research consent), and standard clinical resources like Condition, Observation, and MedicationRequest for outcome data.
How does FHIR Bulk Data help clinical research?
FHIR Bulk Data Export ($export) enables researchers to extract large patient cohorts from EHR systems in standardized NDJSON format. Researchers define a Group of eligible patients and export all relevant clinical data types in a single asynchronous operation, replacing manual chart review or custom database queries.
What is real-world evidence in the context of FHIR?
Real-world evidence (RWE) is clinical evidence derived from data collected during routine patient care rather than controlled trials. FHIR provides standardized access to EHR data, making it practical to extract, de-identify, and analyze real-world data at scale. The FDA accepts RWE for post-market safety monitoring, label expansion studies, and certain regulatory submissions.
Is IRB approval required for FHIR-based research?
Yes. Any use of patient data for research purposes requires IRB review, even when using de-identified data. The IRB evaluates the de-identification methodology, data security controls, and research protocol. Some IRBs offer expedited review for studies using only de-identified FHIR Bulk Data exports, but approval is still required.
How do you de-identify FHIR resources for research?
Apply HIPAA Safe Harbor by removing 18 specified identifiers from Patient and RelatedPerson resources, generalizing dates to year-only for patients over 89, replacing identifiers with research pseudonyms, and stripping geographic data below state level. Expert Determination allows retaining more detail if a qualified statistician certifies low re-identification risk.



