Nirmitee.io
HL7's AI Office: What the New Standards Body Means for Every Healthcare Developer

HL7's AI Office: What the New Standards Body Means for Every Healthcare Developer

March 17, 2026
14 min read
Written by
Jitendra Choudhary
Jitendra Choudhary

CTO & Co-Founder

A technology leader with deep expertise in AI/ML, software architecture, and scalable digital systems.

In July 2025, HL7 International made the most consequential organizational move in its 38-year history: it launched a dedicated AI Office with a mandate to build the standards infrastructure for healthcare AI. Daniel Vreeman, HL7's chief standards development officer, was appointed as the organization's first-ever chief AI officer. Six months later, the AI Transparency on FHIR implementation guide entered its first ballot cycle. If you build healthcare software, this changes your roadmap.

This is not a committee that publishes white papers. The AI Office is shipping specifications that will define how AI-generated clinical data is tagged, traced, and trusted across every EHR, payer system, and clinical decision support tool in production. Here is what they are building, what the first standards look like, and exactly how to prepare your systems.

HL7 AI Office organizational structure and four strategic workstreams

Why HL7 Launched an AI Office Now

The timing is not accidental. Three converging forces made 2025 the inflection point:

  • Regulatory pressure: The ONC HTI-1 final rule requires algorithm transparency in certified EHR technology. CMS's 2026 interoperability rules reference AI-generated content. The EU AI Act classifies medical AI as high-risk. Standards bodies need to provide the technical specifications regulators reference.
  • Production reality: Over 80% of health care executives expect agentic AI to deliver moderate-to-significant value in 2026, per Deloitte's State of AI survey. AI agents are already writing clinical notes, generating prior authorization requests, and flagging sepsis risk from remote monitoring data. Without standards, every vendor implements transparency differently.
  • The FHIR foundation: FHIR R4 is normative and widely deployed. FHIR R6 is on the horizon. The Provenance resource, AuditEvent, and extension mechanisms provide the technical building blocks to represent AI metadata. HL7 realized they could build AI transparency on top of FHIR rather than creating a parallel standard.

The result: four strategic workstreams that cover everything from specification development to global policy alignment.

The Four Strategic Workstreams

HL7 AI Office four strategic workstreams: Standards, Global Leadership, Innovation Lab, Community Excellence

The AI Office operates through four parallel tracks, each with distinct deliverables:

1. Standards Development: The AI-Ready Interoperability Stack

This is the core technical track. Two specifications are in active development:

  • AI Transparency on FHIR Implementation Guide (IG): Entered Draft Standard ballot in January 2026. Defines how to document when AI has created, modified, or influenced a FHIR resource. Based on FHIR 4.0.1. Package ID: hl7.fhir.uv.aitransparency#1.0.0-ballot.
  • Agent Tool Specifications: Early-stage work on defining standardized interfaces for AI agents that interact with clinical systems. Think of it as CDS Hooks for the agentic era.

The standards track also includes extensions to the FHIR Provenance resource for AI lineage tracking and model card profiles that capture algorithm metadata in a FHIR-native format.

2. Global Leadership and Partnerships

HL7 is convening cross-SDO alignment with IHE, DICOM (for AI in imaging), and national health IT bodies. The goal: prevent fragmentation where every country builds its own AI transparency standard. The EU AI Act, US ONC rules, and WHO guidelines should all reference the same underlying FHIR profiles.

3. AI Innovation Lab

This is the proving ground. The Innovation Lab runs connectathons where implementers test AI specifications against real systems. The January 2026 FHIR Connectathon included the first-ever AI Transparency track, where teams validated the IG against production-like scenarios.

4. Community Excellence

Implementation guides, best practice libraries, and developer training. This track ensures that when the standards are published, the community has the tools to adopt them without a two-year learning curve.

AI Transparency on FHIR: What the Specification Actually Says

Four levels of AI observability in the AI Transparency on FHIR specification

The AI Transparency on FHIR IG is the first concrete deliverable. It establishes four levels of observability for AI-influenced health data:

Level 1: Tagging

The minimum requirement. A binary flag on any FHIR resource that has been created or modified by AI. Implemented via Resource.meta.tag with a standardized code from the IG's value set. This answers the question: Was AI involved in producing this data?

{
  "resourceType": "Observation",
  "meta": {
    "tag": [
      {
        "system": "http://hl7.org/fhir/uv/aitransparency/CodeSystem/ai-involvement",
        "code": "ai-generated",
        "display": "AI Generated"
      }
    ]
  },
  "status": "preliminary",
  "code": {
    "coding": [{ "system": "http://loinc.org", "code": "59408-5", "display": "Oxygen saturation" }]
  },
  "valueQuantity": { "value": 94, "unit": "%", "system": "http://unitsofmeasure.org", "code": "%" }
}

Level 2: Provenance

Which AI model produced or influenced the data? What version? What was the confidence score? This uses the FHIR Provenance resource with extensions for model identification.

{
  "resourceType": "Provenance",
  "target": [{ "reference": "Observation/ai-spo2-reading" }],
  "recorded": "2026-03-15T14:30:00Z",
  "agent": [
    {
      "type": {
        "coding": [{
          "system": "http://hl7.org/fhir/uv/aitransparency/CodeSystem/agent-type",
          "code": "ai-algorithm"
        }]
      },
      "who": { "display": "Sepsis Risk Model v3.2" }
    }
  ],
  "extension": [
    {
      "url": "http://hl7.org/fhir/uv/aitransparency/StructureDefinition/ai-confidence",
      "valueDecimal": 0.94
    },
    {
      "url": "http://hl7.org/fhir/uv/aitransparency/StructureDefinition/ai-algorithm-type",
      "valueCode": "non-deterministic"
    }
  ]
}

Level 3: Reasoning

The chain of thought. What evidence did the AI consider? What was the decision path? This is critical for clinical decision support where a physician needs to evaluate why the AI reached a conclusion. The IG provides guidance on representing reasoning chains through linked Provenance resources and the entity element.

Level 4: Full Audit

Complete input/output capture, model card documentation, and validation results. Required for FDA SaMD compliance and high-risk clinical applications. The IG defines profiles for model cards that capture training data characteristics, performance metrics, known limitations, and intended use populations.

Algorithm Classification

The specification distinguishes three algorithm types, each with different transparency requirements:

Algorithm TypeDescriptionMinimum ObservabilityExample
DeterministicSame input always produces same outputLevel 1 (Tagging)Rule-based sepsis screening
Non-deterministicOutput may vary; includes LLMs and neural networksLevel 2 (Provenance)GPT-based clinical note generation
HybridDeterministic logic with non-deterministic componentsLevel 2 (Provenance)Rules engine + ML risk scorer

The HL7 Global AI Challenge: What Won and Why It Matters

HL7 Global AI Challenge 2025 winners across six award categories

In parallel with the AI Office launch, HL7 ran its first-ever Global AI Challenge. Thirty entries from every inhabited continent. Winners announced at the 39th Annual Plenary in Pittsburgh, September 2025. The results signal where the standards community sees the highest-value applications:

  • Clinical Data Quality: Health Samurai's Aidbox Forms — an AI assistant for FHIR SDC (Structured Data Capture) and analytics. This directly addresses the data quality problem that undermines every downstream AI model.
  • AI Transparency and Trust: Trisotech — demonstrating that transparency tooling is a first-class category, not an afterthought.
  • Interoperability Leadership: Whitefox FHIR Converter — automating the translation layer between legacy formats and FHIR, which remains the largest bottleneck for AI adoption in hospitals running HL7v2 interfaces.
  • Pioneer Innovation: Ignyte Group and Appian — agentic AI for clinical workflow automation, validating that agents on FHIR is the emerging architecture pattern.

The pattern across winners: standards-based AI that is transparent, interoperable, and integrated into existing clinical workflows. None of the winners were standalone AI models. Every one was an AI system built on open health data standards.

Agent Tool Specifications: CDS Hooks for the Agentic Era

The least-discussed but potentially most impactful work: HL7 is developing standardized interfaces for AI agents that interact with clinical systems. Today, every agent framework defines its own tool contracts. One agent uses custom REST endpoints. Another uses MCP (Model Context Protocol). A third wraps FHIR operations in proprietary tool definitions.

HL7's agent tool specifications aim to standardize:

  • Tool discovery: How an agent discovers what actions it can perform against a clinical system (analogous to FHIR CapabilityStatement for agents)
  • Input/output contracts: Standardized schemas for tool inputs and outputs, built on FHIR resource types
  • Authorization scoping: How SMART on FHIR scopes map to agent tool permissions
  • Audit requirements: What must be logged when an agent invokes a tool, linked to the AI Transparency IG

This is early-stage work — expect a Draft Standard for Trial Use (DSTU) in late 2026 or early 2027. But the direction is clear: the same organization that standardized CDS Hooks for point-of-care decision support is building the equivalent for autonomous clinical agents.

The Standards Timeline: What Ships When

Healthcare AI standards timeline from FHIR R4 through 2027 normative AI standards

Here is the current timeline based on HL7 publications and working group schedules:

DateMilestoneImpact
July 2025AI Office launchedOrganizational commitment, chief AI officer appointed
September 2025AI Challenge winners announcedCommunity validation of standards-based AI approach
January 2026AI Transparency on FHIR — first ballotSpecification available for implementer testing
January 2026First AI Transparency Connectathon trackReal-world validation against production scenarios
Mid 2026Ballot reconciliation + STU publicationStandard for Trial Use — production implementation begins
Late 2026Agent Tool Specifications — initial draftStandardized agent-to-EHR interfaces
2027+Normative status (if adoption warrants)Mandatory for certified EHR technology

The window between now and mid-2026 is the preparation period. Organizations that implement early will shape the final standard through ballot comments and connectathon feedback.

How to Prepare Your Systems: A Developer Checklist

Developer preparation checklist for HL7 AI Transparency standards

Whether you are building AI agents, maintaining an EHR, or integrating clinical decision support, here are the concrete steps to take now:

1. Instrument AI Provenance on Every Write

Every time your AI creates or modifies a FHIR resource, attach a Provenance resource. Start with Level 1 tagging (the meta.tag approach) and add Level 2 provenance as the IG stabilizes. This is backward-compatible — it does not break existing FHIR consumers that ignore the tag.

# Python: Adding AI provenance to a FHIR resource write
import json
from datetime import datetime

def create_ai_provenance(target_ref, model_name, version, confidence):
    return {
        "resourceType": "Provenance",
        "target": [{"reference": target_ref}],
        "recorded": datetime.utcnow().isoformat() + "Z",
        "activity": {
            "coding": [{
                "system": "http://terminology.hl7.org/CodeSystem/v3-DataOperation",
                "code": "CREATE",
                "display": "Create"
            }]
        },
        "agent": [{
            "type": {
                "coding": [{
                    "system": "http://hl7.org/fhir/uv/aitransparency/CodeSystem/agent-type",
                    "code": "ai-algorithm"
                }]
            },
            "who": {"display": f"{model_name} v{version}"}
        }],
        "extension": [{
            "url": "http://hl7.org/fhir/uv/aitransparency/StructureDefinition/ai-confidence",
            "valueDecimal": confidence
        }]
    }

# Usage
provenance = create_ai_provenance(
    "Observation/sepsis-risk-123", "SepsisRiskModel", "3.2", 0.94
)
# POST to /fhir/Provenance alongside the Observation

2. Build Model Cards as FHIR Resources

Document every AI model your system uses in a structured format. The IG's model card profile captures: algorithm type (deterministic / non-deterministic / hybrid), training data characteristics, performance metrics (sensitivity, specificity, AUROC), known limitations, and intended use populations. Start building this documentation now — it will map directly to the IG profiles.

3. Implement Audit Infrastructure

Every AI inference that touches patient data should generate an AuditEvent resource. This is not just good practice — it is the foundation for Level 4 compliance. Use the patterns from our HIPAA-compliant logging guide and extend them with AI-specific fields.

4. Test Against the IG

The AI Transparency on FHIR IG is available at build.fhir.org. Validate your resources against its profiles using the FHIR Validator. Better yet, participate in the next HL7 Connectathon. Use Inferno for automated conformance testing once test suites are published.

5. Map Your SMART Scopes to Agent Permissions

If you are building FHIR-based agents, review your SMART App Launch v2 scope definitions. The agent tool specifications will likely build on SMART scopes for authorization. Agents that already use granular FHIR scopes will have less refactoring when the specs land.

6. Monitor for Model Drift

The AI Transparency IG implicitly requires ongoing monitoring — if you tag a resource as AI-generated with a confidence score of 0.94, but your model has drifted and actual accuracy is 0.71, the provenance data is misleading. Implement the drift detection patterns we covered previously and update provenance metadata when model performance changes.

What This Means for EHR Vendors

For Epic, Oracle Health, athenahealth, and other major EHR platforms: the AI Transparency IG will eventually become a certification requirement. ONC's HTI-1 rule already references algorithm transparency. When the IG reaches normative status, expect it to be cited in the certification criteria.

The practical impact:

  • Resource tagging: EHRs must support meta.tag values from the AI Transparency value set on stored resources
  • Provenance storage: FHIR servers must accept and return Provenance resources linked to AI-generated content
  • API surface: FHIR search must support filtering by AI involvement tags
  • Display requirements: Clinical UIs must visually distinguish AI-generated content from human-authored content

Vendors that build this infrastructure now will be ahead of the certification timeline. Those that wait will face compressed implementation schedules when the rule drops.

What This Means for Agent Builders

If you are building AI agents that interact with clinical data — whether through CDS Hooks, direct FHIR API access, or event-driven architectures — the agent tool specifications will define how your agents are discovered, authorized, and audited.

Start building with these patterns now:

  • Declare capabilities: Every agent should expose a machine-readable description of what it does, what data it needs, and what actions it can take.
  • Use SMART scopes: Authorize agents via SMART App Launch v2 with the most granular scopes possible.
  • Log everything: Every agent action, every data access, every clinical recommendation. Build the logging infrastructure now.
  • Tag your outputs: Every resource your agent creates or modifies gets the AI Transparency tag. Non-negotiable.

The Bigger Picture: Standards as Competitive Advantage

The organizations that shaped early FHIR adoption — through connectathon participation, early implementation, and ballot feedback — are the ones that dominate health IT interoperability today. The same dynamic is playing out with AI standards.

Participating in the AI Transparency on FHIR ballot process, testing at connectathons, and implementing the draft specifications gives you three advantages:

  1. Influence: Ballot comments directly shape the final specification. Your production edge cases become the spec's test scenarios.
  2. Speed: When standards become mandatory, you are already compliant. No scramble, no retrofit.
  3. Trust: Healthcare buyers increasingly require evidence of standards compliance. Early adoption signals maturity and reliability — the qualities that win RFPs and pass security reviews.

The window for early-mover advantage is roughly 18 months: from now through mid-2027 when normative status is expected. After that, compliance becomes table stakes.

Getting Started with Nirmitee

At Nirmitee, we build healthcare AI systems that are standards-compliant from day one. Our teams implement FHIR-native agent architectures with full AI provenance tracking, SMART-scoped authorization, and audit infrastructure that satisfies both the emerging HL7 standards and current HIPAA requirements.

Whether you need to instrument AI transparency in an existing system or design a greenfield agent architecture aligned with the HL7 AI Office specifications, our healthcare engineering team has the deep standards expertise to get it right.

Talk to our healthcare AI team about building standards-ready clinical AI systems.

Frequently Asked Questions

What is the HL7 AI Office?

The HL7 AI Office is a dedicated organizational unit launched in July 2025 to develop global standards for healthcare AI. Led by Daniel Vreeman as HL7's first chief AI officer, it operates through four workstreams: standards development (including the AI Transparency on FHIR IG), global partnerships, an AI Innovation Lab for connectathon testing, and community excellence for implementation guidance.

What does the AI Transparency on FHIR specification require?

The AI Transparency on FHIR IG defines four levels of observability for AI-influenced health data: Level 1 (Tagging) marks resources as AI-generated, Level 2 (Provenance) documents which model produced the data and its confidence score, Level 3 (Reasoning) captures the decision chain, and Level 4 (Full Audit) requires complete input/output capture and model documentation.

When will HL7 AI standards become mandatory?

The AI Transparency on FHIR IG entered its first Draft Standard ballot in January 2026, with STU publication expected mid-2026. Normative status is projected for 2027 or later. The preparation window is approximately 18 months from early 2026.

How should healthcare developers prepare for HL7 AI standards?

Start by adding AI provenance tags to every FHIR resource your AI creates or modifies. Build Provenance resources that document model identity, version, and confidence scores. Implement HIPAA-compliant audit logging for all AI inferences. Use granular SMART on FHIR scopes for agent authorization. Validate against the draft IG profiles at build.fhir.org.

What are HL7 Agent Tool Specifications?

Agent tool specifications are an early-stage HL7 initiative to standardize how AI agents interact with clinical systems. They aim to define tool discovery, input/output contracts based on FHIR resource types, authorization scoping via SMART on FHIR, and audit requirements. A DSTU is expected in late 2026 or early 2027.

Share