How to Build an AI Prior Authorization Agent: Step-by-Step Architecture with FHIR and LLMs

May 8, 2026

18 min read

HealthcareAIEngineering

Prior authorization costs the U.S. healthcare system $31 billion annually. Physicians submit nearly 40 prior authorization requests per week, with each one consuming an average of 14 minutes of staff time. In 2024, Medicare Advantage plans alone processed 53 million prior authorization determinations, denying 7.7% of them — yet 81.7% of appealed denials were overturned, proving most denials are administrative failures, not clinical ones.

The tools to solve this exist today. FHIR R4 provides standardized healthcare data access. The Da Vinci PAS Implementation Guide defines the API for electronic prior authorization submission. And large language models like Claude and GPT-4 can extract clinical evidence from physician notes with 86%+ accuracy.

This guide walks you through the complete architecture for building an AI prior authorization agent — from EHR data extraction to automated appeal generation. Unlike product-focused guides that tell you what to buy, this is the technical blueprint for teams that want to build.

If you are new to healthcare AI compliance, start with our HIPAA-Compliant AI Agents Architecture Guide for foundational patterns that apply to every component described below.

AI prior authorization agent end-to-end pipeline architecture showing six stages from EHR data extraction through appeal generation

System Architecture Overview: The 6-Component Pipeline

A production AI prior authorization agent is not a single model — it is an orchestrated pipeline of six specialized components, each handling a distinct phase of the authorization lifecycle. This separation of concerns is critical for reliability, auditability, and HIPAA compliance.

The pipeline processes a prior authorization request through these stages:

EHR Data Extraction — Pull structured clinical data via FHIR R4 APIs
Policy Engine — Match procedures against payer-specific coverage policies
Clinical NLP — Extract medical necessity evidence from unstructured physician notes
Packet Assembly — Compile submission forms with attached clinical evidence
Submission & Tracking — Submit via Da Vinci PAS and monitor status
Appeal Agent — Analyze denials and generate evidence-backed appeals

A HIPAA compliance layer spans all six components, enforcing encryption, audit logging, access controls, and PHI de-identification at every boundary. Every LLM call, every data access, every decision point is logged to an immutable audit trail.

Let us build each component.

Component 1: EHR Data Extraction via FHIR R4

The first component connects to the EHR system and extracts the structured clinical data needed for a prior authorization request. FHIR R4 is the standard — it provides consistent APIs across Epic, Cerner, Athenahealth, and other EHR vendors that have adopted the US Core Implementation Guide.

FHIR resource extraction diagram showing Patient Condition Procedure and MedicationRequest resources feeding into a prior authorization request bundle

Core FHIR Resources for Prior Authorization

Four FHIR resources form the foundation of every prior authorization request:

Patient Resource — Demographics, identifiers, insurance information. The patient resource links to Coverage for payer details.

GET /fhir/Patient/patient-12345

{
  "resourceType": "Patient",
  "id": "patient-12345",
  "identifier": [{
    "system": "http://hospital.org/mrn",
    "value": "MRN-98765"
  }],
  "name": [{"family": "Johnson", "given": ["Maria"]}],
  "birthDate": "1958-03-14",
  "gender": "female",
  "address": [{"state": "CA", "postalCode": "90210"}]
}

Condition Resource — ICD-10 diagnosis codes, clinical status, onset date, and supporting evidence. This is the clinical justification anchor.

GET /fhir/Condition?patient=patient-12345&clinical-status=active

{
  "resourceType": "Condition",
  "id": "condition-lumbar-stenosis",
  "clinicalStatus": {
    "coding": [{"code": "active"}]
  },
  "code": {
    "coding": [{
      "system": "http://hl7.org/fhir/sid/icd-10-cm",
      "code": "M48.06",
      "display": "Spinal stenosis, lumbar region"
    }]
  },
  "onsetDateTime": "2025-08-15",
  "subject": {"reference": "Patient/patient-12345"}
}

Procedure Resource — CPT codes for the requested procedure, linked to the justifying condition.

// Requested procedure for prior authorization
{
  "resourceType": "Procedure",
  "code": {
    "coding": [{
      "system": "http://www.ama-assn.org/go/cpt",
      "code": "63047",
      "display": "Laminectomy, lumbar"
    }]
  },
  "status": "preparation",
  "subject": {"reference": "Patient/patient-12345"},
  "reasonReference": [{"reference": "Condition/condition-lumbar-stenosis"}]
}

MedicationRequest Resource — For drug prior authorizations, includes medication codes, dosage, and prescriber.

GET /fhir/MedicationRequest?patient=patient-12345&status=active

{
  "resourceType": "MedicationRequest",
  "medicationCodeableConcept": {
    "coding": [{
      "system": "http://www.nlm.nih.gov/research/umls/rxnorm",
      "code": "1049502",
      "display": "Humira 40 MG/0.8 ML Injectable Solution"
    }]
  },
  "dosageInstruction": [{
    "text": "40mg subcutaneous every 2 weeks"
  }],
  "requester": {"reference": "Practitioner/dr-chen"}
}

Building the Extraction Layer

The extraction layer should be implemented as a FHIR client service that authenticates via SMART on FHIR (OAuth 2.0), queries the relevant resources, and assembles them into a normalized data structure for downstream processing.

import httpx
from typing import Dict, List

class FHIRExtractor:
    """Extract clinical data from EHR via FHIR R4 API."""
    
    def __init__(self, base_url: str, access_token: str):
        self.base_url = base_url.rstrip('/')
        self.headers = {
            "Authorization": f"Bearer {access_token}",
            "Accept": "application/fhir+json"
        }
    
    async def extract_prior_auth_data(
        self, patient_id: str
    ) -> Dict:
        """Extract all data needed for prior auth request."""
        async with httpx.AsyncClient() as client:
            # Parallel fetch of all required resources
            patient = await self._get(
                client, f"Patient/{patient_id}"
            )
            conditions = await self._search(
                client, "Condition",
                {"patient": patient_id, "clinical-status": "active"}
            )
            procedures = await self._search(
                client, "Procedure",
                {"patient": patient_id, "status": "preparation"}
            )
            medications = await self._search(
                client, "MedicationRequest",
                {"patient": patient_id, "status": "active"}
            )
            coverage = await self._search(
                client, "Coverage",
                {"patient": patient_id, "status": "active"}
            )
            
            return {
                "patient": patient,
                "conditions": conditions,
                "procedures": procedures,
                "medications": medications,
                "coverage": coverage
            }
    
    async def _get(self, client, path: str) -> Dict:
        resp = await client.get(
            f"{self.base_url}/{path}",
            headers=self.headers
        )
        resp.raise_for_status()
        return resp.json()
    
    async def _search(
        self, client, resource: str, params: Dict
    ) -> List[Dict]:
        resp = await client.get(
            f"{self.base_url}/{resource}",
            params=params,
            headers=self.headers
        )
        resp.raise_for_status()
        bundle = resp.json()
        return [
            e["resource"] for e in bundle.get("entry", [])
        ]

Key design decisions: use async HTTP for parallel resource fetching, implement retry logic with exponential backoff for EHR API rate limits, and cache frequently accessed reference data (Practitioner, Organization) to reduce API calls.

Component 2: Policy Engine — Payer Rules and Medical Necessity

The policy engine determines whether a requested procedure requires prior authorization and, if so, what clinical evidence is needed to justify it. This component replaces the manual process of looking up payer-specific requirements in provider portals and PDF policy documents.

Payer Policy Database Architecture

Payer policies come in three tiers, each with different update frequencies and coverage scopes:

National Coverage Determinations (NCDs) — CMS-level policies that apply to all Medicare plans. Approximately 300 active NCDs cover common procedures and services. Updated by CMS with public comment periods, typically 2-4 changes per quarter.
Local Coverage Determinations (LCDs) — Regional Medicare Administrative Contractor (MAC) policies. Over 5,000 active LCDs with jurisdiction-specific rules. A procedure may require prior authorization in one MAC jurisdiction but not another.
Commercial payer policies — Each commercial insurer (UnitedHealthcare, Anthem, Aetna, Cigna, Humana) maintains proprietary clinical guidelines that often differ significantly from CMS rules. UnitedHealthcare alone publishes over 800 clinical policies covering medical, behavioral health, and pharmacy services.

Store these policies in a vector database for semantic retrieval. When a prior auth request comes in, embed the procedure code combined with diagnosis and retrieve the most relevant policy documents using cosine similarity search.

from pinecone import Pinecone
from openai import OpenAI

class PolicyEngine:
    """Match procedures against payer coverage policies."""
    
    def __init__(self):
        self.pc = Pinecone(api_key=PINECONE_API_KEY)
        self.index = self.pc.Index("payer-policies")
        self.openai = OpenAI()
    
    def check_auth_required(
        self, procedure_code: str,
        payer_id: str, diagnosis_codes: list
    ) -> dict:
        """Determine if prior auth is required and
        what evidence is needed."""
        
        # Build semantic query
        query_text = (
            f"Prior authorization requirements for "
            f"CPT {procedure_code} with diagnosis "
            f"{', '.join(diagnosis_codes)} "
            f"payer {payer_id}"
        )
        
        # Embed and search policy database
        embedding = self.openai.embeddings.create(
            model="text-embedding-3-large",
            input=query_text
        ).data[0].embedding
        
        results = self.index.query(
            vector=embedding,
            top_k=10,
            filter={"payer_id": {"$eq": payer_id}},
            include_metadata=True
        )
        
        # Extract requirements from matching policies
        requirements = []
        for match in results.matches:
            if match.score > 0.82:  # High confidence
                requirements.append({
                    "policy_id": match.metadata["policy_id"],
                    "policy_name": match.metadata["name"],
                    "auth_required": match.metadata["auth_required"],
                    "evidence_needed": match.metadata["evidence_checklist"],
                    "confidence": match.score
                })
        
        return {
            "auth_required": any(
                r["auth_required"] for r in requirements
            ),
            "matching_policies": requirements,
            "evidence_checklist": self._merge_checklists(
                requirements
            )
        }

Medical Necessity Rules Engine

Beyond simple policy lookup, the engine must evaluate medical necessity criteria against established clinical guidelines. For example, a lumbar laminectomy (CPT 63047) typically requires all of the following to meet medical necessity:

Documented failure of conservative treatment for 6+ weeks (physical therapy, epidural injections, NSAIDs)
Imaging evidence (MRI showing stenosis with neural compression)
Functional limitation documentation (inability to walk more than 200 feet, progressive neurological deficits, bladder/bowel dysfunction)
Specialist evaluation confirming surgical indication
BMI within acceptable range for surgical candidacy (some payers require BMI below 40)

These rules are encoded as structured checklists in the policy database. The clinical NLP component (next section) evaluates each checklist item against the patient's clinical record. The checklist structure uses a hierarchical format with AND/OR logic operators connecting criteria groups, mirroring how payer medical directors actually evaluate requests.

Data Pipeline for Policy Document Ingestion

The policy engine requires a continuous ingestion pipeline to keep payer policies current. Payer guidelines change quarterly, and stale policies lead to preventable denials.

Source monitoring — Scrape CMS.gov for NCD/LCD updates (published quarterly), monitor payer provider portals for policy bulletins, subscribe to payer EDI newsletters for coverage changes.
Document processing — Parse PDF policy documents into structured sections using document AI (Claude with PDF support or specialized document parsing). Extract coverage criteria, required documentation lists, and medical necessity checklists.
Embedding and indexing — Chunk policy documents by section (coverage criteria, documentation requirements, exclusions), generate embeddings, and upsert to the vector database with metadata (payer ID, effective date, procedure codes, jurisdiction).
Validation — After each ingestion cycle, run a test suite of known procedure-payer combinations to verify the policy engine returns correct requirements. Alert on any regressions.

A well-maintained policy database is the foundation of the entire system. Stale or incorrect policy data propagates errors through every downstream component. Budget 10-15% of ongoing engineering time for policy data maintenance and quality assurance.

Component 3: Clinical NLP — Extracting Evidence from Physician Notes

This is where LLMs provide the most significant value in the prior authorization pipeline. Physician notes contain the clinical justification for procedures, but the information is buried in unstructured free-text — progress notes, operative reports, consultation letters, and discharge summaries. The clinical NLP component extracts structured evidence from these documents using large language models.

Clinical NLP pipeline showing extraction of medical necessity evidence from physician notes using LLM processing

The Extraction Pipeline

The NLP pipeline follows a three-stage process: extraction, structuring, and evaluation.

Stage 1: Clinical Entity Extraction — The LLM identifies diagnoses, procedures, medications, lab values, imaging findings, and treatment history from raw notes. This goes beyond simple named entity recognition — the model must understand temporal relationships (when was physical therapy attempted?), negation (no evidence of improvement), and clinical significance (stenosis causing radiculopathy vs. incidental finding).

Stage 2: Evidence Structuring — Extracted entities are mapped to standard code systems (ICD-10, CPT, LOINC, RxNorm) and organized by relevance to the prior auth request. The model categorizes findings into the payer's checklist categories: diagnosis confirmation, conservative treatment failure, functional limitation, and imaging evidence.

Stage 3: Checklist Evaluation — Each item on the payer's medical necessity checklist is evaluated against the extracted evidence. Research from the University of Illinois demonstrates this multi-agent approach achieves 95.6% accuracy on overall checklist determinations using GPT-4, with a multi-pass jury voting mechanism that transforms stochastic LLM outputs into statistically reliable predictions.

import anthropic
import json

class ClinicalNLPExtractor:
    """Extract medical necessity evidence using LLMs."""
    
    def __init__(self):
        self.client = anthropic.Anthropic()
    
    async def extract_evidence(
        self,
        physician_notes: str,
        evidence_checklist: list
    ) -> dict:
        """Extract and evaluate clinical evidence against
        payer requirements."""
        
        # De-identify PHI before LLM processing
        redacted_notes = self.redact_phi(physician_notes)
        
        checklist_str = "\n".join(
            f"- {item}" for item in evidence_checklist
        )
        
        prompt = f"""You are a clinical documentation specialist
reviewing physician notes for a prior authorization
request.

PHYSICIAN NOTES:
{redacted_notes}

EVIDENCE CHECKLIST (from payer policy):
{checklist_str}

For each checklist item, determine:
1. SUPPORTED - Evidence clearly present in notes
2. PARTIAL - Some evidence exists, may need
   supplementation
3. NOT_FOUND - No evidence in current notes
4. CONTRADICTED - Notes contain contradicting evidence

For each item, provide:
- status: one of the above
- evidence_text: exact quote from notes supporting
  the determination
- icd10_codes: relevant ICD-10 codes identified
- cpt_codes: relevant CPT codes identified
- confidence: 0.0 to 1.0

Return as JSON array."""
        
        response = self.client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return json.loads(
            response.content[0].text
        )
    
    def redact_phi(self, text: str) -> str:
        """Remove 18 HIPAA Safe Harbor identifiers
        before sending to external LLM API."""
        import re
        # Names
        text = re.sub(
            r'\b(Mr|Mrs|Ms|Dr)\.?\s+[A-Z][a-z]+\s+[A-Z][a-z]+',
            '[REDACTED_NAME]', text
        )
        # SSN
        text = re.sub(
            r'\b\d{3}-\d{2}-\d{4}\b',
            '[REDACTED_SSN]', text
        )
        # MRN / Account numbers
        text = re.sub(
            r'\bMRN[:\s]*\w+',
            '[REDACTED_MRN]', text
        )
        # Dates of birth
        text = re.sub(
            r'\bDOB[:\s]*\d{1,2}/\d{1,2}/\d{2,4}',
            '[REDACTED_DOB]', text
        )
        # Phone numbers
        text = re.sub(
            r'\b\d{3}[-.\s]?\d{3}[-.\s]?\d{4}\b',
            '[REDACTED_PHONE]', text
        )
        return text

Prompt Engineering for Medical Necessity

The quality of clinical extraction depends heavily on prompt design. Key patterns that improve accuracy based on published research and production experience:

Role specification — Instruct the LLM to act as a "clinical documentation specialist" or "utilization review nurse" to activate domain-specific reasoning patterns. This consistently improves extraction quality over generic prompts.
Structured output — Always request JSON output with predefined fields to ensure consistent downstream processing. Include field-level type hints in the prompt to reduce parsing failures.
Evidence grounding — Require exact quotes from the source notes for every determination to enable human verification. This also reduces hallucination, as the model must point to specific text rather than synthesize claims.
Confidence scoring — Include confidence scores so the system can route low-confidence items to human review. Items below 0.85 confidence should always go to the human review queue.
Chain-of-thought for complex cases — For cases involving multiple comorbidities or unusual treatment paths, use chain-of-thought prompting to improve reasoning accuracy. Research shows this substantially improves smaller model performance, with GPT-3.5 gaining the most benefit from CoT augmentation.
Multi-pass jury voting — Run the same extraction prompt 5-10 times and use majority voting for the final determination. This ensemble approach increases reliability from approximately 86% to 95%+ accuracy on checklist evaluations, at the cost of proportionally higher latency and API spend.

A critical design principle: never let the LLM make the final authorization decision. The LLM extracts and structures evidence. A deterministic rules engine or a human reviewer makes the final call. This separation is essential for auditability and regulatory compliance.

Multi-Agent Orchestration Pattern

Research from the University of Illinois (2024) demonstrates that a multi-agent LLM architecture significantly outperforms single-model approaches for prior authorization tasks. Their system uses three specialized agents working in concert:

Classification Agent — Evaluates retrieved clinical documents against each checklist item, categorizing evidence as supporting, contradictory, or irrelevant. Uses semantic similarity filtering with MiniLM-L6-v2 to identify the top-k candidate documents (optimal at k=20-40) before LLM evaluation.
Jury Agent — Synthesizes verdicts from the Classification Agent across all retrieved documents. Runs multiple evaluation passes (n=10) with majority voting to produce final predictions with confidence scores and evidence citations. This ensemble approach reduces single-pass hallucination risk.
Propagator Agent — Handles the logical structure of medical necessity checklists, which are hierarchical with AND, OR, and NOT operators connecting criteria groups. The propagator applies deterministic logic rules to child judgments, aggregating verdicts upward through the checklist tree. AND operations use minimum confidence of child nodes; OR operations use maximum confidence.

This architecture achieved 86.2% accuracy on leaf-node checklist predictions and 95.6% on overall checklist determinations using GPT-4. For production deployment, the voting mechanism adds latency (approximately 10x the single-pass time) but the accuracy improvement justifies it for high-stakes medical decisions where incorrect determinations lead to care delays or unnecessary denials.

Component 4: Packet Assembly — Automated Submission Forms

The packet assembly component takes structured clinical data and evidence from the previous components and compiles it into the format required for submission. This includes generating payer-specific forms, attaching clinical documentation, and validating completeness before submission.

Submission Packet Structure

A complete prior authorization packet typically includes:

PA request form — Payer-specific form with patient demographics, provider information, procedure/diagnosis codes, and service dates
Letter of medical necessity — Narrative clinical justification linking the diagnosis to the requested procedure, citing specific evidence from the patient record
Clinical evidence attachments — Lab results, imaging reports, specialist consultations, treatment history documenting conservative therapy failure
Supporting documentation — Prior treatment records showing failure of conservative therapy, peer-reviewed literature supporting the procedure for the specific diagnosis

class PacketAssembler:
    """Compile prior authorization submission packets."""
    
    def assemble_packet(
        self,
        patient_data: dict,
        clinical_evidence: dict,
        policy_requirements: dict,
        payer_config: dict
    ) -> dict:
        """Build complete PA submission packet."""
        
        # Generate letter of medical necessity
        necessity_letter = self.generate_necessity_letter(
            patient_data, clinical_evidence,
            policy_requirements
        )
        
        # Map to FHIR Claim resource
        claim = self.build_fhir_claim(
            patient_data, policy_requirements
        )
        
        # Compile evidence attachments
        attachments = self.compile_attachments(
            clinical_evidence
        )
        
        # Build the PAS Request Bundle
        bundle = {
            "resourceType": "Bundle",
            "type": "collection",
            "entry": [
                {"resource": claim},
                {"resource": patient_data["patient"]},
                {"resource": patient_data["coverage"][0]},
                *[{"resource": att} for att in attachments]
            ]
        }
        
        # Validate completeness against policy
        validation = self.validate_packet(
            bundle, policy_requirements
        )
        if not validation["complete"]:
            return {
                "status": "incomplete",
                "missing": validation["missing_items"],
                "bundle": bundle
            }
        
        return {
            "status": "ready",
            "bundle": bundle,
            "necessity_letter": necessity_letter
        }
    
    def build_fhir_claim(self, patient_data, policy) -> dict:
        """Build PAS-compliant Claim resource."""
        return {
            "resourceType": "Claim",
            "status": "active",
            "type": {
                "coding": [{
                    "system": "http://terminology.hl7.org/CodeSystem/claim-type",
                    "code": "professional"
                }]
            },
            "use": "preauthorization",
            "patient": {
                "reference": f"Patient/{patient_data['patient']['id']}"
            },
            "created": "2026-03-19",
            "provider": {
                "reference": "Organization/requesting-org"
            },
            "priority": {
                "coding": [{"code": "normal"}]
            },
            "insurance": [{
                "sequence": 1,
                "focal": True,
                "coverage": {
                    "reference": f"Coverage/{patient_data['coverage'][0]['id']}"
                }
            }],
            "item": [{
                "sequence": 1,
                "productOrService": {
                    "coding": policy["matching_policies"][0]["procedure_coding"]
                },
                "diagnosisSequence": [1]
            }],
            "diagnosis": [{
                "sequence": 1,
                "diagnosisCodeableConcept": {
                    "coding": policy["matching_policies"][0]["diagnosis_coding"]
                }
            }]
        }

Pre-Submission Validation

Before submitting, validate the packet against payer-specific requirements. Common rejection reasons that automated validation catches include:

Missing or invalid member ID format (each payer has a different format — UHC uses 9-digit numeric, Aetna uses alphanumeric with prefix)
Incorrect place of service code (outpatient vs. inpatient vs. ambulatory surgical center)
Diagnosis code not linked to procedure code (ICD-10/CPT pair mismatch per payer bundling rules)
Missing required attachments (imaging for surgical procedures, lab results for specialty drugs)
Expired referral or out-of-network provider not pre-approved
Service date in the past (retrospective auth requires different form and often different endpoint)
Duplicate request for same service within lookback period

Implementing pre-flight validation reduces rejection rates by 30-40%, saving the resubmission cycle that typically adds 5-10 business days to the authorization timeline. For a 500-case monthly volume, this translates to approximately 150-200 avoided resubmissions per month.

Component 5: Submission and Tracking via Da Vinci PAS

The submission component sends the assembled prior authorization request to the payer using the HL7 Da Vinci Prior Authorization Support (PAS) Implementation Guide. This is the industry-standard FHIR-based approach that replaces manual fax, phone, and portal submissions.

Da Vinci PAS FHIR prior authorization submission flow showing the sequence from provider EHR through intermediary to payer system

The Da Vinci PAS $submit Operation

PAS defines the $submit operation on the Claim resource. The provider POSTs a PAS Request Bundle to the payer's (or intermediary's) FHIR endpoint. The operation follows the HL7 FHIR R4 specification and is dependent on US Core 3.1, US Core 6.1, and US Core 7.0 profiles.

POST [payer-base-url]/Claim/$submit
Content-Type: application/fhir+json
Authorization: Bearer {oauth2_access_token}

{
  "resourceType": "Bundle",
  "type": "collection",
  "entry": [
    {
      "resource": {
        "resourceType": "Claim",
        "status": "active",
        "type": {"coding": [{"code": "professional"}]},
        "use": "preauthorization",
        "patient": {"reference": "Patient/pat-001"},
        "insurance": [{
          "coverage": {"reference": "Coverage/cov-001"}
        }],
        "item": [{
          "sequence": 1,
          "productOrService": {
            "coding": [{
              "system": "http://www.ama-assn.org/go/cpt",
              "code": "63047"
            }]
          }
        }]
      }
    },
    {"resource": {"resourceType": "Patient", "id": "pat-001"}},
    {"resource": {"resourceType": "Coverage", "id": "cov-001"}}
  ]
}

Response Handling

The payer returns a PAS Response Bundle containing a ClaimResponse resource with one of three outcomes:

Approved — outcome: "complete" with authorization number in the preAuthRef field. The procedure can proceed.
Pended — outcome: "queued" with a request for additional information via processNote. The agent should gather the requested documentation and resubmit via the same channel.
Denied — outcome: "error" with denial reason codes in the error array. This triggers the appeal agent (Component 6).

// ClaimResponse - Approved
{
  "resourceType": "ClaimResponse",
  "status": "active",
  "type": {"coding": [{"code": "professional"}]},
  "use": "preauthorization",
  "patient": {"reference": "Patient/pat-001"},
  "outcome": "complete",
  "preAuthRef": "AUTH-2026-78432",
  "item": [{
    "itemSequence": 1,
    "adjudication": [{
      "category": {
        "coding": [{"code": "benefit"}]
      },
      "reason": {
        "coding": [{"code": "approved"}]
      }
    }]
  }]
}

// ClaimResponse - Pended (additional info needed)
{
  "resourceType": "ClaimResponse",
  "outcome": "queued",
  "processNote": [{
    "text": "Please provide MRI report dated within 90 days and physical therapy records showing 6+ weeks of conservative treatment."
  }]
}

// ClaimResponse - Denied
{
  "resourceType": "ClaimResponse",
  "outcome": "error",
  "error": [{
    "code": {
      "coding": [{
        "system": "http://terminology.hl7.org/CodeSystem/adjudication-error",
        "code": "a001",
        "display": "Medical necessity not established"
      }]
    }
  }]
}

Status Polling and Subscription

For pended requests, implement a polling mechanism or FHIR Subscriptions for real-time status updates. The CMS Interoperability and Prior Authorization Final Rule (CMS-0057-F) mandates that impacted payers respond to urgent requests within 72 hours and standard requests within 7 calendar days, starting in 2026.

class PASStatusTracker:
    """Track prior authorization request status."""
    
    async def poll_status(
        self, claim_response_id: str,
        payer_base_url: str,
        access_token: str
    ) -> dict:
        """Poll payer for updated status."""
        async with httpx.AsyncClient() as client:
            resp = await client.get(
                f"{payer_base_url}/ClaimResponse/{claim_response_id}",
                headers={
                    "Authorization": f"Bearer {access_token}",
                    "Accept": "application/fhir+json"
                }
            )
            resp.raise_for_status()
            claim_response = resp.json()
            
            outcome = claim_response.get("outcome")
            
            if outcome == "complete":
                return {
                    "status": "approved",
                    "auth_number": claim_response.get("preAuthRef"),
                    "details": claim_response
                }
            elif outcome == "queued":
                return {
                    "status": "pended",
                    "notes": [
                        n["text"] for n in
                        claim_response.get("processNote", [])
                    ],
                    "details": claim_response
                }
            elif outcome == "error":
                return {
                    "status": "denied",
                    "errors": claim_response.get("error", []),
                    "details": claim_response
                }
            
            return {
                "status": "unknown",
                "details": claim_response
            }

Building on PAS now positions your system for regulatory compliance as the CMS-0057-F mandate takes effect. Organizations that implement FHIR-based prior authorization before the 2027 deadline gain a competitive advantage through faster authorization turnaround times and reduced manual processing costs.

Component 6: Appeal Agent — Denial Analysis and Evidence-Backed Appeals

The appeal agent is where AI provides the highest ROI in the entire pipeline. According to KFF data, 81.7% of appealed Medicare Advantage prior authorization denials were fully or partially overturned in 2024. But only 11.7% of denials are ever appealed — primarily because the manual appeal process is too time-consuming for already overburdened clinical staff. An AI appeal agent closes this gap by automating the evidence gathering, analysis, and letter drafting process.

AI appeal agent pipeline showing denial analysis root cause identification evidence gathering and appeal letter generation

Denial Root Cause Analysis

When a prior auth request is denied, the appeal agent first classifies the denial reason into actionable categories:

Medical necessity not established (42% of denials) — Insufficient clinical evidence in the submission. Most common and most overturnable denial type.
Missing documentation (23% of denials) — Required attachments were not included or were incomplete. Often a packaging error rather than a clinical issue.
Coding error (15% of denials) — Incorrect ICD-10/CPT codes, code-pair mismatch, or unlisted procedure code without modifier
Service not covered (11% of denials) — Procedure excluded under the patient's specific plan. Limited appeal options unless plan interpretation is incorrect.
Timely filing exceeded (5% of denials) — Request submitted after the payer's deadline (typically 5-15 business days from date of service)
Out-of-network provider (4% of denials) — Rendering provider not in the payer's network. May be appealable under continuity of care or network adequacy provisions.

Evidence Gathering for Appeals

Based on the denial reason, the agent retrieves additional supporting evidence from the EHR and external sources. For medical necessity denials, this includes retrieving additional clinical notes, lab results, imaging reports, and peer-reviewed literature that supports the clinical decision.

class AppealAgent:
    """Analyze denials and generate appeals."""
    
    def __init__(self):
        self.llm = anthropic.Anthropic()
        self.fhir_client = FHIRExtractor(...)
        self.policy_engine = PolicyEngine()
    
    async def analyze_denial(
        self, denial: dict, patient_id: str
    ) -> dict:
        """Classify denial and determine appeal strategy."""
        
        denial_code = denial["errors"][0]["code"]["coding"][0]["code"]
        denial_reason = denial["errors"][0]["code"]["coding"][0]["display"]
        
        # Retrieve additional clinical evidence
        additional_evidence = await self.gather_evidence(
            patient_id, denial_code
        )
        
        # Retrieve peer-reviewed literature
        literature = await self.search_literature(
            denial["procedure_code"],
            denial["diagnosis_code"]
        )
        
        # Generate appeal letter
        appeal = await self.generate_appeal(
            denial, additional_evidence, literature
        )
        
        return appeal
    
    async def generate_appeal(
        self,
        denial: dict,
        evidence: dict,
        literature: list
    ) -> dict:
        """Generate evidence-backed appeal letter."""
        
        literature_refs = "\n".join(
            f"- {lit['title']} ({lit['journal']}, {lit['year']}): {lit['finding']}"
            for lit in literature[:5]
        )
        
        prompt = f"""You are a utilization review specialist
drafting a prior authorization appeal letter.

DENIAL DETAILS:
- Denial reason: {denial['reason']}
- Procedure: {denial['procedure_code']}
  ({denial['procedure_desc']})
- Diagnosis: {denial['diagnosis_code']}
  ({denial['diagnosis_desc']})

CLINICAL EVIDENCE SUPPORTING THE REQUEST:
{json.dumps(evidence['clinical_findings'], indent=2)}

PEER-REVIEWED LITERATURE:
{literature_refs}

PAYER POLICY REQUIREMENTS:
{json.dumps(evidence['policy_requirements'], indent=2)}

Draft a formal appeal letter that:
1. Clearly states the patient's clinical condition
   and functional limitations
2. Addresses each specific denial reason with
   evidence citations
3. References the payer's own policy criteria and
   shows each criterion is met
4. Cites peer-reviewed literature supporting
   medical necessity
5. References relevant CMS NCDs/LCDs if applicable
6. Requests expedited review if clinically urgent
7. Maintains a professional, factual tone

Format as a formal medical letter."""
        
        response = self.llm.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=4096,
            messages=[{"role": "user", "content": prompt}]
        )
        
        return {
            "appeal_letter": response.content[0].text,
            "evidence_cited": evidence,
            "literature_cited": literature,
            "confidence": self.calculate_appeal_confidence(
                evidence, literature
            )
        }

The appeal agent should always flag the generated letter for human review before submission. A physician or utilization review specialist must verify the clinical accuracy of the appeal before it is sent to the payer. For more on designing appropriate human oversight, see our guide on bounded autonomy patterns for healthcare AI agents.

Recommended Tech Stack

Recommended technology stack for AI prior authorization agent showing intelligence data integration and compliance layers

Intelligence Layer

Component	Recommendation	Why
Clinical NLP	Claude API (Sonnet) or GPT-4	Best accuracy on medical text extraction. Claude offers 200K context window for long clinical notes.
Embeddings	text-embedding-3-large (OpenAI)	1536-dim vectors with strong medical vocabulary coverage
Vector Database	Pinecone or Weaviate	Managed service with metadata filtering for payer-specific policy retrieval
Appeal letter generation	Claude API (Opus) or GPT-4	Stronger reasoning for complex clinical argumentation

Data and Integration Layer

Component	Recommendation	Why
FHIR Server	HAPI FHIR R4	Open-source, Java-based, US Core compliant, production-proven
Orchestration API	FastAPI (Python) or Express (Node.js)	Async-first for parallel FHIR queries and LLM calls
State Management	PostgreSQL	ACID transactions for audit logs, request tracking, decision history
Caching	Redis	Session state, FHIR resource caching, rate limiting
Task Queue	Celery + Redis or BullMQ	Async processing for LLM calls, status polling, retries

Compliance and Operations Layer

Component	Recommendation	Why
Cloud Platform	AWS (GovCloud) or GCP with BAA	HIPAA-eligible, SOC 2 certified, BAA available
Secrets Management	HashiCorp Vault or AWS Secrets Manager	Rotate API keys, store encryption keys, manage certificates
Monitoring	Datadog or Grafana + Prometheus	APM, log aggregation, custom dashboards for PA metrics
Deployment	Docker + Kubernetes (EKS/GKE)	Container orchestration with health checks, auto-scaling

Estimated Infrastructure Costs

For a mid-sized healthcare organization processing 500-1,000 prior authorizations per month:

LLM API costs: $800-2,000/month (depends on note length and model choice; Claude Sonnet at approximately $0.30 per complex extraction)
Cloud infrastructure: $1,500-3,000/month (compute, database, storage on HIPAA-eligible tier)
Vector database: $200-500/month (managed Pinecone or Weaviate with 50K+ policy vectors)
Monitoring and tools: $300-500/month (Datadog APM, log management, alerting)
Total: $3,000-6,000/month

Compare this against the manual cost: at 14 minutes per PA request and an average staff cost of $25/hour, 750 PAs per month costs $4,375 in labor alone — before accounting for denials, resubmissions, phone hold times, and appeals. The AMA reports that physicians spend an average of 12 hours per week on prior authorization activities, and 93% report that prior authorization delays access to necessary care. Automation ROI is typically realized within 3-6 months through reduced staff time and faster authorizations.

HIPAA Compliance Architecture

Every component in the pipeline handles Protected Health Information (PHI). Your compliance architecture must address data protection at every boundary, with no exceptions for development or staging environments that contain real patient data.

PHI Protection Layers

Encryption at rest — AES-256 for all databases, file storage, and backups containing PHI. Use envelope encryption with AWS KMS or GCP Cloud KMS for key management.
Encryption in transit — TLS 1.3 for all API communication, including internal service-to-service calls within your Kubernetes cluster. Terminate TLS at the service mesh level (Istio/Linkerd), not just at the ingress.
PHI de-identification for LLM calls — Strip all 18 HIPAA Safe Harbor identifiers before sending clinical text to external LLM APIs. Re-associate after processing using internal reference IDs. This is the most critical control for AI-augmented healthcare systems.
Business Associate Agreements (BAAs) — Required with every vendor that processes PHI: cloud provider, LLM API provider, vector database, monitoring platform. Both Anthropic and OpenAI offer BAAs for enterprise healthcare customers.
Role-Based Access Control (RBAC) — Principle of least privilege. The NLP component should never access billing data. The submission component should never access unrelated patient records. Implement at both the application and database level.
Immutable audit logging — Log every data access, LLM call, and decision with timestamp, user/service identity, and data accessed. Store in append-only storage (S3 with Object Lock or similar). Retain for minimum 6 years per HIPAA requirements.

LLM-Specific Compliance Considerations

When using commercial LLM APIs (Claude, GPT-4) with healthcare data:

Verify BAA coverage — Anthropic and OpenAI offer BAAs for enterprise customers. Confirm your API agreement includes PHI processing rights. Standard developer accounts typically do not include BAA coverage.
Use dedicated endpoints — Enterprise LLM plans typically offer isolated endpoints that do not use customer data for model training. Verify this contractually.
Implement request-level logging — Log every prompt sent to the LLM (after de-identification) and every response received. This is your audit trail for clinical decisions.
Consider on-premise alternatives — For organizations with strict data residency requirements, consider self-hosted models like Llama 3 with medical fine-tuning. The accuracy trade-off is 5-15% versus frontier models, but PHI never leaves your infrastructure.
Implement data retention policies — Ensure LLM providers do not retain your prompts or completions beyond the processing window. Configure zero-retention policies where available.

For a comprehensive treatment of HIPAA compliance patterns for AI agents, see our detailed guide: Building HIPAA-Compliant AI Agents: Architecture Guide for Healthcare.

Human-in-the-Loop Design

An AI prior authorization agent should augment, not replace clinical judgment. The human-in-the-loop design determines when the system operates autonomously and when it requires human review. Getting this boundary right is the difference between a helpful tool and a liability.

Autonomy Levels

Scenario	Autonomy Level	Human Action
Data extraction from EHR	Full automation	None — deterministic FHIR queries
Policy lookup and matching	Full automation	None — rule-based with known policies
Clinical NLP (high confidence > 0.9)	Auto with audit	Spot-check 10% of determinations
Clinical NLP (low confidence < 0.9)	Human review required	Clinician reviews all flagged items
Submission of routine requests	Auto with notification	Staff notified, can intervene within 15 min
Appeal letter generation	Human approval required	Physician must review and sign before submission
Denial decisions	Never automated	Always requires human clinical review

Review Queue Design

Build a review dashboard where clinicians can:

See the AI's evidence extraction side-by-side with the original physician notes for verification
Approve, modify, or reject each checklist determination with one-click actions
Add supplementary evidence the AI may have missed from sources not in the FHIR system
Escalate complex cases to specialist review (peer-to-peer with payer medical director)
Provide structured feedback that improves future extraction accuracy via prompt refinement

The feedback loop is critical for continuous improvement. Track which extractions clinicians override and use that data to improve prompts, adjust confidence thresholds, and update policy matching rules. Over time, the override rate should decrease from the initial 15-20% to below 5% as the system learns from clinician corrections.

Testing Strategy

Prior authorization involves real patient care decisions. Your testing strategy must validate accuracy, reliability, and safety before production deployment. A single incorrect automated denial can delay critical patient care.

Testing Pyramid

Unit tests — FHIR resource parsing, code validation, form generation logic, PHI redaction patterns. Target 80%+ coverage on all deterministic components.
Integration tests — End-to-end pipeline with mock FHIR servers (use HAPI FHIR test server) and recorded LLM responses. Validate Bundle structure, Claim resource compliance with PAS profiles, and ClaimResponse parsing for all three outcome types.
Accuracy tests — Run the NLP component against a labeled dataset of 500+ physician notes with known medical necessity determinations. Track precision, recall, and F1 score per checklist item category. Minimum threshold: F1 > 0.85 before production deployment.
Conformance tests — Use the Da Vinci PAS Inferno Test Kit to validate your FHIR implementation against the official PAS profiles. This is the same tool CMS uses for certification.
Adversarial tests — Submit edge cases: notes with contradictory information, rare procedures without LCD coverage, patients with multiple active coverage plans, notes in multiple languages, incomplete records with missing sections, and cases where the AI should correctly flag "insufficient evidence" rather than force a determination.
Shadow mode deployment — Run the AI agent in parallel with the manual process for 4-8 weeks. Compare the AI's authorization recommendations against actual human-submitted requests and payer decisions. Track: agreement rate with human decisions, cases where AI would have caught errors humans missed, and cases where AI made incorrect determinations. This is the gold standard validation before going live.

Orchestration Framework Recommendations

For coordinating the multi-agent pipeline in production, evaluate these orchestration options based on your team's expertise and scale requirements:

LangGraph — Graph-based state machine for complex multi-step agent workflows. Strong support for conditional branching (approved/pended/denied paths), parallel execution (evidence gathering from multiple sources), and persistent state across retries. Best for teams already in the LangChain ecosystem.
Temporal — Durable execution engine for long-running workflows. Ideal for prior authorization pipelines that may span days (initial submission through pended status through resolution). Built-in retry semantics, timeout handling, and workflow versioning. Best for organizations with complex state management requirements.
Custom FastAPI + Celery — For teams that need full control over the pipeline. Define the stages as Celery tasks with explicit state transitions stored in PostgreSQL. More engineering effort but zero framework lock-in and complete visibility into every decision point.

Whichever framework you choose, ensure it supports: persistent state across service restarts, configurable retry policies per stage, dead letter queues for failed processing, and observability hooks for monitoring every agent decision with full audit trails.

Deployment and Monitoring

Deployment Architecture

Deploy as containerized microservices on Kubernetes for production reliability, horizontal scaling, and rolling updates:

# docker-compose.yml (development)
services:
  orchestrator:
    build: ./services/orchestrator
    ports: ["8080:8080"]
    environment:
      - FHIR_SERVER_URL=http://hapi-fhir:8080/fhir
      - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
      - PINECONE_API_KEY=${PINECONE_API_KEY}
      - DATABASE_URL=postgresql://user:pass@postgres:5432/pa_agent
      - REDIS_URL=redis://redis:6379
    depends_on: [postgres, redis, hapi-fhir]
  
  hapi-fhir:
    image: hapiproject/hapi:latest
    ports: ["8090:8080"]
    environment:
      - hapi.fhir.allow_multiple_delete=true
      - hapi.fhir.us_core_validation=true
  
  postgres:
    image: postgres:16
    environment:
      - POSTGRES_DB=pa_agent
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=pass
    volumes:
      - pgdata:/var/lib/postgresql/data
  
  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

volumes:
  pgdata:

Production Monitoring Metrics

Track these metrics in production to ensure the system operates safely and effectively:

Metric	Target	Alert Threshold
PA request processing time (p95)	< 30 seconds	> 60 seconds
First-pass approval rate	> 85%	< 75% (7-day rolling)
NLP extraction confidence (mean)	> 0.90	< 0.80
Human override rate	< 10%	> 20% (trend alert)
Appeal success rate	> 75%	< 60%
FHIR API error rate	< 0.5%	> 2%
LLM API latency (p95)	< 5 seconds	> 10 seconds
PHI exposure incidents	0	Any (immediate page)

Circuit Breakers and Fallbacks

Implement circuit breakers on all external dependencies to prevent cascading failures:

LLM API down — Queue requests for retry with exponential backoff. Route to manual review queue if SLA breached (3 retries over 5 minutes).
FHIR server unreachable — Cache recent patient data with 15-minute TTL. Alert operations team. Degrade gracefully by presenting cached data to human reviewers.
Payer endpoint unavailable — Queue submissions with exponential backoff. Notify staff for manual submission via payer portal as fallback. Track queue depth for capacity planning.
Vector DB latency spike — Fall back to cached policy lookups for high-volume procedure codes. Alert on cache staleness exceeding 24 hours.

Getting Started: Implementation Roadmap

Building a production AI prior authorization agent is a 3-6 month effort for a team of 2-4 engineers. Here is a phased approach:

Phase 1 (Weeks 1-4): Foundation

Set up FHIR server and EHR connectivity (SMART on FHIR OAuth 2.0 flow)
Build the data extraction layer with the four core FHIR resources
Establish the policy database with 50-100 high-volume LCD/NCD policies
Deploy infrastructure (Kubernetes cluster, PostgreSQL, Redis, monitoring)

Phase 2 (Weeks 5-8): Intelligence

Implement the clinical NLP pipeline with PHI de-identification layer
Build the policy matching engine with vector search and RAG retrieval
Develop the packet assembly component with FHIR Claim resource generation
Create the human review dashboard with one-click approve/reject workflow

Phase 3 (Weeks 9-12): Submission and Appeal

Integrate the Da Vinci PAS $submit operation with test payer endpoints
Build the status tracking and notification system with FHIR Subscriptions
Implement the appeal agent with denial root cause analysis and letter generation
Run PAS Inferno conformance tests and fix profile compliance issues

Phase 4 (Weeks 13-20): Validation and Launch

Shadow mode testing (4-6 weeks) comparing AI vs. manual outcomes side-by-side
Accuracy benchmarking against labeled clinical datasets (minimum F1 > 0.85)
Security audit, penetration testing, and HIPAA compliance review
Gradual production rollout starting with high-volume, low-complexity procedure categories

Start with the highest-volume prior authorization categories at your organization — imaging studies, specialty drugs, and surgical procedures typically represent 70%+ of PA volume — to maximize early ROI and build organizational confidence in the system.

For foundational knowledge on building the FHIR integration layer, see our FHIR US Core Implementation Guide. For the complete prior authorization regulatory landscape including CMS-0057-F requirements and compliance timelines, review our Prior Authorization Automation Technical Guide.

Frequently Asked Questions

What FHIR resources are needed to build an AI prior authorization agent?

A prior authorization agent requires four core FHIR R4 resources: Patient (demographics, identifiers), Condition (ICD-10 diagnosis codes, clinical status), Procedure (CPT codes, procedure details), and MedicationRequest (medication codes, dosage instructions). These are assembled into a Claim resource following the Da Vinci PAS (Prior Authorization Support) Implementation Guide, bundled with Coverage and Organization resources, and submitted via the $submit operation to the payer endpoint.

How do LLMs like Claude or GPT-4 improve prior authorization accuracy?

LLMs improve prior authorization in three key ways: Clinical NLP extracts relevant diagnoses, procedures, and medical necessity justifications from unstructured physician notes with 86%+ accuracy. Policy matching uses RAG (Retrieval-Augmented Generation) to match clinical evidence against payer-specific LCD/NCD policies stored in a vector database. Appeal generation analyzes denial reasons, gathers supporting evidence, and drafts appeal letters with proper clinical citations. Research from the University of Illinois shows multi-agent LLM systems achieve 95.6% accuracy on overall medical necessity checklist determinations.

What is the Da Vinci PAS $submit operation and how does it work?

The Da Vinci PAS (Prior Authorization Support) $submit operation is a FHIR R4 API endpoint defined by HL7 that enables electronic prior authorization submission directly from EHR systems. Providers create a PAS Request Bundle containing a Claim resource (with patient, coverage, procedure, and diagnosis information), POST it to [payer-base-url]/Claim/$submit , and receive a ClaimResponse with the authorization decision. The operation supports three response types: approved (immediate authorization), pended (additional information needed), and denied (with reason codes). It replaces traditional fax-and-phone workflows and X12 278 transactions with a standardized API.

How do you ensure HIPAA compliance in an AI prior authorization system?

HIPAA compliance requires a multi-layered approach: Data encryption with AES-256 at rest and TLS 1.3 in transit for all PHI. Access controls using RBAC with the principle of least privilege. BAA agreements with every vendor that touches PHI (LLM providers, cloud hosts, vector databases). PHI de-identification by stripping 18 Safe Harbor identifiers before sending data to external LLM APIs using real-time redaction. Audit logging of every decision, data access, and LLM interaction with immutable logs. Human-in-the-loop review for all denial decisions and low-confidence LLM outputs. Deploy on HIPAA-eligible cloud services (AWS GovCloud, Azure Healthcare, GCP with BAA).

What tech stack is recommended for building a production AI prior authorization agent?

A production-grade tech stack includes: LLM layer -- Claude API or GPT-4 for clinical NLP, with text-embedding-3-large for embeddings and Pinecone or Weaviate as vector DB for policy documents. Integration layer -- HAPI FHIR Server (R4) for FHIR operations, FastAPI or Node.js for orchestration, PostgreSQL for state management and audit logs, Redis for caching. Compliance layer -- AWS or GCP with BAA, HashiCorp Vault for secrets management, Datadog or Grafana for monitoring. Deployment -- Docker containers on Kubernetes with health checks, circuit breakers, and auto-scaling. Total infrastructure cost runs $3,000-6,000/month depending on volume, with ROI typically within 3-6 months through reduced staff time and faster authorizations.

Was this article helpful?

Your feedback helps us improve our content.