Designing Agent Guardrails for Clinical Safety: How to Prevent Your Healthcare AI From Recommending the Wrong Drug

Upcoming Webinar

Why Digital Infrastructure Is the Biggest Bottleneck in Pharma Innovation

May 8, 2026

5:00 PM IST

Live On MS Team

March 16, 2026

15 min read

AI & MLHealthcarePatient Safety

An AI Medication Suggestion Gone Wrong Is Not a Hallucination. It Is a Patient Safety Event.

In January 2025, a hospital in the Midwest ran a pilot program: an LLM-powered clinical decision support tool that suggested medications based on patient history and chief complaint. Within three weeks, the tool made three errors that ended the pilot immediately.

The first: it suggested amoxicillin for a patient with a documented penicillin allergy. The allergy was recorded in the EHR, but the agent processed the note as free text and missed it. The second: it recommended an adult dose of metformin for a 9-year-old with Type 2 diabetes — 2,000mg instead of the weight-adjusted 500mg. The third: it suggested adding ciprofloxacin to a patient already on warfarin, a well-known interaction that increases bleeding risk by 2-3x (Baillargeon et al., 2012).

None of these were exotic edge cases. They were the kind of mistakes a second-year pharmacy student would catch. But the LLM had no guardrails — it generated free-text suggestions, and the only check was a busy clinician glancing at the output between patients.

If you are building clinical-facing AI agents that touch medication decisions, this article is your engineering blueprint. We will walk through a six-layer guardrail architecture that makes it structurally difficult for your agent to recommend a dangerous medication. Not impossible — nothing is impossible to break — but difficult enough that failures become auditable, catchable, and rare.

Six-layer clinical AI medication safety guardrail architecture showing structured output, drug interaction checks, dosage validation, allergy cross-reference, hardcoded rules, and confidence gating

Layer 1: Output Schema Enforcement — Kill Free Text

The single most impactful change you can make is eliminating free-text medication output entirely. When your agent returns "I'd suggest starting amoxicillin, maybe around 500mg three times a day," you have lost the ability to programmatically validate anything. "Maybe around 500mg" is not a dosage — it is a liability.

Force every medication suggestion through a strict schema. If the LLM cannot fill every required field, the recommendation fails safely — it does not reach the clinician.

Comparison diagram showing free text medication output versus structured output with RxNorm coding — structured output enables programmatic validation

from pydantic import BaseModel, Field, validator
from enum import Enum
from typing import Optional

class RouteOfAdministration(str, Enum):
    ORAL = "oral"
    IV = "intravenous"
    IM = "intramuscular"
    SC = "subcutaneous"
    TOPICAL = "topical"
    INHALED = "inhaled"
    RECTAL = "rectal"
    SUBLINGUAL = "sublingual"

class FrequencyCode(str, Enum):
    QD = "QD"      # Once daily
    BID = "BID"    # Twice daily
    TID = "TID"    # Three times daily
    QID = "QID"    # Four times daily
    Q4H = "Q4H"    # Every 4 hours
    Q6H = "Q6H"    # Every 6 hours
    Q8H = "Q8H"    # Every 8 hours
    Q12H = "Q12H"  # Every 12 hours
    PRN = "PRN"    # As needed
    ONCE = "ONCE"  # Single dose

class MedicationRecommendation(BaseModel):
    """Structured medication recommendation."""
    drug_name: str = Field(..., description="Generic drug name")
    rxnorm_cui: str = Field(..., pattern=r"^\d{4,7}$",
                            description="RxNorm Concept Unique Identifier")
    dose_value: float = Field(..., gt=0, description="Numeric dose value")
    dose_unit: str = Field(..., pattern=r"^(mg|mcg|g|mL|units|mEq)$")
    route: RouteOfAdministration
    frequency: FrequencyCode
    indication: str = Field(..., min_length=10,
                            description="Clinical indication for this medication")
    duration_days: Optional[int] = Field(None, gt=0, le=365)
    requires_renal_adjustment: bool = False
    requires_hepatic_adjustment: bool = False
    confidence_score: float = Field(..., ge=0.0, le=1.0)

    @validator("rxnorm_cui")
    def validate_rxnorm_format(cls, v):
        if not v.isdigit() or len(v) < 4:
            raise ValueError(f"Invalid RxNorm CUI format: {v}")
        return v

This is not optional polish. This is the foundation that makes every subsequent layer possible. Without a structured CUI, you cannot check drug interactions. Without a numeric dose, you cannot validate ranges. Without a coded route, you cannot catch "oral methotrexate prescribed as IV" errors.

Use your LLM framework's structured output mode — OpenAI's response_format, Anthropic's tool use, or LangChain's PydanticOutputParser — to enforce this at the generation layer, not as a post-hoc parse.

Layer 2: Drug-Drug Interaction Checking via RxNorm

Once you have a structured RxNorm CUI, you can check it against the patient's current medication list using the NLM Drug Interaction API. This is a free, authoritative, FDA-backed data source. There is no reason to skip this step.

API integration diagram showing RxNorm Drug Interaction API flow from agent output through NLM API to severity-classified interaction response

import httpx
from dataclasses import dataclass
from typing import Optional

@dataclass
class DrugInteraction:
    severity: str          # "high", "moderate", "low"
    description: str
    source: str
    drug_a: str
    drug_b: str

async def check_drug_interactions(
    new_drug_cui: str,
    current_med_cuis: list[str]
) -> list[DrugInteraction]:
    """
    Check a proposed medication against the patient's current
    medication list using the NLM Drug Interaction API.
    """
    interactions = []
    all_cuis = [new_drug_cui] + current_med_cuis
    cui_param = "+".join(all_cuis)

    url = (
        "https://rxnav.nlm.nih.gov/REST/interaction/"
        f"list.json?rxcuis={cui_param}"
    )

    async with httpx.AsyncClient(timeout=10.0) as client:
        resp = await client.get(url)
        resp.raise_for_status()
        data = resp.json()

    full_interactions = data.get("fullInteractionTypeGroup", [])
    for group in full_interactions:
        source = group.get("sourceName", "unknown")
        for itype in group.get("fullInteractionType", []):
            for pair in itype.get("interactionPair", []):
                severity = pair.get("severity", "N/A").lower()
                description = pair.get("description", "")
                concepts = pair.get("interactionConcept", [])
                drug_names = [
                    c.get("minConceptItem", {}).get("name", "unknown")
                    for c in concepts
                ]
                interactions.append(DrugInteraction(
                    severity=severity,
                    description=description,
                    source=source,
                    drug_a=drug_names[0] if drug_names else "unknown",
                    drug_b=drug_names[1] if len(drug_names) > 1 else "unknown",
                ))
    return interactions


def evaluate_interaction_risk(
    interactions: list[DrugInteraction]
) -> tuple[str, list[DrugInteraction]]:
    """
    Returns a decision: 'block', 'warn', or 'pass',
    plus the relevant interactions.
    """
    high = [i for i in interactions if i.severity == "high"]
    moderate = [i for i in interactions if i.severity == "moderate"]

    if high:
        return "block", high
    elif moderate:
        return "warn", moderate
    else:
        return "pass", []

A critical implementation detail: the RxNorm API uses specific CUI formats. "Warfarin" is RxCUI 11289, but "Warfarin Sodium 5mg Oral Tablet" is RxCUI 855332. Use the RxNorm name lookup endpoint to resolve the correct CUI for your agent's output, and check interactions at both the ingredient level and the specific product level.

Layer 3: Dosage Range Validation

LLMs are notoriously bad at numbers. A model that correctly identifies metformin as the right drug for Type 2 diabetes may suggest 5,000mg daily — a dose that would cause severe lactic acidosis. Dosage validation is non-negotiable.

Build a dosage reference table from DailyMed (FDA-approved labeling data) and flag anything outside 0.5x to 2x the standard adult range. For pediatric patients, require weight-based dosing calculations.

from dataclasses import dataclass
from typing import Optional

@dataclass
class DosageRange:
    drug_name: str
    rxnorm_cui: str
    min_dose_mg: float          # Minimum single dose
    max_dose_mg: float          # Maximum single dose
    max_daily_mg: float         # Maximum total daily dose
    pediatric_mg_per_kg: Optional[float] = None
    requires_renal_adjustment: bool = False
    requires_weight_based: bool = False
    source: str = "DailyMed FDA Label"

# Reference table (production: load from database)
DOSAGE_RANGES: dict[str, DosageRange] = {
    "311670": DosageRange(   # Metformin 500mg
        drug_name="Metformin", rxnorm_cui="311670",
        min_dose_mg=500, max_dose_mg=1000, max_daily_mg=2550,
        pediatric_mg_per_kg=None,
        requires_renal_adjustment=True,
    ),
    "308182": DosageRange(   # Amoxicillin 500mg
        drug_name="Amoxicillin", rxnorm_cui="308182",
        min_dose_mg=250, max_dose_mg=500, max_daily_mg=3000,
        pediatric_mg_per_kg=25.0,  # 25-45 mg/kg/day
        requires_weight_based=True,
    ),
    "855332": DosageRange(   # Warfarin 5mg
        drug_name="Warfarin", rxnorm_cui="855332",
        min_dose_mg=1, max_dose_mg=10, max_daily_mg=10,
    ),
}

@dataclass
class DosageValidationResult:
    is_valid: bool
    decision: str        # "pass", "warn", "block"
    reason: Optional[str] = None
    suggested_range: Optional[str] = None

def validate_dosage(
    rxnorm_cui: str,
    proposed_dose_mg: float,
    frequency: str,
    patient_weight_kg: Optional[float] = None,
    patient_age_years: Optional[float] = None,
    egfr: Optional[float] = None,
) -> DosageValidationResult:
    """Validate proposed dosage against known safe ranges."""
    ref = DOSAGE_RANGES.get(rxnorm_cui)
    if not ref:
        return DosageValidationResult(
            is_valid=False, decision="warn",
            reason=f"No dosage reference for RxCUI {rxnorm_cui}. "
                   f"Requires manual pharmacist review.",
        )

    freq_multipliers = {
        "QD": 1, "BID": 2, "TID": 3, "QID": 4,
        "Q4H": 6, "Q6H": 4, "Q8H": 3, "Q12H": 2,
        "ONCE": 1, "PRN": 4,
    }
    multiplier = freq_multipliers.get(frequency, 1)
    daily_dose = proposed_dose_mg * multiplier

    # Single dose exceeds 2x maximum
    if proposed_dose_mg > ref.max_dose_mg * 2:
        return DosageValidationResult(
            is_valid=False, decision="block",
            reason=f"Single dose {proposed_dose_mg}mg exceeds 2x max "
                   f"({ref.max_dose_mg}mg) for {ref.drug_name}",
            suggested_range=f"{ref.min_dose_mg}-{ref.max_dose_mg}mg",
        )

    # Daily dose exceeds maximum
    if daily_dose > ref.max_daily_mg:
        return DosageValidationResult(
            is_valid=False, decision="block",
            reason=f"Daily dose {daily_dose}mg exceeds max daily "
                   f"({ref.max_daily_mg}mg) for {ref.drug_name}",
        )

    # Pediatric: require weight-based dosing
    if patient_age_years and patient_age_years < 18:
        if ref.requires_weight_based and not patient_weight_kg:
            return DosageValidationResult(
                is_valid=False, decision="block",
                reason=f"Pediatric patient requires weight-based "
                       f"dosing for {ref.drug_name}.",
            )
        if ref.pediatric_mg_per_kg and patient_weight_kg:
            max_ped = ref.pediatric_mg_per_kg * patient_weight_kg
            if daily_dose > max_ped * 1.5:
                return DosageValidationResult(
                    is_valid=False, decision="block",
                    reason=f"Pediatric daily dose {daily_dose}mg "
                           f"exceeds weight-based max "
                           f"({max_ped:.0f}mg for "
                           f"{patient_weight_kg}kg patient)",
                )

    # Renal adjustment required
    if ref.requires_renal_adjustment and egfr and egfr < 30:
        return DosageValidationResult(
            is_valid=False, decision="block",
            reason=f"{ref.drug_name} requires renal dose adjustment. "
                   f"eGFR is {egfr} (< 30).",
        )

    return DosageValidationResult(is_valid=True, decision="pass")

In production, load your dosage reference from DailyMed's structured product label (SPL) data or the FDA Drugs@FDA database. The hardcoded table above is for illustration — a production system needs thousands of entries, updated quarterly.

Layer 4: Allergy Cross-Reference via FHIR AllergyIntolerance

Allergy checking seems simple until you realize patients rarely have allergies coded to specific RxNorm CUIs. A patient's allergy list might say "penicillin" (the drug class), not "amoxicillin" (the specific drug). Your system must handle three levels of matching: exact drug match, drug class match, and known cross-reactivity.

Allergy checking decision tree showing three levels: exact drug match, drug class match, and cross-reactivity checking with block, warn, and pass outcomes

import httpx
from dataclasses import dataclass

# Drug class membership: RxNorm ingredient CUIs to classes
DRUG_CLASS_MAP: dict[str, list[str]] = {
    "penicillins": ["723", "733", "7984", "1043553"],
    "cephalosporins": ["2231", "2348", "20481", "2193"],
    "fluoroquinolones": ["2551", "82122", "140587"],
    "sulfonamides": ["10831", "36437"],
    "nsaids": ["5640", "161", "7646"],
    "statins": ["36567", "301542", "83367"],
}

# Known cross-reactivity between drug classes
CROSS_REACTIVITY: dict[str, list[str]] = {
    "penicillins": ["cephalosporins"],   # ~2% cross-reactivity
    "cephalosporins": ["penicillins"],
    "sulfonamides": [],
}

@dataclass
class AllergyCheckResult:
    decision: str         # "block", "warn", "pass"
    match_type: str       # "exact", "class", "cross_reactivity", "none"
    details: str
    allergy_source: str

def get_drug_classes(rxnorm_cui: str) -> list[str]:
    """Find which drug classes a given CUI belongs to."""
    return [cls for cls, members in DRUG_CLASS_MAP.items()
            if rxnorm_cui in members]

async def fetch_patient_allergies(
    fhir_base: str, patient_id: str, auth_token: str,
) -> list[dict]:
    """Fetch active AllergyIntolerance resources from FHIR."""
    url = (
        f"{fhir_base}/AllergyIntolerance"
        f"?patient={patient_id}&clinical-status=active"
    )
    async with httpx.AsyncClient(timeout=10.0) as client:
        resp = await client.get(
            url, headers={"Authorization": f"Bearer {auth_token}",
                          "Accept": "application/fhir+json"}
        )
        resp.raise_for_status()
        bundle = resp.json()

    allergies = []
    for entry in bundle.get("entry", []):
        resource = entry.get("resource", {})
        coding = resource.get("code", {}).get("coding", [{}])[0]
        allergies.append({
            "display": coding.get("display", ""),
            "code": coding.get("code", ""),
            "system": coding.get("system", ""),
            "text": resource.get("code", {}).get("text", ""),
        })
    return allergies

def check_allergy_match(
    drug_rxnorm_cui: str,
    drug_name: str,
    patient_allergies: list[dict],
) -> AllergyCheckResult:
    """
    Three-level allergy check:
    1. Exact match (specific drug CUI)
    2. Drug class match (e.g., penicillin class)
    3. Cross-reactivity (e.g., cephalosporin + penicillin allergy)
    """
    drug_classes = get_drug_classes(drug_rxnorm_cui)

    for allergy in patient_allergies:
        allergy_code = allergy.get("code", "")
        allergy_text = (
            allergy.get("display", "") or allergy.get("text", "")
        ).lower()

        # Level 1: Exact CUI match
        if allergy_code == drug_rxnorm_cui:
            return AllergyCheckResult(
                decision="block", match_type="exact",
                details=f"Patient has documented allergy to "
                        f"{drug_name} (exact RxNorm match)",
                allergy_source=allergy_text,
            )

        # Level 2: Drug class match
        allergy_classes = get_drug_classes(allergy_code)
        for cls in DRUG_CLASS_MAP:
            if cls in allergy_text:
                allergy_classes.append(cls)

        overlap = set(drug_classes) & set(allergy_classes)
        if overlap:
            return AllergyCheckResult(
                decision="block", match_type="class",
                details=f"{drug_name} belongs to class "
                        f"{list(overlap)} — patient allergic "
                        f"to {allergy_text}",
                allergy_source=allergy_text,
            )

        # Level 3: Cross-reactivity
        for allergy_cls in allergy_classes:
            cross = CROSS_REACTIVITY.get(allergy_cls, [])
            cross_overlap = set(drug_classes) & set(cross)
            if cross_overlap:
                return AllergyCheckResult(
                    decision="warn",
                    match_type="cross_reactivity",
                    details=f"{drug_name} has known "
                            f"cross-reactivity with "
                            f"{allergy_cls} (~2% rate). "
                            f"Patient allergic to {allergy_text}.",
                    allergy_source=allergy_text,
                )

    return AllergyCheckResult(
        decision="pass", match_type="none",
        details="No allergy matches found",
        allergy_source="",
    )

The FHIR AllergyIntolerance resource is your source of truth here. If your health system uses a different EHR data model, map it to FHIR first — this standardization is what makes the class-level matching possible. For a deeper dive on building FHIR-based clinical data pipelines, see our guide on FHIR for AI/ML pipelines.

Layer 5: The "Never Do" List — Hardcoded Rules Outside the LLM

Some rules should never be subject to LLM judgment. These are deterministic, non-negotiable safety rules that exist in code, not in prompts. No amount of prompt engineering makes these safe to delegate to a language model.

Hardcoded safety rules reference card showing six never-do rules for clinical AI agents including controlled substances, insulin dosing, anticoagulant changes, pediatric dosing, off-label use, and chemotherapy

from dataclasses import dataclass
from typing import Optional

@dataclass
class HardRuleResult:
    blocked: bool
    rule_name: str
    reason: str
    requires_specialist: Optional[str] = None
    requires_human_review: bool = True

# DEA Schedule II controlled substances (partial list)
SCHEDULE_II_CUIS = {
    "7052",    # Morphine
    "5489",    # Hydrocodone
    "7804",    # Oxycodone
    "6813",    # Methylphenidate
    "723",     # Amphetamine
    "3423",    # Fentanyl
}

INSULIN_CUIS = {
    "5856", "253182", "261542", "274783", "400008",
}

ANTICOAGULANT_CUIS = {
    "11289", "1364430", "1114195", "1232082", "67108",
}

CHEMOTHERAPY_CUIS = {
    "3002", "12574", "5101", "51499", "224905",
}

def apply_hard_rules(
    rxnorm_cui: str,
    drug_name: str,
    patient_age_years: Optional[float] = None,
    patient_weight_kg: Optional[float] = None,
    is_off_label: bool = False,
    action: str = "prescribe",
) -> Optional[HardRuleResult]:
    """Non-negotiable safety rules in application code."""

    if rxnorm_cui in SCHEDULE_II_CUIS:
        return HardRuleResult(
            blocked=True,
            rule_name="controlled_substance_gate",
            reason=f"{drug_name} is Schedule II. Requires "
                   f"physician review and DEA authorization.",
            requires_specialist="attending_physician",
        )

    if rxnorm_cui in INSULIN_CUIS and action == "modify":
        return HardRuleResult(
            blocked=True,
            rule_name="insulin_modification_gate",
            reason=f"Insulin dosing modification for {drug_name} "
                   f"requires endocrinologist review.",
            requires_specialist="endocrinologist",
        )

    if rxnorm_cui in ANTICOAGULANT_CUIS and action == "discontinue":
        return HardRuleResult(
            blocked=True,
            rule_name="anticoagulant_discontinuation_block",
            reason=f"Cannot recommend discontinuing {drug_name}. "
                   f"Requires hematology/cardiology review.",
            requires_specialist="hematologist",
        )

    if patient_age_years and patient_age_years < 18 \
       and not patient_weight_kg:
        return HardRuleResult(
            blocked=True,
            rule_name="pediatric_weight_required",
            reason=f"Pediatric patient ({patient_age_years}y) "
                   f"requires documented weight for dosing.",
        )

    if is_off_label:
        return HardRuleResult(
            blocked=True,
            rule_name="off_label_disclosure_required",
            reason=f"{drug_name} off-label use requires "
                   f"disclosure and physician authorization.",
            requires_specialist="attending_physician",
        )

    if rxnorm_cui in CHEMOTHERAPY_CUIS:
        return HardRuleResult(
            blocked=True,
            rule_name="chemotherapy_block",
            reason=f"{drug_name} is a chemotherapy agent. "
                   f"Requires oncology protocol and BSA calc.",
            requires_specialist="oncologist",
        )

    return None  # No hard rules triggered

These rules are not in the prompt. They are not "guidelines" the LLM is asked to follow. They are if statements in your application code that run after every single recommendation. The LLM does not get a vote.

This pattern is directly relevant to why human-in-the-loop is the entire point of healthcare AI — certain decisions should never be fully automated, regardless of model accuracy.

Layer 6: Confidence Gating — When in Doubt, Route to a Human

Even after all five layers pass, the agent's own confidence matters. If the LLM is uncertain about its recommendation — low logit scores, hedging language in its reasoning chain, or a clinical scenario outside its training distribution — the recommendation should not reach the clinician as a suggestion. It should be routed to the pharmacy team for manual review.

@dataclass
class ConfidenceGateResult:
    passed: bool
    confidence: float
    action: str          # "show", "flag", "suppress"
    reason: Optional[str] = None

def apply_confidence_gate(
    recommendation: MedicationRecommendation,
    show_threshold: float = 0.85,
    flag_threshold: float = 0.70,
) -> ConfidenceGateResult:
    """
    Three-tier confidence gating:
    >= 0.85: Show recommendation to clinician
    0.70-0.84: Show with prominent uncertainty flag
    < 0.70: Suppress and route to pharmacist review
    """
    score = recommendation.confidence_score

    if score >= show_threshold:
        return ConfidenceGateResult(
            passed=True, confidence=score, action="show"
        )
    elif score >= flag_threshold:
        return ConfidenceGateResult(
            passed=True, confidence=score, action="flag",
            reason=f"Confidence {score:.0%} below standard threshold."
        )
    else:
        return ConfidenceGateResult(
            passed=False, confidence=score, action="suppress",
            reason=f"Confidence {score:.0%} below minimum "
                   f"({flag_threshold:.0%}). Routing to pharmacist."
        )

Putting It All Together: The Guardrail Chain

Here is the complete pipeline that chains all six layers. Every medication recommendation runs through this before it reaches a clinician's screen.

Complete medication safety guardrail pipeline flowchart showing an Amoxicillin recommendation being blocked at the allergy check stage after passing schema validation, interaction check, and dosage validation

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class GuardrailResult:
    safe: bool
    decision: str               # "approved", "flagged", "blocked"
    blocked_at_layer: Optional[str] = None
    details: list[str] = field(default_factory=list)
    requires_human_review: bool = False
    specialist_required: Optional[str] = None

async def run_medication_guardrails(
    recommendation: MedicationRecommendation,
    patient_id: str,
    current_med_cuis: list[str],
    patient_allergies: list[dict],
    patient_age_years: Optional[float] = None,
    patient_weight_kg: Optional[float] = None,
    egfr: Optional[float] = None,
) -> GuardrailResult:
    """Six-layer medication safety guardrail chain."""
    result = GuardrailResult(safe=True, decision="approved")

    # Layer 1: Schema (enforced by Pydantic at parse time)
    result.details.append(
        f"L1 PASS: {recommendation.drug_name} "
        f"{recommendation.dose_value}{recommendation.dose_unit} "
        f"{recommendation.frequency.value} "
        f"(RxCUI: {recommendation.rxnorm_cui})"
    )

    # Layer 2: Drug-drug interactions
    interactions = await check_drug_interactions(
        recommendation.rxnorm_cui, current_med_cuis
    )
    decision, flagged = evaluate_interaction_risk(interactions)
    if decision == "block":
        result.safe = False
        result.decision = "blocked"
        result.blocked_at_layer = "drug_interaction"
        result.details.append(
            f"L2 BLOCK: {flagged[0].description}"
        )
        return result
    result.details.append(f"L2 PASS: {len(interactions)} checked")

    # Layer 3: Dosage validation
    dosage = validate_dosage(
        recommendation.rxnorm_cui,
        recommendation.dose_value,
        recommendation.frequency.value,
        patient_weight_kg=patient_weight_kg,
        patient_age_years=patient_age_years,
        egfr=egfr,
    )
    if dosage.decision == "block":
        result.safe = False
        result.decision = "blocked"
        result.blocked_at_layer = "dosage_validation"
        result.details.append(f"L3 BLOCK: {dosage.reason}")
        return result
    result.details.append("L3 PASS: Dosage within range")

    # Layer 4: Allergy cross-reference
    allergy = check_allergy_match(
        recommendation.rxnorm_cui,
        recommendation.drug_name,
        patient_allergies,
    )
    if allergy.decision == "block":
        result.safe = False
        result.decision = "blocked"
        result.blocked_at_layer = "allergy_check"
        result.details.append(f"L4 BLOCK: {allergy.details}")
        return result
    if allergy.decision == "warn":
        result.details.append(f"L4 WARN: {allergy.details}")
        result.requires_human_review = True
    else:
        result.details.append("L4 PASS: No allergy matches")

    # Layer 5: Hard rules
    hard = apply_hard_rules(
        recommendation.rxnorm_cui,
        recommendation.drug_name,
        patient_age_years=patient_age_years,
        patient_weight_kg=patient_weight_kg,
    )
    if hard:
        result.safe = False
        result.decision = "blocked"
        result.blocked_at_layer = hard.rule_name
        result.details.append(f"L5 BLOCK: {hard.reason}")
        result.specialist_required = hard.requires_specialist
        return result
    result.details.append("L5 PASS: No hard rules triggered")

    # Layer 6: Confidence gate
    conf = apply_confidence_gate(recommendation)
    if not conf.passed:
        result.safe = False
        result.decision = "blocked"
        result.blocked_at_layer = "confidence_gate"
        result.details.append(f"L6 BLOCK: {conf.reason}")
        return result
    if conf.action == "flag":
        result.details.append(f"L6 FLAG: {conf.reason}")
        result.requires_human_review = True
    else:
        result.details.append(
            f"L6 PASS: Confidence {conf.confidence:.0%}"
        )

    if result.requires_human_review:
        result.decision = "flagged"
    return result

Deployment Considerations

This guardrail chain adds latency — roughly 200-400ms per recommendation (dominated by the RxNorm API call). In a clinical workflow where a physician is waiting for suggestions, this is acceptable. Medication safety is not a use case where you optimize for speed over correctness.

Three additional production concerns worth addressing:

Logging and Auditability

Every guardrail evaluation must be logged with the full chain result — which layers passed, which warned, which blocked, and why. This is not optional. Under the 2026 HIPAA Security Rule, AI-assisted clinical decisions require audit trails. Build structured logging from day one, not as a compliance afterthought. For a deeper discussion of how agent audit trails simplify HIPAA compliance, see our analysis of agent audit trails for HIPAA.

Keeping Reference Data Current

The RxNorm API is updated monthly. Your dosage reference table must track FDA label changes. Your drug class membership maps need updating when new drugs are approved. Build a quarterly refresh cycle for all reference data, and subscribe to the RxNorm release notifications.

Testing with Adversarial Cases

Your test suite should include every failure mode described in this article: a drug the patient is allergic to, a pediatric overdose, a contraindicated combination, a controlled substance, a low-confidence recommendation. If your guardrails pass all of these without any blocks, your test suite is too weak. Real clinical data will always surface edge cases you did not anticipate — your test suite should be growing continuously.

Building Clinical AI That Earns Trust

The six-layer architecture described here is not theoretical. It is the minimum viable safety layer for any AI agent that touches medication decisions. Every layer can be implemented with open, freely available tools — RxNorm, DailyMed, FHIR AllergyIntolerance — and none of them require a PhD in machine learning. They require software engineering discipline applied to a domain where errors have consequences measured in patient harm, not conversion rate drops.

At Nirmitee, we build healthcare software where agent orchestration and clinical safety are engineering problems, not afterthoughts. If you are building clinical-facing AI and need the kind of guardrail infrastructure described here — integrated with FHIR, tested against real clinical scenarios, and designed for regulatory compliance — we should talk.

Ready to deploy AI agents in your healthcare workflows? Explore our Agentic AI for Healthcare services to see what autonomous automation can do. We also offer specialized Healthcare Software Product Development services. Talk to our team to get started.

Was this article helpful?

Your feedback helps us improve our content.

USA Office - Elintex Technologies Inc.

India Office - Elintex Technologies Pvt. Ltd.