Two AI agents can use the same model, framework, and toolchain — and behave completely differently because one runs in real-time and the other asynchronously. The choice between real-time and async is one of the most consequential architectural decisions in any healthcare AI agent design, and it's also one of the least discussed in vendor literature.
This blog walks through the difference, where each fits in healthcare, and how to decide. For broader context on agent design, see our pillar on AI Agents in Healthcare.
What "Real-Time" and "Asynchronous" Actually Mean
Real-time agents sit in the human's attention loop. The user is waiting. Latency budget is typically 500 ms to 5 seconds. The agent must produce predictable, fast responses — fewer tool calls per turn, smaller reasoning depth, aggressive caching.
Asynchronous agents work in the background. The user submits a task and comes back later. Latency budget is minutes to hours. The agent can do deep reasoning, retry failed tool calls, escalate to humans, and re-plan. The trade-off is the user doesn't see anything happen — which has its own UX implications.
Healthcare Workflows That Are Real-Time by Nature
Some workflows have to be real-time because a human is waiting:
- Clinical decision support — physician in the chart, expecting alerts inline.
- Ambient documentation — agent keeps up with the live visit.
- Patient-facing chat — symptom triage, intake, FAQs.
- Front-desk eligibility check — patient at the counter, staff needs an answer now.
- Provider inbox suggestions — suggestion has to surface inline.
For these, design for latency first. Smaller models on the critical path, tool calls pre-warmed, and a clear contract for what the agent won't do in real-time (escalate to async, return "I'll get back to you").
Healthcare Workflows That Are Async by Nature
Other workflows have no human waiting:
- Prior authorization end-to-end — submission, follow-up, appeal.
- Claims denial work — read the denial, draft the appeal.
- Referral processing — parse the inbound document, identify the patient, schedule.
- Risk adjustment coding suggestions — overnight panel processing.
- Population-health outreach — identify gaps in care, draft messages.
For these, optimise for thoroughness and reliability over latency. The agent can run for minutes, do dozens of tool calls, retry failures, and escalate when needed. Cost per task is higher; cost per outcome is lower. See top use cases of AI agents in healthcare for more.
The Hybrid Pattern
The interesting agents — and most production ones — are hybrids. Real-time front end handles the human-facing interaction. Async back end does the heavy lifting.
Patient intake is the canonical example. Real-time: the agent talks to the patient and collects answers. Async: the agent verifies insurance, checks for prior visits, drafts the chart note, queues the visit — all without the patient waiting. The real-time agent says "thanks, we're getting everything ready." The async agent does the work. The multi-agent architecture pattern formalises how these layers coordinate.
Architectural Implications
The choice ripples through every layer:
- Model choice — real-time leans cheaper/faster; async can afford frontier models.
- Tool design — real-time tools need predictable latency budgets and timeout behaviour. Async tools can be long-running and retried.
- Memory access — real-time can't tolerate slow memory lookups. Async can re-hydrate aggressively.
- Failure handling — real-time degrades gracefully ("I'll get back to you"). Async can pause, escalate, retry.
- Observability — real-time wants p95 latency dashboards. Async wants completion-rate and time-to-resolution.
The Cost Conversation
Real-time agents cost less per call but more per outcome — they often punt to a human or a second async agent. Async agents cost more per call but less per outcome — they can chase a problem to completion. The right comparison is per-outcome, not per-call.
This is why "use cheaper models everywhere" rarely lands. The cheap model in real-time may require an expensive async cleanup. The frontier model in async may save five expensive human-handled exceptions. Architecture beats unit economics.
How to Decide
Two questions, asked early:
- Is there a human waiting? Yes → real-time. No → async.
- How bad is "I'll get back to you" for UX? Acceptable → async with real-time acknowledgement. Unacceptable → real-time architecture.
Real-World Example
Multiple vendor case studies — including publicly-disclosed deployments at Nabla, Suki, Abridge, and DeepScribe — show ambient documentation operating in real-time (millisecond-level transcription, sub-second clinical entity extraction). In parallel, vendors like Cohere Health and Olive AI have published architectures for prior-authorization workflows that are predominantly async, running for minutes or hours per case. The same underlying agentic patterns apply; the architecture differs because the workflow demands it.
Common Pitfalls in Real-Time vs Async Design
Three mistakes show up most often in production:
- Making everything real-time by default. Teams over-index on responsiveness because real-time feels modern. The result is a slow, expensive agent that times out under network pressure and burns inference budget on tasks that didn't need to be synchronous. Default to async unless the workflow demands real-time.
- Making everything async by default. The opposite trap. UX feels broken because nothing happens immediately. Users abandon. Even when the actual work is async, the user-facing layer needs a real-time acknowledgement — "got it, processing, we'll notify you in 2 minutes."
- Not designing the hand-off. Hybrid architectures have a real-time front and an async back. The hand-off between them is where most production bugs hide. Without an explicit contract — what data crosses the boundary, what guarantees each side makes — debugging is painful and the user experience suffers.
Designing the Boundary
The cleanest pattern we ship: the real-time agent collects everything it needs to hand off in one structured envelope, returns a confirmation to the user, and emits a job to the async pipeline. The async pipeline owns retries, escalations, and completion notifications. No shared state. No "real-time peeking into async progress." Clean boundary in, clean boundary out.
Key Takeaways
- Real-time agents respond while a human waits (500 ms–5 s budget). Async agents run in the background (minutes–hours).
- Real-time fits clinical decision support, ambient documentation, patient chat, eligibility checks, inbox suggestions.
- Async fits prior auth, claims denial, referral processing, risk adjustment, population health outreach.
- Most production healthcare agents are hybrid — thin real-time layer for human-facing UI, async heavy lifting underneath.
- The cost comparison that matters is per-outcome, not per-call.
Call to Action
This blog is one piece of a larger picture. For the full overview, read the pillar guide: What Are AI Agents in Healthcare and How Are They Transforming Care Delivery.
Want to build or evaluate an AI agent for your healthcare product? Get in touch with Nirmitee — we ship FHIR-native, HIPAA-compliant AI agents for US healthtech teams and global hospitals.



