A cardiologist at a 200-bed regional hospital was spending 3 hours per day on clinical documentation. Not because she was slow — because cardiology notes require detailed hemodynamic data, medication titration rationale, imaging findings, and procedure-specific documentation that templates cannot capture. She was considering reducing her patient load from 22 to 16 per day to manage the documentation burden. At her billing rate, that reduction would cost the hospital $480,000 per year in lost revenue.
We deployed a voice-to-clinical-note AI platform that listens to patient-provider conversations, extracts clinical entities in real time, and generates specialty-specific SOAP notes mapped to ICD-10 and CPT codes. After 8 weeks in production across 3 specialties (cardiology, orthopedics, primary care), documentation time dropped by 66%, and the cardiologist went back to seeing 22 patients per day.
This case study covers the architecture, specialty-specific challenges, integration with the hospital's EHR via FHIR, and the results across 14,200 patient encounters.
The Documentation Burden Problem
The numbers are well-documented and staggering. A 2017 Annals of Internal Medicine study found that physicians spend 2 hours on documentation for every 1 hour of direct patient care. A 2024 AMA survey identified documentation burden as the #1 contributor to physician burnout, ahead of administrative tasks and prior authorization.
The financial impact is equally severe. When physicians spend time documenting instead of seeing patients, the lost revenue ranges from $150-500 per hour depending on specialty. For a 10-physician practice, that translates to $750K-2.5M per year in opportunity cost.
Previous solutions — scribes (human or offshore), speech-to-text dictation, template-based documentation — each addressed part of the problem but created others. Human scribes cost $30-50K per year per physician and have turnover problems. Dictation still requires physician editing time. Templates produce generic notes that miss specialty-specific nuance and frequently trigger coding denials.
Platform Architecture
The platform processes the patient-provider conversation through four stages:
Stage 1: Speech-to-Text (ASR)
We evaluated Whisper (OpenAI), Deepgram, and AWS Transcribe Medical for real-time clinical speech recognition. Deepgram Nova-2 won on three criteria: medical vocabulary accuracy (94.2% on clinical terminology vs 91.8% for Whisper), real-time streaming latency (under 300ms), and speaker diarization (distinguishing provider from patient). The medical terminology gap matters — "metoprolol tartrate 25mg BID" is a common cardiology phrase that generic ASR models frequently mangle.
Stage 2: Clinical NLP
The raw transcript is processed through a clinical NLP pipeline that extracts: chief complaint, history of present illness elements, relevant past medical history mentions, medications discussed (with dosage changes), physical exam findings mentioned verbally, and assessment/plan decisions. This stage uses a fine-tuned clinical NER (Named Entity Recognition) model trained on 50,000 annotated clinical transcripts, supplemented by RxNorm and SNOMED CT lookup for medication and diagnosis normalization.
Stage 3: Note Generation
A specialty-tuned LLM (Claude Sonnet 4) assembles the extracted entities into a structured SOAP note. The key innovation: specialty-specific note templates are loaded as system context, not hard-coded. Cardiology notes emphasize hemodynamic parameters, medication titration rationale, and rhythm analysis. Orthopedic notes emphasize ROM measurements, surgical planning details, and functional status. Primary care notes follow standard SOAP format with preventive care gaps highlighted.
Stage 4: EHR Integration
The generated note is written to the EHR as a FHIR DocumentReference resource, linked to the current Encounter. ICD-10 and CPT codes are suggested (not auto-applied) and appear in the physician's review interface. The physician reviews the note, makes any corrections, and signs — a process that takes 1-3 minutes instead of the previous 8-15 minutes of dictation and editing.
Results: 14,200 Encounters Over 8 Weeks
| Metric | Before | After | Change |
|---|---|---|---|
| Documentation time per encounter | 12.4 min | 4.2 min | -66% |
| Note accuracy (physician-validated) | N/A (physician-authored) | 94.1% | Baseline established |
| Coding accuracy (ICD-10) | 87% (manual coding) | 91% (AI-suggested) | +4.6% |
| RVU capture rate | Baseline | +12% | +12% revenue |
| Physician NPS (documentation satisfaction) | 23 | 54 | +31 points |
| Patient encounters per day per physician | 18.4 | 21.2 | +15% |
The 12% RVU capture improvement was unexpected. The AI system consistently identified documentable elements that physicians were omitting due to time pressure — counseling time, care coordination activities, and complexity modifiers. These are legitimate billing elements that were under-documented, not upcoding.
At Nirmitee, we build healthcare AI systems with the EHR integration and HIPAA compliance infrastructure built in. If you are building clinical documentation AI, talk to our team.


