Nirmitee.io

AI Medical Scribing: How We Built a Platform That Captures 320 Clinical Fields from Doctor-Patient Conversations

March 10, 2026
20 min read
Written by
Gulshan Prajapati
Gulshan Prajapati

Software Development Expert

Writes about software development, scalable architecture, and practical problem-solving across modern digital products. Focuses on turning complex technical ideas into clear, real-world solutions.


Executive Summary

A US-based health tech startup was building the next generation of clinical documentation — a system where doctors don't type, don't dictate into a recorder, and don't hire scribes. Instead, a smartphone mounted on the physician's arm captures the entire patient encounter and an AI engine extracts 320+ clinical fields in real-time, generating a complete SOAP note ready for physician review and EHR submission.

We built the complete platform: the mobile capture app (React Native), the clinical NLP engine, the documentation review interface (React), and two EHR integration pathways — RPN bridge for legacy systems and direct FHIR push for modern EHRs. The result: documentation time dropped from 12 minutes to 2.3 minutes per encounter, with 96.4% AI accuracy across all captured fields.

Learn about our healthcare software product development capabilities.

The Problem: Doctors Spend More Time Typing Than Treating

Clinical documentation is the #1 driver of physician burnout in the United States. Studies consistently show that for every hour of direct patient care, physicians spend nearly 2 hours on documentation and administrative tasks. The average primary care physician documents 20-30 encounters per day — that's 4-6 hours of typing, clicking, and navigating EHR screens.

Our AI-powered healthcare solutions bring intelligence to clinical workflows.

The Documentation Tax

  • 12 minutes per encounter: average time to document a standard outpatient visit — chief complaint, history, exam findings, assessment, plan, prescriptions, referrals, follow-up
  • 2+ hours of after-hours charting: most physicians finish documentation at home ("pajama time"), cutting into family time and personal recovery
  • 15% error rate: manual documentation introduces errors — wrong codes, missing diagnoses, incomplete medication lists, copy-paste artifacts from prior notes
  • $150,000+ annual cost per scribe: human scribes are effective but expensive, and good ones are hard to find and retain
  • Revenue loss: undercoding due to documentation gaps leaves an estimated 10-15% of legitimate revenue uncaptured

Why Existing Solutions Fell Short

  • Dragon Medical (Nuance): speech-to-text, but physicians still need to dictate in a structured format and edit the output. It converts speech to text — it doesn't understand clinical context.
  • Human scribes: effective but $150K+/year, limited availability, training time, and they still can't be in every exam room simultaneously.
  • Template-based EHR tools: click-heavy, rigid workflows that force physicians to document in the EHR's structure rather than their natural clinical workflow.

Our client wanted something fundamentally different: the doctor practices medicine naturally, and the AI handles the documentation.

The Clinical Documentation Interface

After each encounter, the physician sees a split-screen interface: video playback on the left (with the ability to jump to any moment in the encounter) and the AI-generated documentation on the right. Every field is pre-populated — the physician reviews, makes any corrections, and signs. Average review time: 2.3 minutes.

Key features of the documentation interface:

  • Section-by-section review: Chief Complaint, HPI, ROS, Physical Exam, Assessment, Plan — each section expandable with AI-generated content
  • Confidence indicators: green dot (high confidence), yellow dot (needs review) on each auto-populated field — the physician knows exactly where to focus attention
  • One-click corrections: tap any field to edit. The system learns from corrections to improve future accuracy for that physician's documentation style
  • ICD-10 suggestion: assessment section suggests diagnosis codes based on captured clinical data — reducing coding errors and undercoding
  • Approve & Sign: single button to finalize documentation and push to EHR

Technical Architecture

Capture Layer

The physician wears a smartphone mounted on their upper arm using a medical-grade armband. The phone captures:

  • Audio: continuous recording of the doctor-patient conversation using the phone's microphone
  • Video: optional — captures visual context (physical exam maneuvers, skin lesions, range of motion assessments)
  • Ambient context: encounter duration, patient identification (via QR code scan at start), physician identification (biometric)

Audio Processing Pipeline

The audio goes through a 7-stage processing pipeline:

  1. Raw Audio Capture: 16kHz, 16-bit PCM audio stream from device microphone
  2. Noise Reduction: ambient noise filtering (other patients, hallway sounds, medical devices beeping) using spectral gating
  3. Speaker Diarization: separating doctor's voice from patient's voice. Critical for understanding who said what — "I have chest pain" means something different from the doctor vs. the patient
  4. Medical ASR (Automatic Speech Recognition): speech-to-text optimized for medical terminology — drug names, anatomical terms, procedure names, abbreviations (q.i.d., b.i.d., PRN)
  5. Clinical Named Entity Recognition: extracting medical entities from the transcript — medications (blue), conditions (red), procedures (green), anatomical sites (purple)
  6. Field Mapping: mapping extracted entities to the 320 clinical field schema — connecting "blood pressure is 128 over 82" to the Systolic BP and Diastolic BP fields
  7. SOAP Note Generation: assembling all mapped fields into a structured clinical document in SOAP format (Subjective, Objective, Assessment, Plan)

Encounter Summary: The SOAP Note

The AI generates a complete SOAP note organized in four columns:

  • Subjective: chief complaint, history of present illness (onset, duration, severity, modifying factors), past medical history, current medications, allergies, social history
  • Objective: vital signs (auto-extracted from verbal report or integrated device data), physical examination findings organized by body system
  • Assessment: numbered diagnoses with suggested ICD-10 codes, each with an AI confidence score
  • Plan: treatment orders, prescriptions (drug, dose, route, frequency), referrals, follow-up scheduling, patient education points

320 Clinical Fields

The system captures and maps 320 discrete clinical fields across 13 categories:

CategoryFieldsExamples
Demographics12Age, gender, ethnicity, preferred language, emergency contact
Vital Signs8BP systolic/diastolic, HR, temp, SpO2, RR, weight, BMI
Chief Complaint5Primary complaint, duration, severity, location, onset type
HPI22Onset, location, duration, character, severity, timing, modifying factors, associated symptoms
Past Medical History18Conditions, surgeries, hospitalizations, psychiatric history
Family History15Cancer, cardiac, diabetes, mental health — per family member
Social History14Smoking, alcohol, drugs, exercise, occupation, living situation
Medications20Drug name, dose, route, frequency, indication, prescriber, start date
Allergies8Allergen, reaction type, severity, onset date
Review of Systems11214 body systems × 8 avg symptoms each (constitutional, HEENT, cardiovascular, respiratory, GI, GU, MSK, neuro, skin, psych, endo, heme/lymph, immunologic, eye)
Physical Exam45General appearance, HEENT findings, cardiac, lungs, abdomen, extremities, neurological, skin, MSK
Assessment15Diagnoses (up to 5), ICD-10 codes, severity, chronicity, clinical reasoning
Plan26Medications prescribed, labs ordered, imaging ordered, referrals, procedures scheduled, follow-up, patient education

EHR Integration: Two Pathways

Documentation needs to flow into the physician's EHR seamlessly. We built two integration pathways to cover both modern and legacy EHR systems:

Path A: RPN Bridge (Legacy EHRs)

For EHR systems without robust FHIR APIs (older Epic installations, legacy Allscripts, proprietary systems), we use a Robotic Process Navigation (RPN) bridge. The RPN agent navigates the EHR's UI programmatically — opening the patient chart, clicking into each documentation section, and populating fields as if a human user were typing. This works with any EHR that has a desktop client, regardless of API availability.

Path B: Direct FHIR Push (Modern EHRs)

For EHRs with FHIR R4 APIs (Epic with Open.Epic, Cerner Ignite, athenahealth), we push structured documentation directly via FHIR resources: DocumentReference (clinical note), Encounter (visit context), Condition (diagnoses), MedicationRequest (prescriptions), ServiceRequest (orders). This is faster, more reliable, and maintains data structure end-to-end.

Our custom healthcare software development team builds solutions from the ground up.

FeatureRPN BridgeFHIR Direct Push
Latency30-60 seconds2-5 seconds
Reliability95% (UI changes can break navigation)99.9% (API-based)
Supported EHRsAny with desktop clientEpic, Cerner, athena, NextGen
Data structureLoses some structure (flat text)Full FHIR structure preserved
Setup time1-2 weeks per EHR2-4 weeks per EHR (API registration)

Provider Analytics Dashboard

Each physician has a personal analytics dashboard showing their documentation efficiency:

  • Encounters documented today with average time per note
  • AI accuracy trend: overall and by section (Vitals 99%, Medications 97%, History 95%, Physical Exam 94%, Assessment 93%)
  • Time saved: cumulative hours reclaimed from manual documentation
  • Learning curve: as the AI learns each physician's documentation patterns, accuracy improves over time — typically reaching peak accuracy within 2-3 weeks

Results and Impact

MetricBeforeAfterImpact
Documentation time per encounter12 minutes2.3 minutes81% reduction
After-hours charting2+ hours/dayZeroEliminated pajama time
Documentation error rate15%3.6%76% fewer errors
Clinical fields captured~180 (manual average)320 (AI captures all)78% more complete notes
ICD-10 coding accuracy82%94.7%15% improvement → revenue uplift
Physician satisfaction (burnout index)3.2/5 (moderate burnout)4.4/5 (engaged)Dramatic quality of life improvement
Revenue per encounter$142 avg$158 avg$16/encounter uplift from better coding

Financial Impact

For a 5-physician primary care practice seeing 25 patients each per day:

  • Time saved: 5 physicians × 25 encounters × 9.7 min saved = 20+ hours/day reclaimed
  • Revenue uplift from coding: 125 encounters/day × $16/encounter = $2,000/day → $500,000/year
  • Scribe cost avoided: $150,000/year per scribe × 5 = $750,000/year not spent
  • Error reduction: fewer denied claims from documentation errors = est. $80,000/year
  • Total annual value: $1.33M for a 5-physician practice

Technology Stack

LayerTechnologyPurpose
Mobile AppReact NativeAudio/video capture, patient encounter management
Web DashboardReact + TypeScriptDocumentation review, analytics, admin
BackendNode.js + PythonAPI gateway (Node), NLP pipeline (Python)
ASRCustom medical ASR modelSpeech-to-text optimized for medical terminology
NLPspaCy + custom NER modelsClinical entity extraction, field mapping
DatabasePostgreSQL + RedisEncounter data, field mappings, real-time processing
EHR IntegrationFHIR R4 + RPN bridgeModern API + legacy UI automation
VideoWebRTC + HLSReal-time capture, on-demand playback
InfrastructureAWS (HIPAA)GPU instances for NLP, encrypted storage

Compliance

  • HIPAA: all audio/video encrypted at rest (AES-256) and in transit (TLS 1.3). Audio retained per client policy (typically 30-90 days), then securely deleted. Access audit logs for every encounter.
  • Patient Consent: verbal consent captured at the start of each encounter. Consent flag stored with the encounter record. Patients can request audio deletion at any time.
  • De-identification: all AI model training uses de-identified transcripts. No PHI in training data.
  • State Recording Laws: configurable consent requirements per state (one-party vs. two-party consent states).

Project Timeline

PhaseDurationDeliverables
Phase 14 weeksMobile capture app, audio pipeline, basic ASR integration
Phase 26 weeksClinical NLP engine, 320-field schema, speaker diarization, SOAP note generation
Phase 34 weeksDocumentation review UI, EHR integration (FHIR + RPN), provider analytics dashboard
Phase 44 weeksAccuracy tuning, physician onboarding with 3 pilot practices, compliance audit, production hardening

Total: 4.5 months with a team of 4 engineers + 1 NLP specialist.

Lessons Learned

  • Speaker diarization is the hardest NLP problem. Distinguishing doctor from the patient in a noisy exam room with overlapping speech was our biggest technical challenge. We achieved 94% diarization accuracy — high enough for clinical use, but still requiring physician review of edge cases.
  • Physicians don't want perfection — they want speed. 96% accuracy with 2-minute review is vastly preferred over 99% accuracy with a 5-minute review. The time savings matter more than the last few percentage points of accuracy.
  • The 320-field schema was designed with physicians. We didn't pick 320 fields arbitrarily. We worked with 15 physicians across primary care, orthopedics, and cardiology to identify every field they document. The schema covers 98% of outpatient documentation needs.
  • RPN is a bridge, not a destination. The RPN integration works for legacy EHRs today, but it's fragile — EHR UI updates break the navigation. FHIR is the long-term path. We built both because providers need solutions now, not after their EHR vendor ships a FHIR API.
  • Revenue uplift sells the platform. Physicians love the time savings. Practice managers love the revenue uplift from better coding. Together, the ROI case is overwhelming.

Building interoperable healthcare systems is complex. Our Healthcare Interoperability Solutions team has deep experience shipping production integrations. We also offer specialized Agentic AI for Healthcare services. Talk to our team to get started.

Share this case study

Related Case Studies