Cloud / Azure — Python · OCR · Healthcare AI

Azure Health Auditor
Handwritten notes to billing data, automatically.

A cloud pipeline built on Microsoft Azure that reads handwritten clinical notes, extracts the medical acts performed, matches them to official RAMQ billing codes, audits them for clinical consistency, and generates a structured billing report — all without manual data entry.

Azure OCR GPT-4o Vision Healthcare RAMQ Pydantic Streamlit Docker
📄

Reads handwritten PDFs

Azure Document Intelligence scans each word and assigns a confidence score. Low-confidence words are re-analyzed visually by GPT-4o — no manual transcription needed.

🏥

Extracts medical acts

A specialized medical NLP service identifies every clinical element — procedures, medications, measurements, symptoms — and verifies nothing was invented or omitted.

💰

Matches billing codes

Each act is matched to its official RAMQ tariff code using vector similarity search. An automatic fallback activates if the primary engine is unavailable.

🛡️

Audits for clinical logic

An AI auditor cross-checks each billed act against the full clinical context of the patient file. Billing an analgesic for a patient with zero pain, for example, is flagged as an anomaly.

📊

Generates billing reports

Validated acts are exported as structured JSON and CSV with the total billable amount — only acts that passed the audit are included.

🖥️

Manager dashboard

A Streamlit interface lets clinic managers approve clean dossiers in one click or manually correct flagged billing codes before archiving.

01

OCR Reading

The PDF is sent to Azure Document Intelligence. Each word receives a confidence score — words below the threshold are sent to GPT-4o Vision for visual correction. The pipeline stops entirely if too many critical words are unreadable.

02

Medical Extraction

Azure Text Analytics for Health identifies all clinical entities. The system then verifies mathematically that the LLM produced exactly what the medical service found — no hallucinations, no omissions.

03

RAMQ Code Matching

Each act is compared to the official RAMQ tariff database using optimized vector search (NumPy dot product with L2-normalized OpenAI embeddings). A secondary LLM fallback ensures no act goes unmatched.

04

Consistency Audit

An LLM playing the role of medical auditor verifies that each billed act is clinically justified within the full context of the patient file. Anomalies are flagged for human review rather than silently passed.

05

Billing Report

Validated acts are consolidated into a structured JSON file and a CSV export, with the total billable amount calculated. The manager dashboard allows one-click approval or granular correction.

Azure Health Auditor pipeline architecture

End-to-end pipeline — from PDF upload to validated billing report.

OCR

Azure Document Intelligence

Visual Correction

GPT-4o Vision

Medical NLP

Azure Text Analytics for Health

Code Matching

Vector Search — NumPy + OpenAI

Data Validation

Pydantic V2

Interface

Streamlit

Deployment

Docker · Azure Container Apps

CI / CD

GitHub Actions · Ruff · MyPy

View repository on GitHub ↗ ← Back to portfolio