Taming LLM Hallucinations in Clinical Triage Routing

↗

Building an automated clinical routing system where mistakes are catastrophic. We share the architectural blueprint of our deterministic guardrail layer and RAG pipeline that strictly enforces medical ontology.

Category

Applied AI

Read Time

7 min

Published

Nov 2025

Stack

6 technologies

Overview

An NHS trust wanted to automate the first-pass triage routing of clinical referral letters — a task currently performed by specialist nurses. The potential efficiency gain was significant. The risk was also significant: a misrouted referral could delay urgent cancer treatment by weeks. Standard LLM output was non-deterministic and occasionally hallucinated clinical codes that did not exist in the ICD-11 taxonomy. We needed determinism the model couldn't provide on its own.

The Problem

01
LLMs produced plausible but invalid ICD-11 codes at a rate of approximately 2.3% — catastrophic in a clinical context
02
Referral letters mixed structured data, clinical shorthand, and free text with no consistent schema
03
The system needed to escalate ambiguous cases to human review without disrupting routing throughput
04
NHS data governance required all processing to remain on-premises with no data leaving the trust boundary

Our Approach

We built a two-stage architecture: a RAG pipeline grounded in the full ICD-11 taxonomy and the trust's clinical protocol documents, followed by a deterministic validation layer that rejected any output not resolvable to a verified code. Ambiguity scoring routed uncertain cases to a human review queue with pre-populated suggested codes. The entire stack ran on-premises using a fine-tuned Mistral 7B model, with no external API calls.

Key Takeaways

01
Never trust LLM output directly in clinical or safety-critical contexts — always enforce downstream validation against authoritative data
02
RAG grounded in your specific domain ontology reduces hallucination rate by an order of magnitude compared to base models
03
Ambiguity escalation to humans is not a failure state — design it as a first-class feature from day one
04
On-premises fine-tuned open-weight models are production-viable and often the only compliant option in regulated environments

Article Details

Ingesting 5 Petabytes a Day: Rebuilding Our Data Pipeline in Rust

06·Infrastructure·Oct 2025

Taming LLM Hallucinations in Clinical Triage Routing

Ingesting 5 Petabytes a Day: Rebuilding Our Data Pipeline in Rust

Automated Ephemeral Environments for 500+ Engineers