Taming LLM Hallucinations in Clinical Triage Routing
Building an automated clinical routing system where mistakes are catastrophic. We share the architectural blueprint of our deterministic guardrail layer and RAG pipeline that strictly enforces medical ontology.
Category
Applied AI
Read Time
7 min
Published
Nov 2025
Stack
6 technologies
An NHS trust wanted to automate the first-pass triage routing of clinical referral letters — a task currently performed by specialist nurses. The potential efficiency gain was significant. The risk was also significant: a misrouted referral could delay urgent cancer treatment by weeks. Standard LLM output was non-deterministic and occasionally hallucinated clinical codes that did not exist in the ICD-11 taxonomy. We needed determinism the model couldn't provide on its own.
- 01
LLMs produced plausible but invalid ICD-11 codes at a rate of approximately 2.3% — catastrophic in a clinical context
- 02
Referral letters mixed structured data, clinical shorthand, and free text with no consistent schema
- 03
The system needed to escalate ambiguous cases to human review without disrupting routing throughput
- 04
NHS data governance required all processing to remain on-premises with no data leaving the trust boundary
We built a two-stage architecture: a RAG pipeline grounded in the full ICD-11 taxonomy and the trust's clinical protocol documents, followed by a deterministic validation layer that rejected any output not resolvable to a verified code. Ambiguity scoring routed uncertain cases to a human review queue with pre-populated suggested codes. The entire stack ran on-premises using a fine-tuned Mistral 7B model, with no external API calls.
- 01
Never trust LLM output directly in clinical or safety-critical contexts — always enforce downstream validation against authoritative data
- 02
RAG grounded in your specific domain ontology reduces hallucination rate by an order of magnitude compared to base models
- 03
Ambiguity escalation to humans is not a failure state — design it as a first-class feature from day one
- 04
On-premises fine-tuned open-weight models are production-viable and often the only compliant option in regulated environments
Article Details
Category
Applied AI
Read Time
7 min
Published
Nov 2025
Tech Stack
Ready to build?
Start a similar project