Bigstrum
Bigstrum
Bigstrum
Book Consultation
All Insights
Applied AI·7 min read·Nov 2025
Applied AI

Taming LLM Hallucinations in Clinical Triage Routing

Building an automated clinical routing system where mistakes are catastrophic. We share the architectural blueprint of our deterministic guardrail layer and RAG pipeline that strictly enforces medical ontology.

Category

Applied AI

Read Time

7 min

Published

Nov 2025

Stack

6 technologies

Overview

An NHS trust wanted to automate the first-pass triage routing of clinical referral letters — a task currently performed by specialist nurses. The potential efficiency gain was significant. The risk was also significant: a misrouted referral could delay urgent cancer treatment by weeks. Standard LLM output was non-deterministic and occasionally hallucinated clinical codes that did not exist in the ICD-11 taxonomy. We needed determinism the model couldn't provide on its own.

The Problem
  • 01

    LLMs produced plausible but invalid ICD-11 codes at a rate of approximately 2.3% — catastrophic in a clinical context

  • 02

    Referral letters mixed structured data, clinical shorthand, and free text with no consistent schema

  • 03

    The system needed to escalate ambiguous cases to human review without disrupting routing throughput

  • 04

    NHS data governance required all processing to remain on-premises with no data leaving the trust boundary

Our Approach

We built a two-stage architecture: a RAG pipeline grounded in the full ICD-11 taxonomy and the trust's clinical protocol documents, followed by a deterministic validation layer that rejected any output not resolvable to a verified code. Ambiguity scoring routed uncertain cases to a human review queue with pre-populated suggested codes. The entire stack ran on-premises using a fine-tuned Mistral 7B model, with no external API calls.

Key Takeaways
  • 01

    Never trust LLM output directly in clinical or safety-critical contexts — always enforce downstream validation against authoritative data

  • 02

    RAG grounded in your specific domain ontology reduces hallucination rate by an order of magnitude compared to base models

  • 03

    Ambiguity escalation to humans is not a failure state — design it as a first-class feature from day one

  • 04

    On-premises fine-tuned open-weight models are production-viable and often the only compliant option in regulated environments

Article Details

Category

Applied AI

Read Time

7 min

Published

Nov 2025

Tech Stack

Mistral 7BRAGICD-11PythonLangChainOn-Premises

Ready to build?

Start a similar project

View all insights

Next Insight

Infrastructure·Oct 2025

Automated Ephemeral Environments for 500+ Engineers