Anti-hallucination middleware

Catch what your LLM gets wrong, before it ships.

A drop-in safety layer that flags fabricated facts, marketing hype, and broken code in LLM output. Bring your own key. Verifiable. $0 marginal cost.

0.86
F1 on real-world test
93%
Precision
~7×
Hallucination reduction
12
Verifiers, 5 languages
The problem

LLMs sound right. Often, they aren't.

Models confidently invent dates, tickers, statistics, and APIs. Generic guardrails catch tone, not truth.

Fabricated facts

Wrong founding years, made-up tickers, off-by-billion population numbers — all stated with full confidence.

Marketing hype

“Risk-free”, “100% uptime”, “guaranteed ROI”. Compliance and brand teams need these flagged at draft time.

Broken code

Imports from removed Python stdlib, deprecated APIs, Python-2 syntax. Compiles in the model's head, not yours.

How it works

Three detectors. One verdict.

Each LLM output is decomposed into claims, routed to the right verifier, and returned with a calibrated severity.

Data Detector

Cross-checks factual claims against authoritative sources.

  • Wikidata structured facts
  • SEC EDGAR (tickers, filings)
  • World Bank country stats
  • Frankfurter ECB exchange rates
  • NLI + math + comparative

Pattern Detector

Catches stylistic tells of fabrication and hype.

  • Speculative numbers without source
  • Risk denial / future certainty
  • Competitive overstatement
  • Multilingual (EN/ES/FR/DE/PT)
  • Negation-aware (no false flips)

Code Detector

Reads code blocks like a strict linter, plus model-specific traps.

  • Removed Python stdlib
  • Deprecated ABC moves
  • Python 2 holdovers
  • Async deprecations
  • Path / file-existence checks
Validated

Honest numbers, not benchmark theatre.

Held-out evaluation on FEVER, marketing copy, code snippets, and an LLM-generated unbiased test set we couldn't tune for.

Test setF1PrecisionRecallAUROC
Unbiased real-world (LLM-gen, 41 cases)0.860.860.860.954
FEVER factual0.830.930.750.852
Marketing copy0.810.840.790.813
Code snippets0.950.960.940.949

In context

Public hallucination detectors (Galileo, semantic-entropy from Nature, generic NLI) report 0.75–0.85 F1 in the same regime. haluguard sits at the top of that band while remaining explainable and BYOK.

What it means

A 30% baseline hallucination rate drops to roughly 4% on flagged-only output — about a 7× reduction without throwing away clean responses.

Get started

Three lines. Your key. Your data.

Wrap any LLM call. haluguard returns a verdict, severity, and per-claim evidence trail.

# pip install haluguard
from haluguard import Guard

guard = Guard(api_key=YOUR_LLM_KEY)
report = guard.check(prompt, llm_output)

if report.severity_tier in ("clear", "suspicious"):
    log_for_review(report)
Start — $10/mo See pricing
Pricing

One simple plan to start.

Bring your own LLM key. We charge for the verification layer, not the LLM. Cancel anytime.

Hobby
$10 / month
  • 10,000 verifications / month
  • All 12 verifiers + 5 languages
  • API + MCP server (Claude Code, Cursor, …)
  • Bring your own LLM key — $0 extra
  • Cancel anytime
Subscribe — $10/mo

Need more volume or on-prem deploy? Email us.