Anti-hallucination middleware

Catch what your LLM gets wrong, before it ships.

A drop-in safety layer that flags fabricated facts, marketing hype, and broken code in LLM output. Plugs into Claude Code, Cursor, ChatGPT Desktop. Bring your own key. Unlimited use, $5/mo.

Start — $5/mo See the numbers

95%

Recall (hallucinations caught)

0.91

83%

Precision

Verifiers, 5 languages

The problem

LLMs sound right. Often, they aren't.

Models confidently invent dates, tickers, statistics, and APIs. Generic guardrails catch tone, not truth.

Fabricated facts

Wrong founding years, made-up tickers, off-by-billion population numbers — all stated with full confidence.

Marketing hype

“Risk-free”, “100% uptime”, “guaranteed ROI”. Compliance and brand teams need these flagged at draft time.

Broken code

Imports from removed Python stdlib, deprecated APIs, Python-2 syntax. Compiles in the model's head, not yours.

How it works

Three detectors. One verdict.

Each LLM output is decomposed into claims, routed to the right verifier, and returned with a calibrated severity.

Data Detector

Cross-checks factual claims against authoritative sources.

Wikidata structured facts
SEC EDGAR (tickers, filings)
World Bank country stats
Frankfurter ECB exchange rates
NLI + math + comparative

Pattern Detector

Catches stylistic tells of fabrication and hype.

Speculative numbers without source
Risk denial / future certainty
Competitive overstatement
Multilingual (EN/ES/FR/DE/PT)
Negation-aware (no false flips)

Code Detector

Reads code blocks like a strict linter, plus model-specific traps.

Removed Python stdlib
Deprecated ABC moves
Python 2 holdovers
Async deprecations
Path / file-existence checks

Validated

Honest numbers, not benchmark theatre.

Held-out evaluation on FEVER, marketing copy, code snippets, and an LLM-generated unbiased test set we couldn't tune for.

Test set	F1	Precision	Recall	AUROC	vs industry
Real-world test	0.91	0.83	0.95	—	+0.16
Data detector	0.83	0.93	0.75	0.852	+0.08
Pattern detector	0.81	0.84	0.79	0.813	+0.11
Code detector	0.95	0.96	0.94	0.949	+0.10

Industry baseline: public hallucination detectors (Galileo ~0.75 F1, Nature semantic-entropy 0.79 AUROC, generic NLI 0.72) sit in the 0.70–0.85 range on comparable workloads. haluguard delivers the same precision while remaining explainable and BYOK.

What it means

A 30% baseline hallucination rate drops to roughly 4% on flagged-only output — about a 7× reduction without throwing away clean responses.

Honest framing

We don't claim to be the absolute best — but we are verifiable. Every flagged claim comes with a source URL you can audit.

Get started

Three lines. Your key. Your data.

Wrap any LLM call. haluguard returns a verdict, severity, and per-claim evidence trail.

# pip install haluguard
from haluguard import Guard

guard = Guard(api_key=YOUR_LLM_KEY)
report = guard.check(prompt, llm_output)

if report.severity_tier in ("clear", "suspicious"):
    log_for_review(report)

Start — $5/mo See pricing

Pricing

One simple plan to start.

For individual developers using Claude Code, Cursor, ChatGPT Desktop or any MCP-compatible host. Bring your own LLM key — we charge only for the verification layer.

For developers

haluguard

$5 / month

Unlimited verifications
All 12 verifiers + 5 languages
MCP server for Claude Code, Cursor, ChatGPT Desktop
Direct API for personal scripts / CI
Bring your own LLM key — $0 marginal cost
Cancel anytime

Subscribe — $5/mo

Built for individual developers shipping with LLMs.

For teams

haluguard Enterprise

Custom pricing

Shared team workspaces + SSO
Audit logs (compliance-grade)
SLA + dedicated support
On-prem deployment available
Volume discounts on verification quota
Direct API for production traffic

Contact sales

Coming soon. Email us to be notified or for custom integrations today.