Fixing False Positives in AI Compliance Scanning

Damir Andrijanic
Fixing False Positives in AI Compliance Scanning cover visual
ComplianceRadar.dev cover image for Fixing False Positives in AI Compliance Scanning.

Most AI compliance scanners are far less deterministic than users think. Paste a URL, wait 30 seconds, get a report. The workflow looks simple, but production behavior is not.

While improving ComplianceRadar.dev, we repeatedly hit one of the most common and most dangerous issues in automated compliance tooling: false positives in cookie consent detection.

Sites that clearly had consent banners were still flagged as missing a consent mechanism. That sounds like a small bug. In compliance systems, it is a trust problem.

Why This Happens

The naive implementation is everywhere: fetch HTML, send text to an LLM, and ask whether a consent banner exists. It works in demos and fails on modern websites.

Today's sites render consent UI dynamically, inject CMP components asynchronously, localize wording, isolate interfaces in iframes or shadow DOM, and change behavior by region and browser context. Static HTML alone often misses the real UI, and missing deterministic evidence pushes the model into guesswork.

The Distinction Many Scanners Miss

Two different conditions are frequently conflated:

  • No consent UI exists.
  • Consent UI exists, but tracking starts before consent.

These are not equivalent findings. A site can present a visible banner, offer reject controls, and still potentially load analytics or marketing scripts too early. Good compliance scanning must preserve that nuance.

Moving Beyond LLM Guessing

To reduce false positives, we changed the scanning pipeline by introducing a lightweight browser evidence layer.

UI Evidence

Banner detection, reject controls, CMP hints

Runtime Signals

Script timing, cookie activity, request behavior

Language Coverage

Multilingual consent snippets and vendor variants

The system now collects structured technical signals during the normal scan path, not only in fallback rendering. That shifts behavior from inference-first to evidence-first analysis.

The Hard Part Nobody Talks About

Detecting that a banner exists is comparatively easy. Determining whether tracking respects consent state is much harder.

  • Is analytics loaded immediately?
  • Do ad scripts fire before interaction?
  • Are third-party cookies written before consent?
  • Does reject behavior differ materially from accept?

Those are runtime questions. They require instrumentation, request observation, timing awareness, and strict timeout controls.

Why We Added Calibration Guardrails

We also observed occasional model overconfidence. Even with stronger evidence, outputs could still overstate conclusions such as absolute absence of consent mechanisms.

To prevent this, we added a calibration layer after model parsing. If the scanner sees consent UI evidence, CMP hints, or reject controls, hard "no mechanism exists" language is downgraded to findings such as possible pre-consent tracking, incomplete enforcement, or insufficient evidence.

We explicitly model uncertainty now. It is less flashy than confident claims, but materially better for long-term trust.

Multilingual Compliance Detection Is Underrated

Europe is multilingual, and consent systems reflect that. English-only detection misses valid interfaces in Croatian, German, French, Italian, Dutch, and mixed-language deployments.

We expanded multilingual consent patterns and added regression cases for delayed rendering, vendor-specific flows, and ambiguous evidence scenarios.

Operational Reality Matters Too

Better evidence is not free. Browser-level checks add latency, cost, timeout complexity, and concurrency pressure.

Our trade-off was lightweight probes, strict timeout budgets, best-effort cancellation, structured evidence caps, and scan deduplication to reduce repeated load under real traffic.

What We Learned

Compliance scanning is not a binary classification problem. It is a probabilistic evidence problem.

Deterministic signals matter. Calibration matters. Uncertainty handling matters. Operational engineering matters.

The goal is not to claim guaranteed compliance. The goal is to surface meaningful risk signals earlier, with higher consistency and lower false confidence.

Test your public architecture

Run a free compliance scan and see how your consent and tracking signals hold up under structured browser evidence.

Sources and further reading

This article is informational and does not constitute legal advice.