Swiss regulators are asking for evidence that AI systems perform reliably. Traditional testing can't provide it. Here's why statistical approaches are the answer.
When a bank deploys an AI model to assess credit risk, or a cantonal authority uses machine learning to triage applications, a natural question follows: how do you know it works?
For traditional software, the answer is straightforward. You write tests. Each test checks that a specific input produces a specific output. If all tests pass, you have confidence in the system.
AI doesn’t work this way.
The non-determinism problem
AI systems — particularly those built on large language models or statistical learning — are inherently non-deterministic. Ask the same question twice and you may get different answers. This isn’t a bug; it’s a fundamental characteristic of how these systems operate.
This creates a problem for compliance. FINMA expects model validation. ISO 42001 demands performance measurement. Auditors want evidence. But evidence of what, exactly, when the system’s output varies by design?
From pass/fail to pass rates
The answer lies in shifting from binary testing (did it work?) to statistical testing (how often does it work, and is that often enough?).
Instead of asserting that a function returns a specific value, you run it many times and ask whether the observed success rate meets a defined threshold at a given confidence level. This is the foundation of probabilistic testing.
For a business manager, the analogy is quality control in manufacturing. You don’t test every single item — you test a sample and make a statistical judgement about the batch. The same principle applies to AI.
What this means in practice
Probabilistic testing produces structured, quantitative evidence: success rates, confidence intervals, latency distributions, trend data. This is precisely the kind of evidence that regulatory frameworks demand.
It transforms AI compliance from a subjective judgement (“we think it works”) into a measurable, auditable claim (“we have demonstrated, with 95% confidence, that this system’s success rate exceeds 99.5%”).
For technical implementation details, visit javai.org.
