One verify block. Four tools. Two verdicts.
A 30-line example: aver verify passes, aver verify --hostile fails, and both proof backends reject it on identical source — tracing the path from sampled effects through adversarial profiles to universal proof.
The block doesn't change between runs. Only the question being asked changes: "does this hold for the stub I wrote?" versus "does this hold under adversarial profiles of the effects involved?" versus "does this hold for every implementation of this signature?". The first is sample testing. The second is hostile-mode boundary checking. The third is universal proof. Aver lets you ask all three from one source — and tells you exactly which one disagrees.
Three roles, one frame: given is the world you chose — the stub or value list the law was demonstrated in. Hostile substitutes other worlds — frozen clocks, empty disks, rolls stuck at the bound — and asks "what if your world was wrong?". when is the filter that declares which worlds this law assumes (when clock(root, 1) > clock(root, 0) = "monotonic clocks only"); hostile honours it and skips profiles that violate the assumption.
The code
A function samples two random floats and returns whether they were different. The verify block declares that, under a stub distinctStub returning n + 1, the function holds => true.
fn distinctStub(path: BranchPath, n: Int) -> Float ? "Stub returning n+1 as a Float — different value per call counter." Float.fromInt(n) + 1.0 fn twoFloatsDistinct() -> Bool ? "Sample two random floats; return whether they were different." ! [Random.float] a = Random.float() b = Random.float() a != b verify twoFloatsDistinct law alwaysDistinct given rnd: Random.float = [distinctStub] twoFloatsDistinct() => true
Run it four ways
Same file. Four commands. The first runs the verify block as a sample under the stub. The second multiplies it through adversarial profiles of Random.float — midrange (always 0.5), always_zero, always_one — and asks "does the law still hold?". The other two export the same block as a universal theorem and ask a real proof backend to discharge it.
Verify: randomness_paradox.av ✓ verify distinctStub 2/2 ✓ verify twoFloatsDistinct 1/1 Summary: 3/3 cases passed
Verify: randomness_paradox.av ✓ verify distinctStub 2/2 ✗ verify twoFloatsDistinct 1/4 (1/1 declared, 0/3 hostile) fail[verify-hostile-mismatch]: origin: hostile effect profile: Random.float/midrange expected: true actual: false Summary: 3/6 passed | 3 failed
✖ Building RandomnessParadox error: unsolved goals rnd : RandomFloatInUnit ⊢ (rnd.val BranchPath.Root 0 != rnd.val BranchPath.Root 1) = true error: build failed
randomness_paradox.dfy(262,0): Error: a postcondition could not be proved on this return path ensures twoFloatsDistinct( BranchPath_Root, rnd ) == true finished with 11 verified, 1 error
How the three readings work
aver verify evaluates the law as a finite sample. The cartesian product of given domains is enumerated and run against the law's RHS. With distinctStub returning n + 1, the two calls produce 1.0 and 2.0; 1.0 != 2.0 holds; the sample case passes. This is the — familiar — testing path: assert against a chosen implementation, run, verify equality.
aver verify --hostile keeps the same machinery but multiplies the law through every adversarial profile Aver ships for the classified effects involved. Random.float gets midrange (always 0.5), always_zero, always_one. Each profile breaks the — implicit — assumption that the two calls return distinct values: with midrange, both calls return 0.5; 0.5 != 0.5 is false; the law fails. Hostile mode catches the bug your stub hid before you ever reach formal proof. Add a when clause to declare the assumption explicitly (when rnd(root, 0) != rnd(root, 1)) and hostile skips the profiles that violate it — the law is exercised only where its assumption holds.
aver proof takes the same block and lifts it. Every classified effect call (here Random.float) becomes an explicit function parameter, typed as the bounded subtype RandomFloatInUnit — the runtime invariant 0.0 ≤ x ≤ 1.0 is structurally encoded, no axiom required. The exported theorem says the law's RHS holds for every possible such function in the unit interval — not just the stub you supplied, not just the adversarial profiles, but every function. That's a strictly stronger claim. Lean and Dafny independently reject the export for this reason.
What conventional Z3 verification doesn't reach
Most contract-verification systems built on Z3 prove pre/post conditions on pure functions, then stop at the effect boundary. Random results are unconstrained for the solver — there are no useful axioms about them beyond range. Contracts that depend on specific random values fall to runtime checking, not static rejection. Same shape for HTTP responses, disk reads, time, network: the SMT model has no symbolic handle on them, so the proof obligation evaporates.
Aver's Oracle export does the extra step. Classified effects — Random, Http, Disk read-side, Tcp.send, Time.now, Console.readLine, non-modal Terminal, Args, Env read-side — are lifted into the proof artifact as explicit function parameters with structured call addressing (BranchPath + per-branch counter). The exported theorem quantifies over them. Universal claims about effectful code become provable, and rejectable when they're false. The randomness paradox is what that gap looks like, in 30 lines.
Run it yourself
cargo install aver-lang git clone https://github.com/jasisz/aver cd aver # 1. Sample passes aver verify examples/formal/randomness_paradox.av ✓ 3/3 cases passed # 2. Hostile fails — adversarial profiles break the implicit assumption aver verify --hostile examples/formal/randomness_paradox.av ✖ 3/6 passed | 3 failed (Random.float midrange / always_zero / always_one) # 3. Lean rejects aver proof examples/formal/randomness_paradox.av --backend lean --output out cd out && lake build ✖ unsolved goals # 4. Dafny rejects aver proof examples/formal/randomness_paradox.av --backend dafny --output out dafny verify out/*.dfy ✖ postcondition could not be proved