Zulip Chat Archive

Stream: general

Topic: Does anyone use Comparator?

Jason Rute (Dec 27 2025 at 23:43):

The Lean FRO, specifically @Henrik Böving, made a tool called Comparator (https://github.com/leanprover/comparator, #lean4 > FRO's new verifier) which is for checking the correctness of “challenge” problems. These are exactly the sort of problems we see now in AI applications and vibe coding, where one has a target theorem they want to prove and a human or AI supplies the proof. I think we absolutely need tools like this, and I want to encourage people to use them (if they are ready). I know the current version is, at best, a beta version, but I assume stress testing it will make it better. So does anyone actually use it?

@George Tsoukalas Do you use Comparator to check PutnamBench submissions?
@Koishi Chan and @Lawrence Wu (llllvvuu), I’ve noticed you two have been using AI to solve Erdos problems. Do you use Comparator to check the results?
@Zeyu Zheng, for Seed Prover 1.5, did you use Comparitor to check your benchmark results?
@Henrik Böving Have you been using Comparator to check any of the above results?

Jason Rute (Dec 27 2025 at 23:44):

Sorry for the annoying post. I assume the answer is no, that no one uses it. If the answer is yes,

Is it user-friendly?
Does it have problems with false negatives as mentioned by @Eric Wieser in #Machine Learning for Theorem Proving > Tool to verify a solution solves the actual problem??
Does it fit common use cases, or does it have too many restrictions?
Is it usable for benchmark problems? For problems in the formal conjectures project? For blueprint projects?

Jason Rute (Dec 29 2025 at 15:57):

(deleted)

Boris Alexeev (Dec 29 2025 at 19:11):

I have done a bunch of Erdős problem+AI stuff, and I would generally like to use a tool like this. However, to answer the question: I have not used Comparator.

Boris Alexeev (Dec 29 2025 at 19:12):

I do verify all of the proofs I upload, but I either do it manually or with my own script (which is definitely easily exploitable).

Boris Alexeev (Dec 29 2025 at 19:17):

I think I could try using Comparator, and it might even be productive for my use case, but there is one thing that catches my attention:

The site lists this as an assumption:

You have not previously tried to compile the Solution file (as that might compromise your Challenge file to make it seem like you are looking for a different proof than you actually are)

But that doesn't match my use case at all. I'm often working with parts of the solution file for a long time before it's done, so I've probably previously tried compiling it.

I assume that for my non-adversarial use case, it would still be okay to use Comparator? But still, it gives me pause.

Jason Rute (Dec 29 2025 at 22:40):

@Boris Alexeev I personally think that Comparator is making a bunch of assumptions that are probably not realistic in practice. I think they assume you have a Challenge and Solution file, where some independent entity came up with the Challenge file (say that Erdos problem was formalized as part of the formal conjecture project). Then you make a matching solution file (which you can compile locally all you want, as long as the challenge file lives somewhere safe like in a github repo). Then ideally if someone wants the purest check possible, they would combine the challenge and solution file together on a fresh machine, never running the solution file in that machine (except in a sandboxed environment) which is what Comparator does) to make sure the Solution files don't do anything crazy (like hack into the Challenge file and change it.

But I think we shouldn't let absolute perfect security be the enemy of reasonable security. For modern concerns, I think there is still a lot of value to you taking your solution, abstracting out a challenge file after the fact, and running Comparator to check the results. That way, you have a Challenge file which contains an easy-to-inspect version of your theorem statement, and a tripled checked Solution proof. Comparator, even if you violate the compiling requirements, will still be the best possible checker out their for Lean. It will check for axioms, for full correctness, and that your challenge and solution match at the term level. The do-not-compile requirements are to prevent very sophisticated hacking that I think is beyond current AI and beyond almost all current Lean users. (To be clear, this is my opinion, and I suspect @Henrik Böving may disagree with me on this.)

Jason Rute (Dec 29 2025 at 22:44):

So I decided I should actually try to use Comparator myself. I'll report back what I find. (The landrun requirement is quite onerous, because nothing else in the Lean ecosystem requires golang, or requires pure Linux, but I think I have figured that out.)

Chris Henson (Dec 29 2025 at 22:59):

I don't understand the hesitancy around having not previously compiled. To comply with this, it seems you can just delete .lake or in a CI setting have this running in a fresh clone, right?

Jason Rute (Dec 29 2025 at 23:24):

I think @Henrik Böving is worried about the most sophisticated level of attack where a tactic or IO code modifies files on the file system including the challenge file. But I see no reason the person writing the code can’t compile it and also run Comparator.

Henrik Böving (Dec 29 2025 at 23:25):

I wouldn't say that's the most sophisticated level of attack I can think of but that is what this advice is guarding against yes.

James E Hanson (Jan 02 2026 at 09:25):

Is there an example of the setup you need to use Comparator? The readme on the Github repo mentions modules. Are these the kind of modules you need to set experimental.module to true in order to use?

Henrik Böving (Jan 02 2026 at 10:45):

No, module is just the name for a singular Lean file and has always been. The module system naming confusion is a bit unfortunate here.

James E Hanson (Jan 02 2026 at 18:25):

Thank you, but how are the files supposed to be arranged? I've tried putting the challenge and solution files both in a separate Lean project and in the Comparator folder itself and lake env .lake/build/bin/comparator comparator.json keeps failing with

Building Challenge
error: unknown target `Challenge`
uncaught exception: Child exited with 1

Henrik Böving (Jan 02 2026 at 20:19):

λ ls
Challenge.lean  config.json  lakefile.toml  lake-manifest.json  lean-toolchain  Notacheater  Notacheater.lean  README.md  Solution.lean
λ cat lakefile.toml
name = "notacheater"
version = "0.1.0"
defaultTargets = ["Notacheater"]

[[lean_lib]]
name = "Solution"

[[lean_lib]]
name = "Challenge"
λ cat Challenge.lean
theorem mp : ∀ (P Q : Prop), P → (P → Q) → Q := sorry
λ cat Solution.lean
theorem mp : ∀ (P Q : Prop), P → (P → Q) → Q :=
  fun _ _ hp hpq => hpq hp

λ cat config.json
{
    "challenge_module": "Challenge",
    "solution_module": "Solution",
    "theorem_names": ["mp"],
    "permitted_axioms": ["propext", "Quot.sound", "Classical.choice"]
}

James E Hanson (Jan 02 2026 at 21:40):

Thank you but it's now complaining that Challenge.olean has an 'incompatible header'

Building Challenge
⚠ [2/3] Replayed Challenge
warning: Challenge.lean:2:8: declaration uses 'sorry'
Build completed successfully (3 jobs).
Exporting #[mp, propext, Quot.sound, Classical.choice] from Challenge
uncaught exception: process 'landrun' exited with code 1
stderr:
uncaught exception: failed to read file '/home/james/Notacheater/.lake/build/lib/lean/Challenge.olean', incompatible header

James E Hanson (Jan 02 2026 at 21:41):

Could this be because I have Notacheater and Comparator on different versions of the Lean toolchain?

James E Hanson (Jan 02 2026 at 21:46):

No, that doesn't seem to be the issue. I adjusted them to be the same and I'm getting the same error.

Henrik Böving (Jan 02 2026 at 21:53):

lean4export needs to be built with the same toolchain as well

James E Hanson (Jan 02 2026 at 22:02):

I think I have all three on v4.27.0-rc1, but I'm still getting the same error. Is it possible to make the errors more verbose?

James E Hanson (Jan 02 2026 at 22:39):

Okay I was able to get it to work by going back to v4.26.0-rc2. Perhaps this was because lean4export is using something from Mathlib?

Thomas Zhu (Jan 04 2026 at 20:38):

We did not use Comparator or SafeVerify for Seed-Prover, but we may look into it in the future. I think one thing that would lower the barrier for people to use it (assuming it is already production ready) is to have tagged releases for every toolchain version.

For our benchmark results, we use a crude check of "no sorry warnings + no errors + only the 3 standard axioms" first, and for the final submission result we also check that the type of the proved theorem is defeq to the original sorried theorem. The latter gave some false negatives in very limited cases (eg adding "open Classical" as a standard header before the theorem statement might change the statement in a non-definitionally-equal way) that we inspect manually are not hacks.

Kim Morrison (Jan 05 2026 at 01:01):

Thomas Zhu said:

I think one thing that would lower the barrier for people to use it (assuming it is already production ready) is to have tagged releases for every toolchain version.

I have created a v4.27.0-rc1 tag for both lean4export and comparator, and lean#11899 will ensure that all future version releases also create tagged versions of these repositories with the latest toolchain.

Last updated: Feb 28 2026 at 14:05 UTC