Zulip Chat Archive

Stream: general

Topic: Understanding output of Lean web interface


Bogdan Grechuk (Dec 03 2025 at 09:16):

I asked Aristotle to formalize my proof in Lean. After about 30 hours of thinking, it claimed to do the job, and returned the attached Lean code.
aris6-output.lean

I then copied the code into Lean web interface https://live.lean-lang.org/
It returned "No goal" but then 36 messages (attached)
aris6_Lean_output.txt
The messages look like hints, not errors, but I am not sure.

My question is: how from the very long output of Lean web interface extract Yes/No information - the Lean code compiles in Lean or not? Is "No goal" sufficient for this?

The second question: if the code compiles, what other conditions I need to check before claiming that the result is 100% correct is Lean verified? I checked that:

  • the theorem statement in Lean corresponds to my intended theorem
  • the code contains no "sorry"
  • the code contains no "admit"

Is this sufficient, or there is something else needs to be checked? I am asking because earlier I did not know about admit and just checked for sorry. Maybe there is something else I do not know about?

TongKe Xue (Dec 03 2025 at 09:27):

Disclaimer:

  1. I'm still learning Lean myself.
  2. I've never used aesop

However, these lines seem to suggest some goals were not proved. Maybe aesop on Aristotle servers had running time and the web lean interface (probably running on someone else's server?) has very low time limits ?

MathlibDemo.lean:233:32
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:233:32
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:234:39
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:237:48
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:244:47
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:247:46
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:248:28
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:258:4
aesop: failed to prove the goal after exhaustive search.
MathlibDemo.lean:294:10
Try this:
  [apply] ring_nf

  The `ring` tactic failed to close the goal. Use `ring_nf` to obtain a normal form.

  Note that `ring` works primarily in *commutative* rings. If you have a noncommutative ring, abelian group or module, consider using `noncomm_ring`, `abel` or `module` instead.
MathlibDemo.lean:296:165

Bogdan Grechuk (Dec 03 2025 at 09:53):

I have read a message on Aristiotle website that specifically says that "Lean4
aesop: failed to prove the goal after exhaustive search." is not an error but a warning and can be ignored - it just means that the code first tried aesop, and, when it failed, solved the goal by other methods. But 1) they say nothing about the ring tactic message, and 2) anyway, I do not want to trust Aristiotle website, I want to know how to independently verify their claim using Lean web interface.

Yaël Dillies (Dec 03 2025 at 10:39):

We currently do not have a "gold standard" for checking that purported proofs of a theorem actually are proofs of said theorem. There are at least a few I can recommend:

  1. Check the statement and all definitions leading up to it
  2. No sorry, admit, stop
  3. No weird meta code, notation, etc
  4. The theorem actually exists in the environment (by doing eg #check name_of_thm after the AI-generated lemma name_of_thm : ... := ...). There was previously a bug where a declaration would silently error and not be added to the environment, so if you just looked at the infoview, you couldn't spot that the proof was wrong

Yaël Dillies (Dec 03 2025 at 10:41):

There are some more time-taking extra steps you could do, but the above ones are very easy to implement just by hand-inspecting the code and will catch basically all possible exploits.

Bogdan Grechuk (Dec 03 2025 at 11:05):

Ups, this is very unpleasant. One of the main problems of using AI to assist with proofs is that they can cheat and intentionally hide difficult-to-discover error inside the proof. Lean is widely advertised as an ideal solution - it is impossible to cheat, proofs are verified down to axioms of math, and are 100% correct. The user only need to check that (i) the Lean formulation corresponds to what you wanted to prove and (ii) that the proof compiles correctly. Now you say that this is not the case: the AI can cheat by adding sorry, admit or stop (ok, this is easy to check). It also can cheat by adding "weird meta code, notation, etc", or by making "a declaration would silently error and not be added to the environment". But even after checking all this, it is still no warranty that the proof is valid, because "there are some more time-taking extra steps you could do". In conclusion, it seems to be that not having an easy way for "checking that purported proofs of a theorem actually are proofs of said theorem" invalidates one of the main purposes to use Lean...

Bogdan Grechuk (Dec 03 2025 at 11:11):

Would it be difficult to automate this process and create a separate web interface, where you can enter theorem formulation, claimed proof, and just get a clear yes/no output whether the proof compiles or not? Here, the Yes answer should imply that the formulated theorem is 100% correct, the proof is verified down to axiom of math, and there is no way at all to cheat. Any "sorry", "admit", "stop" in the proof, or any "weird meta code, notation, etc", or anything else that can invalidate the proof - should result in "No" output.

Bogdan Grechuk (Dec 03 2025 at 11:14):

If there are currently no clear and easy to use method "for checking that purported proofs of a theorem actually are proofs of said theorem", may I at least have this question resolved in my specific case: can some experts tell me whether the provided Lean code indeed proves that y^2 + x^3 * y + z^3 + 3 ≠ 0 for all integers x,y,z ?

Yan Yablonovskiy 🇺🇦 (Dec 03 2025 at 11:20):

Bogdan Grechuk said:

If there are currently no clear and easy to use method "for checking that purported proofs of a theorem actually are proofs of said theorem", may I at least have this question resolved in my specific case: can some experts tell me whether the provided Lean code indeed proves that y^2 + x^3 * y + z^3 + 3 ≠ 0 for all integers x,y,z ?

I believe you make two further assumptions, which I haven't looked into, such as:

/-
Property: If $m$ is odd and its prime factors satisfy the condition, then $m \neq 4u^2+2uv+7v^2$.
-/
def Buell3Property : Prop :=
   (m : ), Odd m  ( p, Nat.Prime p  p  m  ( u v : , (p : ) = u^2 + 27 * v^2)  3  Nat.factorization m p) 
  ¬( u v : , (m : ) = 4 * u^2 + 2 * u * v + 7 * v^2)

Kim Morrison (Dec 03 2025 at 11:24):

Bogdan Grechuk said:

Now you say that this is not the case: the AI can cheat by adding sorry, admit or stop (ok, this is easy to check). It also can cheat by adding "weird meta code, notation, etc", or by making "a declaration would silently error and not be added to the environment". But even after checking all this, it is still no warranty that the proof is valid, because "there are some more time-taking extra steps you could do". In conclusion, it seems to be that not having an easy way for "checking that purported proofs of a theorem actually are proofs of said theorem" invalidates one of the main purposes to use Lean...

Please see lean4checker and comparator. If you run a result through these I think you can be very confident. But yes, if you are interested in checking adversial proofs, you need to learn these tools.

Kim Morrison (Dec 03 2025 at 11:30):

I have some suggestions about how we should ask people to publish AI generated proofs (or generally, anything there's reason to be skeptical, or act adversially). The key is to have the statement of your claim (without any proof), in a file with no notation, no imports outside of Mathlib, and ideally few or no definitions. So this will contain something like

def StatementOfMyAmazingTheorem : Prop := ...

(note we are defining a proposition here, not giving a proof of it)

Then, in separate files, you can do whatever you like, and then finally write

theorem proofOfMyAmazingTheorem : StatementOfMyAmazingTheorem := by ai_magic

This structure makes adversarial verification much easier: ideally as a community we would publish a document insisting that "big claims" always adhere to this structure.

(With the new module system, it's even better: you can demand that nothing is publicly exported by the project except the def (and its body) and theorem above, but none of the "implementation".)

Bogdan Grechuk (Dec 03 2025 at 11:31):

Yan Yablonovskiy, ah, yes, I mean is the result is 100% true as stated, with the listed assumptions?

theorem main_theorem_final_v3 (h_cubic : CubicResidueProperty) (h_buell : Buell3Property) (x y z : ℤ) :
y^2 + x^3 * y + z^3 + 3 ≠ 0

Kim Morrison (Dec 03 2025 at 11:31):

(And if you claim you need new definitions before you can even state your amazing theorem --- which is perfectly reasonable --- then you need to participate in the social process of making the PR of those definitions to Mathlib, and undergoing peer review!)

Bogdan Grechuk (Dec 03 2025 at 11:36):

Kim Morrison -- this I understand and this is perfectly reasonable. I need to be able to state the theorem, understand the statement, and check that the formal statement is what I actually want to prove. What worry me is that even if the statement is correct and perfectly understood, I still cannot easily check the correctness of the proof in web interface. Thank you for letting me know that this is at least possible with lean4checker and comparator - I will have a look.

Kevin Buzzard (Dec 03 2025 at 11:36):

@Bogdan Grechuk #backticks for posting Lean code, and it's not possible to judge the theorem as you posted it without seeing what CubicResidueProperty and Buell3Property are (for example if Buell3Property is the assumption that 2 + 2 = 5 then it's very easy to prove anything). Note: you're asking about checking the proof but independent of that one has to check the statement (which is what I'm talking about).

Bogdan Grechuk (Dec 03 2025 at 11:39):

Kevin Buzzard - the definition of Buell3Property is inside the provided Lean code, and I had a look at it. I understand that this is my job to check the statement. I am only asking for help to checking the claimed proof. Does the provided proof compiles in Lean and confirms that the statement is correct - whatever it means?

Sébastien Gouëzel (Dec 03 2025 at 11:56):

The statement of your theorem admits CubicResidueProperty and Buell3Property. So in a sense it is not a complete proof of the theorem, because a complete proof would also prove these two properties instead of admitting them.

Malvin Gattinger (Dec 03 2025 at 14:47):

Sorry if this is offtopic, but a general issue here seems to me that newcomers (not to blame anyone!) often only know and only use the web version of Lean which can only work on a single file. Rigorous checks like the separation of the trusted statement and untrusted proof into two files or running lean4checker seem to be out of reach then, right?

And I would hope that anyone providing an API like Aristotle also tells their users how to make sense of the output and provide a way to verify it as easily as generating it.


Last updated: Dec 20 2025 at 21:32 UTC