Zulip Chat Archive

Stream: lean4

Topic: apply? might suppress other warnings/errors

Thomas Zhu (May 02 2025 at 05:35):

In Lean v4.9.0, the following

import Mathlib.Tactic.TypeStar

theorem f : False := by
  have := Type*
  apply?

shouldn't work. It should raise an error invalid occurrence of universe level 'u_1' at 'f', it does not occur at the declaration type, nor it is explicit universe level provided by the user, occurring at expression, because there is no universe parameter to f. In fact, I think this error is indeed reached, and f isn't even added to the environment. However, this error interacts weirdly with apply?, and apply? seems to have suppressed the error (and its own warning declaration uses 'sorry').

But:

In VS Code, there are no errors or warnings in InfoView. The error/warning counter on the bottom-left is 0/0.
lake build --wfail succeeds with code 0 (so CI passes).
lake env /path/to/lean4checker@4.9.0/.lake/build/bin/lean4checker False returns code 0 (probably because f is not in the environment?).

This is not the same issue as, but is related to, #lean4 > #print axioms does not report sorry, in that apply? suppresses all warnings and errors, when the particular error in the linked thread occurs.

This is also not the same issue as the 500 limit of squiggly lines in VS Code. Note that apply? on False gives only 23 suggestions here.

All in all, on the surface this really really looks convincing, though of course it is not a "real" issue because f is not added.

Reproduce the error: https://github.com/hanwenzhu/lean-demo-20250502

#xy: this is minimized from 3 of DeepSeek-Prover-V2's (wrong) solutions, where the prover apparently exploited this issue. (DeepSeek would have caught this if they checked the theorem they wanted to prove is indeed in the environment after the proof.)

I would really appreciate someone more familiar with Lean core to determine if this is a trivial thing or indeed a serious bug. Thank you!

Jeremy Tan (May 02 2025 at 05:37):

Why are you using v4.9.0? Everyone in the public sphere has moved on from it

Thomas Zhu (May 02 2025 at 05:40):

As far as I know this issue was not reported before, so without knowing what caused the error, a variant of the MWE might work on newest Lean. This particular example doesn't. (and I really hope that this issue was somehow fixed between 4.9.0 to 4.19.0, but I need a much more knowledgeable person to help confirm this)
DeepSeek-Prover-V2 used v4.9.0, and wrongly claimed they proved some theorems, because of this issue.

Kim Morrison (May 02 2025 at 10:18):

Here's a slightly better minimization, which doesn't need Mathlib, and clearly shows both that no error message appeared on v4.9.0, but that no proof of False was ever added to the environment:

-- Uh oh, no error messages generated on this apparent proof of `False`.
#guard_msgs (drop info) in
theorem f : False := by
  have := Type _
  apply?

-- It's not so bad: it was never added to the environment, so you can't do anything with it.
/-- error: unknown identifier 'f' -/
#guard_msgs in
example : False := f

As of v4.13.0, error messages are generated as expected.

Kim Morrison (May 02 2025 at 10:23):

Thomas Zhu said:

I would really appreciate someone more familiar with Lean core to determine if this is a trivial thing or indeed a serious bug. Thank you!

This is a user interface bug (dropped error message), that was present up to v4.12.0, but not a soundness bug, because you could not use the apparent proof of False to do anything else.

It was certainly a bad user interface bug --- because a piece of text that apparently proves False was accepted!

But it is the sort of thing that any AI pipeline or autograder for students should take responsibility for: verify that the declaration you're claiming to prove is actually in the environment.

As a point of comparison, one could easily write a macro that overrides the theorem syntax, turning it into a no-op. Then you can write theorem false : False := trivial, and have the file be accepted. Of course nothing will have been added to the environment!

Jason Rute (May 02 2025 at 10:27):

I claim Lean needs to provide robust and easy-to-use auto-grading solutions. Otherwise people will be lazy and not use them. In Lean 4.9 what would have been the right solution to catch this? (I thought it was lean4checker, but in that case that wasn’t enough, right?) How do we make the correct solution more well known?

Jason Rute (May 02 2025 at 10:28):

(Actually maybe that should be its own topic so it is easy to find.)

Jason Rute (May 02 2025 at 10:30):

For this topic, are their tests catching this user interface bug so it doesn’t happen again?

Kim Morrison (May 02 2025 at 10:31):

The standard autograder solution today is Rob Lewis' https://github.com/robertylewis/lean4-autograder-main. We should investigate whether this would have been sufficient for DeepSeek's needs!

Kim Morrison (May 02 2025 at 11:11):

There is a minimal example of setting up the autograder for circumstances like this at https://github.com/kim-em/autograder_false.

git clone https://github.com/kim-em/autograder_false
cd autograder_false
./autograder.sh

should print the output

Submission compilation output:
Try this: refine Nat.not_succ_le_zero ?_ ?_
Try this: refine false_of_true_eq_false ?_
Try this: refine false_of_ne fun a => ?_
Try this: refine (ne_self_iff_false ?_).mp fun a => ?_
Try this: refine Ne.elim (fun a => ?_) ?_
Try this: refine true_iff_false.mp ?_
Try this: refine false_iff_true.mp ?_
Try this: refine (iff_false_intro fun a => ?_).mp ?_
Try this: refine (not_iff_false_intro ?_).mp ?_
Try this: refine false_of_true_iff_false ?_
Try this: refine Ne.irrefl fun a => ?_
Try this: refine not_true.mp ?_
Try this: refine neZero_zero_iff_false.mp ?_
Try this: refine (and_not_self_iff ?_).mp ?_
Try this: refine (not_and_self_iff ?_).mp ?_
Try this: refine (Bool.eq_true_and_eq_false_self ?_).mp ?_
Try this: refine (Bool.eq_not_self ?_).mp ?_
Try this: refine (Bool.not_eq_self ?_).mp ?_
Try this: refine (Bool.eq_false_and_eq_true_self ?_).mp ?_
Try this: refine (Int.negSucc_not_nonneg ?_).mp ?_
Try this: refine (Int.negSucc_not_pos ?_).mp ?_
Try this: refine Lean.Omega.Int.le_lt_asymm ?_ ?_
Try this: refine Lean.Omega.Int.lt_le_asymm ?_ ?_
Try this: refine (List.mem_nil_iff ?_).mp ?_
autograder/submission.lean:1:8: warning: declaration uses 'sorry'


false:
  failed (0 points)
  Proof contains sorry

Kim Morrison (May 02 2025 at 11:13):

Note that this autograder example is running on v4.15.0, where the declaration is being added, and the autograder is detecting the sorry.

I haven't quite worked on what the invocation for autograder is back on a v4.9.0 compatible version, but hopefully @Rob Lewis will be able to help me with this. I'm pretty confident that once we work out how to invoke it the autograder will successfully notice that on v4.9.0 the declaration was simply not added at all, and similarly "fail the student".

Kim Morrison (May 02 2025 at 11:14):

(If anyone wants to experiment in the meantime, you can:

change the lean-toolchain in that repository to point to v4.9.0
change the lakefile.toml, to use a version of autograder compatible with v4.9.0 (see comment in the file)
change the attribute from autogradedProof to autograded in autograder/solutions.lean
try to work out how to modify autograder.sh to make the old version run!)

Thomas Zhu (May 02 2025 at 13:16):

Kim Morrison said:

Thomas Zhu said:

I would really appreciate someone more familiar with Lean core to determine if this is a trivial thing or indeed a serious bug. Thank you!

This is a user interface bug (dropped error message), that was present up to v4.12.0, but not a soundness bug, because you could not use the apparent proof of False to do anything else.

Was this UI bug known before? What happened during v4.13.0 that apparently fixed the bug? I think DeepSeek Prover (through very large scale random generations) found and then exploited this bug, because it tried to use Cardinal.ofNat which also triggers this bug. Of course the proof is wrong, but will it be fair for DeepSeek to now claim their model actually found and exploited a bug in Lean's UI?

Jason Rute (May 02 2025 at 13:24):

Although we now know from an author that they used the REPL (see the DeepSeek-Prover v2 thread), so even if the AI could have explointed the UI, in practice it only exploited the REPL. (Of course it isn’t fully clear yet, what they should have done, and what all AI researchers going forward, should do to catch this stuff.)

Thomas Zhu (May 02 2025 at 13:48):

But the root cause of exploiting the REPL is just this Lean UI bug that doesn't have anything to do with REPL

Thomas Zhu (May 02 2025 at 13:53):

I like the idea of using Rob's autograder, if it actually catches the error above. The repo https://github.com/BrownCS1951x/fpv2024 uses autograder @ bac0a6c9b83279e903d89c56621dbe89fd2b27f3 on Lean v4.10.0, and someone should test this setup against the bug above (I don't have time currently to do that)

Jason Rute (May 02 2025 at 13:55):

Good point! (Personally I feel this issue deserves a full post mordem paper. I don’t think many of us suspected that by using standard tactics inside a proof that such a weird UI bug could happen that would cause so many of the standard ways to check a Lean proof for errors to fail, even if it was just for dumb reasons like that the theorem wasn’t added to the environment and none of the standard methods check for that.)

Thomas Zhu (May 02 2025 at 13:58):

I agree---I appreciate that from the point of view of the kernel, this error is not really different from a person maliciously changing the elaboration of "theorem" command to a no-op, but on the surface level it really looks like an innocuous proof that passes CI testing (for example see this non-proof: https://github.com/hanwenzhu/lean-demo-20250502/blob/main/putnam_2005_a4.lean).

Rob Lewis (May 02 2025 at 14:40):

Kim Morrison said:

(If anyone wants to experiment in the meantime, you can:

change the lean-toolchain in that repository to point to v4.9.0

change the lakefile.toml, to use a version of autograder compatible with v4.9.0 (see comment in the file)

change the attribute from autogradedProof to autograded in autograder/solutions.lean

try to work out how to modify autograder.sh to make the old version run!)

@Kim Morrison

The v4.10 autograder also compiles in v4.9, but it's pretty hard-coded to the Gradescope directory layout. Since then we've generalized a lot. I quickly hacked up v4.10 to match the layout of your repo instead (without touching any of the actual checking code). https://github.com/robertylewis/autograder_false is your repo, depending on this hacked version. You can run it with lake exe autograder and indeed see that it flags the sorry in this proof.

Rob Lewis (May 02 2025 at 14:44):

There's a 50% chance I've flipped the "assignment" and "solution" but it doesn't matter for this example. One of them is the reference file from the "instructor" and one is the "student" file being checked. But here they're identical, hardcoded to be autograder/{Assignment, Solution}.lean

Kim Morrison (May 02 2025 at 15:09):

Fantastic, thank you Rob.

Thomas Zhu (May 02 2025 at 15:37):

@Kim Morrison @Rob Lewis I think there is a mistake in your repos: the line should be have := Type _, not have := Type

Kim Morrison (May 02 2025 at 15:39):

Yes, fixed now.

Kim Morrison (May 02 2025 at 15:40):

(Note that on later versions have := Type _ is not even allowed.)

Kim Morrison (May 02 2025 at 15:42):

Okay, here are the clean examples:

git clone https://github.com/kim-em/autograder_false
cd autograder_false
git checkout v4.15.0
./autograder.sh    # reports "Proof contains sorry"
git checkout v4.9.0
./autograder.sh   # reports "Declaration not found in submission"

Matthew Ballard (May 02 2025 at 20:33):

@Rob Lewis I did not anticipate you grading the robots. This is great!

Rob Lewis (May 02 2025 at 20:44):

Neither did I! I hope our autograder is up for it.

Rob Lewis (May 02 2025 at 20:45):

We were already planning to make a lot of improvements this summer. If there are requests from the robot operators that don't conflict with its main purpose (grading non-robots), please let me know.

Thomas Zhu (May 02 2025 at 21:46):

The exact mechanism of the error is very simple: it is a synthetic sorry that also doesn't log an error. The following also doesn't report any errors (v4.9.0):

theorem f : False :=
  have := Type _
  sorryAx False (synthetic := true)

apply? calls the function admitGoal without logging errors, breaking the invariant that creators of synthetic sorries should log errors first.

There should have been a big warning in the docstring of admitGoal to say the sorryAx is synthetic in v4.9.0, and direct users to Term.reportUnsolvedGoals or closeUsingOrAdmit instead.

Then elaboration of the theorem tries to throw the expected error of universe parameter here. This error is not actually thrown, because the message data body contains a synthetic sorry. This still stops elaboration: within Term.elabMutualDef (see v4.9.0 source), checkForHiddenUnivLevels raises this error, so the next step addPreDefinitions (which calls addDecl, which would have raised warning declaration uses 'sorry') is not reached.

(Edit: added details)

Thomas Zhu (May 02 2025 at 21:59):

There was probably something in v4.13.0 update that changed some of the parts above. Perhaps there was a similar known issue that led to this. However, I don't have time currently to go through the git blame (see below)

Aaron Liu (May 02 2025 at 22:01):

I can't get the theorem f to print a warning even with a natural sorry

Thomas Zhu (May 03 2025 at 01:21):

Aaron Liu said:

I can't get the theorem f to print a warning even with a natural sorry

Do you have a working example? Changing true to false in the above makes the error pop up.

Aaron Liu (May 03 2025 at 02:05):

Maybe it's the version I'm on

Thomas Zhu (May 03 2025 at 04:39):

I finally had some time to dig through the code changes. The exact commit that fixed this error was lean4#5402, which deferred the checkForHiddenUnivLevels logic (hidden levels check) to later, and also added some error recovery mechanisms, and this allowed elaboration to continue even if the "hidden levels" error is present, so f is finally added, and the warning declaration uses 'sorry' is reached.

Thomas Zhu (May 03 2025 at 05:10):

I found a version of this error that still exists on Lean stable & nightly:

partial def g :
  have : False := by apply?
  False := g

Expected: error invalid use of 'partial', 'g' is not a function and warning declaration uses 'sorry'
Actual: no errors, no warnings

Thomas Zhu (May 03 2025 at 05:37):

Since it is clear now it is not just a Lean ≤ v4.12.0 issue, I created lean#8212. Also changed the title of this topic.

Kim Morrison (May 05 2025 at 17:22):

This is now fixed in lean#8231, which should shortly arrive as part of v4.20.0-rc3.

Last updated: Feb 28 2026 at 14:05 UTC