Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Guarding against exploits in AI code


Joseph Myers (Jul 29 2025 at 10:22):

I think it's clear we could do with a better approach provided as standard in Lean + Mathlib for marking up problems to be solved and then verifying a provided solution. That is, a standard tool to check that a file with a solution (after exporting the proof from Lean and reading back into a separate checker) does indeed provide a properly typed term, depending only on the standard three axioms (not Lean.ofReduceBool), whose type matches (is defeq to) that in the trusted file with the problem. Along with that, there should be standard ways to mark up "this is the problem to be solved" and "this is where you need to substitute an answer of this type". And then the tools for checking the solution should also be able to print the provided answer using only trusted delaborators for human verification that it's a legitimate answer to the problem. Optionally, but probably useful for multiple repositories, Compfiles-style markup for indicating which parts of a file with both problem and solution are the problem statement and answer to be given, and which are the solution.

(This might not actually be used in Lean + Mathlib, but it seems sufficiently generally relevant to be worth having as a standard tool there, or failing that in some leanprover-community repository, and arguably parts of it would be appropriate to use in Archive/Imo to indicate what's the original problem and what's the solution to it / answer the problem asked to be found.)

Jason Rute (Jul 29 2025 at 10:37):

Joseph Myers said:

I think it's clear we could do with a better approach provided as standard in Lean + Mathlib for marking up problems to be solved and then verifying a provided solution.

(If I understand you correctly,) I've been advocating for something like this since the DeepSeek v2 bug was found. The best thing I've found so far is SafeVerify by @GasStationManager. I think I've successfully convinced @George Tsoukalas to use it to check the PutnamBench solutions, and I've advocated for it to be made into something more official, similar to lean4checker (although it mostly relies on official components like lean4checker's replay and the code Lean uses to check axioms). @Mario Carneiro seemed to think it was pretty good. It also has support for solutions (although they would still need to be checked manually for being in the right form). @Rob Lewis's autograder is the other alternative.

Chris Hughes (Jul 29 2025 at 10:39):

One of the issues with asking an AI to fill in the answer is that sometimes the question might ask you to find a set of solutions to an equation, and this implicitly means find a simple description of this set, but in Lean you could just write down the equations as the definition of the set. In fact on all problems you could use Classical.epsilon (property given in the question) as the answer. So you need to be precise about what find the solution means and this isn't that easy.

Joseph Myers (Jul 29 2025 at 10:42):

Jason Rute said:

Joseph Myers said:

I think it's clear we could do with a better approach provided as standard in Lean + Mathlib for marking up problems to be solved and then verifying a provided solution.

(If I understand you correctly,) I've been advocating for something like this since the DeepSeek v2 bug was found. The best thing I've found so far is SafeVerify by GasStationManager. I think I've successfully convinced George Tsoukalas to use it to check the PutnamBench solutions, and I've advocated for it to be made into something more official, similar to lean4checker (although it mostly relies on official components like lean4checker's replay and the code Lean uses to check axioms). Mario Carneiro seemed to think it was pretty good. It also has support for solutions (although they would still need to be checked manually for being in the right form). Rob Lewis's autograder is the other alternative.

Yes, that's the sort of thing I want, but as something standard with Lean. So the reference manual would say something like"To check a claimed solution to a problem, run (some command) in a sandbox to process and export the untrusted proof, then run (some command) to check the exported file does provide a valid proof of the type requested in your trusted file with the problem.". (Together with answer markup.)

Joseph Myers (Jul 29 2025 at 10:45):

Chris Hughes said:

One of the issues with asking an AI to fill in the answer is that sometimes the question might ask you to find a set of solutions to an equation, and this implicitly means find a simple description of this set, but in Lean you could just write down the equations as the definition of the set. In fact on all problems you could use Classical.epsilon (property given in the question) as the answer. So you need to be precise about what find the solution means and this isn't that easy.

This is where human verification of the answer comes in. (I referred above to using only trusted delaborators to display the answer for review since an adversarial file could in principle have an answer term that just restates the problem but disguise it so that in the Lean source it looks like a valid answer, until you notice the meta code elsewhere in that file being used to cheat.)

Jason Rute (Jul 29 2025 at 10:46):

I don't think you need fancy markup. But if you do, then yes, it should be standard in Lean or at least mathlib. I think you just need a template file and a filled-in version. One reason for needing the template anyway is to be sure that the solution didn't change the meaning of the problem. I've found lots of ways you can do this in Lean, especially if you are allowed to add code before the theorem statement. (It doesn't even have to be meta code. A simple local instance could be enough.)

Jason Rute (Jul 29 2025 at 11:29):

By the way, I was referring above to local instance changing the meaning of a theorem statement, but I see @Joseph Myers's point about how it can also change the meaning of a solution. I present to you the following:

import Mathlib
-- pay no attention to this local instance behind the curtain
local instance : LT Nat := fun x y => ( m, (m  x) -> (m  y))

def solution : Set (Nat × Nat) := { x | x.1 < x.2 }  -- this looks reasonable, right?

theorem foo : { x : Nat × Nat | ( m, (m  x.1) -> (m  x.2)) } = solution := by
trivial

Jason Rute (Jul 29 2025 at 11:34):

And no, SafeVerify wouldn't be able to spot that easily (nor likely any current tool) .

(deleted) (Jul 29 2025 at 11:42):

The FRO should create an officially maintained tool for checking Lean proofs against a specification.

Alex Meiburg (Jul 29 2025 at 11:54):

Well, there's a separate issue here that I think Lean should issue a warning when defining an instance where one can already be synthesized, at least if they aren't defeq... :) But obviously there's other ways to make a sneaky definition

GasStationManager (Jul 29 2025 at 16:33):

It's not just the native_decide in those p3 and p4 proofs, which could be checked with additional manual effort; it is also the knowledge that this prover model will always try to use native_decide (since it's been trained to do so) in its future proofs, and then how much you trust these future proofs will boil down to how much you trust native_decide. (I'd say the same for Kimina Prover and its use of native_decide.)

Right now, SafeVerify (and I assume lean4checker) would not be able to process a proof containing native_decide. Hypothetically, there could be a utility that replaces each native_decidewith sorry, then pass it to be checked by lean4checker or SafeVerify, and then prints out the goal state for each sorry (ideally from inside the replayed environment) to show what were the goals resolved by native_decide. But Harmonic (and/or the Kimina team) should really be the one doing that.

(deleted) (Jul 29 2025 at 16:51):

Data contributors like me and others usually use native_decide to reduce the time needed to write proofs. Looks like its use is controversial within the larger Lean community.

Jason Rute (Jul 29 2025 at 18:18):

That is yet another danger, data contributors like @Huỳnh Trần Khanh, especially if paid by the theorem, could use native_decide in an inconsistent manner to supply data even faster. :smiling_devil:.

Niels Voss (Jul 29 2025 at 18:18):

I don't think there's a problem with using native_decide to help you write proofs, in the same way that there's no problem with using sorry to help write proofs; only when it is left in the finished proof is it controversial. I suspect that with the number of times @[extern] and @[implemented_by] are used in Lean core, Lean.ofReduceBool is already inconsistent.

Jason Rute (Jul 29 2025 at 18:19):

@Niels Voss You don't have to suspect: #general > Using `native_decide` to prove False?

Niels Voss (Jul 29 2025 at 18:20):

Oh I remember this thread now.

Yiwei (Jul 29 2025 at 19:07):

Jason Rute said:

I don't think you need fancy markup. But if you do, then yes, it should be standard in Lean or at least mathlib. I think you just need a template file and a filled-in version. One reason for needing the template anyway is to be sure that the solution didn't change the meaning of the problem. I've found lots of ways you can do this in Lean, especially if you are allowed to add code before the theorem statement. (It doesn't even have to be meta code. A simple local instance could be enough.)

Is this vulnerability limited to changing the file before the theorem statement? i.e. is lean safe from prompt injection style manipulation after the theorem statement?

Jason Rute (Jul 29 2025 at 19:18):

@Yiwei If you allow native_decide (more specifically if you allow its axioms), then you can put in an inconsistent proof without modifying anything above the proof. See the example in the above link. If you only allow the three standard axioms, it’s more subtle. I know of no “attack” which survives #print axioms and works without putting code above the theorem. And if there was such an exploit, I doubt it would survive SafeVerify (unless of course the kernel was unsound or the axiom checker had a bug).

Niels Voss (Jul 29 2025 at 19:21):

One such attack is that you can redefine #print axioms to always show the standard three. That being said, these super adversarial attacks seem very unlikely in the case of Harmonic's solutions, given that they published the outputs and made them easy to check by hand. However, I think they would be a more serious concern if LLMs started generating much longer proofs.

Jason Rute (Jul 29 2025 at 19:22):

If you don’t do #print axioms (which also checks that the proof was added to the environment), DeepSeek-Prover v2 found an exploit (since fixed) which silently prevented the theorem from being added to the environment. The model used it a few times to give bad proofs of theorems.

Jason Rute (Jul 29 2025 at 19:23):

@Niels Voss Oh, I see. If you assume print axioms will be run after your code, then just add meta code redifining it to the end of your proof! I like/hate it!

Niels Voss (Jul 29 2025 at 19:31):

Personally, I believe that in an adversarial context, nothing short of generating .olean files, moving those .olean files to another computer (in case the original code contains malware), running an external verifier on the .oleans, and having a human manually check the theorem statement and every single definition and typeclass instance in use would suffice. But that's a topic for another thread.

Jason Rute (Jul 29 2025 at 19:46):

I agree about moving this to another thread. But it’s happened once (albeit on a naive level) and it will probably happen again. It isn’t just academic. We already have good partial solutions like SafeVerify. Such tools not only help AI research but could assist large blueprint projects and industrial consulting projects. I worry throwing our hands up in the air and saying it it too hard will delay adoption and further development of such tools.

Mario Carneiro (Jul 29 2025 at 19:48):

Niels Voss said:

Personally, I believe that in an adversarial context, nothing short of generating .olean files, moving those .olean files to another computer (in case the original code contains malware), running an external verifier on the .oleans, and having a human manually check the theorem statement and every single definition and typeclass instance in use would suffice. But that's a topic for another thread.

I believe SafeVerify is doing basically this (minus the separate computer part)

Niels Voss (Jul 29 2025 at 19:50):

I worry throwing our hands up in the air and saying it it too hard will delay adoption and further development of such tools.

Oh, I'm not trying to argue that reliable and complete verification is virtually unattainable. On the contrary, I think it's very much within reach and should be something we should aim for. I was just trying to provide a counterexample for your statement that #print axioms suffices to check a proof.

Notification Bot (Jul 29 2025 at 19:51):

26 messages were moved here from #Machine Learning for Theorem Proving > Harmonic: IMO Livestream by Jason Rute.

Jason Rute (Jul 29 2025 at 19:52):

Sorry it was hard to make a clean break from the previous conversation.

Niels Voss (Jul 29 2025 at 19:54):

I believe SafeVerify is doing basically this (minus the separate computer part)

From SafeVerify's README.md, it seems that it actually allows you to do the separate computer part if you want? In which case, it satisfies my requirements

Jason Rute (Jul 29 2025 at 20:06):

I was just trying to provide a counterexample for your statement that #print axioms suffices to check a proof.

I would love for more people to at least start with #print axioms. That would have caught the DeepSeek bug. And IIRC, we don't currently have a good way to make sure only the three axioms are used in Mathlib, and I don't think there is any sort of check for this in the Mathlib CI. But certainly, I agree this isn't sufficient and I apologize for not making that point more clearly.

Jason Rute (Jul 29 2025 at 20:08):

(Ok, we do have a good manual way: code review.)

Joseph Myers (Jul 30 2025 at 01:48):

I wouldn't say native_decide is controversial; rather, different subcommunities have different conventions. I think it's fairly clear in the mathlib and mathlib-adjacent communities that ofReduceBool is not a legitimate axiom to depend on for formally verified mathematics, exactly the same as that using axiom is not a proper way to set up a mathematical theory even though some beginners get the idea that it is (understandably, given informal descriptions of mathematical structures based on axioms they satisfy); only the three standard axioms are legitimate. Whereas it seems some people working in software verification are more likely to wish to use bv_decide and get a dependency on ofReduceBool that way. There may be certain similar niche cases in mathematics (verifying a terabyte-sized certificate generated by an external SAT solver, for example), but in those cases I'd expect the person claiming the proof with ofReduceBool is legitimate to be writing a paper about the project involving computations that can't reasonably be handled in Lean without ofReduceBool, which can discuss in detail exactly how the dependency on ofReduceBool is kept to a minimum.

Joseph Myers (Jul 30 2025 at 02:45):

Jason Rute said:

I agree about moving this to another thread. But it’s happened once (albeit on a naive level) and it will probably happen again. It isn’t just academic. We already have good partial solutions like SafeVerify. Such tools not only help AI research but could assist large blueprint projects and industrial consulting projects. I worry throwing our hands up in the air and saying it it too hard will delay adoption and further development of such tools.

Right now, the people we see claiming AI solutions to unsolved problems, and claiming that the AI gave a formal proof in Lean, are producing AI slop that doesn't even compile, or proves a statement that obviously has no relation to the actual unsolved problem they're claiming to have solved.

I expect more plausible-looking claims to appear comparatively soon, simply because occasionally an unsolved problem does turn out to have a fairly straightforward solution using known ideas, and would like us to have an official verification solution, clearly documented, when such claims start appearing. (That still won't help with the people who claim to have solved a major open problem but whose formal statement is something different and much easier.)

Jason Rute (Jul 30 2025 at 02:52):

Besides the usual stuff (run it in Lean, make sure it type checks with 0 errors) and the stuff that should be usual (print the axioms, run it in lean4checker), it then gets a bit complicated if someone is formalizing and solving a problem themselves. Of course, if the person is a crank, it is going to be painful. But if they are sincere and well-meaning, that can go a long way. The interaction can be about teaching and not adversarial. And for this latter camp, I agree the Lean community could develop good protocols that would reduce the burden.

Jason Rute (Jul 30 2025 at 02:57):

A simple one would be to make two files, one with just the theorem statement and a sorried proof and the other with everything. Then run it through SafeVerify. That will make it easier to check the theorem statement.

Jason Rute (Jul 30 2025 at 02:59):

Another would be to emphasize that for problems in formal conjectures (or similar), just because you solved a problem as written, doesn't mean you solved the actual open problem. We need to check that the problem was formalized correctly. There could be mistakes. (Of course, this is unpleasant for everyone if this happens, but it is good to get this out in the open right away.)

Niels Voss (Jul 30 2025 at 06:13):

But if they are sincere and well-meaning, that can go a long way. The interaction can be about teaching and not adversarial. And for this latter camp, I agree the Lean community could develop good protocols that would reduce the burden.

I definitely agree that most Lean code is not written in an adversarial way. The reason I suggested treating this in an adversarial context is that

  1. RL is somewhat similar to an adversarial context in the sense that the AI will just try random things and if one of them happens to succeed, it will cause the AI to always exploit it. In this case, it would be in the AI developers own interest to make sure the code is adverserally secure.
  2. As Lean becomes more widely used, people will likely start demanding increasing levels of security. It's better in the long run if everyone always uses the most secure option. Kind of like how we prefer sha256 over md5 even for non-security sensitive applications.

I think SafeVerify is a good approach right now. I also agree that developing protocols and social convention to use those protocols is just as important as developing tools.

Yiwei (Aug 02 2025 at 00:07):

Jason Rute said:

Yiwei If you allow native_decide (more specifically if you allow its axioms), then you can put in an inconsistent proof without modifying anything above the proof. See the example in the above link. If you only allow the three standard axioms, it’s more subtle. I know of no “attack” which survives #print axioms and works without putting code above the theorem. And if there was such an exploit, I doubt it would survive SafeVerify (unless of course the kernel was unsound or the axiom checker had a bug).

I'm trying to check my understanding:

The native_decide/implemented_by gap seems to be a different type of problem than the compiler bugs. There could be a subtler proofs of False within the compiler + runtime that get greenlit by native_decide (a longer "false loop") that are harder to detect.

csimp has the additional issue that it doesn't correctly output the axioms it uses, which would make it invisible to some of the proposed fixes.

Appreciate any corrections to the my interpretation.


Last updated: Dec 20 2025 at 21:32 UTC