Zulip Chat Archive

Stream: lean4

Topic: Cheating at proofs using multiple instances

Mitchell Lee (Jan 01 2025 at 04:05):

Here's an easy proof that if a b c n are natural numbers with 0 < a and 0 < b and 2 < n, then a ^ n + b ^ n ≠ c ^ n.

instance : Pow Nat Nat where
  pow a n := 1

@[simp]
theorem Nat.pow_eq_one (a n : Nat) : a ^ n = 1 := rfl

example (a b c n : Nat) (ha : 0 < a) (hb : 0 < b) (hn : 2 < n) :
    a ^ n + b ^ n ≠ c ^ n := by
  simp

This is, of course, cheating: I've redefined a ^ n to be 1, thus making the theorem trivial. However, the cheating would be much more difficult to detect in a long file with more separation between the Pow Nat instance and the theorem statement. I could even imagine doing something like this accidentally.

It is worth noting that I cannot use the same method to prove the following much more difficult theorem, because Nat.pow a n, unlike a ^ n, can't be redefined.

example (a b c n : Nat) (ha : Nat.lt 0 a) (hb : Nat.lt 0 b) (hn : Nat.lt 2 n) :
    Nat.add (Nat.pow a n) (Nat.pow b n) ≠ Nat.pow c n := by sorry

Is there any automatic way to prevent such deceitful and underhanded proof strategies?

Yakov Pechersky (Jan 01 2025 at 04:35):

If in the same library you have a proof that there are infinite positive a b c such that a^2 + b^2 = c^2, one could trust it a little bit more (unless your adversarial Pow instance does the right thing for an exponent of 2 and the evil thing for other exponents).

Matt Diamond (Jan 01 2025 at 04:44):

If you override the OfNat Nat instance you can prove some silly (or just silly-looking) things:

import Mathlib

instance : OfNat Nat 0 where
  ofNat := 1

-- a million lines of code later:

example : (1 : Nat) = 0 := rfl

Mitchell Lee (Jan 01 2025 at 05:15):

Wow, that is silly indeed.

This exploit is, of course, related to the well-known and unsolvable issue with proof assistants that they do not check whether the terms in the theorem statement are defined correctly. However, it is different from proving a theorem by not having the right definition. a ^ n is defined correctly; the problem is that it is also defined incorrectly.

There is already one way around this, which is to use set_option trace.Meta.synthInstance truein important theorems to verify that none of the instances are unreasonable. However, it would be nice if there was an easier way, because it can be difficult to actually find the instance declarations.

To be clear, I don't think this is a huge problem and I would totally understand if the answer was "there is nothing much that can be done about this", but I thought I would bring it up in case someone can, in fact, think of something that can be done about it.

Ruben Van de Velde (Jan 01 2025 at 09:00):

Not much more than "prove things about your definition" - for example that 2 ^ 2 is 4 rather than 1

Kevin Buzzard (Jan 01 2025 at 10:58):

For FLT one could just prove the Peano axioms (succ_ne_zero etc) defining facts about + (add_zero, add_succ) and ^ and <= and 3 (three_eq_succ_succ_succ_zero) and \ne and \not and you should be fine

Daniel Weber (Jan 01 2025 at 12:08):

You could still redefine pow after proving stuff about it

Number Eighteen (Jan 01 2025 at 12:32):

My question and worry is about how stuff like this interact with LLMs. If an LLM dumps a gargantuan, inscrutable proof of a crucial result and verifiers verify it, how can we be reasonably confident that the LLM has not sneaked in a trick like the above? And if this proof is part of a self training loop, how can we avoid such a result within the billions of iterations (which could cascade into training an expert scammer)?

It seems that either we must require that the network "does not cheat" by limiting a priori the kind of output it generates, or we must build a cheat-resistant verifier. The former seems to me impossible, but I am far from an expert.

Ruben Van de Velde (Jan 01 2025 at 12:56):

You can define the conclusion outside of the purported proof (see e.g. docs#FermatLastTheorem), so it won't be affected by the bad instance and then verify the proof produces a term of that type

Johan Commelin (Jan 02 2025 at 09:42):

Note that you need to make sure you get the imports right. If you import the purported proof into the file that defines docs#FermatLastTheorem, then that statement might change its meaning.
So you have to import the statement into the file that has the proof. (Or import both in a new file.)

Last updated: May 02 2025 at 03:31 UTC