Zulip Chat Archive

Stream: general

Topic: Fermat Last theorem by omega

Anton Mellit (Sep 20 2024 at 07:07):

Here is a fun example demonstrating that misunderstanding of the way Nat behaves may lead to "proofs". Is there a way to make sure that something like this does not happen deep deep deep in libraries?

theorem Fermat (x y z n : ℕ) (h1 : 2 - n < 0) : (x * y * z ≠ 0) → x ^ n + y ^ n ≠ z ^ n := by
  omega

Henrik Böving (Sep 20 2024 at 07:09):

h1 is just assuming a proof of False. Unless you are in an inconsistent situation you'll never be able to provide that argument.

Daniel Weber (Sep 20 2024 at 07:13):

The problem here is mostly with the definition, and something general which works for these is checking specific examples, and proving stuff using it

Mario Carneiro (Sep 20 2024 at 07:30):

in other words, while this is a concern for the main statement of the theorem, it's not an issue deep deep deep in the middle of a proof, because anything which is assumed has to be proved later so if it was trivially true or trivially false then the proof would not hold together

Mario Carneiro (Sep 20 2024 at 07:31):

it is a concern for large libraries in the sense that they are large collections of theorems, not necessarily all related to each other, so there is a possibility that one of those theorems is accidentally trivialized and no one has noticed because it is not being used

Josha Dekker (Sep 20 2024 at 07:32):

So would it be worth it to lint against such things? To check if a theorem assumes an hypothesis which reduces to False?

Mario Carneiro (Sep 20 2024 at 07:32):

We have a linter synTaut for catching refl lemmas that are literally proving a = a (and are not Eq.rfl itself)

Anton Mellit (Sep 20 2024 at 07:34):

Yeah, trying with n=3 will uncover the problem. I am thinking of a situation when the assumptions would be convoluted, and some straightforward cases would be ok, for instance I would have an actual correct proof for n<100000, and for n>100000 the assumptions would be accidentally inconsistent

Mario Carneiro (Sep 20 2024 at 07:35):

I think a more general version of that is nitpick, which is an isabelle linter which tells you when the theorem you have stated is false. Theorems that are true but trivial are a bit harder because there are genuine reasons you might want to state a trivial theorem (as evidenced by the fact that the most popular proof in mathlib by a large margin is rfl)

Mario Carneiro (Sep 20 2024 at 07:36):

@Anton Mellit in that case the assumptions are not inconsistent, they just imply n <= 100000

Mario Carneiro (Sep 20 2024 at 07:36):

which might be an actual part of the proof

Daniel Weber (Sep 20 2024 at 07:37):

Mario Carneiro said:

I think a more general version of that is nitpick, which is an isabelle linter which tells you when the theorem you have stated is false. [...]

Is that like slim_check?

Mario Carneiro (Sep 20 2024 at 07:37):

yes, except it runs as a linter instead of being called explicitly

Henrik Böving (Sep 20 2024 at 07:45):

Daniel Weber said:

Mario Carneiro said:

I think a more general version of that is nitpick, which is an isabelle linter which tells you when the theorem you have stated is false. [...]

Is that like slim_check?

Nitpick is much further developed than slim_check it uses a SAT solver to perform model finding and not just random exploration of the search space. That said I've been thinking for a while about pulling slim_check from mathlib and just making it its own library so everyone :tm: can benefit from it. It would be even cooler if we could integrate it with the by keyword in core but I'm not sure if that would scope creep a little.

Mario Carneiro (Sep 20 2024 at 07:52):

I think it needs a rewrite, it should not be implemented through the typeclass system

Henrik Böving (Sep 20 2024 at 07:53):

I think if we had a deriving handler for it it would be quite fine

Mario Carneiro (Sep 20 2024 at 07:53):

Henrik Böving said:

It would be even cooler if we could integrate it with the by keyword in core but I'm not sure if that would scope creep a little.

I thought that "scope creep" was the subtitle of lean core

Mario Carneiro (Sep 20 2024 at 07:55):

There are things that can't be implemented in slim_check because the enumeration happens at the object level instead of the meta level. E.g. you can't use it to check theorems about real numbers because things aren't Decidable even if linarith can find a trivial contradiction

Henrik Böving (Sep 20 2024 at 08:11):

Mario Carneiro said:

Henrik Böving said:

It would be even cooler if we could integrate it with the by keyword in core but I'm not sure if that would scope creep a little.

I thought that "scope creep" was the subtitle of lean core

Not really no.

Mario Carneiro said:

There are things that can't be implemented in slim_check because the enumeration happens at the object level instead of the meta level. E.g. you can't use it to check theorems about real numbers because things aren't Decidable even if linarith can find a trivial contradiction

Right

Asei Inoue (Sep 20 2024 at 13:17):

Is it possible to port nitpick to Lean? It sounds like a very great thing.

Henrik Böving (Sep 20 2024 at 13:18):

Given the incompatability between the logics of Isabelle and Lean a direct port would not be possible no. Iirc getting SAT based model finders compatible with HOL was already research work at the time, whether it is trivial to extend this work to DTT or not I don't know. Either way it would not be a trivial endeavour to write up in Lean. Though a sufficiently skilled Lean developer that is willing to dig into the matter could probably get there.

Tyler Josephson ⚛️ (Sep 23 2024 at 12:10):

When I was just getting into theorem proving, I tried translating a science derivation into logic (was using automated theorem prover KeYmaera X) and would often make mistakes when writing down the theorem. The logic expert on our team explained the principle of explosion to me; our context had several hypotheses and ensuring they were all consistent wasn’t always simple. Eventually (because the theorem prover was automated) I adjusted my workflow so before trying to prove “the thing I wanted,” I’d change the conjecture to “1=0,” then run the prover, and if it immediately returned True, I’d use that signal to go back and check my assumptions.

So, I think building automated consistency checking would be helpful. It would be great if VS code could check, after a theorem is stated, whether the hypotheses are consistent. Maybe in some cases, the compute required would be unreasonable and so the checking should be turned off, but I think scientists and engineers would find this useful. I have some proposals out for automatically constructing theories using LLMs and checking them with a theorem prover, and propose some controls like this for ensuring consistency.

Michael Rothgang (Sep 23 2024 at 12:17):

#13999 is an in-progress PR adding a linter against e.g. natural number subtraction. As I understand the linter, it would have caught this example.

Jireh Loreaux (Sep 23 2024 at 14:36):

@Tyler Josephson ⚛️ it would certainly be nice to have tooling like in #13999 to find user error automatically. However, do you realize that what you're asking for is not possible in general? That is, by Gödel's incompleteness theorems, it is not possible for Lean (or anything of equivalent proving power) to tell you: "your context is consistent" (even an empty context, because Lean has its own axioms). It can only tell you: "your context is inconsistent because I can prove false", and even then, it has to find a proof of False, which may be arbitrarily hard.

Again, this doesn't preclude things like slim_check or similar to help avoid simple mistakes.

Tyler Josephson ⚛️ (Sep 23 2024 at 15:13):

@Jireh Loreaux Yes, indeed. I bet some amount of decision-making would be needed to decide which cases are worth catching automatically. If there was movement on this, I could check with my group and see what kinds of cases we'd all find useful. For example, catching simple equality / inequality relationships among naturals, rationals, and reals would go pretty far. Here's one example (from here) - making a mistake when writing these down can lead to an inconsistency.
Screenshot-2024-09-23-at-11.11.09AM.png

Jon Eugster (Sep 23 2024 at 18:42):

I wonder how much one could already cover your use-case by a simple macro which runs essentially exfalso; aesop or another strong finisher tactic which can find contradictions:

import Mathlib

example (h : 0 = 1) : Nat.Prime 4 := by
  fail_if_success exfalso; aesop -- failure: tactic succeeded

Jonas Bayer (Sep 24 2024 at 13:54):

Some weeks ago I brought up the topic of automatic counterexample finding in a different thread.

https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/Model.20finding.2FAutomatic.20counter.20example.20generation.20in.20Lean/near/451552547

I'd generally be interested in working on this topic and have started reading about Nitpick. It translates the Isabelle problem to Kodkod but I'm not sure if that is still state of the art or if there are now better tools for model finding.

There is also the design question of whether to have a lightweight tool that runs in the background or a more powerful counterexample generator that needs to be run explicitly. Ideally, one could have a single tool that can be run with different settings to achieve both goals, but this might be even harder to come up with...

Kim Morrison (Sep 24 2024 at 22:14):

I'd encourage anyone exploring this to aim for lightweight, extensible, and automatic.

Kim Morrison (Sep 24 2024 at 22:16):

Although it is worth considering that for counterexample finding no proofs are necessary, so an extensible mechanism to hand problems to an unverified SMT solver is perhaps viable. However that worked then require users to have an external tool, which would limit adoption.

Eric Wieser (Sep 25 2024 at 00:48):

If your counterexample doesn't come with a proof it might not be a counterexample after all. Any statement in Lean with junk values is particularly vulnerable to this, as the SMT solver might not model them correctly

Jonas Bayer (Sep 25 2024 at 16:23):

In Isabelle, nitpick can return "found a potentially spurious counterexample" when the example cannot be verified.

Last updated: May 02 2025 at 03:31 UTC