Zulip Chat Archive

Stream: general

Topic: Strategies for testing theorem statements are correct

Derrik Petrin (Nov 02 2024 at 22:02):

Hi all,

I've been wondering what strategies people use to make sure that the theorem statements they write define what they think they do. For example, here is an incorrect statement of Fermat's last theorem (the last conjunction should be an implication):

def Fermat's_last_theorem : Prop :=
  ∀a : Nat, ∀b : Nat, ∀c: Nat, ∀n :Nat, (a > 0) ∧ (b > 0) ∧ (c > 0) ∧ (n > 2) ∧ ¬(a^n + b^n = c^n)

And here is one way that I would go about verifying the theorem statement:

example : ¬Fermat's_last_theorem :=
  λ h : Fermat's_last_theorem =>
    have : 0 < 0 := (h 0 0 0 0).left
    show False from (Nat.not_lt_zero _) this

But (coming from a software engineering background), I'm interested in a more test-driven development style of writing these assertions. That is, writing a proof of a proposition that my intended theorem implies, before I have defined the proposition, and then once I have a correct statement of the theorem (according to this particular implication), the proof goes through. One idea I had is something like this:

example (h : Fermat's_last_theorem) : ∀ n, 1^n + 1^n ≠ 1^n := by
  intro
  | 0 | 1 | 2 => simp
  | n + 3 => first | (apply h <;> simp) | sorry

This way, the proof will pass with a sorry if the theorem is not stated correctly, and once it is, the first tactic will stop hitting the sorry.

But this still seems fragile/not robust enough, since for example it wouldn't work for applying this statement of Fermat's last theorem, but if you restated the theorem using implications in place of all conjunctions, then it would.

Michael Rothgang (Nov 02 2024 at 22:25):

As far as I know, the current best answer to your question involves two parts:

Ask an experienced Lean user to verify your Lean formalisation
There are some pitfalls of Lean (for example, relating to natural number subtraction as opposed to integer subtraction), which take time to learn. For some of them, there exists a prototype linter.
Prove some interesting theorem about your statement.
This will both week out false statements (you'll have trouble proving them :-)) and will test if your formalisation is workable: not all ways to state your given result will be equally convenient to work with, and working with it will reveal this.

In general, finding (the right/usable) definitions and statements is generally "the hard part" of formalisation, much harder than "just proving it". Many definitions go through several versions before (or after) landing in mathlib.

Michael Rothgang (Nov 02 2024 at 22:27):

I haven't thought much about using TDD practices. That's a nice idea (a sophisticated version: use plausible, formerly known as slim_check, to try to find a counterexample to your statement). I suspect it will not find all issues - some combination of the above still seems needed.

Derrik Petrin (Nov 02 2024 at 22:45):

Where is plausible defined? I don't see it in the version of mathlib I have, but I do see slim_check. slim_check definitely looks closer to what I'm thinking! It sounds kind of like the spirit of the Hypothesis Python testing framework.

Ruben Van de Velde (Nov 02 2024 at 22:56):

Yeah, slim_check moved into a separate library called Plausible in the last week

Violeta Hernández (Nov 03 2024 at 00:39):

There's some mathematical objects where you can verify the correctness of a statement just by proving a certain result. To give an example I've been working with, docs#ONote.opow is a quite complicated algorithm for exponentiating ordinals less than epsilon-0, but docs#ONote.repr_opow and docs#ONote.nf_opow serve as proofs of its correctness regardless.

Violeta Hernández (Nov 03 2024 at 00:41):

Generally I agree the best way to weed out bad definitions is by proving things about them. If you have a wrong definition then some of the basic results will almost inevitably break. If you have a bad theorem statement then you'll have trouble showing the implications that it should entail.

Violeta Hernández (Nov 03 2024 at 00:43):

Though sometimes a "bad definition" can be relative. Another example I've come across is docs#StrictOrder.cof. This isn't the correct definition of cofinality for partial orders, but since the current API just uses it for well-orders it hasn't caused much issue as of yet.

Last updated: May 02 2025 at 03:31 UTC