Zulip Chat Archive
Stream: general
Topic: bug in linarith
Scott Morrison (Nov 02 2022 at 05:22):
While writing tests for the port of linarith
to Lean 4, I think I've encountered a bug in the implementation of Fourier-Motzkin elimination in Lean 3, but haven't yet identified what is wrong.
Here's the evidence:
If you add
def mathlib3ex : list comp :=
[
⟨ineq.le, [(2, 1), (1, -1)]⟩,
⟨ineq.le, [(3, 1), (1, -1)]⟩,
⟨ineq.le, [(4, 1), (1, -1)]⟩,
⟨ineq.le, [(5, -1), (1, 1)]⟩,
⟨ineq.le, [(6, 1), (4, -1)]⟩,
⟨ineq.le, [(7, 1), (5, -1), (2, -1), (1, 1)]⟩,
⟨ineq.le, [(8, 1), (5, -1), (3, -1), (1, 1)]⟩,
⟨ineq.le, [(9, 1), (8, -1)]⟩,
⟨ineq.le, [(10, 1), (7, -1)]⟩,
⟨ineq.le, [(11, 1), (6, -1)]⟩,
⟨ineq.lt, [(0, -1)]⟩,
⟨ineq.lt, [(11, -1), (10, -1), (9, -1), (5, 3)]⟩
]
def mathlib4ex : list comp :=
[
⟨ineq.le, [(5, -1), (2, 1)]⟩,
⟨ineq.le, [(6, -1), (3, 1)]⟩,
⟨ineq.le, [(7, -1), (1, 1)]⟩,
⟨ineq.le, [(9, -1), (8, 1)]⟩,
⟨ineq.le, [(9, 1), (4, -1)]⟩,
⟨ineq.le, [(9, 1), (8, -1), (7, 1), (4, -1)]⟩,
⟨ineq.le, [(10, -1), (9, 1), (6, 1), (4, -1)]⟩,
⟨ineq.le, [(10, 1), (9, -1)]⟩,
⟨ineq.le, [(11, -1), (5, 1)]⟩,
⟨ineq.le, [(11, 1), (9, -1)]⟩,
⟨ineq.lt, [(0, -1)]⟩,
⟨ineq.lt, [(4, 3), (3, -1), (2, -1), (1, -1)]⟩
]
#eval fourier_motzkin.produce_certificate mathlib3ex 11 -- succeeds, i.e. found a contradiction
#eval fourier_motzkin.produce_certificate mathlib4ex 11 -- fails, i.e. no contradiction found
to the bottom of src/tactic/linarith/elimination.lean
(this is in mathlib3), you'll see that we find a contradiction from the first set of inequalities, but not from the second.
(This is available in the branch#linarith_bug.)
However these sets of inequalities are the same, up to renaming variables (and then re-sorting the linear expressions so the terms are in descending order again), and hence should be equisatisfiable!
In pcomp.maybe_minimal
there is an implementation of Imbert's first theorem. If we remove this check we successfully find a contradiction in both cases, leading me to suspect that this check has been mis-implemented, but so far I don't see what could be wrong.
(I encountered this problem in the mathlib4 implementation, which is meant to be identical except that for unimportant reasons the variable orders are different; a test was failing that succeeded in mathlib3, and I could minimise to this.)
If anyone, e.g. @Rob Lewis (who implemented linarith
in the first place!) or @Reid Barton (who I recall has an implementation of FM elsewhere) has any ideas about what's going on here, that would be very helpful.
Reid Barton (Nov 02 2022 at 05:39):
Does that mean that the test in mathlib3 that was succeeding is actually incorrect?
Mario Carneiro (Nov 02 2022 at 05:48):
What is the contradiction that was found? Is it actually a contradiction?
Mario Carneiro (Nov 02 2022 at 05:51):
#eval do
coeff ← fourier_motzkin.produce_certificate mathlib3ex 11,
tactic.trace coeff.to_list
-- [(11, 1), (9, 1), (8, 1), (7, 1), (6, 1), (5, 1), (4, 1), (3, 1), (2, 1), (1, 1), (0, 1)]
Mario Carneiro (Nov 02 2022 at 05:57):
def mathlib3ex : list comp :=
[
⟨ineq.le, [(2, 1), (1, -1)]⟩,
⟨ineq.le, [(3, 1), (1, -1)]⟩,
⟨ineq.le, [(4, 1), (1, -1)]⟩,
⟨ineq.le, [(5, -1), (1, 1)]⟩,
⟨ineq.le, [(6, 1), (4, -1)]⟩,
⟨ineq.le, [(7, 1), (5, -1), (2, -1), (1, 1)]⟩,
⟨ineq.le, [(8, 1), (5, -1), (3, -1), (1, 1)]⟩,
⟨ineq.le, [(9, 1), (8, -1)]⟩,
⟨ineq.le, [(10, 1), (7, -1)]⟩,
⟨ineq.le, [(11, 1), (6, -1)]⟩,
⟨ineq.lt, [(11, -1), (10, -1), (9, -1), (5, 3)]⟩
]
#eval tactic.trace (mathlib3ex.foldl comp.add ⟨ineq.eq, []⟩)
-- []<0
The contradiction seems to be correct
Mario Carneiro (Nov 02 2022 at 06:26):
Here's a minimization of the example:
def mathlib4ex : list comp :=
[⟨ineq.le, [(3, -1), (2, 1)]⟩,
⟨ineq.le, [(3, 1), (1, -1)]⟩,
⟨ineq.le, [(3, -1)]⟩,
⟨ineq.lt, [(3, 1), (2, -1), (1, 1)]⟩]
#eval fourier_motzkin.produce_certificate (mathlib4ex.map (λ x, ⟨x.1, list.merge_sort (λ a b, a.1 > b.1) x.2⟩)) 3
changing 3
to 0
, or rearranging any of the coefficients (preserving the sum) makes the failure go away
Mario Carneiro (Nov 02 2022 at 08:00):
Okay, here's an explanation of what happens in the code. We start out with the initial equations E0, ..., E3:
step 0:
E0: -x2 <= 0: history = {E0}, explicit = {}, implicit = {}
E1: x2 - x0 <= 0: history = {E1}, explicit = {}, implicit = {}
E2: -x2 + x1 <= 0: history = {E2}, explicit = {}, implicit = {}
E3: x2 - x1 + x0 < 0: history = {E3}, explicit = {}, implicit = {}
We eliminate x2
by adding E0 + E1 and E2 + E3 (and also two others that won't matter):
step 1:
E01 = E0 + E1: -x0 <= 0: history = {E0, E1}, explicit = {x2}, implicit = {}
E23 = E2 + E3: x0 < 0: history = {E2, E3}, explicit = {x2}, implicit = {x1}
The explicit
set indicates that x2
was eliminated from each equation "on purpose", but x1
was also eliminated from E23
"on accident" and so it shows up in the implicit
list for E23
.
We then eliminate x1
, which does nothing (except for deleting the extra unnecessary equations) because the equations already don't have x1
present.
We then eliminate x0
:
step 2:
E0123 = E01 + E23: 0 < 0: history = {E0, E1, E2, E3}, explicit = {x0, x2}, implicit = {}
x0
has been added to the explicit
list because this is a new equation constructed by elimination of x0
. x1
is not in the implicit list because neither E01
nor E23
contains an occurrence of x1
.
We apply the theorem now and conclude that since |history| = 4 > 3 = |explicit ∪ implicit| + 1
, the newly generated equation E0123
is redundant, so we discard it and get a consistent state, leading to failure of the algorithm.
Mario Carneiro (Nov 02 2022 at 08:32):
Here's a standalone test case:
example (a b c z : ℚ) (_ : a ≤ z) (E0 : b ≤ c) (E1 : c ≤ a) (E2 : 0 ≤ c) : b ≤ a + c := by linarith -- fail
Scott Morrison (Nov 02 2022 at 09:46):
Thanks for the minimization! So... the conclusion we've reached here is that either
- our interpretation (i.e. as implemented in mathlib3) of the theorem must be incorrect, because it is incorrect to discard E01234, or
- our interpretation is correct and the theorem is false?
Mario Carneiro (Nov 02 2022 at 09:50):
I think the implementation of the theorem is incorrect in the interpretation of x1
as not being an implicitly eliminated variable of E0123
. Wikipedia says that it should include all variables from equations in the history which are not in the explicit list, and x1
is in E2
which is in the history of E0123
. I think the fix is to union the vars
fields when you merge equations instead of recalculating it from the combined equation.
Scott Morrison (Nov 02 2022 at 10:06):
Scott Morrison (Nov 02 2022 at 10:16):
Actually, I better rewrite some doc-strings.
Scott Morrison (Nov 02 2022 at 10:27):
Okay, I've updated the doc-strings.
However over on the mathlib4 side, while this fix does fix the failure I initially reported in this thread, it doesn't appear to fix the failure I originally identified (and semi-minimised to put in this thread). So perhaps let's not hurry to merge this fix until I have that sorted out too.
Kevin Buzzard (Nov 02 2022 at 11:53):
Thank you so much Scott and Mario for this work. There's one thing I'm confused about -- if the lean 3 implementation has a bug then why is the output of the lean 3 test not incorrect?
Yaël Dillies (Nov 02 2022 at 13:32):
What do you mean by "incorrect"? It doesn't output anything incorrect but rather thinks it can't solve cases that should be able to.
Eric Rodriguez (Nov 02 2022 at 13:58):
I wonder how many proofs could be golfed now because of this bugfix...
Scott Morrison (Nov 03 2022 at 03:47):
I'm still seeing problems in the mathlib4 implementation (i.e. cases where linarith fails, but should succeed), but where if I ask it to reject fewer constraints according to Imbert's theorem, it succeeds. This suggests I still have the implementation of Imbert's theorem incorrect. :-(
Scott Morrison (Nov 03 2022 at 03:48):
Does someone have access to https://doi.org/10.1016/B978-0-444-88771-9.50019-2, and could send me a copy?
Scott Morrison (Nov 03 2022 at 03:48):
I think I'd prefer to read it at the source than try to parse too many masters theses trying to restate the result. :-)
Scott Morrison (Nov 03 2022 at 03:52):
Or if someone wants to compare the wikipedia description and the implementation, you can look either in #17307, or mathlib4#526. The mathlib4 PR has the advantage that it has a failing test, that succeeds if you change the 1 +
in maybeMinimal
to 2 +
, so can be used to test alternative interpretations...
Scott Morrison (Nov 03 2022 at 06:17):
Okay, I think I have this sorted out.
Sadly, the performance of Fourier-Motzkin elimination in larger examples is highly sensitive to the variable/equation ordering, and it's easy to have things either work fine or crash by adding and removing .reverse
...
Scott Morrison (Nov 03 2022 at 07:02):
Okay, I've got this sorted out, and updated the PR.
Scott Morrison (Nov 03 2022 at 07:03):
The new interpretation of "implicitly eliminated variable" now results in all the tests (including the new one!) working in both mathlib3 and mathlib4, and follows the text of the original paper. (However, it doesn't seem to agree with a formula the original paper uses later, so I left a note for future readers... The original paper also has some confusing typos in a critical example... :-)
Kevin Buzzard (Nov 03 2022 at 07:06):
Re the typos: clearly the solution to that is to formalise the paper.
Reid Barton (Nov 03 2022 at 07:07):
I was only able to access the author's later 1993 paper, but I found the description of the theorem and associated algorithm there very unclear and I'm not surprised that it might have been interpreted incorrectly.
Moritz Doll (Nov 03 2022 at 09:12):
Kevin Buzzard said:
Re the typos: clearly the solution to that is to formalise the paper.
Well if we formalize every math paper that has typos in it, we'd be here forever :big_smile:
Mario Carneiro (Nov 03 2022 at 09:13):
the ones without typos are also helpful
Moritz Doll (Nov 03 2022 at 09:13):
I don't know any papers without typos..
Ruben Van de Velde (Nov 03 2022 at 09:18):
Reid Barton said:
I was only able to access the author's later 1993 paper, but I found the description of the theorem and associated algorithm there very unclear and I'm not surprised that it might have been interpreted incorrectly.
Maybe you could find a master's student to clarify it in their thesis
Eric Wieser (Nov 21 2022 at 12:06):
The fixed version seems much more likely to timeout than the original; for instance on
example (a b : ℤ)
(h₁ : a < b)
(ha : 1 * a * a - 3 * a - 2 = 0)
(hb : b * b - 3 * b - 2 = 0) :
(1 * a * a + b * b) - 2 * a * b = 17 :=
begin
nlinarith only [h₁, ha, hb],
end
Perhaps it's a coincidence that nlinarith
was able to solve this in the first place; but it also seems plausible that we're now stuck in an infinite loop somewhere.
Scott Morrison (Nov 22 2022 at 23:44):
Yes, the bug was causing it to drop constraints, so it's no particular surprise if some big problems now timeout. The cost of correctness. :-)
Last updated: Dec 20 2023 at 11:08 UTC