Zulip Chat Archive

Stream: mathlib4

Topic: To automate or not to automate?

David Loeffler (Mar 24 2024 at 13:11):

I'm curious if there's a community consensus on the following issue:

If a goal can be proved either by a sophisticated one-shot tactic (nlinarith, field_simp, continuity etc), or by a longer hand-tailored argument (explicitly invoking a chain of low-level lemmas), which should one use for library code?

The "big hammer" tactics tend to be significantly slower than hand-written arguments (some more than others); but on the other hand they frequently result in shorter and more human-readable proofs, and big-hammer proofs are probably more robust and maintainable in the long run than fiddly hand-tailored arguments. So it's a tradeoff – how much do we value CPU power vs. human brainpower?

(This came up because of a PR I was reviewing, which had four nlinariths and four field_simps all within a single ~20-line proof. I proposed a far faster hand-rolled argument – which almost immediately broke, because one of the lemmas it used got renamed overnight. Perhaps one should live with slowness for the sake of easier maintainability?)

Eric Wieser (Mar 24 2024 at 13:22):

I think breakage due to lemma renaming is something that we should try to resolve with better renaming tools, rather than by trying to avoid ever mentioning lemma names

Yaël Dillies (Mar 24 2024 at 13:22):

What lemma was it?

Eric Wieser (Mar 24 2024 at 13:23):

I think there are two sides to easy maintainability here; chance of being broken by changes elsewhere in the library, vs ease of future refactoring (lemma reordering, typeclass generalizations) due to the argument being more explicit

Eric Wieser (Mar 24 2024 at 13:25):

The CPU vs human brainpower tradeoff is supposed to be solved by says, which lets the CPU cost be spent upfront rather than for every editor of the file; though this comes with its own maintenance burdens.

David Loeffler (Mar 24 2024 at 13:30):

@Eric Wieser says is specifically for simp, right? I don't think one can do linarith says x.

Eric Wieser (Mar 24 2024 at 13:31):

I think it supports a few other tactics, but indeed not linarith

David Loeffler (Mar 24 2024 at 13:35):

Yaël Dillies said:

What lemma was it?

As you probably guessed it was one of the renames from #11530, specifically add_sub_cancel'. Having deprecation tools which tell you what changes need to be made is already a big advance; but the current tooling doesn't cover many other kinds of change, like adjusting the arguments to an existing lemma, or swapping two pre-existing names.

David Loeffler (Mar 24 2024 at 14:11):

Eric Wieser said:

I think it supports a few other tactics, but indeed not linarith

Could one perhaps make this functionality available for arbitrary tactics by adapting the existing show_term command somehow? That will usually result in impossibly long proof terms, but perhaps the resulting term can be compressed or stored externally, and then regenerated if it stops working.

Ruben Van de Velde (Mar 24 2024 at 14:15):

Would linarith be amenable to generating certificates like polyrith?

David Loeffler (Mar 24 2024 at 14:28):

Ruben Van de Velde said:

Would linarith be amenable to generating certificates like polyrith?

I've not played with polyrith much, but I'm sceptical whether generating and/or verifying certificates for linarith can be done any quicker than re-running the algorithm.

I'm thinking here of really easy cases of linarith, like proving 1 + a < 1 + b when a < b is known – of course this is immediate from add_lt_add_left, but using linarith instead saves you (and later readers) the trouble of memorising a whole catalogue of lemma names.

Eric Wieser (Mar 24 2024 at 14:42):

I think probably gcongr is more appropriate as a middle ground here

Kim Morrison (Mar 24 2024 at 18:41):

Generally I'm in favor of more automated proofs.

Kim Morrison (Mar 24 2024 at 18:41):

Some of the tactics mentioned above are in dire need of a rewrite.

Kim Morrison (Mar 24 2024 at 18:43):

Field_simps in particular which badly abuses the discharger mechanism.

Kim Morrison (Mar 24 2024 at 18:45):

Also I suspect the linarith could adopt the main shortcut omega uses to terminate early. (Index both upper and lower bounds together, according to the coefficients, and stop if they contradict even before you finish eliminating variables.)

Generally, a rewrite of linarith with a non-Mathlib specific typeclass would be great.

Kevin Buzzard (Mar 24 2024 at 18:50):

This is the kind of task which is very unlikely to happen organically, I would imagine. You're more likely to get more mathlib theorems formalised by people working on projects than you are to get tactics appearing out of nowhere -- right now it seems that mathematicians are easily able to find projects which keep them occupied, but I'm not sure it's the same with tactics. Could this sort of rewrite be done by a CS masters student or are you better off just paying someone to do it?

Kim Morrison (Mar 24 2024 at 18:54):

linarith is only used in Mathlib currently, and omega covers most of the programming applications of this corner of tactic space, so indeed linarith may be orphaned.

David Loeffler (Mar 24 2024 at 19:05):

Scott Morrison said:

Generally I'm in favor of more automated proofs.

So in my example – showing 1 + a < 1 + b with a < b in the context – would you favour linarith, or an appeal to the specific lemma? For the sake of argument, let's keep this to linarith as it currently exists, not a hypothetical optimised rewrite.

Kevin Buzzard (Mar 24 2024 at 19:06):

For that one I'd prefer gcongr

David Loeffler (Mar 24 2024 at 19:08):

OK OK, let's say something that can't be gcongr'd, like 2 * a + b < a + 2 * b given a < b.

Kevin Buzzard (Mar 24 2024 at 19:08):

Then I'd personally prefer linarith because it's easier to write and easier to read

Kim Morrison (Mar 24 2024 at 19:16):

We really really need to get the overhead (de Bruijn) factor down if we want to make serious progress with maths in Lean. Just because we have already suffered through writing maths without enough automation doesn't mean we should impose this on others. :-)

Damiano Testa (Mar 24 2024 at 19:26):

Personally, I think that the only reason not to use automation is "painful slow-downs" and this should only be an incentive to make automation more efficient, not to write more detailed proofs.

David Loeffler (Mar 24 2024 at 19:53):

@Scott Morrison "If"?!

Eric Wieser (Mar 24 2024 at 19:54):

Scott Morrison said:

Generally, a rewrite of linarith with a non-Mathlib specific typeclass would be great.

Upstreaming LinearOrderedSemiring out of mathlib sounds like it will create a lot of work

Yaël Dillies (Mar 24 2024 at 19:55):

My work untangling dependencies (eg #11633) will help for that, FWIW

Mario Carneiro (Mar 24 2024 at 20:01):

Scott Morrison said:

linarith is only used in Mathlib currently, and omega covers most of the programming applications of this corner of tactic space, so indeed linarith may be orphaned.

Keep in mind that omega is not a replacement for linarith: Omega works for nat and int, while linarith works on linearly ordered rings. You can't use omega on most of the goals in mathlib that use linarith currently, for more than skin-deep reasons like the names of typeclasses.

Yaël Dillies (Mar 24 2024 at 20:03):

I think Scott rather meant that nobody has a natural incentive to make linarith better because the people using it (= mathematicians) are also the ones who are the least capable of doing so

Kim Morrison (Mar 24 2024 at 20:04):

And in particular the shortcuts that omega can take over Int (rounding coefficients in the favorable direction) plays into its ability to shortcut early. When you're working over Rat you don't as often get lucky with early contradictions.

David Loeffler (Mar 24 2024 at 20:04):

Mario Carneiro said:

Keep in mind that omega is not a replacement for linarith: Omega works for nat and int, while linarith works on linearly ordered rings. You can't use omega on most of the goals in mathlib that use linarith currently, for more than skin-deep reasons like the names of typeclasses.

I think that was exactly the point: that there are lots of things which linarith can do but omega can't, but those are mostly things which mathematicians care about and computer scientists don't, so the people with the skillset to make linarith better don't have the incentive to do so.

Eric Wieser (Mar 24 2024 at 20:05):

Scott Morrison said:

We really really need to get the overhead (de Bruijn) factor down if we want to make serious progress with maths in Lean. Just because we have already suffered through writing maths without enough automation doesn't mean we should impose this on others. :-)

I claim that while reducing the de Bruijn factor is super important for users and so we need the tools to exist, but amenability to refactors is more important within mathlib and so that doesn't necessarily meant we should use those tools.

Eric Wieser (Mar 24 2024 at 20:06):

(for instance, using norm_num in a proof about division currently prevents it being generalized to semifields)

Kim Morrison (Mar 24 2024 at 20:06):

But often higher automation makes refactoring and maintenance easier! It's not a one way street.

David Loeffler (Mar 24 2024 at 20:07):

Scott Morrison said:

But often higher automation makes refactoring and maintenance easier! It's not a one way street.

Indeed, that was exactly the point I raised at the start of this conversation :-)

Matthew Ballard (Mar 24 2024 at 20:09):

Scott Morrison said:

But often higher automation makes refactoring and maintenance easier! It's not a one way street.

I would say robust automation which I don’t see as the same

Kim Morrison (Mar 24 2024 at 20:09):

I think it is reasonably plausible that ring will get a non-Mathlib implementation in the medium term. And we plan to completely reimplement omega in the medium term, and expanding scope to include Rat (or maybe even more) would be plausible. That would then eat even more of the current linarith's scope.

David Loeffler (Mar 24 2024 at 20:14):

Eric Wieser said:

I claim that while reducing the de Bruijn factor is super important for users and so we need the tools to exist, but we need the tools to exist, but amenability to refactors is more important within mathlib and so that doesn't necessarily meant we should use those tools.

I feel this is a bit of a false distinction, because in some sense "mathlib has no users, only contributors". I learned pretty early on that there was absolutely no point in writing code that uses mathlib unless I contributed that code to mathlib, because otherwise it would bitrot in a matter of days! Unless the mathlib development model changes very radically indeed, there's really no point in distinguishing between what is important for "us" (=contributors) and what is important for "users".

Eric Wieser (Mar 24 2024 at 20:15):

Maybe a better framing would be "towards the core" vs "towards the leaves"

Yaël Dillies (Mar 24 2024 at 20:16):

Certainly I am also a mathlib contributor, but I have a very different contribution profile to you, David

David Loeffler (Mar 24 2024 at 20:28):

Yes, well, I guess I'm a pretty leafy sort of contributor. :leaves:

It looks like there's a consensus here that at least in the outer foliage of mathlib it is preferable to use the automated tactics whenever possible, unless they're causing really serious slowdowns (maybe more than a 2x factor). Thanks everyone for the interesting remarks!

Kim Morrison (Mar 24 2024 at 20:42):

I'm hoping within the next year or so we may regain theorem level parallelism, which should take some of the pain out of slower tactics, too.

Jannis Limperg (Mar 25 2024 at 12:47):

David Loeffler said:

really serious slowdowns (maybe more than a 2x factor)

This is a very conservative notion of serious slowdown. It implies that Aesop, for example, can do hardly any search since its performance is dominated by simp, which has roughly constant cost per goal. So Aesop would usually be used as aesop? says .... Other search-based automation, such as duper, will probably also struggle to meet this performance requirement, and there it's much harder to output a certificate that can be checked quickly. Also, what about decision procedures such as omega; are they fast enough by this metric?

On the wider question, my impression is that Coq and Isabelle users tolerate much bigger slowdowns. I also believe that, in the long term, more automation is required to broaden the appeal of ITP beyond a few academic niches (and within those niches). So I think it might be a strategic error to choke off automation development by putting strict performance limits on it. (Obviously, these points are somewhat self-serving.)

Btw, as Scott says, Lean currently makes the user experience of slow tactics much worse than it needs to be. If tactic scripts were elaborated incrementally, rather than all at once, that would help a lot. Additionally, Lean could parallelise tactic scripts in certain cases, e.g. suffices T by simp; tacs could execute simp and tacs in parallel. Changes like these would substantially boost interactivity. I believe the FRO is already working in this direction, so hopefully in future slow tactics will be more tolerable.

In any case, discussions like these are very helpful for tool developers like me, so thank you very much!

David Loeffler (Mar 25 2024 at 13:45):

Jannis Limperg said:

David Loeffler said:

really serious slowdowns (maybe more than a 2x factor)

This is a very conservative notion of serious slowdown. [...] Other search-based automation, such as duper, will probably also struggle to meet this performance requirement, and there it's much harder to output a certificate that can be checked quickly. Also, what about decision procedures such as omega; are they fast enough by this metric?

I really hope nobody thought I was suggesting that automated tactics were somehow "useless" if they didn't match some arbitrary speed requirement! I was just looking for a rough style guideline for when they should/shouldn't be used in contributions to the mathlib library.

I'd also like to clarify that my "> 2x speedup" rule of thumb for eliminating automatic tactics was meant to mean a 2x speedup in the entire proof of the theorem – not just in that single step of the proof. The example at hand was one_add_mul_self_lt_rpow_one_add, which takes about 4-5 seconds to compile, and which can be got down to 0.5s by using (short) hand-written proofs for the subgoals currently handled by field_simp and nlinarith.

It implies that Aesop, for example, can do hardly any search since its performance is dominated by simp, which has roughly constant cost per goal. So Aesop would usually be used as aesop? says ....

This is actually not so far from the way that simp currently gets used in (some parts of) mathlib, where it's routine practice to squeeze all simp's into simp only's before anything gets merged into the library.

I have the impression that you see it as a bad thing if aesop (or other search-based tactics) are getting used as foo says ... – why? To me this sounds like getting the best of both worlds (ease of use + compilation speed).

Josha Dekker (Mar 25 2024 at 13:59):

A thought that came to my mind, but of which I'm not entirely sure if it is feasible in practice, is a meta kind of tactic, let's call it tactic_q(as I have a terrible lack of naming-inspiration at this point) which takes two arguments: a proof a and a tactic, t. (please bear with me...)

If the proof a closes the goal, tactic_q succeeds with this proof. If not, but the tactic t closes the goal, tactic_q succeeds as well, using t. Now the idea would be to have automation at some step (e.g. CI, but I'm not sure if that will be too slow), such that whenever a fails but t succeeds, we replace (in the code!) a by a proof that we extract from t.

Intuitively, the point of tactic_q is to tell Mathlib that we have a "fallback tactic" t that can solve the goal (and is robust to some changes in naming etc) but might be slow. So, Mathlib extracts from this a proof a that it will use, unless it fails, in which case it uses t and overwrites a with the proof obtained from t. I'm still working through FPIL and need to go over meta-programming afterwards, so I have no idea if this will work, but I feel that this may be a nice way to increase speed on proofs that take a lot of time, but are vulnerable to breakage.

Yaël Dillies (Mar 25 2024 at 14:01):

Is that not basically says?

Josha Dekker (Mar 25 2024 at 14:07):

You're right, I forgot about that one!

David Loeffler (Mar 25 2024 at 14:13):

(As has been remarked above, says is currently only available for certain specific tactics, and it would be great if it could be generalised.)

Jannis Limperg (Mar 25 2024 at 14:43):

David Loeffler said:

I'd also like to clarify that my "> 2x speedup" rule of thumb for eliminating automatic tactics was meant to mean a 2x speedup in the entire proof of the theorem – not just in that single step of the proof.

Oh okay, I didn't get that. That obviously makes a big difference. (And fwiw I agree that this particular refactor is good, given the current state of things.)

I have the impression that you see it is a bad thing if aesop (or other search-based tactics) are getting used as foo says ... – why? To me this sounds like getting the best of both worlds (ease of use + compilation speed).

I agree that this can be a good compromise, but:

The says solution is less easily applicable to auto params, which account for a good amount of Aesop use in parts of Mathlib.
The says scripts clutter the proof, though this could be hidden with better editor support.
It's annoying busywork to update foo? says after a refactor of foo? or of the library, when foo would have solved the goal before and after, though in an ideal world this could be automated.
There are other automation methods where it's less easy to generate reasonable certificates that can be used for says. We can always fall back to proof terms, but these are large and very brittle.

Kim Morrison (Mar 25 2024 at 14:53):

I think it's a mistake to focus on the slowdown factor. David's example above is excellent:

The example at hand was one_add_mul_self_lt_rpow_one_add, which takes about 4-5 seconds to compile, and which can be got down to 0.5s by using (short) hand-written proofs for the subgoals currently handled by field_simp and nlinarith.

because it points at the real culprit: field_simp is badly broken.

No one should be removing uses of field_simp because they are slow. We need to fix field_simp!

Kim Morrison (Mar 25 2024 at 14:55):

Re: says, I don't think (despite being an early advocate) that it has been particularly successful. Our implementation carries an annoying maintenance burden (because you need to turn on special settings to see the failures). I'm not saying we should get rid of it, but it deserves a rethink before anyone tries rolling it out really widely.

David Loeffler (Mar 25 2024 at 15:59):

Scott Morrison said:

because it points at the real culprit: field_simp is badly broken.

I'm actually very fond of field_simp and use it quite heavily in my projects, but only for goals exceeding a certain degree of complication. IIRC, thefield_simps in the example were proving things like 1 / (1 + x - 1) = 1 / x – stuff that ring could also have handled – and field_simp really doesn't excel at ticking off the easy stuff quickly.

Patrick Massot (Mar 25 2024 at 16:31):

Scott is not saying field_simp is not needed, he wants a better one.

Eric Rodriguez (Mar 25 2024 at 17:09):

Yaël Dillies said:

Is that not basically says?

No, because says always reruns.

Heather Macbeth (Mar 25 2024 at 18:15):

Ruben Van de Velde said:

Would linarith be amenable to generating certificates like polyrith?

Yes, it would. I think this is on a lot of people's mid-term to-do list, including @Rob Lewis' and mine (indeed anyone who wants a mid-level tactic project could volunteer to do this). The idea would be to generalize linear_combination to handle inequalities and then provide a linarith? which outputs an appropriate linear_combination (with the coefficients obtained by the backend work linarith is already doing).

Heather Macbeth (Mar 25 2024 at 18:17):

In many cases a user could see by eye what the right linear_combination is, so it wouldn't even be necessary to go through the linarith? process to construct it.

variable {a b c : ℝ}

example (h : 0 < a) : 0 < a/2 := by linear_combination h/2
example (h : a < b) : 0 < b - a := by linear_combination h
example (h1 : a < b) (h2 : b < c) : a < c := by linear_combination h1 + h2

Heather Macbeth (Mar 25 2024 at 18:19):

The above examples are ones where I always feel a bit guilty writing a linarith call, since it seems so much like overkill -- so it would be good to have more lightweight automation to perform it.

Heather Macbeth (Mar 25 2024 at 18:22):

Here's how this linear_combination tactic would work under the hood:

example (h1 : a < b) (h2 : b < c) : a < c := by
  -- linear_combination h1 + h2
  rw [← sub_neg] at *
  convert add_neg h1 h2 using 1
  ring

Heather Macbeth (Mar 25 2024 at 18:39):

David Loeffler said:

OK OK, let's say something that can't be gcongr'd, like 2 * a + b < a + 2 * b given a < b.

This example would also very much be in scope:

example (h : a < b) : 2 * a + b < a + 2 * b := by linear_combination h

Richard Osborn (Mar 25 2024 at 18:46):

It might be useful to have a performance wish list for tactics i.e. proofs in which we expect the tactic to solve quickly but currently doesn't. Rewriting field_simp without a target to aim for is perhaps a bit daunting to anyone who isn't already very familiar with all the ways it is used within mathlib.

Heather Macbeth (Mar 25 2024 at 18:57):

field_simp currently has a quick'n'dirty implementation. As I understand it, the issue isn't just with speed -- the implementation also doesn't cover some situations which ideally would be in scope.

import Mathlib.Tactic

example {a b : ℝ} (hb: b ≠ 0) :
    b / a * 2 / b = 1 / a * 2 := by
  field_simp
  ring1 -- fails

example {a b : ℝ} (hb: b ≠ 0) :
    b / a * 2 / b = 1 / a * 2 := by
  rw [mul_comm, mul_div_assoc, div_div, mul_comm a b, ← div_div, div_self hb]
  ring1 -- works

David Loeffler (Mar 25 2024 at 19:07):

As I understand it, the issue isn't just with speed

I completely agree. Is it really field_simp's job to deal with easy cases quickly? To me it's much more important that it deal with hard cases eventually; I'm more than happy to use other, more lightweight tactics for the easier cases.

My # 1 wish with field_simp would be for it to give some warning about what it's missing: "I'd be able to get further if you gave me a proof that a ≠ 0", or something. Sometimes I end up just using a string of sorry's to chuck a whole bunch of vaguely-relevant-looking non-vanishing statements into the context until field_simp does what I want, then deleting them one-by-one until I get a minimal set and filling in the sorry's; automating that step away would be very welcome indeed.

Stream: mathlib4

Topic: To automate or not to automate?

David Loeffler (Mar 24 2024 at 13:11):

Eric Wieser (Mar 24 2024 at 13:22):

Yaël Dillies (Mar 24 2024 at 13:22):

Eric Wieser (Mar 24 2024 at 13:23):

Eric Wieser (Mar 24 2024 at 13:25):

David Loeffler (Mar 24 2024 at 13:30):

Eric Wieser (Mar 24 2024 at 13:31):

David Loeffler (Mar 24 2024 at 13:35):

David Loeffler (Mar 24 2024 at 14:11):

Ruben Van de Velde (Mar 24 2024 at 14:15):

David Loeffler (Mar 24 2024 at 14:28):

Eric Wieser (Mar 24 2024 at 14:42):

Kim Morrison (Mar 24 2024 at 18:41):

Kim Morrison (Mar 24 2024 at 18:41):

Kim Morrison (Mar 24 2024 at 18:43):

Kim Morrison (Mar 24 2024 at 18:45):

Kevin Buzzard (Mar 24 2024 at 18:50):

Kim Morrison (Mar 24 2024 at 18:54):

David Loeffler (Mar 24 2024 at 19:05):

Kevin Buzzard (Mar 24 2024 at 19:06):

David Loeffler (Mar 24 2024 at 19:08):

Kevin Buzzard (Mar 24 2024 at 19:08):

Kim Morrison (Mar 24 2024 at 19:16):

Damiano Testa (Mar 24 2024 at 19:26):

David Loeffler (Mar 24 2024 at 19:53):

Eric Wieser (Mar 24 2024 at 19:54):

Yaël Dillies (Mar 24 2024 at 19:55):

Mario Carneiro (Mar 24 2024 at 20:01):

Yaël Dillies (Mar 24 2024 at 20:03):

Kim Morrison (Mar 24 2024 at 20:04):

David Loeffler (Mar 24 2024 at 20:04):

Eric Wieser (Mar 24 2024 at 20:05):

Eric Wieser (Mar 24 2024 at 20:06):

Kim Morrison (Mar 24 2024 at 20:06):

David Loeffler (Mar 24 2024 at 20:07):

Matthew Ballard (Mar 24 2024 at 20:09):

Kim Morrison (Mar 24 2024 at 20:09):

David Loeffler (Mar 24 2024 at 20:14):

Eric Wieser (Mar 24 2024 at 20:15):

Yaël Dillies (Mar 24 2024 at 20:16):

David Loeffler (Mar 24 2024 at 20:28):

Kim Morrison (Mar 24 2024 at 20:42):

Jannis Limperg (Mar 25 2024 at 12:47):

David Loeffler (Mar 25 2024 at 13:45):

Josha Dekker (Mar 25 2024 at 13:59):

Yaël Dillies (Mar 25 2024 at 14:01):

Josha Dekker (Mar 25 2024 at 14:07):

David Loeffler (Mar 25 2024 at 14:13):

Jannis Limperg (Mar 25 2024 at 14:43):

Kim Morrison (Mar 25 2024 at 14:53):

Kim Morrison (Mar 25 2024 at 14:55):

David Loeffler (Mar 25 2024 at 15:59):

Patrick Massot (Mar 25 2024 at 16:31):

Eric Rodriguez (Mar 25 2024 at 17:09):

Heather Macbeth (Mar 25 2024 at 18:15):

Heather Macbeth (Mar 25 2024 at 18:17):

Heather Macbeth (Mar 25 2024 at 18:19):

Heather Macbeth (Mar 25 2024 at 18:22):

Heather Macbeth (Mar 25 2024 at 18:39):

Richard Osborn (Mar 25 2024 at 18:46):

Heather Macbeth (Mar 25 2024 at 18:57):

David Loeffler (Mar 25 2024 at 19:07):

Michael Stoll (Mar 25 2024 at 19:57):