Zulip Chat Archive

Stream: lean4

Topic: Aesop update


Joachim Breitner (Mar 01 2022 at 15:30):

My meta-programming experience was thinking “hey, wouldn’t it be nice to have open inductive predicates and types”, finding a master student to play around with that, and now he’s a fan of SML, but I haven't learned much more :-)

Joachim Breitner (Mar 01 2022 at 15:32):

But auto is pretty good indeed, with the automatic application of intro and elim rules in addition to simp, which is itself already pretty good compared to Lean’s simp (as far as I can tell), with the ability to do rewriting with congruences. I am not sure whether my comments can be actually useful though :-)

Sebastian Ullrich (May 17 2022 at 09:28):

@Jannis Limperg So I gave Aesop a try on a non-trivial proof from our lab course today and it got further than I expected! https://gist.github.com/Kha/5babc9f0ac38d4f8f6094fb5cf3a69d9#file-exercise5-lean-L55
I had to help it out a bit with the filter case analysis, without the extra simp_all the conditional filter_cons_true/false simp theorems didn't seem to get applied. I also tried to apply a custom elim/split theorem elem_filterE, but I don't think that's quite the shape Aesop expects right now (see commented invocation). If it did, it would seem feasible to contract the entire proof to

constructor <;> induction as <;> aesop ...

Sebastian Ullrich (May 17 2022 at 10:37):

but I don't think that's quite the shape Aesop expects right now

Ah, I suppose I confused your elim rules (transformation of a hypothesis, called destruct in Isabelle) with what Isabelle calls elim (transformation of the goal triggered by a hypothesis)

Jannis Limperg (May 17 2022 at 19:46):

Indeed that's what it is. Maybe I should rename the builder for consistency with Isabelle.

I have a collaborator visiting this week, so might not have time to look into this, but I'll get to it next week at the latest. I recently weakened the builtin simp to the equivalent of simp at *, which allows me to cache across goals. But this doesn't seem to do very much (surprisingly), so maybe I'll go back to simp_all. In any case, I'm very happy to hear that you found the tactic (close to) useful.

Sebastian Ullrich (Jun 10 2022 at 15:52):

Sebastian Ullrich said:

This might already be the case for Aesop, but I should mention that perhaps auto's best feature is that despite its heuristics, it reports sensible intermediate goals on failure that usually give you a pretty good idea on what lemma to prove & include next

I've continued experimenting with Aesop by throwing it at the same lab course final project that was previously golfed to death in Isabelle, and I'm really excited about how close we can already get to the auto proofs in some cases. However, I realized that in the quote above I only told half the story: in reality, auto does not error on an unclosed proof at all but returns the unrpoved goal(s), much like simp. Both of them should only be used in terminal position in a clean proof of course, but for experimentation it is in fact crucial that you can continue after half-solving a proof with auto, then try a few manual steps, and finally reintegrate them back into the auto call. Afaict this is the only reliable way of composing a proof from multiple "unsafe" steps, as neither auto nor aesop will (or should) show you the remaining goals after heuristically applying unsafe steps without delving into their enormous traces.

Jannis Limperg (Jun 10 2022 at 18:22):

Okay, I can add this as well. Should (tm) be a relatively small modification to the proof extraction procedure.

Sebastien Gouezel (Jun 10 2022 at 18:38):

I should say that one of the features I like a lot with auto in Isabelle is that you can tell it to perform some unsafe steps. For instance, auto intro: foo will try to apply foo if it leads to completion, but auto intro!: foo will apply foo even if it does not lead to a complete proof. This may make the goal unsolvable, but when you know it's the way to go then it is very efficient. For instance, it is possible that a function f + g is continuous without f and g being continuous, but in most cases it is the way to go to try to prove the continuity of both f and g. Then auto intro!: continuous_add would leave you with the two goals that f and g are continuous.

Jannis Limperg (Jun 10 2022 at 18:50):

Aesop can already do this: if you call it with aesop (add safe foo), it'll use foo as a 'safe' rule that is applied without backtracking. The unsolved goals which Aesop reports to the user are exactly those which are left after all safe rules have been applied (as often as possible).

Aesop does not currently remove an unsafe foo rule if one is already registered, but that hopefully doesn't make much of a difference.

Sebastian Ullrich (Jun 18 2022 at 08:15):

Demo of Aesop already doing a job as least as good/slightly better than CoqHammer's internal solver (i.e. the non-hammer part): https://gist.github.com/Kha/96d67c8b947b48f8786aea90857fbb5c

James Gallicchio (Jun 18 2022 at 18:10):

That's incredible. How fast is aesop on that example?

James Gallicchio (Jun 18 2022 at 18:12):

also, is it possible yet for aesop to emit the tactic steps it finds, sorta like simp? behavior?

Sebastian Ullrich (Jun 18 2022 at 18:54):

I'll have to do a more thorough benchmark when I'm back at that machine, but there was no noticeable delay in the editor at all

Jannis Limperg (Jun 20 2022 at 09:21):

James Gallicchio said:

also, is it possible yet for aesop to emit the tactic steps it finds, sorta like simp? behavior?

Not yet and this requires a fair bit of engineering. (Basically, successful rules will have to construct a Syntax representing the corresponding tactic.) But I want to add this at some point.

For benchmarking, you can use set_option trace.aesop.profile true. This also gives some details about which rules were slow etc.

Sebastian Ullrich (Jun 22 2022 at 11:23):

Quick performance comparison before/after changing most of the proofs in our theorem lab's final project to Aesop one-liners:

cumulative profiling times:
    compilation 61.1ms
    elaboration 789ms
    import 15.1ms
    initialization 17.9ms
    interpretation 12.4ms
    parsing 8.61ms
    simp 1.04s
    typeclass inference 78.3ms
2.15user 0.04system 0:02.20elapsed 100%CPU (0avgtext+0avgdata 226124maxresident)k
cumulative profiling times:
    compilation 65ms
    elaboration 581ms
    import 116ms
    initialization 17.7ms
    interpretation 1.1s
    parsing 10.2ms
    simp 1.81s
    typeclass inference 92.8ms
3.91user 0.06system 0:03.98elapsed 100%CPU (0avgtext+0avgdata 445548maxresident)k

Sebastian Ullrich (Jun 22 2022 at 11:28):

Mind you, this is using my simplistic simpAll rule. I'm sure @Jannis Limperg can come up with a much more performant implementation.

Sebastian Ullrich (Jun 22 2022 at 13:39):

I presented Aesop to the students of our Lean 4 lab course yesterday. We'll see if they find it helpful for the final project (which has already started), but here are my slides if anyone else is interested: https://github.com/IPDSnelting/tba-2022/blob/master/slides/lecture9.pdf

Sebastian Ullrich (Jun 23 2022 at 08:35):

Once more, with feeling --load-dynlib to eliminate the interpretation overhead in Aesop:

cumulative profiling times:
    compilation 64.4ms
    elaboration 897ms
    import 119ms
    initialization 17.5ms
    interpretation 184ms
    parsing 10.6ms
    simp 1.81s
    typeclass inference 93.6ms
Command exited with non-zero status 1
3.36user 0.08system 0:03.44elapsed 100%CPU (0avgtext+0avgdata 444632maxresident)k

That's not bad.

Kevin Buzzard (Jun 23 2022 at 09:34):

Command exited with non-zero status 1 is usually bad, right?

Henrik Böving (Jun 23 2022 at 09:38):

Yes, UNIX systems have a thing called an exit code, 0 means fine, everything \neq 0 means, something bad happened. If the program is sticking to established standards it will return one of the so called errno (https://www.man7.org/linux/man-pages/man3/errno.3.html) constants. You can translate a number back to a name from this manpage with errno <number>:

λ errno 1
EPERM 1 Operation not permitted

Sebastian Ullrich (Jun 23 2022 at 09:42):

Hah, forgot to edit that out... it's from a proof that should be finished with simpAll, but my current implementation of it fails if the goal did not change (to prevent looping), only the local context did

(unprovable)
...
a : Γ x¹ = some τ
fwd : Γ x¹ = some (Val.ty v)
 Γ  const v : τ

Since I'm convinced I have "morally" completed the proof, I'm simply waiting for Aesop to get better instead of finding a different proof :halo: .

Henrik Böving (Jun 23 2022 at 09:42):

"Proof by morality"

Jannis Limperg (Jun 23 2022 at 13:04):

@Sebastian Ullrich I've now added most of your extensions as builtin rules: simp_all during normalisation, automatic splitting of the target and of hypotheses, automatic substitution of equations whose LHS or RHS is a variable. (My test suite actually got faster, presumably because more goals now get solved by simp.)

Once more, with --load-dynlib to eliminate the interpretation overhead in Aesop:

Which incantation did you use to test this?

Regarding performance in general: while I've been careful not to introduce unnecessary overhead, I haven't looked into any bigger optimisations yet. Aesop tends to call rules many times on similar goals, so there are certainly caching opportunities.

Sebastian Ullrich (Jun 23 2022 at 13:58):

Awesome, I'll give it a try!

Which incantation did you use to test this?

It was something like

(cd lean_packages/aesop; lake build :sharedLib)

and

lake env lean Foo.lean --profile --load-dynlib=./lean_packages/aesop/build/lib/Aesop.so

Jannis Limperg (Jun 23 2022 at 14:00):

Thanks! I'll add this to the testsuite.

Jannis Limperg (Jun 23 2022 at 14:32):

Hmm, when I run these commands (with a lake clean thrown in for good measure), I get Error: index out of bounds for every file. But the unused variable linter still complains, so the files seem to get typechecked.

Sebastian Ullrich (Jun 23 2022 at 14:35):

Yes, I didn't investigate that yet

Mac (Jun 24 2022 at 22:19):

Sebastian Ullrich said:

It was something like

(cd lean_packages/aesop; lake build :sharedLib)

Fyi, lake build Aesop:sharedLib should work as well. :big_smile: (If it doesn't, that's a bug).

Sebastian Ullrich (Jun 28 2022 at 14:16):

I also tested accelerating Aesop using precompileModules := true in its lakefile.lean with Lake master, and performance seems to be equal to the manually loaded single shared lib above. @Mac This might be worth a Lake bump in lean4 if you're not already planning one?

Sebastian Ullrich (Jun 28 2022 at 14:17):

For what it's worth, we are already at 2905 bytes of --load-dynlib flags :big_smile:

Mac (Jun 28 2022 at 17:08):

Sebastian Ullrich said:

Mac This might be worth a Lake bump in lean4 if you're not already planning one?

My plan is to have a bump by the end of the week. :smile:


Last updated: Dec 20 2023 at 11:08 UTC