Zulip Chat Archive

Stream: lean4

Topic: code generator does not support recursor


Jad Ghalayini (Mar 17 2022 at 02:43):

So; I want to generate functions using tactics, and the easiest way to do that is to use induction. Unfortunately; induction lowers to usage of the recursor, and therefore I get "code generator does not support recursor" errors; I do want my functions to be computable, so this is irritating. Is there anything in the way of the code generator supporting recursors for at least simple inductive types? I mean, it should be quite trivial to lower it to the appropriate recursive function, no?

Arthur Paulino (Mar 17 2022 at 02:46):

Posting some code might be helpful on this one

Chris B (Mar 17 2022 at 04:58):

I think the error message for this is "does not support recursor yet", I'm sure there's a reason why they haven't gotten around to this. You can use match in tactics which should be able to get you pretty close.

Jad Ghalayini (Mar 17 2022 at 13:23):

match in tactics doesn't let me automate my cases properly, so that doesn't work. As for source code, something as simple as

inductive MyTree: Type
| leaf: MyTree
| branch: MyTree -> MyTree -> MyTree

def count_leaves (m: MyTree): Nat := by {
  induction m with
  | leaf => exact 1
  | branch l r Il Ir => exact Il + Ir
}

gives the error

code generator does not support recursor 'MyTree.rec' yet, consider using 'match ... with' and/or structural recursion

Manual use of MyTree.rec yields the same error; but the recursor should be extremely easy to generate code for, no?

Jad Ghalayini (Mar 17 2022 at 13:27):

Note that this works if I stick to noncomputable def, which is fine for now but irritating long-term, especially since it's infectious. Ofc I could manually implement this with partial recursion, but the whole reason why I want to use induction is to avoid writing out a massive 30-branch induction

Arthur Paulino (Mar 17 2022 at 13:41):

Kind of #xy but would something like this work?

inductive MyTree: Type
| leaf:   MyTree
| branch: MyTree  MyTree  MyTree

open MyTree

def MyTree.countLeaves : MyTree  Nat
  | leaf       => 1
  | branch l r => l.countLeaves + r.countLeaves

def treeEx : MyTree :=
  branch leaf leaf

#eval treeEx.countLeaves -- 2

Arthur Paulino (Mar 17 2022 at 13:53):

I'm a new Lean user, but, to me, the strength of tactic mode is to provide an interface that helps you generate a term that type checks. For computations in which the generated term matters, I would just stick to functions

Arthur Paulino (Mar 17 2022 at 14:02):

Do you have a motivation to use tactics for your computation?

Kevin Buzzard (Mar 17 2022 at 14:17):

In Lean 3 the big problem with using tactics to generate data is that it can become very difficult to prove anything about the data, because in general tactics are just designed to "get that proof term, no matter how" so it doesn't matter if they produce horrible code because it's all erased after. With data you don't know what you're getting with the induction tactic, whereas the equation compiler is specifically designed for this purpose.

Mario Carneiro (Mar 17 2022 at 15:23):

In lean 4 it is somewhat the other way around: using the induction tactic is the simplest / most reliable way to get a foo.rec term with no extra fluff (you can't easily write foo.rec directly because the motive is not inferred correctly because @[elab_as_eliminator] doesn't exist)

Mario Carneiro (Mar 17 2022 at 15:24):

I recently had to deal with this when writing an expression that needed to compute in the kernel. I did this:

def hasEdge' : List Nat  List Nat  Bool  Bool  Bool
| a::l, b::r, found, n => if a = b then hasEdge' l r found n else
  bif found || (b - a == 2 || a - b == 2) then
    n || hasEdge' l r true true
  else
    hasEdge' l r false true
| a, b, c, d => false

@[implementedBy hasEdge'] def hasEdge : List Nat  List Nat  Bool  Bool  Bool := by
  intro l; induction l with
  | nil => exact fun _ _ _ => false
  | cons a l ih => intro r; induction r with
    | nil => exact fun _ _ => false
    | cons b r => exact
      bif a == b then ih r else fun found =>
        bif found || (b - a == 2 || a - b == 2) then
          fun n => n || ih r true true
        else
          fun _ => ih r false true

Mario Carneiro (Mar 17 2022 at 15:26):

the first term is optimized for the compiler and the second one is optimized for the kernel

Mario Carneiro (Mar 17 2022 at 15:31):

The simple solution here is indeed to just implement foo.rec (and foo.casesOn) in the compiler. It should not be too hard, and I'm even thinking of trying my hand at it although the compiler is definitely in @Leonardo de Moura 's jurisdiction

Leonardo de Moura (Mar 17 2022 at 16:24):

@Mario Carneiro The compiler already has support for casesOn. The support for rec will only happen after we port the parts of the compiler still written in C++ to Lean. Unfortunately, this port will not happen any time soon. It is a tough decision, we need to port this C++ code to Lean and fix many problems we have there, but we also have many higher priority tasks.
BTW, I view the support for rec as a super low priority and it will benefit very few users.

Mario Carneiro (Mar 17 2022 at 16:28):

It is something I have run into a half dozen times so far, I'm working up the courage to do it myself since I'm aware it's a low priority

Leonardo de Moura (Mar 17 2022 at 16:29):

@Mario Carneiro Are they all the same use-case? Could you explain it?

Mario Carneiro (Mar 17 2022 at 16:33):

The use case that appeared above was that I wanted a term that computes as efficiently as possible in the kernel because it is part of a large proof by reflection; I don't know how to get casesOn applications using cases but induction produces rec applications

Mario Carneiro (Mar 17 2022 at 16:35):

and then I wanted such a term to be sanity-checkable using the compiler

Mario Carneiro (Mar 17 2022 at 16:35):

as you can see above my workaround is to rewrite the whole term in compiler-friendly format

Leonardo de Moura (Mar 17 2022 at 16:37):

Mario Carneiro said:

The use case that appeared above was that I wanted a term that computes as efficiently as possible in the kernel because it is part of a large proof by reflection; I don't know how to get casesOn applications using cases but induction produces rec applications

We are planning to improve the reduction engine in the kernel, and have better support for proofs by reflection.
You will not need to use the trick above after we implement these improvements. I think you should avoid these tricks for now and try to survive with the current performance overhead. Is it bigger than 2x?

Mario Carneiro (Mar 17 2022 at 16:42):

I suspect it would be possible to write the foo.rec generator entirely in lean: you just have to call compileDecl on foo.rec with a term that looks like

inductive Foo : Type
| one : Nat  Foo
| two : Foo  Foo

def Foo.rec'.{u} {motive : Foo  Sort u}
  (h1 : (a : Nat)  motive (Foo.one a))
  (h2 : (a : Foo)  motive a  motive (Foo.two a)) : (t : Foo)  motive t
| one a => h1 a
| two a => h2 a (rec' h1 h2 a)

Mario Carneiro (Mar 17 2022 at 16:49):

@Leonardo de Moura Not 2x, but measurable. I have a proof of (the computational part of) the Keller conjecture counterexample in dimension 8, and without using bare terms the proof by reflection (essentially an enumeration of 2*16^2 possibilities) takes 10.65 s, with the optimizations it is 7.9 s. I can send you the file as a test case

Leonardo de Moura (Mar 17 2022 at 16:49):

If I recall correctly, we used this hack in Lean 3, but it didn't work well: poor performance, unexpected behavior because Lean is a strict language.
In Lean 4, we also have extra complexity: mutual and nested inductives.

Mario Carneiro (Mar 17 2022 at 16:50):

One thing that I would like to see improved with this kind of stuff is better debugging support for kernel reduction. It is hard to tell whether a reduction is being cached or redone, for example

Leonardo de Moura (Mar 17 2022 at 16:50):

Mario Carneiro said:

Leonardo de Moura Not 2x, but measurable. I have a proof of (the computational part of) the Keller conjecture counterexample in dimension 8, and without using bare terms the proof by reflection (essentially an enumeration of 2*16^2 possibilities) takes 10.65 s, with the optimizations it is 7.9 s. I can send you the file as a test case

It would be great if you can survive with this overhead. The new reduction engine will make it way faster, and make the trick above obsolete.
Yes, it would be great to have the file.

Leonardo de Moura (Mar 17 2022 at 16:52):

Mario Carneiro said:

One thing that I would like to see improved with this kind of stuff is better debugging support for kernel reduction. It is hard to tell whether a reduction is being cached or redone, for example

Yes, it would be great to have better profiling tools in the kernel as we embrace proofs-by-reflection more and more.

Mario Carneiro (Mar 17 2022 at 16:52):

Keller.lean

Leonardo de Moura (Mar 17 2022 at 16:52):

Thanks.

Mario Carneiro (Mar 17 2022 at 16:52):

it has some minor dependencies on mathlib for the List.Pairwise decision procedure and run_cmd

Leonardo de Moura (Mar 17 2022 at 16:55):

Mario Carneiro said:

it has some minor dependencies on mathlib

Could you please remove them? :grinning_face_with_smiling_eyes:
It would be great to have them on our benchmark suite. @Sebastian Ullrich has a system for tracking progress over time. We have found many performance regression issues with this system.

Mario Carneiro (Mar 17 2022 at 16:55):

run_cmd should totally be part of lean 4 anyway :upside_down:

Sebastian Ullrich (Mar 17 2022 at 16:57):

Something I brought up recently is to add timing information to the traces. Together with lazy traces (because tracing the whole reduction would be counterproductive for benchmarking), that should give you a pretty quick idea on what parts are cached/exprensive, even without specific profiling support for reductions.

Sebastian Ullrich (Mar 17 2022 at 16:58):

Lazy traces are not really necessary for that as long as the trace classes are sufficiently granular

Leonardo de Moura (Mar 17 2022 at 16:59):

@Sebastian Ullrich We don't have trace messages in the kernel :(

Leonardo de Moura (Mar 17 2022 at 16:59):

Mario Carneiro said:

run_cmd should totally be part of lean 4 anyway :upside_down:

Sure, do you want to submit a PR?

Sebastian Ullrich (Mar 17 2022 at 16:59):

Oops, I was purely thinking of reduction in the elaborator :)

Joachim Breitner (Mar 17 2022 at 17:01):

@Sebastian Ullrich has a system for tracking progress over time. We have found many performance regression issues with this system.

Now I am curious; what system are you using?

Sebastian Ullrich (Mar 17 2022 at 17:02):

Yet another PSE project of course :) http://speedcenter.informatik.kit.edu/velcom/home

Mario Carneiro (Mar 17 2022 at 17:05):

mathlib-free via copious inlining: Keller.lean

Leonardo de Moura (Mar 17 2022 at 17:05):

Thanks.

Joachim Breitner (Mar 17 2022 at 17:05):

Uh, pretty. rm -rf gipeda :-)

Joachim Breitner (Mar 17 2022 at 17:08):

Oh, Java. borg extract gipeda/.

Mario Carneiro (Mar 17 2022 at 17:10):

@Leonardo de Moura said:

Mario Carneiro said:

run_cmd should totally be part of lean 4 anyway :upside_down:

Sure, do you want to submit a PR?

In the process of inlining I found that it depends on two other, also useful features:

unsafe def evalExpr (α) (expectedType : Expr) (value : Expr) (safety := DefinitionSafety.safe) : MetaM α
unsafe def evalTerm (α) (type : Expr) (value : Syntax) (safety := DefinitionSafety.safe) : TermElabM α

Are you interested in receiving these as well, or should I extract just run_cmd?

Mario Carneiro (Mar 17 2022 at 17:16):

By the way, this is not the original proof I had for Keller dim 8. The first version used a naive clique check which is 256^2 tests; this is easy enough with #eval / nativeDecide but took longer than I had patience to wait for in by decide. The version here does a lot more reasoning to get the computational part to a feasible level, but perhaps if the reduction engine gets good enough the simpler version will work as a benchmark

Leonardo de Moura (Mar 17 2022 at 17:18):

Are you interested in receiving these as well, or should I extract just run_cmd?

Sure.

Leonardo de Moura (Mar 17 2022 at 17:20):

The version here does a lot more reasoning to get the computational part to a feasible level, but perhaps if the reduction engine gets good enough the simpler version will work as a benchmark

Whoa, we can try to make it fast, not sure we will be able to handle this one in the near future.

Mario Carneiro (Mar 17 2022 at 17:21):

Heh, there's always a bigger mountain. You don't want to see how much worse Keller dim 7 is

Jad Ghalayini (Mar 18 2022 at 22:05):

I'm confused; why is it difficult for the code-generator to support recursors? Isn't it possible to simply (hackily) lower them to partial recursive functions for now?

Jad Ghalayini (Mar 21 2022 at 02:09):

Leonardo de Moura said:

If I recall correctly, we used this hack in Lean 3, but it didn't work well: poor performance, unexpected behavior because Lean is a strict language.
In Lean 4, we also have extra complexity: mutual and nested inductives.

@Leonardo de Moura would it be possible to temporarily have this hack and later swap it out for something better? I believe that mutual/nested inductives could be taken care of very easily by just generating partial functions as the implementation. Would this be a good issue for a beginner to tackle?

Mario Carneiro (Mar 21 2022 at 05:10):

@Leonardo de Moura said:

If I recall correctly, we used this hack in Lean 3, but it didn't work well: poor performance, unexpected behavior because Lean is a strict language.
In Lean 4, we also have extra complexity: mutual and nested inductives.

I missed this comment earlier. In what sense is this a hack? This is literally the meaning of foo.rec as a function. I'm not saying that we should only use these recursor functions, but when you use it directly then this is the behavior I would expect.

What alternatives are there? The fact that nat.rec will eagerly evaluate motive 0 is not great, I agree, but it's hard to see what else to do. I suppose you could make that match definition macroInline which could recover some of the laziness...?

Leonardo de Moura (Mar 21 2022 at 11:36):

@Mario Carneiro

I missed this comment earlier. In what sense is this a hack? This is literally the meaning of foo.rec as a function. I'm not saying that we should only use these recursor functions, but when you use it directly then this is the behavior I would expect.

This is not the behavior users expect. It is also not consistent with the approach we use for casesOn where we use lazy semantics.
BTW, in previous versions of Lean, we did generate a rec function, and users were confused when they realized it was not lazy.

What alternatives are there? The fact that nat.rec will eagerly evaluate motive 0 is not great, I agree, but it's hard to see what else to do. I suppose you could make that match definition macroInline which could recover some of the laziness...?

One possibility is to have special support for rec in the compiler, similar to what we have for casesOn.

I suppose you could make that match definition macroInline which could recover some of the laziness...?

Not sure how far you can go using just macroInline, but some code transformation that replaces rec with recLazy should also. For example, Nat.recLazy would be

def Nat.recLazy (motive : Nat  Sort u) (hz : Unit  motive 0) (hs : (n : Nat)  (Unit  motive n)  motive (Nat.succ n)) (n : Nat) : motive n :=
  match n with
  | 0   => hz ()
  | n+1 => hs n (fun _ => recLazy motive hz hs n)

Now, whenever we find code such as

  @Nat.rec (fun _ => Nat)
    (dbg_trace "hz"; 10)
    (fun n ih => dbg_trace s!"hs: {n}"; if n > 10 then 2* ih else 1)
    20

We replace it with

  Nat.recLazy (fun _ => Nat)
    (fun _    => dbg_trace "hz"; 10)
    (fun n ih => dbg_trace s!"hs: {n}"; if n > 10 then 2* (ih ()) else 1)
    20

Mario Carneiro (Mar 21 2022 at 11:41):

Do you think that recLazy should actually exist and be used by users etc, or is this just a compiler internal thing that isn't in the environment?

Leonardo de Moura (Mar 21 2022 at 11:45):

Jad Ghalayini said:

Leonardo de Moura said:

If I recall correctly, we used this hack in Lean 3, but it didn't work well: poor performance, unexpected behavior because Lean is a strict language.
In Lean 4, we also have extra complexity: mutual and nested inductives.

Leonardo de Moura would it be possible to temporarily have this hack and later swap it out for something better? I believe that mutual/nested inductives could be taken care of very easily by just generating partial functions as the implementation. Would this be a good issue for a beginner to tackle?

This kind of temporary hack is a headache for us because the proper solution is not a priority right now.
Suppose we add the hack, then I am very confident another user will hit the unexpected behavior I described above and will file a bug report. Now, we have a bug report for a feature that we did not even consider important, and is usually misused by users.
We do use hacks sometimes, but there are hacks for features we consider important.

Would this be a good issue for a beginner to tackle?

I would suggest you manually define your own recLazy function for the types you care about. Note that the tactic induction has the using modifier that allows us to specify a different elimination principle. That is, you can use induction x using Nat.recLazy

Mario Carneiro (Mar 21 2022 at 11:48):

Perhaps a target that might be more accessible for newcomers interested in usefully contributing to this issue is to write a program to generate recLazy for an arbitrary inductive type

Mario Carneiro (Mar 21 2022 at 11:49):

and defer the question of how to integrate this with the compiler

Leonardo de Moura (Mar 21 2022 at 11:55):

Mario Carneiro said:

Do you think that recLazy should actually exist and be used by users etc, or is this just a compiler internal thing that isn't in the environment?

I don't have a preference here. To be honest, I would prefer to have really good documentation explaining why we have rec, and why it should not be used directly in programs. Make it clear to the user that rec is just one of the tools we have to show termination. We could also explain Nat.recLazy, and why even this approach would produce suboptimal code.

Leonardo de Moura (Mar 21 2022 at 11:55):

Mario Carneiro said:

Perhaps a target that might be more accessible for newcomers interested in usefully contributing to this issue is to write a program to generate recLazy for an arbitrary inductive type

Yes, this can be a fun project for newcomers.

Mario Carneiro (Mar 21 2022 at 11:57):

I think there is a tension with the fact that rec is a primitive of dependent type theory (and lean 4 doesn't change this). It feels like it should be the most efficient thing from one perspective, but for the compiler something like match is more primitive. Writing code that works well in both the compiler and the kernel is therefore tricky

Mario Carneiro (Mar 21 2022 at 11:59):

As a library writer, I want to write code that is good in all the ways, and that means reaching for implementedBy which is the best tool we have at the moment for this kind of thing

Leonardo de Moura (Mar 21 2022 at 12:12):

@Mario Carneiro I agree the "kernel vs compiled" issue is annoying, and it will be fixed in the future. We will have better reduction engines in the kernel in the future, and you will not need to be concerned about this anymore. Recall the messages we exchanged in the other thread about the workaround you were using for this issue.

Mario Carneiro (Mar 21 2022 at 12:14):

Right, sorry to rehash the issue. External verifiers will probably still use the original reduction algorithm though, so I doubt it will ever become completely irrelevant

Mario Carneiro (Mar 21 2022 at 12:15):

Anyway I think we're on the same page for this issue. There are plans for this but they are all rather long term

Leonardo de Moura (Mar 21 2022 at 12:18):

Mario Carneiro said:

Right, sorry to rehash the issue. External verifiers will probably still use the original reduction algorithm though, so I doubt it will ever become completely irrelevant

Yes, I care about external verifiers too. We want to make sure our developments can still be checked by relatively simple external verifiers.
I believe one of the optimizations we are going to use is easy to implement in external verifiers.

James Gallicchio (Mar 21 2022 at 17:13):

Leonardo de Moura said:

I don't have a preference here. To be honest, I would prefer to have really good documentation explaining why we have rec, and why it should not be used directly in programs. Make it clear to the user that rec is just one of the tools we have to show termination. We could also explain Nat.recLazy, and why even this approach would produce suboptimal code.

Just seconding this -- I'd encountered the compiler error before, but had no idea why rec didn't compile until reading this thread (and now it makes sense). I'd be happy to help write something up if nobody else is taking it on.

The compiler error could even provide a link to the discussion, kinda like rustc does with common errors.

Leonardo de Moura (Mar 21 2022 at 22:11):

The compiler error could even provide a link to the discussion, kinda like rustc does with common errors

This is a great suggestion.
@Gabriel Ebner You once told me that the Zulip links are not very stable and may change if, for example, someone marks the topic as resolved. Should we use links to https://leanprover-community.github.io/archive/ in error messages instead?

Kevin Buzzard (Mar 22 2022 at 08:20):

There have been more recent changes to the way Zulip can be accessed by people who are not part of the community. They're still in beta though so perhaps they might change again.

Gabriel Ebner (Mar 22 2022 at 08:22):

The archive was mostly intended as a stop-gap solution until it was implemented in Zulip itself. As Kevin said, Zulip can now be accessed without an account. Once it can be indexed by search engines as well, we might even retire the archive.

Gabriel Ebner (Mar 22 2022 at 08:25):

You once told me that the Zulip links are not very stable and may change

The Zulip team is slowly working on that. Apparently links to messages are stable now, see the discussion at https://github.com/zulip/zulip/issues/21505 (not sure if the change referenced in that issue still needs to be deployed).


Last updated: Dec 20 2023 at 11:08 UTC