Zulip Chat Archive

Stream: lean4

Topic: Segfault due to mutual recursion?


Siddharth Bhat (Sep 29 2021 at 02:03):

I'm on the LEAN4 nightly, and I've encountered a segfault when implementing a custom Parser monad to write a hand written recursive-descent parser. Here's a full gist with the repro. I'm on LEAN4 commit f4759c9a223f

I've pasted the program here for ease-of-use:

inductive Maybe (a : Type) : Type where
| ok: a -> Maybe a
| err: Maybe a

structure P (a: Type) where
   runP: String -> Maybe (String × a)

def ppure {a: Type} (v: a): P a := { runP :=  λ s => Maybe.ok (s, v) }

def pbind {a b: Type} (pa: P a) (a2pb : a -> P b): P b :=
   { runP := λs => match pa.runP s with
            | Maybe.ok (s', a) => (a2pb a).runP  s'
            | Maybe.err => Maybe.err
   }

instance : Monad P := { pure := ppure, bind := pbind }

def pfail : P a := { runP := λ _  => Maybe.err }

instance : Inhabited (P a) where default := pfail

mutual
partial def pregion : P Unit :=  do
  pblock

partial def pop : P Unit := do
  pregion

partial def pblock : P Unit := do
   pop
end  -- end mutual

def main: IO Unit := do
  return ()

Am I violating some contract that causes this program to crash? I currently suspect that the mutual block I have somehow violates something. However, it's dead code (main doesn't invoke it!) so I'm not sure what the actual problem is.

Mac (Sep 29 2021 at 02:06):

I would suspect the problem is the combination of mutual and partial. You should be using one or the other, not both.

Siddharth Bhat (Sep 29 2021 at 02:07):

What if I do really need both? (mutually recursive functions whose termination is too complicated to prove, and are hence partial for practical purposes.)

Also, is this a bug?

Mac (Sep 29 2021 at 02:10):

@Siddharth Bhat partial definitions can already be written in a mutual recursive manner but just unpacking them. The partial definition:

partial def foo := bar

is essentially equivalent to:

unsafe def fooImpl : T := bar
@[implementedBy fooImpl] constant foo : T

Siddharth Bhat (Sep 29 2021 at 02:13):

@Mac I don't understand how to extend this to a definition of bar that depends on foo. I naively tried:

unsafe def fooImpl : Unit := bar
unsafe def barImpl : Unit := foo
@[implementedBy fooImpl] constant foo : Unit
@[implementedBy barImpl] constant bar : Unit

which doesn't work. Could you please show me how? :)

Mac (Sep 29 2021 at 02:15):

This should work:

constant foo : Unit
constant bar : Unit

unsafe def fooImpl : Unit := bar
unsafe def barImpl : Unit := foo

attribute [implementedBy fooImpl] foo
attribute [implementedBy barImpl] bar

Mac (Sep 29 2021 at 02:18):

Siddharth Bhat said:

Also, is this a bug?

Yes. Lean should not be segfaulting. My point was that you are doing something that is likely very unexpected (combining mutual and partial) and thus it was probably not well tested (resulting in said segfault).

Siddharth Bhat (Sep 29 2021 at 02:21):

Interesting, the suggested change fixes the segfault :) Thanks @Mac , I'll file a bug report and use the workaround for now.

Siddharth Bhat (Sep 29 2021 at 02:28):

@Mac I filed an issue: https://github.com/leanprover/lean4/issues/697

Mario Carneiro (Sep 29 2021 at 04:04):

Mac said:

Siddharth Bhat said:

Also, is this a bug?

Yes. Lean should not be segfaulting. My point was that you are doing something that is likely very unexpected (combining mutual and partial) and thus it was probably not well tested (resulting in said segfault).

I have used partial and mutual together many times without issue. All definitions in the block must be all-partial or all not, but otherwise it works just as you would expect.

Mario Carneiro (Sep 29 2021 at 04:07):

The fact that the recursion is completely unguarded here might be an issue though (mutual functions are usually functions with at least one argument), and my guess is that it is hitting an internal evaluation limit when constructing the closed terms pregion, pop, pblock

Mac (Sep 29 2021 at 04:38):

Mario Carneiro said:

I have used partial and mutual together many times without issue. All definitions in the block must be all-partial or all not, but otherwise it works just as you would expect.

In that case, would it make more sense for parital to be a modifier to mutual in such cases rather than part of each def?

Mario Carneiro (Sep 29 2021 at 04:40):

There are several features of the declarations of the mutual block that could arguably be moved to the header. That was the lean 3 design. My understanding is that the lean 4 design is intended to make mutual defs read just like regular defs, and enable more flexibility in the way the individual definitions are constructed

Mac (Sep 29 2021 at 04:43):

My reason for saying this is that mutual tends to imply a lot of complex kernel type checking to make sure the recursion is valid. With all partial definitions, all that is skipped. The only thing the block is doing at that point is allowing for forward reference. Right?

Mac (Sep 29 2021 at 04:44):

The heavy distinction there feels like it should be clearer.

Mario Carneiro (Sep 29 2021 at 04:44):

The elaboration of partial and non-partial mutual defs is the same

Mac (Sep 29 2021 at 04:44):

@Mario Carneiro What do you mean by that?

Mario Carneiro (Sep 29 2021 at 04:45):

The definitions are typechecked in the same context, with the same things in scope

Mario Carneiro (Sep 29 2021 at 04:45):

the only difference is in the backend compilation in the kernel

Mario Carneiro (Sep 29 2021 at 04:46):

they also produce the same compiled code

Mac (Sep 29 2021 at 04:46):

Mario Carneiro said:

the only difference is in the backend compilation in the kernel

which is why I defined the difference as "complex kernel type checking"

Mario Carneiro (Sep 29 2021 at 04:47):

I'm not sure I understand your suggestion

Mario Carneiro (Sep 29 2021 at 04:48):

Do you think that unsafe should be divorced from the line with the def as well?

Mario Carneiro (Sep 29 2021 at 04:48):

If it's a large mutual block, it might not be obvious that the def is in a block at all, in which case the displaced partial could be confusing

Mac (Sep 29 2021 at 04:50):

That is a weird argument, if its a large mutual block that one has forgotten is mutual, it might be just as confusing where the recursive functions come from (as one normally assume they are defined earlier in the file rather than later).

Mac (Sep 29 2021 at 04:52):

My view was simply that as the termination checking occurs w.r.t. the entire mutual block and partial essentially means 'don't do termination checking', it seems most logica; (to me) to apply that to the mutual which is doing said checking as opposed to the def which is not.

Mario Carneiro (Sep 29 2021 at 04:54):

The application of partial, or unsafe, is restricted to apply to a block at a time, which is why all defs in a block have to have the same partial/unsafe setting

Mac (Sep 29 2021 at 04:54):

For instance, the new termination_by statement for wf recursion will apply to the end of the mutual block instead of at each def -- so for consistency it makes sense to apply the partial (its opposite, essentially) to the block as well.

Mario Carneiro (Sep 29 2021 at 04:55):

The fact that it is written on a def is only for convenience / style

Mac (Sep 29 2021 at 04:55):

@Mario Carneiro I think it would be hard to argue that repeating partial multiple times is more 'convenient' than doing it once.

Mario Carneiro (Sep 29 2021 at 04:56):

Perhaps you might not feel the same way if you have used lean 3 mutual defs

Mac (Sep 29 2021 at 04:57):

how so?

Mac (Sep 29 2021 at 04:58):

It also violates a common adage of CS: don't repeat yourself (DRY).

Mac (Sep 29 2021 at 04:59):

Also, Lean is already quite fond of this in other places -- i.e., DRY is what things like universe and variable are for (which also apply to the whole block).

Mario Carneiro (Sep 29 2021 at 05:00):

Like I said, lean 3 mutual defs don't repeat anything: meta/noncomputable shows up only once, in the mutual header, as well as parameters that aren't participating in the recursion. It is followed by a list of with defs that give the body of the definitions. In actual use, with has an unfortunate status as not quite a def; it means that you have to learn a different syntax for defs and you can't use other commands in the block

Mario Carneiro (Sep 29 2021 at 05:01):

Keep in mind, when proposing to change this, that this is literally what lean 4 changed away from

Mac (Sep 29 2021 at 05:01):

Mario Carneiro said:

as well as parameters that aren't participating in the recursion.

I can definitely see why this would be annoying

Mac (Sep 29 2021 at 05:02):

@Mario Carneiro Also, fyi, I did use Lean 3 myself. :wink: I don't remember having a problem with mutual there.

Mario Carneiro (Sep 29 2021 at 05:02):

Lean 4 also does the parameter thing differently when you call a function recursively, even without mutual - you need to repeat parameters in the function call

Mario Carneiro (Sep 29 2021 at 05:02):

the lean 3 way is DRY but also confusing

Mac (Sep 29 2021 at 05:02):

@Mario Carneiro Yeah I am quite with that. That always tripped me up in Lean 3.

Mac (Sep 29 2021 at 05:03):

Mario Carneiro said:

the lean 3 way is DRY but also confusing

There is also the fact that it just wouldn't work with partial defs at all.

Mario Carneiro (Sep 29 2021 at 05:03):

how so?

Mac (Sep 29 2021 at 05:04):

A partial def has no delineation between recursive and non-recursive parameters so the idea wouldn't really apply there, right?

Mario Carneiro (Sep 29 2021 at 05:04):

meta def foo (a : ) :   
| 0 := foo a
| (n+1) := foo n

Mac (Sep 29 2021 at 05:05):

How does meta know that I don't want to recur on both arguments?

Mario Carneiro (Sep 29 2021 at 05:05):

because only one is right of the colon

Mario Carneiro (Sep 29 2021 at 05:06):

that's how the rule goes in lean 3

Mac (Sep 29 2021 at 05:06):

Yeah, it ties pattern matching with recursion, which I think is the bigger problem there.

Mario Carneiro (Sep 29 2021 at 05:07):

you don't have to do any actual pattern matching though

Mac (Sep 29 2021 at 05:08):

Yes but it ties the pattern-matching syntax with recursion, which is weird. And it would make writing most partial functions (which tend to be structure in normal CS style) very weird.

Mac (Sep 29 2021 at 05:09):

As the boilerplate of partial def foo : a -> b -> c | a, b => would be quite common.

Mac (Sep 29 2021 at 05:10):

Furthermore, def alternates in Lean 4 are equivalent to an internal match (which also makes me quite happy), this style of delineating recursion wouldn't even be logical anymore in Lean 4.

Mac (Sep 29 2021 at 05:12):

My point being that the parameter part of old mutual is much more than just DRY it also encapsulates/demands more information than one might wish to provide (and, as you said, breaks consistency).

Mac (Sep 29 2021 at 05:13):

On the other hand partial is a modifier, and some modifiers already only work in certain circumstances, so there is much less consistency to break (if any).

Mario Carneiro (Sep 29 2021 at 05:14):

Note that you can still have "parameters" in lean 4 style recursion, by using variable. I haven't decided whether this is a good thing or inconsistent yet

Mario Carneiro (Sep 29 2021 at 05:16):

partial def foo (a : Nat) : Nat  Nat
| 0 => foo a a
| n+1 => foo a n

variable (a : Nat)
partial def foo' : Nat  Nat
| 0 => foo' a
| n+1 => foo' n

Mac (Sep 29 2021 at 05:16):

@Mario Carneiro Personally, if I were writing a style guide, I would ban explicit binders in variable, as it makes the parameters one needs to specify of a def (and there order) hard to figure out.

Mac (Sep 29 2021 at 05:18):

Especially if the def is far removed from the variable.

Mario Carneiro (Sep 29 2021 at 05:19):

looking at lean 4 itself it seems like about 1/3 of variable lines violate your style guide

Mac (Sep 29 2021 at 05:19):

I am aware. :(

Mac (Sep 29 2021 at 05:19):

It has caused me much pain.

Scott Morrison (Sep 29 2021 at 05:20):

And in mathlib3 there are at least 1400 occurrences. :-)

Mac (Sep 29 2021 at 05:20):

My only exception to this rule would be the the binder meant to be used with dot notation (which I would demand be named self).

Mario Carneiro (Sep 29 2021 at 05:21):

Regarding determining the parameters of a def, I use #print for that

Mac (Sep 29 2021 at 05:24):

@Mario Carneiro while I like Lean's interactivity, having to go right a #print statement while deep in a function is not fun.

Mac (Sep 29 2021 at 05:24):

And is also impossible when I am viewing Lean code in a non-interactive manner.

Mario Carneiro (Sep 29 2021 at 05:25):

well, one day we will have #print at expression/tactic scope

Mac (Sep 29 2021 at 05:25):

Honestly, I would much prefer to see it at an editor level through a command.

Mario Carneiro (Sep 29 2021 at 05:25):

hover already fills that niche to a great extent

Mac (Sep 29 2021 at 05:26):

@Mario Carneiro not when there are elaboration errors in your code. :(

Mac (Sep 29 2021 at 05:26):

which is quite common when writing a function.

Mac (Sep 29 2021 at 05:27):

and even more common when said error is the thing you are trying to fix by verifying types/defs. XD

Mario Carneiro (Sep 29 2021 at 05:28):

usually if lean has enough understanding to give you a type error at a function application, it will be able to give you a hover on the function

Mac (Sep 29 2021 at 05:30):

yeah, but such types are generally less than useful as they are some intermediate output of the inference engine (i.e., with metavariables and dependent types) and thus unreadable to my unexpert eyes.

Mac (Sep 29 2021 at 05:30):

also, if the problem is syntactic hovers may not even exist.

Mario Carneiro (Sep 29 2021 at 05:32):

when I hover on a constant name, the output looks like the original type of the constant, before unification

Mario Carneiro (Sep 29 2021 at 05:33):

although I would really like hovering on the name of a definition to also work and act just like hovering on a use of that definition

Mario Carneiro (Sep 29 2021 at 05:33):

that would solve your parameters issue pretty well

Mac (Sep 29 2021 at 05:33):

Oh, you mean the actual function itself, in that case, yeah, but that is rarely my problem.

Mario Carneiro (Sep 29 2021 at 05:34):

as in, if you are looking at some def foo that picks up some unknown set of variables above, you can hover on foo to find out the actual type

Mac (Sep 29 2021 at 05:34):

Mario Carneiro said:

that would solve your parameters issue pretty well

I think you greatly underestimate how much Lean code I read on GitHub.

Mac (Sep 29 2021 at 05:34):

Also, I have been having major problems getting Lean to properly find definitions when browsing its source at all.

Mario Carneiro (Sep 29 2021 at 05:34):

I gave up on reading proof assistants from raw code long ago

Mac (Sep 29 2021 at 05:34):

Most of my go-to-defs find nothing.

Mario Carneiro (Sep 29 2021 at 05:35):

can you #mwe?

Mac (Sep 29 2021 at 05:35):

Note that this is only when browsing the cloned repo, not in the distribution acquired through elan.

Mac (Sep 29 2021 at 05:36):

@Mario Carneiro How does one #mwe an interactivity issue?

Mario Carneiro (Sep 29 2021 at 05:36):

there are interactive tests

Mac (Sep 29 2021 at 05:36):

I suspect this may be due to the fact I often update the code without rebuilding it. Since rebuilding still takes a good 15-30mins on my machine.

Mario Carneiro (Sep 29 2021 at 05:36):

but the usual way to do it here is to set up something reproducible and describe what you are doing and what you see

Mac (Sep 29 2021 at 05:40):

Okay, just to demonstrate how this works for me (and how I don't understand interactivity #mwe):

  • I open up my local copy of the lean4 repo in VSCode
  • I open up src/Lean/Parser.lean
  • Wait for elaboration to finish
  • Hover over Parenthesizer in ident.parenthesizer
  • Right-click 'Go to Definition'
  • Get "no definition found for 'Parenthesize'"

Mario Carneiro (Sep 29 2021 at 05:45):

That works for me (go-to-definition takes me to Parenthesizer); I have vscode, just updated to master, never built, vscode-lean4 v0.0.33

Mac (Sep 29 2021 at 05:46):

Does it work if you have an older build with the proper settings for elan? That is my case.

Mac (Sep 29 2021 at 05:47):

I have my elan set up in the folder as described here: https://leanprover.github.io/lean4/doc/dev/index.html#dev-setup-using-elan

Mario Carneiro (Sep 29 2021 at 05:48):

I just ran elan update leanprover/lean4:nightly, still works

Mac (Sep 29 2021 at 05:48):

I meant having overrides set for the repo root and src dirs

Mario Carneiro (Sep 29 2021 at 05:48):

I have elan set up to use nightly, not master

Mac (Sep 29 2021 at 05:49):

The dev setup for lean4 is suppose to look like the following (as described here):

# in the Lean rootdir
elan toolchain link lean4 build/release/stage1
elan toolchain link lean4-stage0 build/release/stage0
# make `lean` etc. point to stage1 in the rootdir and subdirs
elan override set lean4
cd src
# make `lean` etc. point to stage0 anywhere inside `src`
elan override set lean4-stage0

Mario Carneiro (Sep 29 2021 at 05:50):

I haven't actually built lean 4, the stage 0 stuff scares me off

Johan Commelin (Sep 29 2021 at 05:50):

Mac said:

Mario Carneiro Personally, if I were writing a style guide, I would ban explicit binders in variable, as it makes the parameters one needs to specify of a def (and there order) hard to figure out.

I think there are many places where it explicit binders in variables can be reasonably defended.

Example:

variables {X : Type} [base_class1 X] [base_class2 X] [base_class3 X] [mixin1 X] [mixin2 X] [mixin3 X] (x : X)

lemma API_lemma_1 : blabla x

lemma API_lemma_2 : blabla x

lemma API_lemma_3 : blabla x

lemma API_lemma_4 : blabla x

variables (X) -- make `X` explicit (but don't repeat all the typeclass assumptions, DRY)

def cool_construction [mixin4 X] : foobar X := somestuff involving  x : X, and API_lemma_4 from above.

variables {X} -- from now on `X` can be implicit again

Johan Commelin (Sep 29 2021 at 05:51):

This pattern is pretty common in mathlib, I think.

Mac (Sep 29 2021 at 05:52):

@Johan Commelin wouldn't the following just be better? (if it was supported)

variables {X : Type} [base_class1 X] [base_class2 X] [base_class3 X] [mixin1 X] [mixin2 X] [mixin3 X] (x : X)

lemma API_lemma_1 : blabla x

lemma API_lemma_2 : blabla x

lemma API_lemma_3 : blabla x

lemma API_lemma_4 : blabla x

-- modify the binder in the def for just this def
def cool_construction (X) [mixin4 X] : foobar X := somestuff involving  x : X, and API_lemma_4 from above.

Johan Commelin (Sep 29 2021 at 05:54):

If that were supported, that would be quite useful. But somehow it has to work together with introducing fresh variables that can typically shadow names that were introduced by variables before. I don't have a good sense for how that would work. (Because I don't know anything about language design at all.)

And nevertheless. Sometimes you want to make X explicit for 8 declarations in a row, so you put those in a section and add variables (X) to that section.

Johan Commelin (Sep 29 2021 at 05:55):

Note that there are also very reasonable situations where you want to make write variables (R A B M N) and make 5 variables explicit for the next 7 declarations.

Mac (Sep 29 2021 at 05:57):

Johan Commelin said:

hat can typically shadow names that were introduced by variables before.

I know that is how it currently but I think being able to shadow a binder from variable is just more confusing. I view the idea behind variable being that, within, this section x is X. If that is not the case, I don't really feel like the def should be in the same section.

Mac (Sep 29 2021 at 06:00):

Johan Commelin said:

write variables (R A B M N) and make 5 variables explicit for the next 7 declarations.

Where? Having multiple functions with the same 5 explicit variables (and 5 explicit arguments in the first place) is a bit concerning. Generally, if a function is taking that many explicit parameters, you should probably be passing a struct with said fields instead.

Though I guess one reasonable exception / use case I can imagine would be explicit binders with defaults that are meant to be used only by named argument syntax. In such cases, the order is unimportant and you can still use the function without specifying them (so shares more in common with an inferred binder than with other explicit binders) . However, unfortunately, since variable as its arguments at the start this is usually not possible.

Even if it were, though, I don't wouldn't the fact that you can't tell from the definition which arguments are to be specify to it and which it infers or otherwise inherits from the surrounding scope. But I would be less opinionated on such cases.

Scott Morrison (Sep 29 2021 at 06:05):

@Mac, for context, there are many examples in mathlib where we change many variables binders at once, e.g. https://github.com/leanprover-community/mathlib/blob/master/src/algebra/lie/tensor_product.lean#L77

Scott Morrison (Sep 29 2021 at 06:07):

(That one appears to be a record, but there are lots where we change 3 or 4 at a time.)

Mac (Sep 29 2021 at 06:07):

The expression lift R L M N P f (m ⊗ₜ n) just hurts me.

Mac (Sep 29 2021 at 06:09):

However, I suspect this has more to do with the nature of mathlib. I suspect that you rarely actually do apply those arguments explicitly and instead apply them through notation.

Scott Morrison (Sep 29 2021 at 06:10):

Three modules, a ring, a lie algebra, a linear map, and two vectors! Just maths. :-)

Johan Commelin (Sep 29 2021 at 06:10):

you should probably be passing a struct with said fields instead.

but without eta for records, this can cause defeq issues down the road, rigth? (just one objection)
Also, this might be nice while setting up the API for that particular theory, but when applying it, users would constantly have to bundled up random types into a record, instead of just passing them directly to the functions.
In 50% of the applications, unification can figure out the types, so you just write an _. But for those other 50%, we're really happy that the argument has explicit binding.

Johan Commelin (Sep 29 2021 at 06:11):

Mac said:

However, I suspect this has more to do with the nature of mathlib. I suspect that you rarely actually do apply those arguments explicitly and instead apply them through notation.

I'm not so sure. Things like lift R L M N P f or very similar things are certainly not an uncommon thing in mathlib

Mac (Sep 29 2021 at 06:12):

Johan Commelin said:

In 50% of the applications, unification can figure out the types, so you just write an _. But for those other 50%, we're really happy that the argument has explicit binding.

In that case, wouldn't those arguments work better as implicit arguments (that are some times specified by name)? Since Lean 3 lacked named arguments, maybe that is why such formulations were so popular?

Johan Commelin (Sep 29 2021 at 06:13):

I agree that this is now a new feature that is quite exciting. And we'll have to do experiments to find out what the right balance is in using it.

Johan Commelin (Sep 29 2021 at 06:13):

_ is really short, so it makes "a bit too much explicit variables" really cheap.

Johan Commelin (Sep 29 2021 at 06:14):

If something is implicit, and 50% of the time you have to write (base_ring := R), this will quickly make me wish that R was explicit.

Johan Commelin (Sep 29 2021 at 06:15):

But certainly, named variables provide a new potential!

Mac (Sep 29 2021 at 06:16):

@Johan Commelin Also, I think my rules are more geared towards CS than theorem proving. In theorem proving you are likely to have a lot of parameters, whereas in CS, you tend to have relatively few.

Johan Commelin (Sep 29 2021 at 06:17):

Sure, I guess we can different variations in style guide for different areas

Mac (Sep 29 2021 at 06:18):

For instance, there are many places variable is used for defs that have one or two parameters, and thus it makes it hard to distinguish them from defs with none (and the same would be true in end-user code).

Johan Commelin (Sep 29 2021 at 06:19):

Just for the record: here is how you say "let MM be a manifold with corners" in mathlib-speak:

variables {ι : Type }
{E : Type uE} [normed_group E] [normed_space  E] [finite_dimensional  E]
{H : Type uH} [topological_space H] {I : model_with_corners  E H}
{M : Type uM} [topological_space M] [charted_space H M] [smooth_manifold_with_corners I M]

Mac (Sep 29 2021 at 06:20):

This is why I feel such explicit binders are not really explicit binders in theorem proving. They are things that on a paper proof would be implicit, but are made explicit mostly due to the limitations / requirements of the formalization.

Mac (Sep 29 2021 at 06:21):

Thus it makes much more since for them to be tossed in a variable header (as that is how it would appear in the paper proof).

Mac (Sep 29 2021 at 06:25):

In CS, on the other hand, the explicit binders are what you are expecting to see in the compiled functions signature. Thus, I think it is reasonable to expect to see them in signature of the def being compiled. The implicit / synthetic binders, however, are abstract fluff that is (usually) compiled away and therefore not part of the signature. Thus, it makes sense to stick them in some variable header (as they have to do with the abstraction that section is dealing with).


Last updated: Dec 20 2023 at 11:08 UTC