Zulip Chat Archive

Stream: lean4

Topic: Nested-inductive-safe types


Mario Carneiro (Apr 17 2024 at 16:00):

Inspired by @Markus Himmel's PR std4#748, I'd like to talk about how we should approach the issue of types being used in nested inductives, in the manner of:

inductive Foo
  | mk : MyDataStructure Foo -> Foo

Let's say that we are making MyDataStructure and want users to be able to write the above. In this case:

  • MyDataStructure must be an inductive type or definition wrapping one. (In particular it cannot be opaque)
  • it must not contain any proofs, subtypes, or well formedness assumptions in it, because these use the value negatively

It is possible to encode a number of well formedness properties using only inductive types. I once saw a Coq talk on it. But my takeaway is that this is an extraordinarily painful road to take, and lean has been steadily evolving in the direction of more abstraction hiding and less reliance on definitional equality (e.g. using proofs by well founded recursion).

Nested inductives, as they are currently designed, are fundamentally abstraction breaking. If you write an inductive such as the one above, the Foo.rec theorem will write out all the details of every inductive type involved in the definition of MyDataStructure. This does not have to be the case; the way inductives are handled in Isabelle, adapted to lean with the QPF package, uses something closer to typeclasses asserting that MyDataStructure is suitably functorial, and does not involve unwrapping any definitions.

It is undeniably a use case that people will want these kind of inductive types, but I am not sure following that thread will take us in a good direction. I'm curious how other people think we should approach this issue in the standard library and elsewhere.

Joachim Breitner (Apr 17 2024 at 16:18):

I had serious trouble recently because I could not nest under IO, even though if unrolled completely, it would be a positive occurrence, I believe, and had to work around it. So more flexibility here might have helped me.

Mario Carneiro (Apr 17 2024 at 16:23):

ah, so actually my "or definition wrapping one" is not true

inductive Foo -- ok
  | mk : List Foo  Foo

abbrev List' := List
inductive Foo' -- fail
  | mk : List' Foo'  Foo'

Mario Carneiro (Apr 17 2024 at 16:30):

Joachim Breitner said:

I had serious trouble recently because I could not nest under IO, even though if unrolled completely, it would be a positive occurrence, I believe, and had to work around it. So more flexibility here might have helped me.

This is another case where if we had gone this way it would end up with backward incentives, since if we make IO nested-inductive-safe (and promise this for the future), then we can't also make it opaque

Markus Himmel (Apr 18 2024 at 10:12):

@Mario Carneiro For something like HashMap, I would want to consider the following things to be the public API:

  • That there is a type HashMap that is not necessarily safe to use in nested inductives
  • That there is a type HashMap.Imp (perhaps deserving to be renamed) that is safe to use in nested inductives
  • All of the operations on HashMap and HashMap.Imp like insert, findEntry?, etc.
  • That there is a predicate HashMap.Imp.WF that is preserved by the operations that modify the hash map
  • All of the (mostly yet-to-be-developed) lemmas for HashMap.Imp and HashMap, with the HashMap.Imp lemmas taking an extra HashMap.Imp.WF
  • (Maybe) an equivalence between HashMap and { m : HashMap.Imp // m.WF } that is compatible with all of the operations

There are other things that I would not consider to be part of the public API, for example

  • The definitions of HashMap.Imp and HashMap
  • Things like HashMap.rec and HashMap.Imp.rec
  • Helper functions like reinsertAux

I agree that making sure that HashMap.Imp is usable in nested inductives now and in the future is somewhat annoying given what is currently possible in Lean. I also agree with you that today it is not possible to prevent users from relying on the things that I do not consider to be part of the public API on a technical level. I do however think that as long as we clearly communicate our promises and non-promises in the documentation, it's acceptable to be abstraction-breaking on a technical level, and if users choose to ignore that, they will have to adapt their developments. I also think that it is worth the effort to support usability of our datastructures in nested inductives starting now, and that this does not prevent us from taking advantage of better technical solutions like the ones you referenced, should they become available in the future.

Timo Carlin-Burns (Apr 18 2024 at 16:35):

Wouldn't it be preferable if HashMap was safe to use in nested inductives and HashMap.Imp could remain an implementation detail?

Mario Carneiro (Apr 18 2024 at 17:04):

Markus Himmel said:

There are other things that I would not consider to be part of the public API, for example

  • Things like HashMap.rec and HashMap.Imp.rec

Note that I didn't say MyDataStructure.rec in the above description, I said Foo.rec. When someone writes a nested inductive involving HashMap.Imp, the definition of HashMap.Imp is made visible as part of Foo.rec, and I regard this as a major privacy violation given that we can't reasonably ask people not to use Foo.rec (how else would they make use of the nested inductive they just defined?), but it still contains basically the same information as HashMap.Imp.rec (which we want to be private).

Mario Carneiro (Apr 18 2024 at 17:10):

I also think that it is worth the effort to support usability of our datastructures in nested inductives starting now, and that this does not prevent us from taking advantage of better technical solutions like the ones you referenced, should they become available in the future.

When this means things like "definitions are not allowed", I just can't bring myself to agree with this. Nested inductives are a poorly supported feature in lean already, and they have some structural problems. Bending over backwards to support them in our data structures is going to actively make them worse along other axes and also will take more work to revert once we have a better mechanism that doesn't leak these details and will just allow people to write inductive Foo | mk: HashMap A Foo -> Foo which is what they want anyway.

Markus Himmel (Apr 19 2024 at 07:26):

Mario Carneiro said:

When someone writes a nested inductive involving HashMap.Imp, the definition of HashMap.Imp is made visible as part of Foo.rec, and I regard this as a major privacy violation given that we can't reasonably ask people not to use Foo.rec (how else would they make use of the nested inductive they just defined?), but it still contains basically the same information as HashMap.Imp.rec (which we want to be private).

This is a good point, however I would expect that almost everyone uses the equation compiler/pattern matching instead of invoking Foo.rec directly, and with that you only get exposed to the internals of the data structure if you actively force it. So I think this is a non-issue in almost all practical applications.

Markus Himmel (Apr 19 2024 at 07:33):

Mario Carneiro said:

I also think that it is worth the effort to support usability of our datastructures in nested inductives starting now, and that this does not prevent us from taking advantage of better technical solutions like the ones you referenced, should they become available in the future.

When this means things like "definitions are not allowed", I just can't bring myself to agree with this. Nested inductives are a poorly supported feature in lean already, and they have some structural problems. Bending over backwards to support them in our data structures is going to actively make them worse along other axes and also will take more work to revert once we have a better mechanism that doesn't leak these details and will just allow people to write inductive Foo | mk: HashMap A Foo -> Foo which is what they want anyway.

I agree that the current state of what's possible in Lean is not great, and I think that it is a very good idea to explore ways to make this better. However, at least for the map-like data structures in Std, it's really not that bad: docs#Std.RBNode already works in nested inductive types without any changes, and the changes in std4#748 seem innocous enough to me.

Mario Carneiro (Apr 19 2024 at 09:38):

Markus Himmel said:

Mario Carneiro said:

When someone writes a nested inductive involving HashMap.Imp, the definition of HashMap.Imp is made visible as part of Foo.rec, and I regard this as a major privacy violation given that we can't reasonably ask people not to use Foo.rec (how else would they make use of the nested inductive they just defined?), but it still contains basically the same information as HashMap.Imp.rec (which we want to be private).

This is a good point, however I would expect that almost everyone uses the equation compiler/pattern matching instead of invoking Foo.rec directly, and with that you only get exposed to the internals of the data structure if you actively force it. So I think this is a non-issue in almost all practical applications.

I agree to an extent, but it is frequently the case that this is insufficient; we are still missing a lot of tools needed to be able to make definitions over such nested inductives such that well founded recursion works correctly. As long as it's all partial functions there isn't too much issue though.


Last updated: May 02 2025 at 03:31 UTC