Zulip Chat Archive

Stream: lean4

Topic: Elaboration performance large inductive types


Christian Pehle (Sep 05 2021 at 19:32):

I'm autogenerating bindings for libclang and ran into a performance cliff for elaboration of inductive types: Elaboration of a inductive datatype with ~150 constructors takes > 30 seconds. The CXCursorKind enum of libclang has 271 entries alone. Is that a known problem?

Sebastian Ullrich (Sep 05 2021 at 20:45):

Try with set_option genInjectivity false

Leonardo de Moura (Sep 06 2021 at 00:10):

I added the command enum for declaring big enumeration types. It is just a macro based on Fin.

enum Foo where
  | a | b | c

def f : Foo  Nat
  | Foo.a => 10
  | Foo.b => 20
  | Foo.c => 35

#check Foo.a == Foo.b -- false

#eval Foo.b -- Foo.b

Foo in the example above is defined as

structure Foo when
  val : Fin 3

@Christian Pehle Your example with 271 entries can be elaborated in 0.3s on my machine. Most of the time is spent on the Repr instance that is automatically generated. I am currently generating it eagerly because deriving instance Repr for Foo would generate an instance that produces values such as { val := 0 } instead of Foo.a.

Mac (Sep 06 2021 at 00:49):

@Leonardo de Moura since enum is a wrapper around Fin, doesn't that mean the resulting type is no longer unboxed?

Mac (Sep 06 2021 at 00:53):

@Christian Pehle this may be important to your use case, as you may no longer be able to directly pass the enum type directly to the Clang bindings (if you were doing so).

Leonardo de Moura (Sep 06 2021 at 00:58):

@Mac Yes, the compiler currently compiles Fin n as Nat even when n is known at compilation time and small. However, this is not an obstacle for invoking the clang API. @Christian Pehle can use lean_unbox to convert the enum into a C scalar value.

Mac (Sep 06 2021 at 01:04):

@Leonardo de Moura true! Just wanted to point out that potential pitfall for @Christian Pehle.

Leonardo de Moura (Sep 06 2021 at 01:08):

@Mac If this becomes too annoying in practice, I can modify the enum macro, and use

structure Foo when
  val : UInt32
  h : val < 3

Mac (Sep 06 2021 at 01:34):

@Leonardo de Moura I wonder if such an approach would also allow for the numeric mapping to be specified in the the enum. A lot of enums in C/C++ libraries have specific encodings for their enum fields (e.g., CallingConv in LLVM), which would be nice to reflect in Lean. That was a key feature of the enum macro I had previously written for Papyrus (though that one was not type safe).

Leonardo de Moura (Sep 06 2021 at 02:32):

@Mac It seems your macro doesn't cover exhaustiveness. That is, it will fail at

enum Foo where
  | a | b | c

def f : Foo  Nat
  | Foo.a => 10
  | Foo.b => 20
  | Foo.c => 35

It is feasible to extend your macro to support exhaustiveness, but this is the kind of work I am not looking forward to. For example, a big disjuction would be too inefficient. I think your macro is useful for interfacing with C APIs, we could call it c-enum or something like that.

Mac (Sep 06 2021 at 02:47):

@Leonardo de Moura yeah, that was sort of what I was alluding to when I said the macro was "not type safe". And, admittedly, making it support exhaustivity was something I was not looking forward to either :laughing:.

In fact, it is why I have only used it for CallingConvention and not many of the other LLVM enums with explicit values (e.g., EngineKind), because unlike them, CallingConvention is not actually an exhaustive enum in LLVM (extensions can define their own additional calling conventions within its range). Thus, in the particular case if CallingConvetion, the lack of exhaustivity was a feature not a bug.

Christian Pehle (Sep 06 2021 at 09:31):

This is great @Leonardo de Moura! For automatic code generation the use of Nat is not much of a problem, I think. It might make sense to let the user optionally decide the encoding type and even values of the enum alternatives, just like in C++. Together with the unbox trivial unparameterised structure feature this would eliminate some of the boilerplate on the C++ side. (Edit: I just saw that this is what @Mac has done).


Last updated: Dec 20 2023 at 11:08 UTC