Zulip Chat Archive

Stream: lean4

Topic: lake: generating FFI bindings


György Kurucz (Oct 14 2023 at 15:20):

I haven't looked too deep into lake's documentation/source code yet, just wondering whether there is already an example configuration for this somewhere that I could reference.

I would like to have a two-step build process, where:

  • Some lean code gets compiled.
  • It gets run, and it outputs a .lean and a .c file.
  • These are then compiled and linked to get the final binary.

My use-case is to generate FFI bindings for a somewhat large API.

Henrik Böving (Oct 14 2023 at 15:23):

https://github.com/tydeu/lean4-alloy this exists and I made this with it: https://github.com/hargoniX/socket.lean.

Generating API bindings could get a little harder but in theory :tm: alloy is capable of parsing C code right. So maybe it's possible to write some other clever stuff around it to use it for generating FFI with it. Idk if that is a use case @Mac Malone intended?

Mac Malone (Oct 14 2023 at 18:01):

Henrik Böving said:

Generating API bindings could get a little harder but in theory :tm: alloy is capable of parsing C code right.

I am not sure by generate if @György Kurucz's meant to parse the C code as that is not part of the lists steps. I think they may have meant generate in the sense of some form of Lean templating since they suggest outputting both a .lean and a .c file. On that point, Lean rarely needs templating because the metaprogramming facilities are strong enough that you can dynamically generate the relevant definitions in step 1 (when the Lean code is compiled). Alloy is designed to the other half of step 2 (output a .c file and compile and link it with the Lean output).

If you do want to parse C code, Alloy is capable of doing so (at least to some extent -- e.g., it can verifiable parse lean.h), but I have no knowledge of this kind of binding generation being tested with Alloy or anything else.

György Kurucz (Oct 14 2023 at 18:19):

Thanks for the pointers!

If you do want to parse C code

no parsing of C code is involved, I am in fact generating everything from an XML file...

Lean rarely needs templating

True, and I just recently learned that IO can be used during metaprogramming, so I could just read my XML file and generate the relevant definitions

Alloy is designed to the other half of step 2

Is there a way to feed Alloy my generated C functions?

Henrik Böving (Oct 14 2023 at 18:21):

Generally speaking you can always construct Syntax objects for Alloy commands and elabCommand them from your elaborator in order to get what you want.

György Kurucz (Oct 14 2023 at 18:33):

Hm does Alloy help with marshaling C types to Lean types beyond what Lean provides by default? A large part of my custom generated C shim is just for unpacking/packing Lean structures to C structs, so if I could simplify that using Alloy that would be very interesting for me.

György Kurucz (Oct 14 2023 at 18:35):

Generally speaking you can always construct Syntax objects for Alloy commands

I would rather avoid round-tripping my C String to Alloy's AST representation and then back again to a string, unless Alloy is helping me with something.

Mac Malone (Oct 14 2023 at 23:55):

György Kurucz said:

Hm does Alloy help with marshaling C types to Lean types beyond what Lean provides by default?

Yes, Alloy provides alloy c enum and alloy c extern_type as compact ways to declare Lean types backed by C values. Though it is hard to know how much they would help in your particular use case without more information about it.

György Kurucz said:

I would rather avoid round-tripping my C String to Alloy's AST representation and then back again to a string, unless Alloy is helping me with something.

This roundtrip does allow Alloy C to add metadata relevant to editor features (e.g., go-to-def, etc..) for the generated C code. However, that may or may not be of use in your case. You can also add code directly to Alloy's C shim like so:

import Alloy.C
open Alloy Lean Elab Command

def addCode  (code : String) : CommandElabM Unit := do
  modifyEnv fun env => C.shimExt.modifyState env fun shim =>
    {shim with text := FileMap.ofString <| shim.text.source ++ code}

However, I have not tested whether this breaks assumptions Alloy makes about the structure of the C code (but I don't think it should).

Mac Malone (Oct 15 2023 at 00:01):

Like Henrik, I would personally suggest constructing Syntax rather the a string for C code. It at least two major advantages that I can think of at the moment:

  1. It ensures the C snippets you write are well-formed syntax (no hard to debug errors due to typos).
  2. You can use Lean's metaprogramming features to trace, debug, and otherwise analyze the C code being produced.

György Kurucz (Oct 15 2023 at 07:26):

Yes, Alloy provides alloy c enum and alloy c extern_type

I want to recursively pack/unpack C structs and arrays, so unfortunately I don't think these help much. For example:

typedef void *some_opaque_pointer;
struct X {
  uint32_t a;
  float b; // yes, not double unfortunately
  uint32_t len_c;
  char **c;
  some_opaque_pointer p;
  struct Y *y;
};
struct Y {
  uint32_t x, y;
};

Should be translated to:

opaque SomeOpaquePointer : Type := Unit
structure Y where
  x : UInt32
  y : UInt32
structure X where
  a : UInt32
  b : Float
  c : Array String
  p : SomeOpaquePointer
  y : Option Y

You can also add code directly to Alloy's C shim like so

Thanks, this should be useful for some quick-and-dirty testing to see whether Alloy works at all for my use-case, but I understand that you recommend generating Syntax as a proper solution.

Anyway, in a week or two I might post my results here on Zulip once I have a minimum viable example working, maybe then people can chime in with any better ideas for how large C API bindings should be done.

Utensil Song (Oct 15 2023 at 08:17):

Alloy provides a C to C solution, i.e. write C in Lean then generate IR in C.

Reading the topic, it seems @György Kurucz needs to create a binding to something, but I can't really see whether it's FFI or reverse FFI yet, probably the former.

But I'm also thinking about a Lean to C++ solution for bindings, i.e. write Lean and then generate some glue code in C++ body with extern C interface. I just don't know if the native ability of Lean of generating IR fits in the picture, as Lean only generates C not C++.

I'm also OK with a C++ to C++ solution, i.e. write (a subset of) C++ in Lean and then emit the C++ body as is then surround it with generate C code, then send to C++ compiler.

Utensil Song (Oct 15 2023 at 08:20):

To be exact, I'm thinking about a Lean equivalent of Unsafe Rust.

Utensil Song (Oct 15 2023 at 08:59):

Ah, I haven't been reading @György Kurucz 's C struct to Lean structure example carefully enough, I think they are exactly the missing pieces in alloy but they belong to alloy.

György Kurucz (Oct 15 2023 at 10:55):

Hm @Mac Malone is this on the longer term roadmap for Alloy? (Though handling all the ambiguous cases is far from trivial, e.g. when to translate a pointer to Option, or how to handle the lengths of arrays.)

Mac Malone (Oct 15 2023 at 19:31):

@György Kurucz Looking at your example above, I am unclear on how you wish to "pack/unpack C structs ". Since Lean objects are reference counted and follow a strict ABI, a pointer for e.g. the C struct X cannot be directly casted into the Lean structure X and vice-versa. Are you expecting to deallocate the C struct and allocate the Lean struct and vice versa each time they pass through the FFI interface (which seems like a significant overhead)? Or do you have something else in mind?

György Kurucz (Oct 15 2023 at 20:04):

@Mac Malone

Are you expecting to deallocate the C struct and allocate the Lean struct and vice versa each time they pass through the FFI

Yes exactly, I don't see any other way. (For my use-case I don't expect this to be a bottleneck. And if it ever becomes one, you can always just push more of the logic to the C side until the FFI boundary is no longer the hotspot.)

Continuing the above example, this is roughly what I would want (and also the reverse):

#include <lean/lean.h>
#include <stdlib.h>

typedef void *some_opaque_pointer;
struct X {
  uint32_t a;
  float b;
  uint32_t len_c;
  const char **c;
  some_opaque_pointer p;
  struct Y *y;
};
struct Y {
  uint32_t x, y;
};

extern void f(struct X);

/* X -> IO Unit */
LEAN_EXPORT lean_obj_res glue_f(b_lean_obj_arg x, b_lean_obj_arg w) {
  lean_object *c = lean_ctor_get(x, 0);
  uint32_t len_c = lean_array_size(c);
  const char **unmarshaled_c = calloc(sizeof(char *), len_c); // TODO: who owns?
  for (uint32_t i = 0; i < len_c; ++i) {
    // TODO: copy or borrow?
    unmarshaled_c[i] = lean_string_cstr(lean_array_get_core(c, i));
  }

  // TODO: not the only way to store pointers
  some_opaque_pointer unmarshaled_p =
      (some_opaque_pointer)lean_unbox(lean_ctor_get(x, 1));

  lean_object *y = lean_ctor_get(x, 2);
  bool y_is_some = !lean_is_scalar(y);
  struct Y unmarshaled_y;
  if (y_is_some) {
    unmarshaled_y.x = lean_ctor_get_uint32(y, 0);
    unmarshaled_y.y = lean_ctor_get_uint32(y, 4);
  }

  struct X unmarshaled_x = {
      .a = lean_ctor_get_uint32(x, 0 + 3 * sizeof(lean_object *)),
      .b = (float)lean_ctor_get_float(x, 4 + 3 * sizeof(lean_object *)),
      .len_c = len_c,
      .c = unmarshaled_c,
      .p = unmarshaled_p,
      .y = y_is_some ? &unmarshaled_y : NULL,
  };

  f(unmarshaled_x);

  free(unmarshaled_c);

  return lean_io_result_mk_ok(lean_box(0));
}

(There surely is a bug in this somewhere, writing this by hand is quite tricky, hence why I am looking for an alternate solution.)

Mac Malone (Oct 15 2023 at 21:01):

@György Kurucz In that case, I can say that this kind of thing is on the roadmap for Alloy. In fact, this discussion has given me a number of ideas, so it is likely on the near-term roadmap. That is, there was something I was already working on for Alloy, but this would be my next endeavor (and I switch to this sooner given some difficulties I am having with the current thing I was working on). However, when I will have time to work on this is different question that I do not know the answer for (given that Alloy is not one of things I am currently being paid to work on). Nonetheless, I do not expect it to be too far off in the future.

Alloy also already has some tools for this. For example, there is the notion of a "translator" (alloy c translator) which registers of_lean and to_lean definitions for marshalling and unmarshalling a C type to and from Lean (which was inspired by @Henrik Böving's work on socket.lean). However, what Alloy is missing is the metaprogramming utilities to automatically generate such definitions for a structure.

Mac Malone (Oct 15 2023 at 21:03):

Also, with a smart generator it may also be possible follow the pattern of Float or Array in core to embedded the C pointer in the Lean structure without having to marshal and unmarshall every field eagerly.

Mac Malone (Oct 15 2023 at 22:53):

Here is a rather long-winded example of what I mean by an optimized representation (and which could be cleaned up and shorten with some polish, a Lean fix, and additional Alloy utilities):

import Alloy.C
open scoped Alloy.C

alloy c include <lean/lean.h>

alloy c section
typedef struct {
  uint32_t n, m;
} Y;

static lean_external_class * g_class_raw_Y = NULL;

static inline void Y_finalize(void* ptr) {
  free((Y*)ptr);
}

static inline void noop_foreach(void *mod, b_lean_obj_arg fn) {}

static inline lean_obj_res Y_to_Lean(Y* y) {
  if (g_class_raw_Y == NULL) {
    g_class_raw_Y = lean_register_external_class(Y_finalize, noop_foreach);
  }
  return lean_alloc_external(g_class_raw_Y, y);
}

static inline Y* Y_of_lean(lean_obj_arg y) {
  return (Y*)(lean_get_external_data(y));
}
end

structure PureY where
  n : UInt32
  m : UInt32

structure RawY where
  data : PureY
  -- make the structure non-trivial
  -- this is unsafe hack around lean4#2292 for testing purposes
  unit : Unit

alloy c translator RawY := {
  toLean := `Y_to_Lean
  ofLean := `Y_of_lean
}

alloy c section
LEAN_EXPORT lean_obj_res raw_Y_mk(lean_obj_arg y, lean_obj_arg u) {
  Y* rawY = malloc(sizeof(Y));
  rawY->n = lean_ctor_get_uint32(y, 0);
  rawY->m = lean_ctor_get_uint32(y, sizeof(uint32_t));
  return to_lean<RawY>(rawY);
}

LEAN_EXPORT uint32_t raw_Y_n(lean_obj_arg y) {
  return of_lean<RawY>(y)->n;
}
LEAN_EXPORT uint32_t raw_Y_m(lean_obj_arg y) {
  return of_lean<RawY>(y)->m;
}
end

attribute [extern "raw_Y_mk"] RawY.mk

@[extern "raw_Y_n"] def RawY.n (y : RawY) := y.data.n
@[extern "raw_Y_m"] def RawY.m (y : RawY) := y.data.m

def RawY.dataImpl (y : RawY) : PureY :=
  {n := y.n, m := y.m}

theorem RawY.dataImpl_eq_data (y : RawY) : y.data = y.dataImpl := rfl

attribute [implemented_by RawY.dataImpl] RawY.data

set_option trace.compiler.ir.result true in
def addRawY (y : RawY) :=
  y.n + y.m

set_option trace.compiler.ir.result true in
def addPureY (y : PureY) :=
  addRawY <| .mk y () -- due to lean4#2292, will optimize away `mk` w/o Unit

def test' :=
  addPureY {n := 4, m := 2}

Utensil Song (Oct 16 2023 at 02:35):

It would be even better if the LEAN_EXPORT glue code can have a safe default implementation for most of the parameter and return value combinations of primitive types, the opaque types, the structure types (that can be accessed and proven but passes like the opaque types in and out of bindings).

Mac Malone (Oct 16 2023 at 03:11):

@Utensil Song Yeah, the eventual goal would be to have Alloy generate all of this automatically.

Utensil Song (Oct 16 2023 at 03:17):

Would alloy support the generated body containing C++ code? Is it belong to alloy or do you see it a potential project that builds on alloy? Is it technically compatible? I haven't figured out whether I can use Lean to generate C++ code yet.

Utensil Song (Oct 16 2023 at 03:23):

I did some experiments on parsing C++ headers using libclang (with Python), but it seems that I should use Lean to do it instead (except maybe I need to first bootstrap a Lean libclang binding with Python then all following work can be done in Lean).

Mac Malone (Oct 16 2023 at 03:30):

@Utensil Song A C++ grammar DSL is in the roadmap for Alloy. But someone else may get to writing one (i.e., a Lean C++ grammar) first.

Mac Malone (Oct 16 2023 at 03:32):

My thought process is that is a good idea to stabilize all the features for C bindings before duplicating/generalizing them for C++.

Utensil Song (Oct 16 2023 at 03:52):

Mac Malone said:

My thought process is that is a good idea to stabilize all the features for C bindings before duplicating/generalizing them for C++.

Sure, this is a better and pragmatic approach.

Mac Malone said:

Utensil Song A C++ grammar DSL is in the roadmap for Alloy. But someone else may get to writing one (i.e., a Lean C++ grammar) first.

Nice, then I'll start experimenting the (quite small) subset I need (for bindings) with alloy.

A full set is a completely different story and I can't imagine the use case yet (except for parsing the header, but most stuff in the body can be safely ignored for this purpose, there is only so much grammar on the data type and API surface).

Utensil Song (Oct 16 2023 at 05:40):

It turns out easier than I thought, here is the quick-and-dirty POC that now builds with lake -R build Test -v, I added only new, delete, extern "C" and made a simple alloy cpp extern (sorry for the ad hoc intrusive modifications).

György Kurucz (Oct 16 2023 at 13:05):

A full set is a completely different story

Yeah just fair warning the C++ grammar is Turing-complete, you probably don't want to implement the whole thing, just stick to a well-defined subset. (Though you might not be able to, e.g. if you want to bind some library that has non-trivial template metaprograms.)

Utensil Song (Oct 16 2023 at 14:20):

I have more thoughts on this here. In my imagination, the end-user code might look like this (conceptually, not necessarily syntactically):

alloy cpp section

with "xyz"
    from <string> import xyz.string as Xyz.String
    open Xyz

def reverse (str : String) : String :=
    let s := String.mk str
    s.reverse

end

where "xyz" is the library name (and linkage), <xyz/string> is the header file, xyz is the namespace, string is the class name, String is the Lean type name, reverse is the method name, s.reverse is the call site where the end-user can be prompted by code completion to call non-existing Lean String.reverse then everything is automatically generated. When we finish the line, we have called into the C++ xyz::string::reverse method.

Utensil Song (Oct 16 2023 at 14:25):

György Kurucz said:

A full set is a completely different story

Yeah just fair warning the C++ grammar is Turing-complete, you probably don't want to implement the whole thing, just stick to a well-defined subset. (Though you might not be able to, e.g. if you want to bind some library that has non-trivial template metaprograms.)

Yes, I do have one such header only library in mind, which uses C++20 concept, constexpr etc. But I'm not touching it (from Lean) anytime soon.

Utensil Song (Oct 17 2023 at 10:59):

It seems to me that using alloy to create a binding still has much boilerplate, even compared to raw Lean FFI, compare my libclang binding POC using alloy to the GiNaC binding POC( lean part, C++ part, which follows EigenLean to use C++ template to abstract most of the boilerplate away, and this part is preferable to be done on the Lean side), significant amount of boilerplate have to be written for each type, a lot of the copy-paste of the types name are involved.

Utensil Song (Oct 17 2023 at 11:16):

For a binding, usually there would be dozens of types involved, naturally one wishes to treat the type opaque at first, with the least of hassles, then gradually add Lean-side fields to it, and these fields are not necessarily a 1:1 mapping of the corresponding C/C++ type, might be just a selected few of interested fields and some extra Prop fields. For now, alloy requires declaring the Lean type (e.g. Index in the libclang example) in C struct, which is the opposite of what I'm wishing for: I wish to omit writing the C struct completely, and declare everything in Lean.

Sebastian Ullrich (Oct 17 2023 at 12:32):

We want to improve the built-in FFI in the future. An issue collecting use cases, especially with comparisons to other tools and languages, would be very useful I think.

Utensil Song (Oct 17 2023 at 15:25):

Most of the improvements I mentioned can actually be implemented in alloy, the language capabilities are already there, one exception is lean4#2292 .

Sorry if I sound a bit too critical in this, that's only because I'm seeing so much potential inspired by the code generation design of Lean and alloy, and I'm interested in making some of these changes happen if they are aligned with the roadmap because I need them and implementing them is a better use of time than work around them.

My motivating libraries are:

  • GiNaC, C++, for symbolic calculation
  • Libclang, C, for generating C++ binding for GiNaC

They both have massive API surface, and initial surveys show potential to make binding them in Lean more effective.

György Kurucz (Oct 31 2023 at 09:48):

FYI if anyone in this thread is interested, I just posted about my use-case that prompted my questions: #general > Lean Vulkan


Last updated: Dec 20 2023 at 11:08 UTC