Zulip Chat Archive

Stream: lean4

Topic: C FFI usage


Anders Christiansen Sørby (Oct 06 2021 at 11:10):

I'm trying to link the BLAKE3 library in C to Lean FFI with extern. My attempt so far is:

typedef struct {
  // private fields
} blake3_hasher;

which I tried in Lean to represent as an opaque type :

@[extern "blake3_hasher"]
constant Blake3Hasher : Type

Which seems to work.
And the initialization function:

void blake3_hasher_init(
  blake3_hasher *self);

in Lean I tried this

@[extern "blake3_hasher_init"]
constant initHasher : IO Blake3Hasher

But when I try to write this:

@[extern "blake3_version"]
constant version : String

For this c header:

const char *blake3_version(void);

I get this error:

/nix/store/9abmq17hg5gia3r2h748ps8dm8m3r884-LeanPlay.LeanVersion-c/LeanPlay/LeanVersion.c:820:21: error: called object type 'lean_object *' is not a function or function pointer
x_1 = blake3_version();
      ~~~~~~~~~~~~~~^
1 error generated.

How do I link this correctly?

Sebastian Ullrich (Oct 06 2021 at 11:17):

There is no automatic conversion between char * and String, so you will need to implement a wrapper function lean_object * lean_blake3_version(lean_object * unit)/constant version : Unit -> String anyway

Sebastian Ullrich (Oct 06 2021 at 11:23):

Anders Christiansen Sørby said:

which I tried in Lean to represent as an opaque type :

@[extern "blake3_hasher"]
constant Blake3Hasher : Type

Which seems to work.

This will break when the runtime tries to increment or decrement the RC of such an object. See lean_external_object, incl. some prior discussions in this stream, for wrapping external resources in Lean objects.

Xubai Wang (Oct 06 2021 at 11:50):

FYI: two steps of using lean_external_object

  1. Initialization
static lean_external_class *g_blake3_hasher_external_class = NULL;

static void blake3_hasher_finalizer(void *ptr)
{
    // cleanup code such as
    free(ptr);
}

inline static void blake3_hasher_foreach(void *mod, b_lean_obj_arg fn) {
    // used for `for in`
}

lean_obj_res blake3_initialize()
{
    // must be called beforehand
    // Often I wrap this in an external function and call it with `builtin_initialize` from Lean.
    // There is another way of calling this each time in `lean4-papyrus`.
    g_blake3_hasher_external_class = lean_register_external_class(blake3_hasher_finalizer, blake3_hasher_foreach);
    return lean_io_result_mk_ok(lean_box(0));
}
@[extern "blake3_initialize"] constant blake3Init : IO Unit

builtin_initialize blake3Init
  1. Creation and Usage (usually wrapped into some box and unbox convenience function)
lean_object *blake3_hasher_box(blake3_hasher *o)
{
    return lean_alloc_external(g_blake3_hasher_external_class, o);
}

blake3_hasher *blake3_hasher_unbox(lean_object *o) {
    return (blake3_hasher *)(lean_get_external_data(o));
 }

And note that blake3_hasher_box should be used on heap allocated blake3_hasher *.

Xubai Wang (Oct 06 2021 at 11:56):

You can also refer to the IO.FS.Handle example in Lean source code here.

Xubai Wang (Oct 06 2021 at 12:19):

@Sebastian Ullrich Creating an external class for each C struct is annoying when there're a lot of them. Although [unbox] is noop now, will it be possible in the future that we can directly unbox a Lean structure into C struct by marking it with attribute [unbox]?

Sebastian Ullrich (Oct 06 2021 at 12:28):

Perhaps, but how to then pass it as a pointer is also unclear

Xubai Wang (Oct 06 2021 at 12:36):

I have considered several ways of using external struct.

  1. The first is to simply use the lean_external_class and constant. This is easy to use but it makes the struct completely opaque. We cannot have some proof on it. And since constant cannot be Inhabited, every thing returned have to be IO or Option or something else even if we just want to access the struct field.
  2. The second one is to use a plain Lean inductive. We can use @[extern] def to access the struct field, but this creates a big overhead when we need to pass the struct back and forth (e.g. epoll or IOCP).
  3. The third way is to define our own lean_objects, like the standard lean_sarray_object, lean_string_object. But this seems impossible because each of them use a different tag.

None of them is satisfactory enough, and the most elegant and least overhead approach I can come up with is to directly use struct internally on FFI structures.

Sebastian Ullrich (Oct 06 2021 at 12:40):

Can the overhead in 2. be eliminated via LTO?

Sebastian Ullrich (Oct 06 2021 at 12:42):

Runtime representation and Lean representation (constant/inductive) should be mostly orthogonal design decision

Xubai Wang (Oct 06 2021 at 13:00):

I think the overhead is mostly in "translating" . Can LTO optimize this away?

Take poll (or WSAPoll) as example.

int poll(struct pollfd fds[], nfds_t nfds, int timeout);

struct pollfd {
     int    fd;       /* file descriptor */
     short  events;   /* events to look for */
     short  revents;  /* events returned */
};

The Lean side can be:

structure PollFd where
  fd : FileDesc
  events : Array PollEvent  -- where PollEvent is a enum inductive
  revents : Array PollEvent

@[extern "lean_poll"] constant poll : (@& (IO.Ref PollFd)) -> @& USize -> @& USize -> IO Unit

The lean_poll function has to convert PollFd to pollfd to feed and then convert it back to return (overhead: the creation of pollfd and modifying the Array). Also, each events and revents has to be anded out to turn them into array of PollEvents instead of checking bit needed only.

Sebastian Ullrich (Oct 06 2021 at 15:47):

Ah, I assumed you meant to make the constructor and all field accessors @[extern], like we e.g. do with String. Then the Lean type doesn't really exist at runtime. But the C type still has to be wrapped in a lean_object or somehow be transferred unboxed.

Sebastian Ullrich (Oct 06 2021 at 15:49):

I think for structs behind pointers or arrays the most realistic solution for now is to write a metaprogram that translates a C structure declaration to raw accesses of a ByteArray

Sebastian Ullrich (Oct 06 2021 at 15:50):

and/or an of an unsafe Pointer type, which we do not have yet

Christian Pehle (Oct 06 2021 at 17:22):

I considered whether

structure SomeClassRef where
        addr : UInt64

@[extern "lean_create_class_ref"]
constant SomeClassRef.create : IO SomeClassRef

could work, then each function could just cast the address to a struct pointer and operate on this (as this is already unboxed by the compiler). But it would require to track the lifetime manually and in say IO, which isn't very elegant. I like the approach in lean-papyrus by @Mac , which distinguishes between OwnerPtr and WeakPtr properly, but such an implementation can be only done in a sufficiently powerful language like C++ / Rust and not in a C.

Anders Christiansen Sørby (Oct 07 2021 at 14:30):

Using a call to buildLeanPackage:

      blake3Mod = leanPkgs.buildLeanPackage {
        name = "Blake3";
        src = ./src;
        debug = true;
        linkFlags = [ "-v -l${blake3-c}" ];
        staticLibDeps = [ blake3-c ];
      };

The first few lines of the generated C code looks like this:

// Lean compiler output
// Module: Blake3
// Imports: Init
#include <lean/lean.h>
#if defined(__clang__)
#pragma clang diagnostic ignored "-Wunused-parameter"
#pragma clang diagnostic ignored "-Wunused-label"
#elif defined(__GNUC__) && !defined(__CLANG__)
#pragma GCC diagnostic ignored "-Wunused-parameter"
#pragma GCC diagnostic ignored "-Wunused-label"
#pragma GCC diagnostic ignored "-Wunused-but-set-variable"
#endif
#ifdef __cplusplus
extern "C" {
#endif
LEAN_EXPORT lean_object* l_Blake3_hasherFinalize___boxed(lean_object*, lean_object*, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_hasherUpdate___boxed(lean_object*, lean_object*, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_version;
lean_object* blake3_hasher_init_derive_key(lean_object*, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_initHasher___boxed(lean_object*);
lean_object* blake3_version(lean_object*);
LEAN_EXPORT uint16_t l_Blake3_BLAKE3__BLOCK__LEN;
LEAN_EXPORT lean_object* l_Blake3_initHasherDeriveKey___boxed(lean_object*, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_hasherFinalizeSeek___boxed(lean_object*, lean_object*, lean_object*, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_initHasherKeyed___boxed(lean_object*, lean_object*);
lean_object* blake3_hasher_update(lean_object*, size_t, lean_object*);
lean_object* blake3_hasher_finalize_seek(uint64_t, lean_object*, size_t, lean_object*);
LEAN_EXPORT uint16_t l_Blake3_BLAKE3__MAX__DEPTH;
LEAN_EXPORT uint16_t l_Blake3_BLAKE3__KEY__LEN;
LEAN_EXPORT uint16_t l_Blake3_BLAKE3__OUT__LEN;
lean_object* blake3_hasher_init(lean_object*);
static lean_object* l_Blake3_version___closed__1;
lean_object* blake3_hasher_finalize(lean_object*, size_t, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_internalVersion___boxed(lean_object*);
lean_object* blake3_hasher_init_derive_key_raw(lean_object*, size_t, lean_object*);
lean_object* blake3_hasher_init_keyed(lean_object*, lean_object*);
LEAN_EXPORT lean_object* l_Blake3_initHasherDeriveKeyRaw___boxed(lean_object*, lean_object*, lean_object*);
LEAN_EXPORT uint16_t l_Blake3_BLAKE3__CHUNK__LEN;
// ....
LEAN_EXPORT lean_object* l_Blake3_internalVersion___boxed(lean_object* x_1) {
_start:
{
lean_object* x_2;
x_2 = blake3_version(x_1);
return x_2;
}

Full example code here: https://github.com/Anderssorby/LeanPlay
But this causes the errors:

/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_internalVersion___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:83: undefined reference to `blake3_version'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_initHasher___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:108: undefined reference to `blake3_hasher_init'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_initHasherKeyed___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:116: undefined reference to `blake3_hasher_init_keyed'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_initHasherDeriveKey___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:124: undefined reference to `blake3_hasher_init_derive_key'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_initHasherDeriveKeyRaw___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:134: undefined reference to `blake3_hasher_init_derive_key_raw'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_hasherUpdate___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:144: undefined reference to `blake3_hasher_update'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_hasherFinalize___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:154: undefined reference to `blake3_hasher_finalize'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `l_Blake3_hasherFinalizeSeek___boxed':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:166: undefined reference to `blake3_hasher_finalize_seek'
/nix/store/hy3lz2vfv9qq2v5jz9nzlx6mmiaq79rj-binutils-2.35.1/bin/ld: /nix/store/nh3mbs8n9iarlbjx32fmd2b771bqx5v4-Blake3-cc/Blake3.o: in function `_init_l_Blake3_version___closed__1':
/nix/store/dckyffc4s94gg47nb7xrr69xn4l441fc-Blake3-c/Blake3.c:92: undefined reference to `blake3_version'

What am I missing? Should there be a header file inclusion at the top of this file somehow?

Sebastian Ullrich (Oct 07 2021 at 14:36):

That's from the linker, so it's not header files. Can you post the linker cmdline? Should it be linkFlags = [ "-v" "-l..." ];?

Sebastian Ullrich (Oct 07 2021 at 14:39):

I forgot about staticLibDeps btw. If blake3-c directly contains a .a file, you shouldn't need the -l argument.

Anders Christiansen Sørby (Oct 08 2021 at 00:07):

Linker cmdline:

 "/nix/store/ri0cphpgg739axdzgiwl3grawj6nlykg-ccache-links-wrapper-/bin/ld" --eh-frame-hdr -m elf_x86_64 -o /nix/store/d7q4n09gwnmyan3b1hxjjzswqkg4sn69-leanplay/bin/leanplay /nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib/crt1.o /nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib/crti.o /nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib/gcc/x86_64-unknown-linux-gnu/10.3.0/crtbegin.o -L/nix/store/fx58y0jdzd2xggdy0pqag2ffj72l7w33-gmp-6.2.1/lib -L/nix/store/12j5q4qa7f1h5qcxy47hycl18m6ych4f-Init-lib -L/nix/store/v7a28x16z8afha1nyrx5wi28v8mynccd-Std-lib -L/nix/store/60fckhkp33hnb35da605spn3m28h1vd3-Lean-lib -L/nix/store/fmam7m6307bwyjqx351y13j6js204xgq-leancpp/lib/lean -L/nix/store/hbqyp5xq6yi4sw5lm22i22sqyffvzi0n-leanshared -L/nix/store/1nzq03jahnla4gibxc8jmhbqi5j808ib-leanc/lib -L/nix/store/1nzq03jahnla4gibxc8jmhbqi5j808ib-leanc/lib/lean -L/nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib -L/nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib/gcc/x86_64-unknown-linux-gnu/10.3.0 -L/nix/store/lg104nh0szci8slz5z6494m457jm5y3p-gcc-10.3.0-lib/x86_64-unknown-linux-gnu/lib -L/nix/store/d3r0hhgljsp0nk5n4ra2k2ykvjyah96z-clang-12.0.1-lib/lib -L/nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib -L/nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib/gcc/x86_64-unknown-linux-gnu/10.3.0 -L/nix/store/lg104nh0szci8slz5z6494m457jm5y3p-gcc-10.3.0-lib/x86_64-unknown-linux-gnu/lib -L/nix/store/d3r0hhgljsp0nk5n4ra2k2ykvjyah96z-clang-12.0.1-lib/lib -L/nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib64/gcc/x86_64-unknown-linux-gnu/10.3.0 -L/nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib64/gcc/x86_64-unknown-linux-gnu/10.3.0/../../../../lib64 -L/nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib64/gcc/x86_64-unknown-linux-gnu/10.3.0/../../.. -dynamic-linker=/nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib/ld-linux-x86-64.so.2 -dynamic-linker=/nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib/ld-linux-x86-64.so.2 /nix/store/2cwkzhb0k8k1mn206kzh91kcsqc3rcgz-LeanPlay-lib/libLeanPlay.a /nix/store/1b3sx4l56s2sw5clwzca2ab9z69xfhnj-Blake3-lib/libBlake3.a /nix/store/12j5q4qa7f1h5qcxy47hycl18m6ych4f-Init-lib/libInit.a /nix/store/50waard4c0nfjrrmiwwv0nd2l0xpcm29-Lake-lib/libLake.a /nix/store/60fckhkp33hnb35da605spn3m28h1vd3-Lean-lib/libLean.a /nix/store/yz4r96wavknfijbbnif9kc0zq85zfxp8-Leanpkg-lib/libLeanpkg.a /nix/store/v7a28x16z8afha1nyrx5wi28v8mynccd-Std-lib/libStd.a --start-group -lleancpp -lLean --end-group --start-group -lInit -lStd -lleanrt --end-group -lstdc++ -lm -lgmp -ldl -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /nix/store/cj8kp411lblbdm4qj94s14h99dx1pv4b-gcc-10.3.0/lib/gcc/x86_64-unknown-linux-gnu/10.3.0/crtend.o /nix/store/cvr0kjg2q7z2wwhjblx6c73rv422k8cm-glibc-2.33-47/lib/crtn.o

Anders Christiansen Sørby (Oct 08 2021 at 00:09):

The blake3-c is a .so file I think. I had troubles compiling to a .a static file.

Sebastian Ullrich (Oct 08 2021 at 14:10):

I only see a /nix/store/1b3sx4l56s2sw5clwzca2ab9z69xfhnj-Blake3-lib/libBlake3.a in that cmdline. Does nm says that that file contains blake3_version?

Sebastian Ullrich (Oct 08 2021 at 16:29):

linkFlags = [ blake3-c ]; works. I have to say though that putting a .so, or really any file, directly into Nix' $out instead of making that a folder is highly unconventional :) .

Sebastian Ullrich (Oct 08 2021 at 16:30):

Now if only all bug reports were as easily reproducible as nix build .#executable :cry: ...

Anders Christiansen Sørby (Oct 15 2021 at 12:34):

I'm getting a segmentation fault when running nix build .#tests for my blake3 bindings.
https://github.com/yatima-inc/lean-blake3
Full output:

nix build .#tests -vL
building '/nix/store/p6prlan0h9mlhkb9wbsg5cx8gmfxpda8-Tests.HashString.drv'...
Tests.HashString> bash: line 7:     7 Segmentation fault      (core dumped) lean -o $out/$oleanPath -c $c/$cPath $leanPath $leanFlags $leanPluginFlags
error: builder for '/nix/store/p6prlan0h9mlhkb9wbsg5cx8gmfxpda8-Tests.HashString.drv' failed with exit code 139;
       last 1 log lines:
       > bash: line 7:     7 Segmentation fault      (core dumped) lean -o $out/$oleanPath -c $c/$cPath $leanPath $leanFlags $leanPluginFlags
       For full logs, run 'nix log /nix/store/p6prlan0h9mlhkb9wbsg5cx8gmfxpda8-Tests.HashString.drv'.
error: 1 dependencies of derivation '/nix/store/waq526gdyni9bim6cqgzgj5j78dq6v25-Tests-depRoot.drv' failed to build
error: 1 dependencies of derivation '/nix/store/rxwqmvssr2di81i17bvk5f2nn7c2wk5c-Tests-depRoot.drv' failed to build

Gabriel Ebner (Oct 15 2021 at 12:38):

The type of the function doesn't seem to match the C function:

@[extern "blake3_hasher_update"]
constant hasherUpdate : (hasher : Hasher)  (input : ByteArray)  (length : USize)  Hasher
void blake3_hasher_update(blake3_hasher *self, const void *input,
                          size_t input_len);

I'm not surprised that you get a segfault when converting void to blake3_hasher *.

Anders Christiansen Sørby (Oct 15 2021 at 12:42):

True, but how do I correctly translate this mutable function?

Anders Christiansen Sørby (Oct 15 2021 at 12:43):

I was hoping lean would give me a warning or somehow figure out how to link this signature correctly. How can I manage the c code generation to behave as I want?

Mario Carneiro (Oct 15 2021 at 12:44):

It would be great if there was a lean-bindgen

Gabriel Ebner (Oct 15 2021 at 12:44):

I would do the usual implementedBy

constant HasherPointed : PointedType
def Hasher : Type := HasherPointed.type
instance : Inhabited Hasher := HasherPointed.val

@[extern "blake3_hasher_update"]
constant hasherUpdateExtern : (hasher : Hasher)  (input : ByteArray)  (length : USize)  IO Unit

unsafe def unsafeIO' [Inhabited α] (k : IO α) : α :=
  match unsafeIO k with
  | Except.ok a => a
  | Except.error e => panic e.toString

unsafe def hasherUpdateImpl (hasher : Hasher) (input : ByteArray) (length : USize) : Hasher :=
  unsafeIO' do
    hasherUpdateExtern hasher input length
    hasher

@[implementedBy hasherUpdateImpl]
constant hasherUpdate (hasher : Hasher) (input : ByteArray) (length : USize) : Hasher

Gabriel Ebner (Oct 15 2021 at 12:45):

There is probably an easier way to launder IO functions, but I can't remember it right now.

Mario Carneiro (Oct 15 2021 at 12:45):

it shouldn't be hard to write something to read the lean type and produce a plausible looking C header

Gabriel Ebner (Oct 15 2021 at 12:45):

If we're already generating C code, then we could just do inline-c.lean

Gabriel Ebner (Oct 15 2021 at 12:46):

https://hackage.haskell.org/package/inline-c

Gabriel Ebner (Oct 15 2021 at 12:46):

An unsafe term-mode macro would also be convenient.

Mario Carneiro (Oct 15 2021 at 12:46):

Is there a way to write entire C functions in lean? I recall seeing some way to do this with little snippets like (float)(x + y) but I would rather not do a whole function like that

Mario Carneiro (Oct 15 2021 at 12:47):

but it would be nice to be able to write the shim from lean ABI to a normal looking C function in a bit of lean code

Gabriel Ebner (Oct 15 2021 at 12:55):

recall seeing some way to do this with little snippets like (float)(x + y) but I would rather not do a whole function like that

There's no law that requires you to write that string manually. :smile: If you're willing to use gcc, then you can even define functions in that string:

(float) ({
  float my_function() {
    return 42;
  }
  my_function();
})

And then we could make Lean macros that expand into that attribute:

def frob (x : Float) : Float :=
  inlineC
    float x = $x;
    for (int i = 0; i < 10) {
      x = cosf(x);
    }
    return x;

Other options include generating a C source file as a side effect, or adding support to lean for extra C source code output.

Mario Carneiro (Oct 15 2021 at 12:58):

Yeah, I thought about using ({ ... }) here, but it would be nice to have something more "official" from the compiler

Mario Carneiro (Oct 15 2021 at 13:00):

Generating a C source file sounds like it would be hell for lake

Gabriel Ebner (Oct 15 2021 at 13:01):

The other interesting question is how to add #include <...>, I'm not sure if that's possible atm.

Mario Carneiro (Oct 15 2021 at 13:02):

I think the ideal low level interface would just be something like Compiler.addIncludeFile "windows.h" or Compiler.addCFunction "float foo() { return 42; }" which just gets added to the list of functions to emit for the current file

Mac (Oct 15 2021 at 13:03):

This also depends heavily on which direction one wants Lean to go. PR #694 is all about removing inline C code from Lean (so it can support other backends -- such as LLVM). I think adding an inline C dialect would be moving further in a direction that Lean probably doesn't want to go.

Mac (Oct 15 2021 at 13:07):

Tough, if the idea is that such inline C would generate a separate C file that is compiled and linked into the package separately rather than be part of the current C emitter that would probably still be supportable by alternative backends. Thus, such an approach may still be feasible.

Anders Christiansen Sørby (Oct 15 2021 at 13:08):

This procedure with an IO binding and an unsafeIO' to convert it into a pure function could be made into a general bind_gen-like macro.

Mario Carneiro (Oct 15 2021 at 13:09):

Well, hasherUpdate in that snippet isn't really safe; it modifies the hasher in-place, so it's not referentially transparent

Mario Carneiro (Oct 15 2021 at 13:10):

A correct implementation should check the refcount and make a copy if it is shared first

Mac (Oct 15 2021 at 13:10):

Gabriel Ebner said:

I would do the usual implementedBy

I was also under the impression using unsafeIO was heavily discouraged. Is that right, @Sebastian Ullrich ?

Anders Christiansen Sørby (Oct 15 2021 at 13:12):

Ok, so the lean compiler can't detect such problems?

Mac (Oct 15 2021 at 13:12):

no

Mario Carneiro (Oct 15 2021 at 13:13):

The lean compiler is barely involved. The shim would normally be the one to handle this

Mario Carneiro (Oct 15 2021 at 13:13):

It is weird to me that your extern function doesn't have more lean_object * in it

Mario Carneiro (Oct 15 2021 at 13:18):

Correct me if I'm wrong, but I think the right companion to:

@[extern "blake3_hasher_update"]
constant hasherUpdate : (hasher : Hasher)  (input : @& ByteArray)  (length : USize)  Hasher

is

lean_obj_res blake3_hasher_update(lean_obj_arg self, b_lean_obj_arg input, size_t input_len);

Mario Carneiro (Oct 15 2021 at 13:23):

@Mac Thinking about it some more, I don't think we can get away from usage of C in conjunction with lean. The whole lean API is written in C; how are you going to deal with lean_unbox, lean_inc etc otherwise?

Mario Carneiro (Oct 15 2021 at 13:24):

For writing code that works with lean FFI, you either have to write C or something about as complicated that binds to all these functions

Mac (Oct 15 2021 at 13:33):

Mario Carneiro said:

Mac Thinking about it some more, I don't think we can get away from usage of C in conjunction with lean. The whole lean API is written in C; how are you going to deal with lean_unbox, lean_inc etc otherwise?

But the C API is just linked in. It's source language is irrelevant. Lean can emit whatever it wants without that mattering.

Anders Christiansen Sørby (Oct 15 2021 at 13:33):

But that is not the signature I have and I don't want to change stuff in the BLAKE3 c implementation.

Mario Carneiro (Oct 15 2021 at 13:33):

that's why there is a shim

Mac (Oct 15 2021 at 13:34):

@Anders Christiansen Sørby you need to write glue code to translate between the two

Mario Carneiro (Oct 15 2021 at 13:34):

and @Mac what language should the shim be written in?

Mario Carneiro (Oct 15 2021 at 13:35):

Theoretically it can be anything that understands the C ABI, but I don't think we want to be writing LLVM

Mac (Oct 15 2021 at 13:35):

@Mario Carneiro C is fine. But so is C++/C#/Rust/whatever. The shim is also linked in, it source language is relevant.

Anders Christiansen Sørby (Oct 15 2021 at 13:36):

@Mac Is there no way around? Some functionality in lean for specifying this perhaps?

Mario Carneiro (Oct 15 2021 at 13:36):

I agree with you, but you have to make a choice somehow, and ideally one that doesn't make building it a nightmare

Mac (Oct 15 2021 at 13:36):

@Mario Carneiro It is important that this not just a me issue, there are multiple other people writing alternative backends for Lean. After all, PR #694 wasn't mine.

Mario Carneiro (Oct 15 2021 at 13:37):

Separate linking is a fine technical solution, but the build tool needs to understand what's going on

Mac (Oct 15 2021 at 13:37):

Anders Christiansen Sørby said:

Mac Is there no way around? Some functionality in lean for specifying this perhaps?

Sadly, at the moment, the answer to this is a big no. Lean unfortunately currently supports very few external types (directly). Just uint8/16/32/64 and double -- that's it.

Mario Carneiro (Oct 15 2021 at 13:38):

Gabriel's solution with inline C seems plausible for the present

Mac (Oct 15 2021 at 13:38):

Mario Carneiro said:

Separate linking is a fine technical solution, but the build tool needs to understand what's going on

Yeah, and lake already does.

Mario Carneiro (Oct 15 2021 at 13:39):

How does lake handle this now?

Mac (Oct 15 2021 at 13:39):

https://github.com/leanprover/lake/tree/master/examples/ffi

Mario Carneiro (Oct 15 2021 at 13:41):

I kind of want to try Lean + Rust FFI now

Anders Christiansen Sørby (Oct 15 2021 at 13:43):

@Mac So the solution of @Gabriel Ebner would not work? I'm wondering how to proceed with this now.

Mac (Oct 15 2021 at 13:49):

Mario Carneiro said:

Gabriel's solution with inline C seems plausible for the present

I think an inline C solution that produced a separate C shim that was compiled and linked separately could be quite desirable . For one, It would keep the C source and Lean bindings tied together in the source. For example:

extern c def frob (x : Float) : Float :=
  for (int i = 0; i < 10) {
    x = cosf(x);
  }
  return x;

would compile to

@[extern "frob"]
def frob (x : Float) : Float

and

double frob(double x) {
  for (int i = 0; i < 10) {
    x = cosf(x);
  }
  return x;
}

Mario Carneiro (Oct 15 2021 at 13:50):

@Anders Christiansen Sørby I think Mac's solution is easier to work from since it's a complete worked example and I imagine you will have a decent amount of shim code, not to mention linking with the blake headers

Anders Christiansen Sørby (Oct 15 2021 at 13:51):

Is this valid source?
I see in the generated C code that it generates lean_object * for all non-trivial types.

Mario Carneiro (Oct 15 2021 at 13:52):

lean_obj_arg is used for all types except "unboxed" scalar types, which means basically USize, UInt64, UInt32, Float and maybe a few others

Mac (Oct 15 2021 at 13:54):

and b_lean_obj_arg is used when the object is "borrowed" i.e., defined with (foo : @& Foo) -- which means the caller does not increment the reference counter.

Anders Christiansen Sørby (Oct 15 2021 at 13:55):

But I don't see entirely how to use this solution. Do I have to create an entire intermediary c-lib that includes lean.h?

Mario Carneiro (Oct 15 2021 at 13:57):

yes

Mario Carneiro (Oct 15 2021 at 13:57):

well, a file at least

Mac (Oct 15 2021 at 13:59):

it should probably be compiled at least to a static library as each dependent of your library will also need to link it in.

Mario Carneiro (Oct 15 2021 at 14:07):

Where is lean.h anyway? The one in the source depends on config.h that doesn't exist, and I can't find any file called lean.h in the nightlies

Gabriel Ebner (Oct 15 2021 at 14:07):

[whole discussion about "could we have API interfaces without C?"]

I don't think that we be a big problem as long as we have enough "intrinsics", i.e., IO functions that operate on pointers, that can access refcounts, etc. And also a way to link to libraries and call exported functions. ctypes works quite well in python.

Anders Christiansen Sørby (Oct 15 2021 at 14:10):

https://github.com/leanprover/lean4/blob/master/src/include/lean/lean.h

Mario Carneiro (Oct 15 2021 at 14:11):

I know about that, I'm asking how to make a C compiler approved lean.h

Mario Carneiro (Oct 15 2021 at 14:12):

lean.h contains #include <lean/config.h> and there is no file called config.h in the distribution

Mario Carneiro (Oct 15 2021 at 14:12):

there is a config.h.in that is apparently built into config.h but I don't want to build lean from source

Sebastian Ullrich (Oct 15 2021 at 14:13):

Looks fine to me

$ ls lean-4.0.0-nightly-2021-10-13-linux/include/lean/
config.h  lean.h  lean_gmp.h  version.h

Mario Carneiro (Oct 15 2021 at 14:14):

that's odd:

$ ls -al /home/mario/.elan/toolchains/leanprover--lean4---nightly/lib/lean/src/include/lean/
total 8
drwxr-xr-x 2 mario mario 4096 Sep 29 01:47 ./
drwxr-xr-x 3 mario mario 4096 Sep 29 01:47 ../

Sebastian Ullrich (Oct 15 2021 at 14:15):

Latest as well?

Mario Carneiro (Oct 15 2021 at 14:15):

yes

Anders Christiansen Sørby (Oct 15 2021 at 14:17):

Do I have to fill in the lean_object* struct by hand? Or is there an easier way of doing this?

Mario Carneiro (Oct 15 2021 at 14:18):

use lean_alloc_external to create one, and lean_is_exclusive to do the copy on write stuff

Mac (Oct 15 2021 at 14:18):

@Anders Christiansen Sørby no, you use one of the API methods to create objects of lean_object* -- which you use depends on what object you are trying to construct.

Sebastian Ullrich (Oct 15 2021 at 14:19):

@Mario Carneiro Not sure what to say...

  leanprover/lean4:nightly unchanged - Lean (version 4.0.0-nightly-2021-10-13, commit 66fcfcce3716, Release)

$ ls -l .elan/toolchains/leanprover--lean4---nightly/include/lean/
.rw-r--r-- 259 sebastian  1 Jan  1970 config.h
.rw-r--r-- 73k sebastian  1 Jan  1970 lean.h
.rw-r--r-- 551 sebastian  1 Jan  1970 lean_gmp.h
.rw-r--r-- 405 sebastian  1 Jan  1970 version.h

Mario Carneiro (Oct 15 2021 at 14:21):

Oh, that's not the same folder

Mario Carneiro (Oct 15 2021 at 14:21):

I was in lib/lean/src/include/lean/

Sebastian Ullrich (Oct 15 2021 at 14:21):

Oh :)

Anders Christiansen Sørby (Oct 15 2021 at 14:30):

I've tried this, but how do i go from char * to lean_object *?

LEAN_SHARED lean_obj_res lean_string_mk(lean_obj_arg cs);

lean_object* l_blake3_version() {
  const char * v = blake3_version;
  lean_obj_res r = lean_string_mk(v);
  return r;
}

I can't find any api function for doing this conversion.

Mario Carneiro (Oct 15 2021 at 14:32):

I think this is what the shim should look like:

#include <lean/lean.h>

// replace with #include <blake3.h>
typedef struct blake3_hasher blake3_hasher;
extern blake3_hasher * blake3_hasher_copy(blake3_hasher *self);
extern void blake3_hasher_update(blake3_hasher *self, const void *input, size_t input_len);
extern void blake3_hasher_free(blake3_hasher *self);

static lean_external_class *g_blake3_hasher_external_class = NULL;

static void blake3_hasher_finalizer(void *ptr) {
  blake3_hasher_free(ptr);
}

inline static void blake3_hasher_foreach(void *mod, b_lean_obj_arg fn) {}

lean_obj_res blake3_initialize() {
  g_blake3_hasher_external_class = lean_register_external_class(blake3_hasher_finalizer, blake3_hasher_foreach);
  return lean_io_result_mk_ok(lean_box(0));
}

static inline lean_obj_res lean_ensure_exclusive_blake3_hasher(lean_obj_arg a) {
    if (lean_is_exclusive(a)) return a;
    return lean_alloc_external(g_blake3_hasher_external_class, blake3_hasher_copy(a));
}

lean_obj_res lean_blake3_hasher_update(lean_obj_arg self, b_lean_obj_arg input, size_t input_len) {
  lean_object* a = lean_ensure_exclusive_blake3_hasher(a);
  blake3_hasher_update(lean_get_external_data(a), lean_sarray_cptr(input), input_len);
  return a;
}

Mario Carneiro (Oct 15 2021 at 14:35):

@Anders Christiansen Sørby You should use lean_mk_string not lean_string_mk

Mario Carneiro (Oct 15 2021 at 14:36):

There are some lean examples that just inline it:

/-- Additional version description like "nightly-2018-03-11" -/
@[extern c inline "lean_mk_string(LEAN_SPECIAL_VERSION_DESC)"]
constant version.getSpecialDesc (u : Unit) : String

Xubai Wang (Oct 15 2021 at 16:10):

(deleted)

Anders Christiansen Sørby (Oct 16 2021 at 10:43):

@Sebastian Ullrich Where do I find the header files in the nix build?

Sebastian Ullrich (Oct 16 2021 at 11:13):

@Anders Christiansen Sørby lean-bin-tools-unwrapped, https://github.com/leanprover/lean4/blob/8d10197164830e3e8cbc6ed321e5fe920299c162/nix/bootstrap.nix#L108

Sebastian Ullrich (Oct 16 2021 at 11:15):

There's also lean-all that is mostly equivalent to the release tarballs, but that is intended more for interactive use than as a build dependency

Anders Christiansen Sørby (Oct 19 2021 at 16:59):

I still have some issues with building. Why doesn't the extern detect the declaration in the c-shim?
The full example can be found at
https://github.com/yatima-inc/lean-blake3
and by running nix build .#tests -L

Tests.HashString> could not find native implementation of external declaration 'Blake3.internalVersion' (symbols 'l_Blake3_internalVersion___boxed' or 'l_Blake3_internalVersion')

Do I need to do some additional linking? Why is there no warning that the linking failed when building just the module?

Xubai Wang (Oct 19 2021 at 17:04):

Maybe it's that l_blake3_version declared with static? Static functions are not visible to other files.

Anders Christiansen Sørby (Oct 19 2021 at 17:28):

Didn't work now either.

Anders Christiansen Sørby (Oct 19 2021 at 17:33):

nix build .#modRoot does work though

Anders Christiansen Sørby (Oct 19 2021 at 17:37):

But this doesn't require any linking apparently, because it runs without including the static library.

Mac (Oct 19 2021 at 18:11):

@Anders Christiansen Sørby what is the lean command being used to run these tests?

Anders Christiansen Sørby (Oct 19 2021 at 18:12):

It is essentially leanc and lean using the buildLeanPackage nix function

Mac (Oct 19 2021 at 18:13):

is there any way to get the exact lean inovaction?

Mac (Oct 19 2021 at 18:16):

if you want to use C bindings in at interpretation time you need to pass the package (i.e., Blake3) as a shared library built with leanc -shared to lean as a --plugin

Anders Christiansen Sørby (Oct 19 2021 at 19:25):

Thanks that lead me on the right track. I need to use the pluginDeps option to get libraries passed to --plugin. I was using staticLibDeps. This could use a lot more docs.

Anders Christiansen Sørby (Oct 20 2021 at 12:37):

I've added freeing and copy functions to blake3_hasher* even though this is not supported in the original API. How do I wrap this inside a lean_object* so that it can be used within the language?

Mario Carneiro (Oct 20 2021 at 12:38):

I think it would look like my long post above

Anders Christiansen Sørby (Oct 20 2021 at 12:42):

Yes, that was very helpful, but how do I extract the data from the lean_external_object* so that I can do a copy or free data?

Mario Carneiro (Oct 20 2021 at 12:42):

Note that both copy and free are called in the code there

Mario Carneiro (Oct 20 2021 at 12:44):

the free function is called in the finalizer that gets passed to lean_register_external_class, and the copy function is called by lean_ensure_exclusive_blake3_hasher which is used in any mutating function

Anders Christiansen Sørby (Oct 20 2021 at 12:49):

Yes, but those functions needs to be implemented and they are also working on the wrong type. I need to access the external data wrapped inside the lean_object* created by the lean_alloc_external. Is is safe to just do l->data?

Anders Christiansen Sørby (Oct 20 2021 at 12:50):

I see there is a function for that.

Mario Carneiro (Oct 20 2021 at 12:52):

I don't follow. Both blake3_hasher_copy and blake3_hasher_free should have natural types there, those are the functions you are supposed to implement

Anders Christiansen Sørby (Oct 20 2021 at 12:53):

Is it necessary to assert the type of the lean_object* btw?

Mario Carneiro (Oct 20 2021 at 12:54):

dunno, are you ok with UB if it's not true?

Anders Christiansen Sørby (Oct 20 2021 at 12:54):

In your snippet you used blake3_hasher* but their types are lean_object* I just need to transform form that

Mario Carneiro (Oct 20 2021 at 12:55):

Could you show your code or a snippet thereof? The code I posted is type correct and passes a C compiler

Anders Christiansen Sørby (Oct 20 2021 at 13:01):

I got a lot of warnings when running that.
My attempt so far:

/*
 * Copy the contents of the hasher to a new memory location.
 */
static inline lean_object * blake3_hasher_copy(lean_object *self) {
  blake3_hasher* a = (blake3_hasher*) lean_get_external_data(self);
  blake3_hasher * copy = malloc(sizeof(blake3_hasher));
  *copy = *a;

  return lean_alloc_external(g_blake3_hasher_external_class, copy);
}

/*
 * Free the memory for this hasher. This makes all other references to this address invalid.
 */
static inline void blake3_hasher_free(lean_object *self) {
  blake3_hasher* a = (blake3_hasher*) lean_get_external_data(self);
  // Mark memory as available
  free(a);
  a = NULL;
}

Anders Christiansen Sørby (Oct 20 2021 at 13:03):

I guess I could add this assert at the start of those functions, but is it really necessary?

assert(lean_get_external_class(self) == g_blake3_hasher_external_class);

Anders Christiansen Sørby (Oct 20 2021 at 13:16):

Now it compiles without warning, but I wonder how to use the lean command correctly (from nix). When building the package I get:

Blake3> terminate called after throwing an instance of 'lean::exception'
Blake3>   what():  error loading plugin, /nix/store/p7pjngnra1z3vr59rxnqgh4kfa3nfbf7-libblake3.so/libblake3.so: undefined symbol: blake3_version
Blake3> bash: line 7:     7 Aborted                 (core dumped) lean -o $out/$oleanPath -c $c/$cPath $leanPath $leanFlags $leanPluginFlags

What sort of --plugin arguments are necessary to link this correctly?

Mario Carneiro (Oct 20 2021 at 14:52):

Who is calling those functions? Are these functions exposed directly to lean?

Mario Carneiro (Oct 20 2021 at 14:53):

blake3_hasher_free definitely looks like a not safe thing to do

Mac (Oct 20 2021 at 15:22):

@Anders Christiansen Sørby

Some notes:

  1. It would be wiser to use a function like the following to access the external class:
static lean_external_class* get_blake3_hasher_class() {
  // Use static to make this thread safe by static initialization rules.
  static lean_external_class* c =
    lean_register_external_class(&blake3_hasher_finalizer, &blake3_hasher_foreach);
  return c;
}

This ensure the class is always properly initialized before use.

  1. Looking at your code in blake3-shim.c, blake3_hasher_finalizer calls blake3_hashe_free which is just wrong. The finalizer is past the external pointer (i.e., the blake3_hasher) not the lean_object*.

Anders Christiansen Sørby (Oct 20 2021 at 19:42):

@Mac I don't understand. Is the finalizer not supposed to free the memory? I noticed there was a type error there which I have fixed.

Anders Christiansen Sørby (Oct 20 2021 at 19:46):

This doesn't solve my other problem of getting lean to correctly link the plugin.

Mac (Oct 20 2021 at 19:50):

Anders Christiansen Sørby said:

Mac I don't understand. Is the finalizer not supposed to free the memory? I noticed there was a type error there which I have fixed.

No, the finalizer is suppose to free memory. My point was simply that the pointer it receives is a the blake3_hasher directly and not its the lean_object that wraps it.

Mac (Oct 20 2021 at 19:51):

Anders Christiansen Sørby said:

This doesn't solve my other problem of getting lean to correctly link the plugin.

The problem here is that the plugin you are passing is a shared library compiled from the shim itself rather than from the entire Lean module linked with the shim. That is, the plugin is suppose to be the entire Blake3 Lean package not just the shim (and thus the package itself doesn't need the plugin).

Anders Christiansen Sørby (Oct 20 2021 at 19:59):

Ok, that is what I suspected. But how do I compile the Blake3 module together with the shim?

Mac (Oct 20 2021 at 20:57):

@Anders Christiansen Sørby just compile the shim into a static library and pass it an additional static library o the lean package build.

Anders Christiansen Sørby (Oct 21 2021 at 08:36):

That is what I was doing before and it didn't work. Including the staticLib as a dependency to the build command. I'll experiement some more. Where can I find docs and code for the lean command?

Using get_blake3_hasher_class leads to the error: initializer element is not constant. Is there another way to do this?

Anders Christiansen Sørby (Oct 21 2021 at 08:59):

This is not how the lean source does it. It uses Mario's solution. Is this some new C/C++ feature?

Sebastian Ullrich (Oct 21 2021 at 09:02):

It's C++, but I suspect it won't work anyway because the Lean data structures are not yet initialized at that point. It's a classic case of the static initialization order fiasco (except that Lean doesn't use static initialization for this very reason).

Anders Christiansen Sørby (Oct 21 2021 at 14:34):

This is more of a nix question. How do I get this to build correctly?

        blake3-shim = import ./c/default.nix {
          inherit system pkgs blake3-c lean;
        };
        project = leanPkgs.buildLeanPackage {
          inherit name;
          src = ./src;
          debug = true;
          # linkFlags = [ blake3-shim.staticLib ];
          # staticLibDeps = [ blake3-shim.staticLib ];
          # pluginDeps = [ blake3-shim.dynamicLib ];
        };
        tests = leanPkgs.buildLeanPackage {
          name = "Tests";
          src = ./tests;
          debug = true;
          deps = [ project ];
          pluginDeps = [ project.sharedLib ];
          staticLibDeps = [ blake3-shim project.staticLib ];
        };

The blake3-shim build works fine, but building project.executablegives:

/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:156:20: error: called object type 'lean_object *' is not a function or function pointer
x_1 = blake3_hasher();
      ~~~~~~~~~~~~~^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:261:29: error: called object type 'lean_object *' is not a function or function pointer
x_1 = lean_blake3_initialize();
      ~~~~~~~~~~~~~~~~~~~~~~^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:342:1: error: use of undeclared identifier 'l_Blake3_HasherPointed'
l_Blake3_HasherPointed = _init_l_Blake3_HasherPointed();
^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:342:26: warning: implicit declaration of function '_init_l_Blake3_HasherPointed' is invalid in C99 [-Wimplicit-function-declaration]
l_Blake3_HasherPointed = _init_l_Blake3_HasherPointed();
                         ^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:343:22: error: use of undeclared identifier 'l_Blake3_HasherPointed'; did you mean '_init_l_Blake3_HasherPointed'?
lean_mark_persistent(l_Blake3_HasherPointed);
                     ^~~~~~~~~~~~~~~~~~~~~~
                     _init_l_Blake3_HasherPointed
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:342:26: note: '_init_l_Blake3_HasherPointed' declared here
l_Blake3_HasherPointed = _init_l_Blake3_HasherPointed();
                         ^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:352:1: error: use of undeclared identifier 'l_Blake3_initHasher'
l_Blake3_initHasher = _init_l_Blake3_initHasher();
^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:352:23: warning: implicit declaration of function '_init_l_Blake3_initHasher' is invalid in C99 [-Wimplicit-function-declaration]
l_Blake3_initHasher = _init_l_Blake3_initHasher();
                      ^
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:353:22: error: use of undeclared identifier 'l_Blake3_initHasher'; did you mean '_init_l_Blake3_initHasher'?
lean_mark_persistent(l_Blake3_initHasher);
                     ^~~~~~~~~~~~~~~~~~~
                     _init_l_Blake3_initHasher
/nix/store/xn1f3mbdnfymy1cw109axj1jnm85fksy-Blake3-c/Blake3.c:352:23: note: '_init_l_Blake3_initHasher' declared here
l_Blake3_initHasher = _init_l_Blake3_initHasher();

It seems like PointedType assumes that the extern value is a function?

Anders Christiansen Sørby (Oct 21 2021 at 14:38):

In the generated code:

extern lean_object* blake3_hasher;
// ....
static lean_object* _init_l_Blake3_instInhabitedHasher___closed__1() {
_start:
{
lean_object* x_1;
x_1 = blake3_hasher();
return x_1;
}
}

Anders Christiansen Sørby (Oct 21 2021 at 14:40):

I must probably implement a lean_object in the c-shim to solve this. Any tips on a reference impl? io.cpp perhaps? But that is so big and complicated.

Mario Carneiro (Oct 21 2021 at 14:42):

It looks like it wants to call a function to initialize a new hasher. Why not just write one?

Mario Carneiro (Oct 21 2021 at 14:48):

You declared the hasher like this:

@[extern "blake3_hasher"]
constant HasherPointed : PointedType
def Hasher : Type := HasherPointed.type
instance : Inhabited Hasher := HasherPointed.val

I can't find any precedent in the lean 4 code base for putting an @[extern] on the pointed type itself. My interpretation of the error is that you should be putting the name of the initialization function in this extern declaration, because it will be called as a function. Given your shim code, I think this should be lean_blake3_initialize

Mario Carneiro (Oct 21 2021 at 14:51):

I don't think lean cares at all what the name of the C type is (blake3_hasher in this case). It only ever deals with lean_object* values wrapping a void*

Mac (Oct 21 2021 at 14:52):

Yeah, as @Mario Carneiro said, you should not be putting @[extern\ attributes on types.

Mac (Oct 21 2021 at 14:54):

Anders Christiansen Sørby said:

This is not how the lean source does it. It uses Mario's solution. Is this some new C/C++ feature?

The difference is that the Lean source has an initializer function that is called by the Lean runtime at startup that sets the global variable. Shims generally don't.

Anders Christiansen Sørby (Oct 21 2021 at 15:11):

Ok, I removed the extern and it helped. I also changed the signature of the constructor to

@[extern "lean_blake3_initialize"]
constant initHasher : Unit  Hasher

Which led it to be interpreted as a no argument function not a constant, but now I get this error:
For building the #executable

blake3> /nix/store/fp2h123793bxln4cjxy0mz08sf4nx2m3-binutils-2.35.1/bin/ld: read in flex scanner failed
blake3> clang-13: error: linker command failed with exit code 1 (use -v to see invocation)

and for building the #tests

Tests.HashString> lean: symbol lookup error: /nix/store/asz64i34j9hx9sj9ihgmqjy3qzxbf99r-Blake3.so/Blake3.so: undefined symbol: lean_blake3_version

but

$ nm -D /nix/store/asz64i34j9hx9sj9ihgmqjy3qzxbf99r-Blake3.so/Blake3.so | grep blake3
                 U l_blake3_hasher_finalize
                 U l_blake3_hasher_update
                 U lean_blake3_initialize
                 U lean_blake3_version

So what is going on?

Sebastian Ullrich (Oct 21 2021 at 15:14):

U means "undefined"

Sebastian Ullrich (Oct 21 2021 at 15:15):

I've never seen the first error before; I would guess that one of the inputs is not a valid object file/archive/...

Anders Christiansen Sørby (Oct 21 2021 at 16:05):

I had put the wrong input as a staticLib.
Now I get

> could not find native implementation of external declaration 'Blake3.internalVersion' (symbols 'l_Blake3_internalVersion___boxed' or 'l_Blake3_internalVersion')

for

@[extern "lean_blake3_version"]
constant internalVersion : Unit  String

This function symbol is defined in Blake3.so but the symbol from the BLAKE3 blake3_version is not.
What am I missing?

Mac (Oct 21 2021 at 16:07):

@Anders Christiansen Sørby looking at your flakes.nix, your project and tests setup seems suspect. I would expect them to look more like this:

project = leanPkgs.buildLeanPackage {
  inherit name;
  src = ./src;
  debug = true;
  staticLibDeps = [ blake3-shim, blake3-c ];
};
tests = leanPkgs.buildLeanPackage {
  name = "Tests";
  src = ./tests;
  debug = true;
  deps = [ project ];
  pluginDeps = [ project ];
};

Anders Christiansen Sørby (Oct 21 2021 at 16:08):

What is blake3 in this context?

Mac (Oct 21 2021 at 16:09):

@Anders Christiansen Sørby good point, fixed (it would be project)

Anders Christiansen Sørby (Oct 21 2021 at 16:18):

It didn't help. I've tried various variations of this:

        tests = leanPkgs.buildLeanPackage {
          name = "Tests";
          src = ./tests;
          debug = true;
          deps = [ project ];
          pluginDeps = [ project.sharedLib ];
          staticLibDeps = [ blake3-c.staticLib blake3-shim.staticLib project.staticLib ];
        };

Mac (Oct 21 2021 at 16:20):

@Anders Christiansen Sørby what errors are you getting?

Anders Christiansen Sørby (Oct 21 2021 at 16:20):

The same as above

Mac (Oct 21 2021 at 16:21):

Personally, I would have suggested building manually and getting that to work before translating it to Nix.

Mac (Oct 21 2021 at 16:21):

so that you have a better understanding of what is suppose to be happening

Anders Christiansen Sørby (Oct 21 2021 at 17:00):

A manual invocation like this:

LEAN_PATH=/nix/store/4wrg252zqyr33ayx7f2j2sdy793zh5hx-Tests.HashString-depRoot lean --plugin=/nix/store/s4la9xg9da5yixcxlhdnbflqzygkripl-Blake3.so/Blake3.so tests/Tests/HashString.lean

Produces the same error.

Sebastian Ullrich (Oct 21 2021 at 17:05):

And nm -g on the .so shows that symbol? Could you post the output?

Anders Christiansen Sørby (Oct 21 2021 at 17:15):

I guess not:

                 U blake3_hasher_update
                 U blake3_version
00000000000602e0 T initialize_Blake3
0000000000210648 B l_Blake3_BLAKE3__BLOCK__LEN
0000000000210650 B l_Blake3_BLAKE3__CHUNK__LEN
0000000000210640 B l_Blake3_BLAKE3__KEY__LEN
0000000000210658 B l_Blake3_BLAKE3__MAX__DEPTH
0000000000210628 B l_Blake3_BLAKE3__OUT__LEN
00000000000601f0 T l_Blake3_hash
                 U l_blake3_hasher_finalize
0000000000060160 T l_Blake3_hasherFinalize___boxed
0000000000210688 B l_Blake3_HasherPointed
                 U l_blake3_hasher_update
00000000000600f0 T l_Blake3_hasherUpdate___boxed
00000000000600c0 T l_Blake3_initHasher___boxed
0000000000210670 B l_Blake3_instBlake3HashToString
0000000000210610 B l_Blake3_instInhabitedBlake3Hash
0000000000210690 B l_Blake3_instInhabitedHasher
0000000000060090 T l_Blake3_internalVersion___boxed
000000000005ff90 T l_Blake3_unsafeIO_x27
000000000005fd80 T l_Blake3_unsafeIO_x27___rarg
00000000002106a0 B l_Blake3_version
00000000000613a0 T lean_blake3_hasher_update
0000000000061360 T lean_blake3_initialize
0000000000061340 T lean_blake3_version
00000000000601b0 T l_panic___at_Blake3_hash___spec__1

Mac (Oct 21 2021 at 17:33):

@Anders Christiansen Sørby I just managed to manually build and test your package. However, I need to be afk for an hour or less, so I can can't give the full details. A quick note though:

  • initHasher/hasherFinalize shoudl be externing lean_hasher_initialize and lean_hasher_finalize. And those need to be properly implemented.

Anders Christiansen Sørby (Oct 21 2021 at 17:44):

Hm. I need more details to do replicate that. Can you post your entire command and changes?

Mac (Oct 21 2021 at 18:33):

@Anders Christiansen Sørby my code is now available here: https://github.com/tydeu/lean-blake3

Anders Christiansen Sørby (Oct 22 2021 at 00:15):

Thanks for the help, but I still can't see what is missing in my nix build compared to your Makefile. Can you see it?

Anders Christiansen Sørby (Oct 25 2021 at 12:19):

I've included all the C code as static libs to the lean command where the Blake3 module is compiled and linked.
This leads to no errors, but the essential functions like blake3_version is not included (undefined) in either the shared or static lib. In the nix setup I see that this is included as a -Wl,--no-whole-archive option which is fine I guess, but I would want to be able to include this in the lib.

When I then try to compile the whole library as an executable and include the static libs as an input to the lean command I get this error:

       > lean: symbol lookup error: /nix/store/bbfzi1v8nb14q1kax5ig6h50wnkpsvx5-Blake3.so/Blake3.so: undefined symbol: blake3_version

Why doesn't it look for the symbol in the provided static libs (libblake3.a)?
The symbols from the shim are included however so why not libblake3.a?
How is this not a bug?

Sebastian Ullrich (Oct 25 2021 at 12:57):

I can't make sense of your description. lean doesn't even take static libraries as input, do you mean leanc? Do you have a flake target to reproduce?

Sebastian Ullrich (Oct 25 2021 at 13:29):

As for making native code available to the interpreter, --plugin is not the correct way to do that: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/mwe.3A.20plugin.3F/near/252821607

Anders Christiansen Sørby (Oct 25 2021 at 13:33):

Yes, sorry, I mean leanc to generate an executable.It might be that I'm missing something obvious, but I haven't found it so far.
Build .#tests on https://github.com/yatima-inc/lean-blake3 to see the error.

Anders Christiansen Sørby (Oct 25 2021 at 14:26):

So I can fix this by adding the library to LD_PRELOAD? I guess I need to change the nix source a bit to do this.

Sebastian Ullrich (Oct 25 2021 at 14:44):

Yes, it doesn't have any support for that yet. But I was only able to reproduce the "could not find native implementation" error using nix build .#tests fwiw.

Anders Christiansen Sørby (Oct 25 2021 at 17:01):

Strange. Are you on the newest version?
Adding LD_PRELOAD doesn't seem to fix the issue so far.

Arthur Paulino (Nov 22 2021 at 02:49):

Is there a FFI documentation?

Mac (Nov 22 2021 at 04:17):

@Arthur Paulino sadly, no, not really.

Xubai Wang (Nov 22 2021 at 06:13):

Arthur Paulino said:

Is there a FFI documentation?

There is a simple one in the lean4 repo <https://github.com/leanprover/lean4/blob/master/doc/dev/ffi.md>

Arthur Paulino (Nov 22 2021 at 11:56):

Thanks! I've been using your socket package as a reference but it is a bit more time consuming trying to reverse engineer and deduct what's going on

Arthur Paulino (Nov 22 2021 at 22:06):

@Xubai Wang Why do you add the lean_obj_arg w parameter in your functions headers even though you never use w?
Example: https://github.com/xubaiw/lean4-socket/blob/4b95d1c912852db7e39c6d0f7715927fff6dfa0c/native/native.c#L288

Henrik Böving (Nov 22 2021 at 22:47):

The IO monad implicitly passes a so called world parameter into the function so the computation is technically speaking pure since an IO function takes a world and produces a new one where its side effect has happened so lean will pass this world argument to the C function as well. However the C function can essentially fully ignore it since it's ...well C and not a pure functional programming language.

Henrik Böving (Nov 22 2021 at 22:48):

@Arthur Paulino

Mac (Nov 23 2021 at 02:43):

It should also be noted that now that BaseIO exists, you should use it instead of IO in the Lean signature of your FFI binding if the native function does not produce IO errors.

Arthur Paulino (Nov 23 2021 at 03:03):

Do I need to update lake?

Mac (Nov 23 2021 at 05:01):

@Arthur Paulino for what?

Arthur Paulino (Nov 23 2021 at 12:33):

To use BaseIO

Arthur Paulino (Nov 23 2021 at 12:38):

I tried using Lean 4 nightly 2021-11-23 and it gave me access to BaseIO. But I could no longer compile my code because it couldn't link to MySQL lib with -lmysqlclient. Maybe it's a bug?

Mac (Nov 23 2021 at 12:51):

@Arthur Paulino what version of Lean were you using before?

Arthur Paulino (Nov 23 2021 at 13:00):

nightly 2021-10-20

Sebastian Ullrich (Nov 23 2021 at 13:02):

Now that the linker is bundled with Lean, it will not automatically search arbitrary system paths for libraries anymore. You can add them explicitly instead, e.g. -L /usr/lib/x86_64-linux-gnu/

Arthur Paulino (Nov 23 2021 at 13:17):

Sebastian Ullrich said:

Now that the linker is bundled with Lean, it will not automatically search arbitrary system paths for libraries anymore. You can add them explicitly instead, e.g. -L /usr/lib/x86_64-linux-gnu/

Is this what you mean?

def mySQLLinkArg := "-L /usr/lib/x86_64-linux-gnu/"

package LeanMySQL (pkgDir) {
  srcDir := leanSoureDir
  libRoots := #[`Ffi]
  moreLibTargets := #[cLibTarget pkgDir]
  moreLinkArgs := #[mySQLLinkArg]
}

This approach still doesn't work

Sebastian Ullrich (Nov 23 2021 at 13:19):

Where's your -lmysqlclient?

Arthur Paulino (Nov 23 2021 at 13:19):

$ find /usr/ -name 'libmysqlclient.a'
/usr/lib/x86_64-linux-gnu/libmysqlclient.a

Arthur Paulino (Nov 23 2021 at 13:21):

With nightly 2021-10-20 I was setting def mySQLLinkArg := "-lmysqlclient"

Sebastian Ullrich (Nov 23 2021 at 13:22):

You need both -l and -L

Arthur Paulino (Nov 23 2021 at 13:24):

Do you mean

package LeanMySQL (pkgDir) {
  srcDir := leanSoureDir
  libRoots := #[`Ffi]
  moreLibTargets := #[cLibTarget pkgDir]
  moreLinkArgs := #["-L /usr/lib/x86_64-linux-gnu/", "-lmysqlclient"]
}

?

Arthur Paulino (Nov 23 2021 at 13:25):

It says

error: ld.lld: error: unable to find library -lmysqlclient
clang: error: linker command failed with exit code 1 (use -v to see invocation)

Sebastian Ullrich (Nov 23 2021 at 13:26):

Does it work without the space after -L? If not, what's the output with "-v" added to that list?

Arthur Paulino (Nov 23 2021 at 13:38):

Sebastian Ullrich said:

Does it work without the space after -L? If not, what's the output with "-v" added to that list?

It worked. It also worked like this: moreLinkArgs := #["-L", "/usr/lib/x86_64-linux-gnu/", "-lmysqlclient"]

Arthur Paulino (Nov 23 2021 at 13:40):

Thanks!

Sebastian Ullrich (Nov 23 2021 at 13:42):

Great! I probably don't have to mention that we still need to find out and decide on the best practices in these regards :) . Explicitly specifying the search path might look like a chore, but it is also an explicit reminder that you now depend on something outside of Lean's/Lake's control, so it might not be all bad.

Gabriel Ebner (Nov 23 2021 at 13:48):

so it might not be all bad.

Putting -L /usr/lib/x86_64-linux-gnu/ into the linker arguments seems like the most non-portable solution though. It literally won't work on anything except ubuntu on amd64? AFAICT none of fedora/archlinux/nixos have that directory. Not to speak of windows or mac.

I've said it before, but I think we should use the system linker, at least when we're linking to system libraries.

Arthur Paulino (Nov 23 2021 at 13:54):

Idea:

package LeanMySQL (pkgDir) {
  srcDir := leanSoureDir
  libRoots := #[`Ffi]
  moreLibTargets := #[cLibTarget pkgDir]
  moreLinkLibs := #["/usr/lib/x86_64-linux-gnu/libmysqlclient.a"]
}

Arthur Paulino (Nov 23 2021 at 13:56):

It requires me to be precise and explicit about the libs I'm using. Then the linking can be done according to the system I'm using, but this would be abstracted (invisible) to me

Gabriel Ebner (Nov 23 2021 at 14:10):

On Linux the proper way to figure out include and linking flags is to use pkg-config. This is what pkg-config prints on two different systems (ubuntu and nixos):

$ pkg-config --libs --cflags mysqlclient
-I/usr/include/mysql -lmysqlclient
$ pkg-config --libs --cflags mysqlclient
-I/nix/store/3mzvn84d0wq5bvrq0rp4jxwlmprpqkbx-mariadb-connector-c-3.2.3-dev/include/mariadb/ -L/nix/store/3ca2s5k41m9w6mmk8h68ba0g2b2h1mzl-mariadb-connector-c-3.2.3/lib/mariadb/ -lmariadb

Note that it doesn't contain enough information to figure out the directory /usr/lib/x86_64-linux-gnu on ubuntu. I don't know if there's a portable way to do that. (And none of that would be an issue if we just used the system linker.)

Sebastian Ullrich (Nov 23 2021 at 14:43):

Hah, I did hope pkg-config would print out the full path. If we use the system linker, should we completely ignore $LEAN_SYSROOT/lib and take all dependencies from the system?

Gabriel Ebner (Nov 23 2021 at 14:46):

I think that's the best option. All libraries there should be available on every system (certainly once we've switched away from gmp). We still need to link from $LEAN_SYSROOT/lib/lean though.

Sebastian Ullrich (Nov 23 2021 at 14:47):

Except for libc++ probably :grimacing:

Gabriel Ebner (Nov 23 2021 at 14:48):

Oh!

Gabriel Ebner (Nov 23 2021 at 14:48):

I guess that's an argument for removing the C++ dependency. ducks

Sebastian Ullrich (Nov 23 2021 at 14:48):

Would make some things easier for sure

Sebastian Ullrich (Nov 23 2021 at 14:52):

Note that it's not an issue when linking dynamically, libc++ is completely hidden inside libleanshared.so in that case. Now if there was a convenient way to merge static archives...

Sebastian Ullrich (Nov 23 2021 at 14:58):

Alternatively, since this libc++.a is special-purposed for Lean, we put it in lib/lean and call it a day...

Mac (Nov 23 2021 at 17:29):

I have to disagree with @Gabriel Ebner, I do not think that using the system linking is the best idea here. The primary reason for this is that Lake wants to know the exact path of the library. The proper way of adding a native library to a package is (using @Arthur Paulino's package as an example) :

package LeanMySQL (pkgDir) {
  srcDir := leanSoureDir
  libRoots := #[`Ffi]
  moreLibTargets := #[cLibTarget pkgDir, inputFileTarget "path/to/libmysqlclient.a"]
}

This way Lake generates a trace for the library, allowing it to rebuilding the project if the version of the library (e.g., MySQL) changes and supporting the result being used in content-addressed cloud builds.

Gabriel Ebner (Nov 23 2021 at 17:32):

That makes sense if lake builds the library, but if you link against system libraries (like mysqlclient) then you typically 1) only have shared libraries, 2) you don't know the path the library, and 3) there's also the headers which need to be tracked as well.

Mac (Nov 23 2021 at 17:41):

@Gabriel Ebner It should also be true if Lake did not build the library (and in many cases, ideally, lake would be building the library if possible). Lake, like Nix, is designed (and expects) for its traces to properly represent the change of dependencies (all dependencies). To your specific points:

  1. So? You can trace shared libraries as well.
  2. Hence my point for avoiding system libraries. Use a library you know the path to. Or use tools to figure it out -- like llvm-config for LLVM or more general (but OS specific tools) like brew.
  3. Yes, include directories should also be traced. I am planning on expanding the current oFileTarget to also take an includeDirs argument that uses targets for include directories. However, in some cases (like with Lean) there may be a better proxy for this trace than just the hash of the whole include directory (e.g., Lean's githash).

Arthur Paulino (Nov 23 2021 at 17:50):

There's also the fact that if lake requires specific paths (to headers and to their respective libs), at least it's completely predictable and clear. It also gives the developer more freedom (with the cost of having to know where things are).

Arthur Paulino (Nov 23 2021 at 17:57):

Hm, one problem with this approach of not relying on default system parameters is that the package config would end up being specific to my machine. One wouldn't always be able to clone my repo and just compile it out of the box without making sure that every path is correct in the configuration file. I suppose this is a big con?

Gabriel Ebner (Nov 23 2021 at 17:58):

One wouldn't always be able to clone my repo and just compile it out of the box without making sure that every path is correct in the configuration file. I suppose this is a big con?

Yes, this is why I'm arguing against hardcoding the /usr/lib/x86_64-linux-gnu path and in favor of using pkg-config.

Kyle Miller (Nov 23 2021 at 18:06):

How do other build systems solve this problem? If it were like how you build software on Linux, there could be a "./configure" that creates a module containing variables with system-specific paths that the Lakefile then makes use of. That configure script could use pkg-config if it wants.

Gabriel Ebner (Nov 23 2021 at 18:08):

in many cases, ideally, lake would be building the library if possible

Emphasis on many. Some stuff like libzstd e.g. can reasonably be built by lake. Other libraries must absolutely never be vendored, like e.g. fontconfig (the rust ecosystem gets this horribly wrong, resulting in silently broken builds if the vendored fontconfig is used: https://github.com/servo/libfontconfig/issues/58).

Gabriel Ebner (Nov 23 2021 at 18:12):

How do other build systems solve this problem?

Typically by just calling pkg-config during build on linux on mac (I think), and hoping for the best on windows. You can (and probably should) do the same in your lakefile. The issue we have here in Lean is that since two days ago, Lean uses a bundled linker which doesn't find system libraries.

Sebastian Ullrich (Nov 23 2021 at 18:15):

Mac said:

You can trace shared libraries as well.

To reliably trace shared libraries, you would have to call, or implement the equivalent of, ldd to find transitive dependencies, which I hope we're not planning to do. That together with the path setup problem, which does not seem to have a sane solution on Linux except for Nix, strongly suggests to me that we should not worry about tracing system files.

Kyle Miller (Nov 23 2021 at 18:15):

Gabriel Ebner said:

How do other build systems solve this problem?

Typically by just calling pkg-config during build on linux on mac (I think)

If you're using autoconf, I think traditionally pkg-config is used in the ./configure script, which generates the Makefile loaded with the system-specific paths. Lean packages could rely on something similar, where you have to run a script to create either the Lakefile or some file that the Lakefile includes (if that's possible).

Sebastian Ullrich (Nov 23 2021 at 18:16):

Btw, it should be possible to "unbundle" the bundled linker using -Wl,--sysroot=/ (untested)

Gabriel Ebner (Nov 23 2021 at 18:16):

You can trace shared libraries as well. [..] include directories should also be traced.

This seems like a reasonable idea. My complaint was about passing the filename to the linker, not rebuilding when dependencies change. Even more important than changes in the system library headers are imho changes in headers in the project itself. It might make sense to call cc -M to figure these out.

Gabriel Ebner (Nov 23 2021 at 18:26):

As a point of reference, bazel doesn't track system libraries either: https://github.com/bazelbuild/bazel/issues/4558 (I don't know what the rationale was.)

Gabriel Ebner (Nov 23 2021 at 18:38):

supporting the result being used in content-addressed cloud builds.

The following is a bit biased, and I am putting my mathlib hat on. It's perfectly okay if using system libraries disables cloud caches (for those targets). Other systems are going to have different builds of these libraries, and so the cache probably misses anyhow.

(Note that most ffi stuff will probably depend on libc headers, which are system-specific as well.)

On the other hand it is absolutely crucial that we can be impure and use caches that are built on a different architecture. That is, cache oleans and c files built on Linux/amd64 and use them on Darwin/amd64, Windows/amd64, Linux/aarch64, etc. (and maybe even Unknown/wasm)

If this doesn't work with lake, we can continue to use the current leanproject approach of "upload a tarball with the oleans for every revision". But it would be cool to have more fine-grained caching.

Reid Barton (Nov 23 2021 at 19:07):

What was the motivation for bundling a linker?

Mac (Nov 23 2021 at 19:30):

Gabriel Ebner said:

supporting the result being used in content-addressed cloud builds.

The following is a bit biased, and I am putting my mathlib hat on. It's perfectly okay if using system libraries disables cloud caches (for those targets). Other systems are going to have different builds of these libraries, and so the cache probably misses anyhow.

(Note that most ffi stuff will probably depend on libc headers, which are system-specific as well.)

My point here is that, by tracing the system libraries, the resulting trace would be different for each system configuration so the cloud store can be used -- different system would end up pulling the cached artifact that depends on their system-specific files. Without such components in trace, the trace for a given dependent would appear the same across platforms resulting in invalid cache fetches. This makes cloud storage infeasible for such builds.

Gabriel Ebner said:

On the other hand it is absolutely crucial that we can be impure and use caches that are built on a different architecture. That is, cache oleans and c files built on Linux/amd64 and use them on Darwin/amd64, Windows/amd64, Linux/aarch64, etc. (and maybe even Unknown/wasm)

If this doesn't work with lake, we can continue to use the current leanproject approach of "upload a tarball with the oleans for every revision". But it would be cool to have more fine-grained caching.

This is already posisble with Lake because olean traces are just based on the Lean version and the hash of the Lean input files.

Arthur Paulino (Nov 24 2021 at 03:57):

Another issue with Lean 4 nightly 2021-11-23: I'm not able to use the string lib.

#include <string>
using namespace std;
string e = "qqq";

Causes error: ld.lld: error: undefined symbol: std::allocator<char>::allocator()

PS: It works on nightly 2021-10-20

Arthur Paulino (Nov 24 2021 at 04:12):

Probably another lib I need to link against, but I don't know which one

Sebastian Ullrich (Nov 24 2021 at 11:05):

C++ code should always be linked using the C++ compiler, i.e. not leanc. You should also link to Lean dynamically in this case (https://github.com/leanprover/lean4/blob/f7decd2d464ab679cb73212fb9674aeb92bfe184/tests/compiler/foreign/Makefile#L14-L15).

Sebastian Ullrich (Nov 24 2021 at 11:08):

Reid Barton said:

What was the motivation for bundling a linker?

See https://github.com/leanprover/lean4/pull/733#issuecomment-948426757

Arthur Paulino (Nov 24 2021 at 13:06):

@Sebastian Ullrich Sorry I didn't get it. What should I change in the lakefile.lean file?

Arthur Paulino (Nov 24 2021 at 13:44):

I will be using the 2021-10-20 version in the meantime

Sebastian Ullrich (Nov 24 2021 at 13:58):

Yeah, this thread is about figuring out what Lake should be doing with foreign dependencies :) . That leanc used to be able to correctly compile & link C++ code should be regarded as an unplanned coincidence.

Arthur Paulino (Nov 24 2021 at 14:21):

Is there an example of a lakefile that successfully compiles and links C++ code?

Mac (Nov 24 2021 at 15:36):

@Arthur Paulino Lake has an FFI example in its repository that does exactly that. :smile:

Anders Christiansen Sørby (Nov 24 2021 at 15:59):

@Arthur Paulino You could also try to use nix for building and distributing your software. I find this the most universal and plug and play solution for building anything although unorthodox. This is especially true for FFI and needing to compile several languages or ensure build stability.
As an example you can use this https://github.com/yatima-inc/lean-blake3 which currently points to a PR fork of lean4, but which hopefully will be merged soon.

Arthur Paulino (Nov 24 2021 at 16:22):

@Mac I'm using that as a reference but it doesn't work. Here's the lakefile: https://github.com/arthurpaulino/LeanMySQL/blob/master/lakefile.lean

Mac (Nov 24 2021 at 16:23):

@Arthur Paulino can you elaborate as to the error you are encountering?

Arthur Paulino (Nov 24 2021 at 16:27):

In short, everything was working fine until I started to make use of string in my C++ code. Now lake build reports a lot of errors like this:

error: ld.lld: error: undefined symbol: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::data() const
>>> referenced by ffi.cpp
>>>               ffi.o:(query_all(mysql*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)) in archive ./build/cpp/libffi.a

ld.lld: error: undefined symbol: std::allocator<char>::allocator()
>>> referenced by ffi.cpp
>>>               ffi.o:(lean_mysql_manage_db(lean_object*, lean_object*, char const*)) in archive ./build/cpp/libffi.a

ld.lld: error: undefined symbol: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
>>> referenced by ffi.cpp
>>>               ffi.o:(lean_mysql_manage_db(lean_object*, lean_object*, char const*)) in archive ./build/cpp/libffi.a

Mac (Nov 24 2021 at 16:30):

@Arthur Paulino my guess would be that you need to also link your executable to the C++ standard library then (e.g., via -lstdc++ or something like that I believe).

Arthur Paulino (Nov 24 2021 at 17:32):

Mac said:

Arthur Paulino my guess would be that you need to also link your executable to the C++ standard library then (e.g., via -lstdc++ or something like that I believe).

Do you mean include "-lstdc++" in the moreLinkArgs array? I tried it but it still doesn't work

Arthur Paulino (Nov 24 2021 at 19:39):

Nevermind, I dropped the C++ code snippets

Sebastian Ullrich (Nov 27 2021 at 11:37):

So, to consolidate the discussion up to this point, the primary issues with the new leanc seem to be:

  • system include paths. If oFileTarget is supposed to be used for FFI wrapper C code, it should use the system compiler (as it already does) but also add the Lean include path (or getCFlags, the only additional flag there is -fPIC)
  • system library paths. We could switch to the system linker (via cc) either implicitly when moreLinkArgs is specified or explicitly via some new flags like useSystemLibraries := true. For consistency I'd prefer using leanc with -Wl,--sysroot=/ though, if that works. Unclear what that should look like on Windows, but that is also true for the "system linker".

Does that sound about right?

Sebastian Ullrich (Nov 27 2021 at 11:40):

  • C++ is complicated in general and should probably just be avoided for simple FFI wrapper code

Arthur Paulino (Nov 27 2021 at 15:17):

I agree although I think making C++ more accessible in the future can attract more devs

Arthur Paulino (Nov 27 2021 at 15:25):

I'm trying to parse a string (?) that was returned from my C code:

private constant parse (s : IO String) : Table := do
  let ss : String := s
  ⟨[],[]⟩

But I'm getting the error:

type mismatch
  s
has type
  IO String : Type
but is expected to have type
  String : Type

Which makes sense. But how can I get the string itself from s?

Tomas Skrivan (Nov 27 2021 at 15:47):

Does let ss : String <- s work?

Arthur Paulino (Nov 27 2021 at 15:48):

No, it says

`←` and `<-` are not allowed in pure `do` blocks, i.e., blocks where Lean implicitly used the `Id` monad

Tomas Skrivan (Nov 27 2021 at 15:50):

Ohh I missed that the return type is not IO Table. I don't think you can extract the value then.

Arthur Paulino (Nov 27 2021 at 15:51):

I tried making the return IO Table but it says

application type mismatch
  IO Table
argument
  Table
has type
  Type 1 : Type 2
but is expected to have type
  Type : Type 1

Tomas Skrivan (Nov 27 2021 at 15:55):

Hmm interested that IO is restricted to Type and is not defined for Type u. I have very little understanding of universes, so I'm can't help here.

Tomas Skrivan (Nov 27 2021 at 15:56):

Is Table a type you have defined? Can't you make it Type instead of Type 1?

Arthur Paulino (Nov 27 2021 at 15:57):

This is Table:

structure Table where
  types : List Type
  rows : List Row
  deriving Inhabited

Arthur Paulino (Nov 27 2021 at 15:58):

Tomas Skrivan said:

Is Table a type you have defined? Can't you make it Type instead of Type 1?

How can I do that?

Henrik Böving (Nov 27 2021 at 16:00):

You cannot in this case, your Table contains a List of Type so it has to be one universe higher.

Tomas Skrivan (Nov 27 2021 at 16:01):

What about?

structure Table (types : List Type) where
  rows : List Row
  deriving Inhabited

But that might make all manipulations very difficult.

Henrik Böving (Nov 27 2021 at 16:01):

That would work yes.

Tomas Skrivan (Nov 27 2021 at 16:04):

Or fetching types on the fly with typeclasses

structure Table where
  rows : List Row
  deriving Inhabited

class TableTypes (t : Table) where
   types : List Type

I have no clue how are you using Table so both options might not be feasible/useful.

Reid Barton (Nov 27 2021 at 16:05):

Or if Table is encoding some external database, then you probably want types to encode the external "types" of its fields, which are not Lean types anyways

Tomas Skrivan (Nov 27 2021 at 16:06):

Very good point! You probably do not have a database filled with arbitrary lean types.

Tomas Skrivan (Nov 27 2021 at 16:10):

So something like

inductive DatabaseType where
 | tagged : String -> DatabaseType
 | indexed : Nat -> DatabaseType
 | ...

-- To get Lean Type equivalent
def DatabaseType.toType : DatabaseType -> Type :=
 | tagged str => ...
 | indexed id => ...
 | ...

structure Table where
  types : List DatabaseType
  rows : List Row
  deriving Inhabited

Henrik Böving (Nov 27 2021 at 16:14):

Also regarding this parsing function, you:

  • Probably don't need it to be a constant, a def should be fine
  • Should probably not take an IO String but instead just a string and then have a function that fetches the string via C and applies the parser to it, this way you don't have to bother with all the IO monad stuff in the parser function
  • If the parser might fail probably want to use the Except type as a return value.

Arthur Paulino (Dec 07 2021 at 19:37):

I'm trying to build this function that returns an Array Float to Lean. In C, I have this arr->data, which is an array of double. The following code already returns an empty array, but how can I add data to it?
P.S.: more precisely, data from arr->data?

typedef struct nl_array {
    double*    data = NULL;
    uint64_t length = -1;
} nl_array;

extern "C" LEAN_EXPORT lean_obj_res lean_nl_array_to_lean_array(b_lean_obj_arg arr_) {
    nl_array* arr = (nl_array*) nl_array_unbox(arr_);
    lean_obj_res res = lean_alloc_sarray(sizeof(double), 0, arr->length);
    return res;
}

Henrik Böving (Dec 07 2021 at 19:43):

Maybe you could do a memcpy from your data pointer to the return value of lean_sarray_cptr? Although it returns a uint8_t* so maybe it is intended to be used another way...

Arthur Paulino (Dec 07 2021 at 19:46):

Right, I've been looking at these, trying to figure out how to use them:

LEAN_SHARED lean_obj_res lean_float_array_push(lean_obj_arg a, double d);

static inline lean_obj_res lean_float_array_uset(lean_obj_arg a, size_t i, double d) {
    lean_obj_res r;
    if (lean_is_exclusive(a)) r = a;
    else r = lean_copy_float_array(a);
    double * it = lean_float_array_cptr(r) + i;
    *it = d;
    return r;
}

static inline lean_obj_res lean_float_array_fset(lean_obj_arg a, b_lean_obj_arg i, double d) {
    return lean_float_array_uset(a, lean_unbox(i), d);
}

static inline lean_obj_res lean_float_array_set(lean_obj_arg a, b_lean_obj_arg i, double d) {
    if (!lean_is_scalar(i)) {
        return a;
    } else {
        size_t idx = lean_unbox(i);
        if (idx >= lean_sarray_size(a)) {
            return a;
        } else {
            return lean_float_array_uset(a, idx, d);
        }
    }
}

But all tricks I've tried have failed so far

Arthur Paulino (Dec 07 2021 at 19:52):

I've tried, for instance, return lean_float_array_push(ret, 1.0);, but it's causing a segmentation fault error (I was expecting a #[1.0] in Lean)

Arthur Paulino (Dec 07 2021 at 20:33):

In the worst case scenario I could return a string and parse it in Lean, but I really want to avoid this approach this time :sweat_smile:

Sebastian Ullrich (Dec 07 2021 at 20:35):

Henrik Böving said:

Maybe you could do a memcpy from your data pointer to the return value of lean_sarray_cptr? Although it returns a uint8_t* so maybe it is intended to be used another way...

There's lean_float_array_cptr, as shown in the snippet below

Arthur Paulino (Dec 07 2021 at 23:42):

Still causing a segmentation fault

#define external extern "C" LEAN_EXPORT
#define l_arg b_lean_obj_arg
#define l_res lean_obj_res
#define l_obj lean_object

external l_res lean_nl_array_to_lean_array(l_arg arr_) {
    nl_array* arr = (nl_array*) nl_array_unbox(arr_);
    l_res ret = lean_alloc_sarray(sizeof(double), arr->length, arr->length);
    double* destination = lean_float_array_cptr(ret);
    memcpy(destination, arr->data, arr->length * sizeof(double));
    return ret;
}

Can someone spot what I'm doing wrong?

Sebastian Ullrich (Dec 08 2021 at 08:50):

That part looks good at a first glance. With gdb on a executable with debug symbols, it's usually not too hard to pin-point the source of a segfault (at the very least, the stack trace).

Arthur Paulino (Dec 08 2021 at 11:19):

I've changed my approach but I've encountered a similar issue and the problem was that I had to encapsulate my return with lean_io_result_mk_ok because of the expected return on the Lean side

Arthur Paulino (Dec 08 2021 at 18:09):

Ah, I tried it again and it didn't work. But I'm noticing something strange:

/* Scalar arrays */
typedef struct {
    lean_object   m_header;
    size_t        m_size;
    size_t        m_capacity;
    uint8_t       m_data[0];
} lean_sarray_object;

m_data is a pointer to a 8 bytes memory cell. This function indeed returns such a pointer:

static inline uint8_t* lean_sarray_cptr(lean_object * o) {
    return lean_to_sarray(o)->m_data;
}

But this is the puzzling part: it's casting a uint8_t* into a double*:

static inline double * lean_float_array_cptr(b_lean_obj_arg a) {
    return (double*)(lean_sarray_cptr(a)); // NOLINT
}

Maybe it can cause some trouble?

Arthur Paulino (Dec 08 2021 at 18:29):

I think the memcpy from double* to (actually) uint8_t* is trampling those bytes with nonsense

Anders Christiansen Sørby (Jan 21 2022 at 23:40):

I did a hack to be able to return a tuple of values from FFI. Maybe this is useful to more people:

/*
SDL.pollEvent : IO $ Prod Bool SDL_Event
*/
lean_obj_res lean_sdl_poll_event() {
  SDL_Event *event;
  int b = SDL_PollEvent(event);
  lean_object* e = lean_alloc_external(get_sdl_event_class(), event);
  // Constructs a (Bool, SDL_Event) tuple
  lean_object* tuple = lean_alloc_ctor(0, 2, 0);
  lean_ctor_set(tuple, 0, lean_box(b));
  lean_ctor_set(tuple, 1, e);
  return lean_io_result_mk_ok(tuple);
}

I guess you could use lean_alloc_ctor to create arbitrary (type erased and unsafe) inductive values given that you know how the underlying structure and order of the constructors of the type you wish to return.

Wojciech Nawrocki (Jan 21 2022 at 23:55):

This is how the runtime does things also, so I would say this is "the right way" rather than a "hack" in the sense of a workaround.

Mac (Jan 22 2022 at 00:00):

Wojciech is correct, using lean_alloc_ctor/lean_ctor_set is exactly how you are suppose to create values of (boxed, non-primitive) inductive types using the Lean C API.

Anders Christiansen Sørby (Jan 28 2022 at 15:46):

I've started making several utility C functions for manipulating and creating erased/primitive lean_objects*. Is there any interest in adding these to the lean4 repo?

#include <lean/lean.h>

/**
 * Unwrap an Option of a lean_object* as data for some
 * or NULL (0) for none. Unsafe.
 */
static inline lean_object *lean_option_unwrap(b_lean_obj_arg a) {
  if (lean_is_scalar(a)) {
    return 0;
  } else {
    lean_object *some_val = lean_ctor_get(a, 0);
    return some_val;
  }
}

inline static void foreach_noop(void *mod, b_lean_obj_arg fn) {}

/**
 * Option.some a
 */
static inline lean_object * lean_mk_option_some(lean_object * a) {
  lean_object* tuple = lean_alloc_ctor(1, 1, 0);
  lean_ctor_set(tuple, 0, a);
  return tuple;
}

/**
 * Option.none.
 * Note that this is the same value for Unit and other constant constructors of inductives.
 */
static inline lean_object * lean_mk_option_none() {
  return lean_box(0);
}

static inline lean_object * lean_mk_tuple2(lean_object * a, lean_object * b) {
  lean_object* tuple = lean_alloc_ctor(0, 2, 0);
  lean_ctor_set(tuple, 0, a);
  lean_ctor_set(tuple, 1, b);
  return tuple;
}

z battleman (May 21 2022 at 23:43):

Does anyone know how I can point Clang in the direction of lean/lean.h so I can use it as a language server?

Henrik Böving (May 21 2022 at 23:57):

When working with lean C code you usually want to work with the leanc wrapper around clang that does all of this for you, you can use leanc --print-cflags for it to tell you the flags it uses which does include the include paths for what you're looking for.In my case:

λ leanc --print-cflags
-I /home/nix/.elan/toolchains/leanprover--lean4---nightly/include -fPIC -fvisibility=hidden

z battleman (May 22 2022 at 00:09):

Henrik Böving said:

When working with lean C code you usually want to work with the leanc wrapper around clang that does all of this for you, you can use leanc --print-cflags for it to tell you the flags it uses which does include the include paths for what you're looking for.In my case:

λ leanc --print-cflags
-I /home/nix/.elan/toolchains/leanprover--lean4---nightly/include -fPIC -fvisibility=hidden

Ah thanks so much!

z battleman (May 22 2022 at 15:31):

in general is there any documentation? All these functions and structs and I'm struggling to figure out how it all fits together :sweat_smile:

Arthur Paulino (May 22 2022 at 20:55):

@z battleman not yet, I believe. The current focus is on porting mathlib to Lean 4. In the meantime, we use each other's ideas and projects as references. If we put together what everyone has already been able to learn and achieve so far, we have a decent library of shared knowledge. So asking things on Zulip is still the best way to figure things out about FFI for now.

James Gallicchio (May 23 2022 at 04:49):

Should someone PR an update to the manual, just adding more detail about the ABI? I'd be happy to do it this week if nobody else can

James Gallicchio (May 23 2022 at 04:51):

(And I was working with @z battleman earlier, so I can maybe pull from that conversation plus the threads on here to figure out what information to include)

Arthur Paulino (Sep 20 2022 at 20:15):

Is there some up to date repo that uses a somewhat recent toolchain (less than 1 month old) and has the setup to compile C code and use the FFI? Please share :pray:

James Gallicchio (Sep 20 2022 at 20:16):

LeanColls is pretty up to date :)

James Gallicchio (Sep 20 2022 at 20:17):

I think doc-gen4 doesn't work on newer versions of Lean because Lake is more than a month out of date, but everything else is working okay

Henrik Böving (Sep 20 2022 at 21:05):

Its broken for 2 reasons, a) I'm focusing more attention on fixing the code generator right now and b) And after checking it out briefly just now indeed that the lake in leanprover/lake seems to be out of date? I see that the leanprover/lean4 repo has some commits that seem to address this? But I don't see where those are stored? According to github: "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository." So unless someone could happen to point me at a working Lake that I don't have to swap out again in a few days/weeks because we switched back to the main repo yet again I guess doc-gen will remain broken for now :(

James Gallicchio (Sep 20 2022 at 21:13):

Yeah, I was looking into what it would take to update doc-gen4 and realized Lake was blocking it

James Gallicchio (Sep 20 2022 at 21:14):

There's a PR that fixes what is needed to update lake

James Gallicchio (Sep 20 2022 at 21:14):

https://github.com/leanprover/lake/pull/123 this one

James Gallicchio (Sep 20 2022 at 21:15):

anyways, this probably should be in a different thread :p

Mario Carneiro (Sep 20 2022 at 21:16):

It's blocked on some style bikeshedding in https://github.com/leanprover/lake/pull/122

Mac (Sep 20 2022 at 22:51):

The maintenance PRs have been merged. In general, if something in Lake is causing a road block. @ mention or PM me about it and I will prioritize getting it fixed. I am currently deep in work on my PhD thesis, so I don't read the Zulip as much as I used to, but I do check my email (which notifies me of @ mentions or PMs) and can usually find the time fix things relatively quickly when necessary.

James Gallicchio (Sep 20 2022 at 22:56):

Will do!

Arthur Paulino (Sep 21 2022 at 00:08):

Hi! I'm trying to make lake build succeed on this branch: https://github.com/yatima-inc/OpenSSL.lean/tree/ap/update-toolchain

I'm getting the following error:

~/dev/lean/OpenSSL.lean λ lake build
Linking libffi.so
error: > /home/arthur/.elan/toolchains/leanprover--lean4---nightly-2022-09-11/bin/leanc -shared -o ./build/lib/libffi.so -Wl,--whole-archive ./build/lib/libffi.a -Wl,--no-whole-archive -L -lssl -fPIC
error: stderr:
ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'stderr'; recompile with -fPIC
>>> defined in /home/arthur/.elan/toolchains/leanprover--lean4---nightly-2022-09-11/lib/glibc/libc.so
>>> referenced by native.c
>>>               ffi.o:(handle_error) in archive ./build/lib/libffi.a

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'stderr'; recompile with -fPIC
>>> defined in /home/arthur/.elan/toolchains/leanprover--lean4---nightly-2022-09-11/lib/glibc/libc.so
>>> referenced by native.c
>>>               ffi.o:(handle_error) in archive ./build/lib/libffi.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: external command `/home/arthur/.elan/toolchains/leanprover--lean4---nightly-2022-09-11/bin/leanc` exited with code 1

Help is appeciated!

Mauricio Collares (Sep 21 2022 at 00:14):

I think you're passing -fPIC to the linker instead of passing it to the compiler

Arthur Paulino (Sep 21 2022 at 00:18):

How can I pass it to the compiler?

Mauricio Collares (Sep 21 2022 at 00:19):

There's a moreLeancArgs option, but I haven't tested it

Arthur Paulino (Sep 21 2022 at 00:19):

I've already tried moreLeancArgs := #["-fPIC"] and it gets to the same error :sad:

Mac (Sep 21 2022 at 00:58):

@Arthur Paulino you need to add "-fPIC" to the flags of your ffi.o target.

Arthur Paulino (Sep 21 2022 at 01:00):

Oh wow, I had tried that but it didn't work because I hadn't called lake clean first. Doing lake clean and then lake build worked

Sebastian Ullrich (Sep 21 2022 at 08:27):

Please file an issue on that!

Arthur Paulino (Sep 21 2022 at 11:08):

@Sebastian Ullrich do you mean an issue in the Lake repo?

Sebastian Ullrich (Sep 21 2022 at 11:10):

Yes. This is not desirable behavior, is it?

Sebastian Ullrich (Sep 21 2022 at 11:10):

Not rebuilding the library on flag changes, that is

Arthur Paulino (Sep 21 2022 at 13:01):

https://github.com/leanprover/lake/issues/126 :+1:

James Gallicchio (Nov 02 2022 at 08:26):

Arthur Paulino said:

In short, everything was working fine until I started to make use of string in my C++ code. Now lake build reports a lot of errors like this:

error: ld.lld: error: undefined symbol: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::data() const
>>> referenced by ffi.cpp
>>>               ffi.o:(query_all(mysql*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)) in archive ./build/cpp/libffi.a

ld.lld: error: undefined symbol: std::allocator<char>::allocator()
>>> referenced by ffi.cpp
>>>               ffi.o:(lean_mysql_manage_db(lean_object*, lean_object*, char const*)) in archive ./build/cpp/libffi.a

ld.lld: error: undefined symbol: std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)
>>> referenced by ffi.cpp
>>>               ffi.o:(lean_mysql_manage_db(lean_object*, lean_object*, char const*)) in archive ./build/cpp/libffi.a

was a resolution to this ever found? i'm currently trying to link against CaDiCaL, which is written in c++, but am getting a bunch of linking errors that look like they're from libstdc++ :(

James Gallicchio (Nov 02 2022 at 08:37):

actually, okay, switching the compiler to g++ and including an explicit -lstdc++ linker flag pushes the errors down to linker complaints about -fPIC:

[james] eternity2 $ lake build --verbose
info: Found dependency `std`
Using `iterators` at `43bbbd68a617a0689d4576b8c26f64209f59cc95`
Compiling cadical_ffi.c
> g++ -c -o ./build/c/leancadical.o ./ffi/cadical_ffi.c -I /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/include -I /home/james/Projects/cadical/src -O3 -fPIC
Creating libleancadical.a
> /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/llvm-ar rcs ./build/lib/libleancadical.a ./build/c/leancadical.o
Linking libleancadical.so
error: > /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/leanc -shared -o ./build/lib/libleancadical.so -Wl,--whole-archive ./build/lib/libleancadical.a -Wl,--no-whole-archive -L /home/james/Projects/cadical/build -l cadical -l stdc++
error: stderr:
ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol 'stderr'; recompile with -fPIC
>>> defined in /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/lib/glibc/libc.so
>>> referenced by solver.cpp
>>>               solver.o:(CaDiCaL::Solver::~Solver()) in archive /home/james/Projects/cadical/build/libcadical.a
...

James Gallicchio (Nov 02 2022 at 08:43):

lakefile: https://github.com/JamesGallicchio/eternity2/blob/cadical-ffi/lakefile.lean

James Gallicchio (Nov 02 2022 at 08:43):

if anyone has suggestions for potential fixes, let me know!

James Gallicchio (Nov 02 2022 at 08:48):

Oh I just realized Arthur was posting about the same issue

James Gallicchio (Nov 02 2022 at 09:00):

I now understand that the issue was compiling cadical without -fPIC :big_smile: seems to work now

James Gallicchio (Nov 02 2022 at 09:20):

Okay, now I'm confused again... For some reason my lean file with the extern'd functions is not linking against the shared library :thinking:

James Gallicchio (Nov 02 2022 at 17:02):

Okay, having slept, it seems like the shim file's function names are being mangled by g++. I'm not sure why it's mangling the names at all, given that the file is a .c file, but... :thinking: investigating

James Gallicchio (Nov 02 2022 at 21:16):

After playing around with more combinations of settings I have somehow gotten stuck in an even more cryptic linking error:

info: Found dependency `std`
Using `iterators` at `43bbbd68a617a0689d4576b8c26f64209f59cc95`
Compiling cadical_ffi.c
> cc -c -o ./build/c/leancadical.o ./ffi/cadical_ffi.c -I /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/include -I /home/james/Projects/cadical/src -O3 -fPIC
Creating libleancadical.a
> /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/llvm-ar rcs ./build/lib/libleancadical.a ./build/c/leancadical.o
Linking libleancadical.so
> /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/leanc -shared -o ./build/lib/libleancadical.so -Wl,--whole-archive ./build/lib/libleancadical.a -Wl,--no-whole-archive -L /home/james/Projects/cadical/build -l cadical
Building Eternity2.AuxDefs
error: > LEAN_PATH=./build/lib:./lean_packages/std/build/lib LD_LIBRARY_PATH=/home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/lib:./build/lib:./build/lib /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/lean ./././Eternity2/AuxDefs.lean -R ././. -o ./build/lib/Eternity2/AuxDefs.olean -i ./build/lib/Eternity2/AuxDefs.ilean -c ./build/ir/Eternity2/AuxDefs.c --load-dynlib=libleancadical.so
error: stderr:
libc++abi: terminating due to uncaught exception of type lean::exception: error loading library, ./build/lib/libleancadical.so: undefined symbol: __gxx_personality_v0
error: external command `/home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/lean` exited with code 134

Has anyone seen this error before? It's being raised by Lean itself D:

Sebastian Ullrich (Nov 02 2022 at 21:30):

You could try linking the dynamic library using -Wl,--no-undefined, which might give a better error message if the symbol is already undefined at that stage

James Gallicchio (Nov 03 2022 at 04:24):

That was very helpful, actually! I got cadical building with clang, which is a step in the right direction, and now the linker error seems to be just to do with Lean stuff

Compiling cadical_ffi.c
> clang -c -o ./build/c/leancadical.o ./ffi/cadical_ffi.c -I /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/include -I /home/james/Projects/cadical/src -O3 -fPIC
Creating libleancadical.a
> /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/llvm-ar rcs ./build/lib/libleancadical.a ./build/c/leancadical.o
Linking libleancadical.so
error: > /home/james/.elan/toolchains/leanprover--lean4---nightly-2022-10-27/bin/leanc -shared -o ./build/lib/libleancadical.so -Wl,--whole-archive ./build/lib/libleancadical.a -Wl,--no-whole-archive -L/home/james/Projects/cadical/build -lcadical --no-undefined
error: stderr:
ld.lld: error: undefined symbol: lean_register_external_class
>>> referenced by cadical_ffi.c
>>>               leancadical.o:(leancadical_initialize) in archive ./build/lib/libleancadical.a

I'm assuming leancadds the proper search paths to the clang call to find the lean objects, no?

Sebastian Ullrich (Nov 03 2022 at 09:24):

Ah, right. If that is the only remaining error, you can drop the --no-undefined again. https://github.com/leanprover/lean4/blob/5249611d75c18e449c19cb8eaa071d00c3347e44/src/CMakeLists.txt#L330-L332

James Gallicchio (Nov 03 2022 at 17:11):

Hrm, it actually was not the only remaining error... D: my lack of C++ experience is unfortunate here. what flags does the leanc wrapper pass to clang automatically? (if any)

James Gallicchio (Nov 03 2022 at 17:15):

I'm gonna see if I can write a makefile to compile the lean generated c files with the shim/dependency, and then I'll work backwards to figure out what the right setup is in a lakefile

Sebastian Ullrich (Nov 03 2022 at 17:15):

You can use leanc -v ... to see the flags

James Gallicchio (Nov 05 2022 at 16:47):

I think after a week of trying to get this to compile, my friends and I have thrown in the towel :joy: just gonna use an equivalent library written in C

We're still not entirely sure why we can't get the program to link successfully, but I committed the closest attempt we had here:
https://github.com/JamesGallicchio/eternity2/tree/cadical-ffi-test
lake run setup clones the required c++ dependency, and lake build leads to linker error :pensive: maybe someone will find the info there useful in the future

James Gallicchio (Nov 06 2022 at 05:13):

@Parth Shastri is amazing and figured out how to get it running (still not entirely sure why, but it is working)

Patrick Massot (Nov 06 2022 at 10:13):

It would be nice to contribute everything you figured out in some kind of documentation.

Matej Penciak (Dec 15 2022 at 01:26):

We're attempting a Rust <-> C++ <-> Lean FFI setup, and as expected with such a complicated setup, we're running into some tricky compiler issues we were hoping someone may have more knowledge about.

An example of our attempt can be found here: https://github.com/yatima-inc/FFI.lean

To summarize the attempt, we're trying to bridge Rust <-> C++ by using the CXX crate to generate some .cc files. Those, together with a .cpp "shim", are pointed to in the lakefile as external libraries to link to a very simple function: https://github.com/yatima-inc/FFI.lean/blob/d657ecc025c44bb835a9eecafd40e97ab9accaf3/lean/FFI.lean#L1

The strange thing is that this fragile setup actually works for one of the teammates @Sam Burnham ! For others, we're all generally running into the same linker error:

error: stderr:
libc++abi: terminating due to uncaught exception of type lean::exception: error loading library, ./build/src/libffi.so: undefined symbol: _ZNSt8ios_base4InitD1Ev
error: external command `/home/matej/.elan/toolchains/leanprover--lean4---nightly-2022-12-08/bin/lean` exited with code 134

The only discernible difference between our environments is that if I call leanc -v I get some stuff about clang, whereas when Sam runs the same command he sees the same message with clang replaced by gcc. It's just a shot in the dark, but could this difference in compilers be causing the issue?

If so, is there a way to force leanc to point to a particular C compiler?

Sebastian Ullrich (Dec 15 2022 at 09:05):

Yeah, looks like a C++ stdlib conflict similar to what was reported above, so hopefully something like this should work for you as well: https://github.com/JamesGallicchio/eternity2/blob/cadical-ffi-test/lakefile.lean#L13

Arthur Paulino (Dec 15 2022 at 13:30):

Sebastian Ullrich said:

Yeah, looks like a C++ stdlib conflict similar to what was reported above, so hopefully something like this should work for you as well: https://github.com/JamesGallicchio/eternity2/blob/cadical-ffi-test/lakefile.lean#L13

That works for Sam, but for some reason it doesn't seem to make a difference for me and Matej

Sebastian Ullrich (Dec 15 2022 at 13:52):

Sebastian Ullrich said:

You could try linking the dynamic library using -Wl,--no-undefined, which might give a better error message if the symbol is already undefined at that stage

Arthur Paulino (Dec 15 2022 at 15:26):

Oh, that shows a lot more info:

Linking libffi.so
error: > /home/arthur/.elan/toolchains/leanprover--lean4---nightly-2022-12-08/bin/leanc -shared -o ./build/src/libffi.so -Wl,--whole-archive ./build/src/libffi.a -Wl,--no-whole-archive -Wl --no-undefined -L ./target/debug -l ffi -lstdc++
error: stderr:
ld.lld: error: undefined symbol: std::allocator<char>::allocator()
>>> referenced by cxx.cc:9 (src/cxx.cc:9)
>>>               cxx.o:(cxxbridge1$cxx_string$init) in archive ./target/debug/libffi.a
>>> referenced by cxx.cc:204 (src/cxx.cc:204)
>>>               cxx.o:(_ZNK4rust10cxxbridge16StringcvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv) in archive ./target/debug/libffi.a
>>> referenced by cxx.cc:317 (src/cxx.cc:317)
>>>               cxx.o:(_ZNK4rust10cxxbridge13StrcvNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEv) in archive ./target/debug/libffi.a

There are a lot more errors like this

Arthur Paulino (Dec 15 2022 at 15:29):

But now this is looking like an error that I had in the past that I couldn't figure out: linking C++ code

Sebastian Ullrich (Dec 15 2022 at 15:36):

If you can first link the C++ code into a shared library and then link that with the Lean code, that should be a much easier and robust approach

Arthur Paulino (Dec 15 2022 at 20:56):

Sorry, is there some working example of a Lean 4 program that uses C++ code and that's compiled with Lake?

James Gallicchio (Dec 19 2022 at 10:41):

Yes-- let me get the link......

James Gallicchio (Dec 19 2022 at 10:42):

I meant to document this at some point but the end of the semester stepped on those plans :big_smile:

James Gallicchio (Dec 19 2022 at 10:43):

https://github.com/JamesGallicchio/eternity2/blob/main/lean/lakefile.lean

^ there are lake scripts here that download/build the c++ source for cadical, along with a custom C FFI shim, and patch everything together

James Gallicchio (Dec 19 2022 at 10:51):

Some notes:

  • I'm not entirely sure why, but libstdc++.so.6 needs to be on the linker path for non-mac users, not the generic -lstdc++ argument. I'm not sure if this will work for libraries that aren't CaDiCaL
  • Using gcc did not work for us, which is why we are compiling cadical+our C shim explicitly with clang. I have tested this on clang 14, but nothing newer/older
  • The c++ library's compiled .a needs to be in the build/lib folder when compiling the FFI shim. you can't add it to the path elsewhere. This is a lake restriction (unless Mac has changed buildStaticLib in the last month)
  • precompileModules := true seems to break the build process for some systems? I did not investigate this much but if you run into weird issues that might be the culprit

James Gallicchio (Dec 19 2022 at 10:53):

Also, this build stack is incredibly fragile and I am scared to change any of it. :big_smile:

Arthur Paulino (Jan 05 2023 at 01:46):

Hi all, how can I access a structure attribute in C?
I have this Lean structure:

structure ByteVector (n : Nat) where
  data  : ByteArray
  valid : data.size = n

And this function:

@[extern "lean_byte_vector_to_uint16"]
opaque toUInt16 : @& ByteVector 2  UInt16

If my input were a simple ByteArray, I would do this:

extern l_res lean_byte_vector_to_uint16(l_arg a) {
    return *((uint16_t*)lean_to_sarray(a)->m_data);
}

But this will obviously fail for ByteVectors

Gabriel Ebner (Jan 05 2023 at 01:54):

You're lucky: ByteVector n has the same runtime representation as ByteArray.

Arthur Paulino (Jan 05 2023 at 01:55):

How come?

Hanting Zhang (Jan 05 2023 at 01:55):

@Arthur Paulino To explain more: Lean's runtime will analyze structure fields.

Hanting Zhang (Jan 05 2023 at 01:55):

In particular, if a structure is "trivial" in the sense that it only has one computationally relevant field, then the runtime of that structure will be unboxed to the relevant field

Arthur Paulino (Jan 05 2023 at 01:56):

Does this mean that my C code will just work?

Hanting Zhang (Jan 05 2023 at 01:56):

Here, valid has no computational value so the runtime replaces all ByteVectors with ByteArray

Arthur Paulino (Jan 05 2023 at 01:56):

I see, that makes sense

Hanting Zhang (Jan 05 2023 at 01:56):

Arthur Paulino said:

Does this mean that my C code will just work?

Yes (btw this is also why Chars are UInt32s in Lean's runtime)

Hanting Zhang (Jan 05 2023 at 01:58):

This is also why in core you'll see a lot of stuff like

structure FVarId where
  fvarId : Nat

where it seemingly unnecessarily wraps Nat. But in fact there's no overhead since FVarId is trivial and will unwrap to Nat

James Gallicchio (Jan 05 2023 at 02:00):

If it weren't unwrapped (ie. the struct had multiple fields), you'd use the lean_ctor_get function :)

James Gallicchio (Jan 05 2023 at 02:02):

(BTW, if I wanted to flesh out the FFI API documentation in the manual, what would the process be for that?)

Arthur Paulino (Jan 06 2023 at 14:21):

How can I use lean_nat_to_size_t from runtime/object.cpp on a C function I am implementing?

Henrik Böving (Jan 06 2023 at 14:25):

Since it is not exposed via C FFI I would say you can't in a nice way. I guess you could just pray that the linker finds the symbol during compilation but in general it does not seem to be made for external use as of now.

Arthur Paulino (Jan 06 2023 at 14:26):

Hmm, so handling Nats is impractical, right?

Arthur Paulino (Jan 06 2023 at 14:46):

This will do for now

size_t nat_to_size_t(lean_obj_arg n) {
    if (lean_is_scalar(n)) return lean_unbox(n);
    else lean_internal_panic_out_of_memory();
}

I'm not interested in arbitrarily large values

James Gallicchio (Jan 06 2023 at 16:30):

Arthur Paulino said:

Hmm, so handling Nats is impractical, right?

Handling unbounded Nats in C generally is hard :joy:

Arthur Paulino (Jan 06 2023 at 16:48):

Yeah, but I meant in terms of reusing the infra that's already set

James Gallicchio (Jan 06 2023 at 16:52):

Yeah, we might need a repo for common utility functions and stuff... Out of curiosity, what code are you building an interface to right now?

Arthur Paulino (Jan 06 2023 at 16:56):

For now:

Arthur Paulino (Jan 06 2023 at 17:00):

The duplicated interfaces are temporary. That's because we haven't migrated our test suites to lspecIO yet. So we are doing tests with the #lspec command for now

Arthur Paulino (Jan 06 2023 at 17:04):

Ultimately we want to interface with https://docs.rs/neptune/latest/neptune/


Last updated: Dec 20 2023 at 11:08 UTC