Zulip Chat Archive

Stream: lean4

Topic: wasm nix emscripten build


Anders Christiansen Sørby (Jan 11 2022 at 23:14):

I've started looking into working on the wasm emscripten build support (using nix).
It seems like it is currently broken as it is descirbed in emscripten.md

mkdir -p build/emscripten
cd build/emscripten
emccmake cmake ../../src -DCMAKE_BUILD_TYPE=Emscripten
make

Sebastian Ullrich (Jan 12 2022 at 08:05):

That could very well be the case since it's not tested by CI right now

Sebastian Ullrich (Jan 12 2022 at 08:11):

Also fair warning, the last time I tried this, the wasm libc++ cached by Nixpkgs was missing exception support, and trying to add it lead to some inscrutable error message afair https://github.com/Kha/nixpkgs/commit/6c586acad642bf57f65a531d765f08af18da3fe6

Anders Christiansen Sørby (Jan 12 2022 at 10:56):

Ok. Thanks for the heads up. I think I'm just going to try setting LEAN_CC to emcc and use buildLeanPackage

Anders Christiansen Sørby (Jan 12 2022 at 19:42):

Is it strictly necessary to link with gmp? I'm getting this error:

wasm-ld: error: unknown file type: /nix/store/lqd2xr20f8qmmvss3nrbph22kln8a0p5-gmp-6.2.1/lib/libgmp.so

Henrik Böving (Jan 12 2022 at 20:07):

As far as I know gmp is used for efficient computation on Nat internally? so I'd say yes it is although there might be a way to get rid off it at the cost of Nat efficiency.

Sebastian Ullrich (Jan 12 2022 at 20:26):

See cmake option USE_GMP and https://github.com/leanprover/lean4/issues/827. For your use case it should be fine to require it.

Anders Christiansen Sørby (Jan 17 2022 at 21:18):

So I'm trying to use emcc directly inside buildLeanPackage and it has been surprisingly successful (After setting LEANC_GMP=" "), but wasm only has experimental multithreading support currently so I wanted to disable it by passing

  leancpp-single-thread = leancpp.overrideAttrs (old: {
    cmakeFlags = old.cmakeFlags ++ [ "-ULEAN_MULTI_THREAD" "-UMULTI_THREAD" "-UUSE_GMP" ];
  });

Although this does not change the C macros inside the build. Do you know why?

Sebastian Ullrich (Jan 17 2022 at 21:32):

I don't think you want to use -U. Use -DUSE_GMP=OFF etc.

Anders Christiansen Sørby (Jan 20 2022 at 17:07):

I've managed to build most of lean with emscripten, but I still need to compile the kernel and runtime to statically link.

wasm-ld: error: unknown file type: object.cpp.o

I'm running into this error when compiling Lean with emcc and I suspect this is because object.cpp.o was not compiled with emscripten. So I'm trying to compile leancpp with emscripten.

  leancpp-emscripten = leancpp-single-thread.overrideAttrs (old: {
    name = "leancpp-emscripten";
    nativeBuildInputs = [ cmake emscripten ];
    cmakeFlags = old.cmakeFlags ++ [ "-DCMAKE_BUILD_TYPE=Emscripten" ];
    preConfigure = ''
      # ignore absence of submodule
      sed -i 's!lake/Lake.lean!!' CMakeLists.txt
    '';
    configurePhase = ''
      sh -c "$preConfigure"
      mkdir build
      cd build
      emcmake cmake .. $cmakeFlags
      patchShebangs .
    '';
  });

gives this error

leancpp-emscripten> unpacking sources
leancpp-emscripten> unpacking source archive /nix/store/zzbyl3jkby9dgdkc25idhl76pb937064-source
leancpp-emscripten> source root is source
leancpp-emscripten> patching sources
leancpp-emscripten> configuring
leancpp-emscripten> configure: cmake .. -DSTAGE=1 -DPREV_STAGE=./faux-prev-stage -DUSE_GITHASH=OFF -DMULTI_THREAD=OFF -DUSE_GMP=OFF -DCMAKE_BUILD_TYPE=Emscripten -DCMAKE_TOOLCHAIN_FILE=/nix/store/jrjpc9b1b3kjvj46ylx748awn0i20gp8-emscripten-2.0.27/share/emscripten/cmake/Modules/Platform/Emscripten.cmake -DCMAKE_CROSSCOMPILING_EMULATOR=/nix/store/p27ckm6j4i1g4v5k16rqc0xhp167ympw-nodejs-14.18.1/bin/node
leancpp-emscripten> -- 32-bit machine detected
leancpp-emscripten> -- Enabled multi-thread support
leancpp-emscripten> CMake Warning at CMakeLists.txt:253 (message):
leancpp-emscripten>   Failed to find ccache, prepare for longer and redundant builds...
leancpp-emscripten>
leancpp-emscripten> -- Could NOT find PythonInterp (missing: PYTHON_EXECUTABLE)
leancpp-emscripten> -- Configuring done
leancpp-emscripten> -- Generating done
leancpp-emscripten> -- Build files have been written to: /build/source/build
leancpp-emscripten> building
leancpp-emscripten> build flags: SHELL=/nix/store/l0wlqpbsvh1pgvhcdhw7qkka3d31si7k-bash-5.1-p8/bin/bash leancpp leanrt leanrt_initial-exec shell
leancpp-emscripten> [  1%] Building CXX object initialize/CMakeFiles/initialize.dir/init.cpp.o
leancpp-emscripten> unable to find python in $PATH
leancpp-emscripten> make[3]: *** [initialize/CMakeFiles/initialize.dir/build.make:77: initialize/CMakeFiles/initialize.dir/init.cpp.o] Error 1
leancpp-emscripten> make[2]: *** [CMakeFiles/Makefile2:1394: initialize/CMakeFiles/initialize.dir/all] Error 2
leancpp-emscripten> make[1]: *** [CMakeFiles/Makefile2:1193: CMakeFiles/leancpp.dir/rule] Error 2
leancpp-emscripten> make: *** [Makefile:654: leancpp] Error 2

What can be causing this error?

Anders Christiansen Sørby (Jan 20 2022 at 18:03):

It fails here

    #if defined(LEAN_EMSCRIPTEN)
        // `Crypto.getRandomValues` documents `dest` should be at most 65536 bytes.
        size_t read_sz = std::min(remain, 65536);
    #else
        size_t read_sz = remain;
    #endif
/home/anderscs/src/lean4/src/runtime/io.cpp:402:26: error: no matching function for call to 'min'
        size_t read_sz = std::min(remain, 65536);
                         ^~~~~~~~
/nix/store/jrjpc9b1b3kjvj46ylx748awn0i20gp8-emscripten-2.0.27/share/emscripten/cache/sysroot/include/c++/v1/algorithm:2594:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('unsigned long' vs. 'int')
min(const _Tp& __a, const _Tp& __b)
^
/nix/store/jrjpc9b1b3kjvj46ylx748awn0i20gp8-emscripten-2.0.27/share/emscripten/cache/sysroot/include/c++/v1/algorithm:2605:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against 'unsigned long'
min(initializer_list<_Tp> __t, _Compare __comp)
^
/nix/store/jrjpc9b1b3kjvj46ylx748awn0i20gp8-emscripten-2.0.27/share/emscripten/cache/sysroot/include/c++/v1/algorithm:2614:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided
min(initializer_list<_Tp> __t)
^
/nix/store/jrjpc9b1b3kjvj46ylx748awn0i20gp8-emscripten-2.0.27/share/emscripten/cache/sysroot/include/c++/v1/algorithm:2585:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
min(const _Tp& __a, const _Tp& __b, _Compare __comp)
^

Did they change the api?

Sebastian Ullrich (Jan 20 2022 at 18:08):

Try 65536ul

Anders Christiansen Sørby (Jan 20 2022 at 18:33):

(size_t) 65536 seems to work

Anders Christiansen Sørby (Jan 20 2022 at 19:13):

Any Idea why it fails here?

Putting child 0x461420 (runtime/CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o) PID 45 on the chain.
leancpp-emscripten> Live child 0x461420 (runtime/CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o) PID 45
leancpp-emscripten> [  1%] Building CXX object runtime/CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o
leancpp-emscripten> Reaping winning child 0x461420 PID 45
leancpp-emscripten> cd /build/source/build/runtime && /nix/store/jrjpc9b1b3kjvj46ylx748awn0i20gp8-emscripten-2.0.27/share/emscripten/em++  @CMakeFiles/leanrt_initial-exec.dir/includes_CXX.rsp -Wall -Wextra -std=c++14  -D LEAN_EMSCRIPTEN -s ALLOW_MEMORY_GROWTH=1 -s DISABLE_EXCEPTION_CATCHING=0 -s MAIN_MODULE=1 -fexceptions -DLEAN_BUILD_TYPE="Emscripten" -DLEAN_EXPORTING -D__CLANG__ -fvisibility=hidden -fvisibility-inlines-hidden -MD -MT runtime/CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o -MF CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o.d -o CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o -c /build/source/runtime/debug.cpp
leancpp-emscripten> Live child 0x461420 (runtime/CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o) PID 46
leancpp-emscripten> unable to find python in $PATH
leancpp-emscripten> Reaping losing child 0x461420 PID 46
leancpp-emscripten> make[2]: *** [runtime/CMakeFiles/leanrt_initial-exec.dir/build.make:77: runtime/CMakeFiles/leanrt_initial-exec.dir/debug.cpp.o] Error 1
leancpp-emscripten> Removing child 0x461420 PID 46 from chain.
leancpp-emscripten> make[2]: Leaving directory '/build/source/build'
leancpp-emscripten> Reaping losing child 0x47ab80 PID 44
leancpp-emscripten> make[1]: *** [CMakeFiles/Makefile2:1238: runtime/CMakeFiles/leanrt_initial-exec.dir/all] Error 2
leancpp-emscripten> Removing child 0x47ab80 PID 44 from chain.
leancpp-emscripten> make[1]: Leaving directory '/build/source/build'
leancpp-emscripten> Reaping losing child 0x4756f0 PID 41
leancpp-emscripten> make: *** [Makefile:166: all] Error 2
leancpp-emscripten> Removing child 0x4756f0 PID 41 from chain.

Is this because of the python error?

Sebastian Ullrich (Jan 20 2022 at 19:17):

It wouldn't fail at a specific C++ file then

Sebastian Ullrich (Jan 20 2022 at 19:17):

It's weird that there is no compiler error, how did you find it before?

Anders Christiansen Sørby (Jan 20 2022 at 19:17):

I've made a draft PR of this here https://github.com/leanprover/lean4/pull/966

Anders Christiansen Sørby (Jan 20 2022 at 19:18):

I ran nix develop before. I don't know why the error is not showing

Sebastian Ullrich (Jan 20 2022 at 19:19):

Fixing it in nix develop first is definitely a good idea given how big the derivation is

Anders Christiansen Sørby (Jan 20 2022 at 19:24):

What's ironic is that in nix develop the build actually works for some reason

Anders Christiansen Sørby (Feb 03 2022 at 14:14):

@Mac Do you think it would be easier to build to wasm with lake? I've gotten stuck on compiling leancpp with emcmake and I generally don't like working with cmake. I guess I could replace the cmake in the nix build and just use lake to build? This would be ideal.

Mac (Feb 03 2022 at 17:21):

@Anders Christiansen Sørby do you mean WASM for the Lean core or WASM for Lean libraries? Lake can probably soon be used to build the later, the former seems much more unlikely (at least until Lake supports arbitrary build targets).

Anders Christiansen Sørby (Feb 03 2022 at 17:34):

I mean the lean core c++ code, runtime, kernel, utils etc. This is currently compiled with CMake.

Mac (Feb 03 2022 at 18:51):

In that case, I don't see that being easily replaceable by Lake anytime soon.

Sebastian Ullrich (Feb 03 2022 at 20:33):

By the way, I took a look at the original PR today and we did have a working CI config, it just wasn't really feasible to turn it on. Adding cross-architecture .olean compilation would be one way to fix that.

Mac (Feb 03 2022 at 20:57):

@Sebastian Ullrich maybe it would be possible to run the CI build as scheduled workflow (maybe as part of the nightly release) instead of part of the regular on push workflow? That would make the 2h build time less of a concern.

Mac (Feb 03 2022 at 21:01):

Also, since the nightly release adds the artifacts as they are built (if they succeed), the long WASM build would not delay or prevent the release of the other artifacts.

Sebastian Ullrich (Feb 04 2022 at 07:44):

Mac said:

Also, since the nightly release adds the artifacts as they are built (if they succeed), the long WASM build would not delay or prevent the release of the other artifacts.

Which in itself is not ideal, and I started to fix that on some branch that I just have to find again... but we could always make an exception for the Emscripten build if necessary

Sebastian Ullrich (Feb 08 2022 at 11:28):

Sebastian Ullrich said:

Adding cross-architecture .olean compilation would be one way to fix that.

Doing this in the current compact.cpp implementation would be quite a challenge since it is based on the platform-specific declarations and functions in lean.h. It should be much simpler to make @Gabriel Ebner's oleanparser platform-generic, just abstract over the (all?) read64LE calls. We would have to fix the TODOs first and write a corresponding serializer, but seem quite desirable anyway. /cc @Leonardo de Moura

Sebastian Ullrich (Feb 08 2022 at 11:29):

Not trivial either, but that you cannot easily add a conditional platform to a GitHub Actions job is reason enough for me to solve the Emscripten build that way instead...

Gabriel Ebner (Feb 08 2022 at 11:33):

It should be much simpler to make @Gabriel Ebner's oleanparser platform-generic, just abstract over the (all?) read64LE calls.

That seems like a good plan. We'd also need to abstract over alignment, object headers, and bignum representation.

Gabriel Ebner (Feb 08 2022 at 11:35):

We should also add flags for these variants in the olean file so that we can pick the right parser.

Sebastian Ullrich (Feb 08 2022 at 12:30):

Yes, we probably should if we want to avoid user reports about weird segfaults (...which may happen at any point thanks to mmap :) ). The object layout should also be sufficiently stable that we can add a version number for it, even if that still doesn't guarantee that Lean can load .oleans of the same version because the declarations of the serialized Lean types may have changed.

Anders Christiansen Sørby (Feb 08 2022 at 17:09):

Maybe you should use content addressed versioning for the olean format?

Sebastian Ullrich (Feb 08 2022 at 17:32):

What does that mean?

Anders Christiansen Sørby (Feb 08 2022 at 18:08):

I was thinking to ensure that the format is versioned stably you can include a hash of the serialized Lean types in the version string.

Anders Christiansen Sørby (Feb 08 2022 at 18:09):

Also I made this https://github.com/gebner/oleanparser/pull/3

Anders Christiansen Sørby (Feb 08 2022 at 21:48):

@Gabriel Ebner What are the TODOs on the oleanparser?

Gabriel Ebner (Feb 09 2022 at 09:53):

Most of the TODOs are additional assertions that I wanted to write. The olean format in Lean 4 is highly redundant (because it is identical to the memory representation). For example, arrays have two size fields (which should be equal), thunks have a 64-bit closure field (which is always zero), etc.

Gabriel Ebner (Feb 09 2022 at 09:56):

Another TODO is checking that strings are valid UTF-8, but I don't know if the runtime even requires that. (There's certainly lots of places where fromUTF8Unchecked is called on input that could be invalid UTF-8.)

Gabriel Ebner (Feb 09 2022 at 09:58):

And then there's this one, where I don't remember what the TODO was for:

    let (size, sign) := -- TODO: implement Int32

(AFAICT, Int32 is represented as scalar values and there's no way to distinguish it from regular scalar values.)

Gabriel Ebner (Feb 09 2022 at 10:08):

In case you were asking about the general roadmap: basically what Sebastian said. Serialization, different layouts (32-bit mainly). I'd also like to make the output type generic; right now there's this Obj data structure but I'd like to add a FromObj α type class (with scalar : Nat → α etc. members) so that the parser could directly create the Lean VM objects (with a different instance).

Anders Christiansen Sørby (Feb 09 2022 at 10:36):

Using dependent types as a form of safe (to be proven) runtime loading, "zero" cost serializations, could be really cool.


Last updated: Dec 20 2023 at 11:08 UTC