Zulip Chat Archive

Stream: lean4

Topic: The Lean C/C++ API


Mac (Sep 11 2021 at 20:29):

So I was in the process of updating Papyrus to the latest Lean nightly and noticed a number of breaking changes had been made in the past few days to the Lean C/C++ API. For one, it appears that the C++ interface has been completely removed from the distribution. The manner of linking to Lean also appears to have changed. As such, I am a little at a loss on how to update Papyrus.

Lean 4 devs (e.g., @Sebastian Ullrich and Leo) is there any advice you can give on how to migrate? Also are their any guidelines on how C/C++ bindings should be written going forward to make them more future proof?

Mac (Sep 11 2021 at 20:43):

Two particularly large point pain points are the absence of mpz and utf8 (and mpz_object from object), which I needed to convert between Lean Ints and Nats and LLVM APInts and between Lean Strings and LLVM StringRefs/Twines.

Mac (Sep 11 2021 at 21:09):

Also the absence of the std::string to lean_object converters hurts.

Leonardo de Moura (Sep 11 2021 at 21:18):

@Mac Sorry about that, we assumed nobody was using the C++ .h files in the Lean distribution. These files were originally written to support the Lean implementation. This was not a random decision. It was done to simplify https://github.com/leanprover/lean4/pull/659
This PR is critical because it will allow us to have a Lean distribution that does not depend on a C compiler in the user machine. We believe Lake + this PR will dramatically improve the user experience.
I think Papyrus is also an important project. Could you "survive" with the C API? If not, do you want to volunteer to maintain a C++ API for Lean.
BTW, we do hope one day Lean runtime will be pure C, and everything else will be written in Lean.

Mac (Sep 11 2021 at 21:36):

@Leonardo de Moura Ah, makes sense. I think I can largely do without the C++ API. As you said, most of it used to implement Lean and very little needs to be used by third parties. Most of the useful C++ helpers can also be rewritten relatively easily.

The only critical things I am missing are utf8_strlen and mpz. The former is currently needed to needed to convert things like std::string (i.e., potentially non-null terminated strings with a known size) into a lean_object. However, it could likely be replaced with an alternative C lean_mk_string that takes both a pointer and a size.

For mpz, I was actually hacking around the existing Lean C++ API to get at the C mpz_t, so if there was C API lean_mpz_value which returned the mpz_t, that would be sufficient. Similar methods could also exist for mpq and mpfp to get at their mpq_t and mpfp_t. They could probably be located in a separate .h that includes gmp so that gmp is only a dependency for those who need such an API.

Leonardo de Moura (Sep 11 2021 at 21:38):

The only critical things I am missing are are utf8_strlen

I will this one to the C API.

Leonardo de Moura (Sep 11 2021 at 21:39):

Now, the mpz, do you need big numbers?

Mac (Sep 11 2021 at 21:39):

Yes, for LLVM's APInt.

Leonardo de Moura (Sep 11 2021 at 21:41):

Would a function that returns the Lean Nat/Int as a string be good enough?
How do you create APIntobjects in LLVM? Is it based on GMP too?

Mac (Sep 11 2021 at 21:43):

LLVM has its own numeric representation for APInt which is also an array of limbs, so I was using mpz_import/mpz_export to convert between the two efficiently (without having to construct and parse an intermediary string).

Mario Carneiro (Sep 11 2021 at 21:46):

how about an interface to get the nth limb (with a radix of 2^8, 2^32 or 2^64 as convenient)? That would be relatively efficient for this kind of conversion

Mac (Sep 11 2021 at 21:47):

I would think exposing the already existingmpz_t would be the simplest and most powerful solution.

Mario Carneiro (Sep 11 2021 at 21:47):

That would also work, but imposes a dependency on gmp, and I'm guessing they are considering alternative backends

Mac (Sep 11 2021 at 21:48):

Hence why I said splitting it into a separate header. I am more than happy to rewrite my conversion code if the backend representation changes.

Mac (Sep 11 2021 at 21:49):

And its not like C interface doesn't expose implementation details for other types of objects (e.g., arrays, strings, etc.)

Mario Carneiro (Sep 11 2021 at 21:50):

those are all exposed via structs in lean.h though

Mario Carneiro (Sep 11 2021 at 21:51):

I don't know if something similar is possible for mpz unless you leave it as a void *

Mac (Sep 11 2021 at 21:51):

My ideal interface would be lean_object* lean_alloc_mpz(mpz_t) and mpz_t* lean_mpz_value(lean_object*)

Leonardo de Moura (Sep 11 2021 at 21:54):

The motivation to eliminate the GMP dependency was PR https://github.com/leanprover/lean4/pull/659
@Sebastian Ullrich told me Zig was having trouble finding gmp.h. We don't want to re-distribute gmp.h with Lean. It would be great to just avoid gmp.h in the header, and as @Mario Carneiro suggested, it would allow us to use different bignum libraries in the future without affecting user code.

Mac (Sep 11 2021 at 21:58):

@Leonardo de Moura my confusion is thus. Couldn't a separate header providing these methods #include <gmp.h> without Lean actually distributing gmp.h? That way it would not bloat Lean and would only be usable to libraries that need to be manually built with a separate C/C++ compiler anyway.

Mac (Sep 11 2021 at 22:03):

However, the limb indexing method @Mario Carneiro suggested would probably work too. But that would need at multiple API methods. That is, allocating a number of a given bit width, setting limbs, getting the bit width of a number and getting limbs.

Mac (Sep 11 2021 at 22:04):

Also, in that cast, it would probably be helpful to mention that LLVM uses a 64-bit limb size, so that would be the most pertinent size of limb to set (at least for me).

Leonardo de Moura (Sep 11 2021 at 22:05):

Yes, @Mario Carneiro 's solution is the best long-term solution, but just adding an extra lean_gmp.h is an easier fix for now.

Mario Carneiro (Sep 11 2021 at 22:06):

I wouldn't be surprised if GMP uses 32 bit limbs on 32 bit and 64 bit limbs on 64 bit

Mac (Sep 11 2021 at 22:06):

@Mario Carneiro that's a good point, I did not check if LLVM's is also architecture dependent.

Mac (Sep 11 2021 at 22:07):

On closer examination, it does not appear to be .

Mario Carneiro (Sep 11 2021 at 22:13):

gmp's mp_limb_t is apparently one of unsigned int, unsigned long int and unsigned long long int, which I think puts it somewhere between 8 and 64 bits

Mac (Sep 11 2021 at 22:21):

Ah

Mac (Sep 11 2021 at 22:22):

@Leonardo de Moura Just to clarify, the relevant utf8_strlen for me is the one that takes both a pointer and a size (though the one that takes a pointer may be useful to have for other users of the API).

Mac (Sep 11 2021 at 22:30):

@Leonardo de Moura One other utility that seems to be missing from the C API that feels like it would fit is decode_io_error (as, for one, it is using C errnums).

Leonardo de Moura (Sep 11 2021 at 22:44):

@Mac Pushed the missing APIs.

Mac (Sep 11 2021 at 23:14):

@Leonardo de Moura Thank you so much!

Mac (Sep 11 2021 at 23:48):

@Leonardo de Moura Two final notes I have from porting the Papyrus C/C++ lib to the new API:

  1. You might want to expose mk_io_user_error somewhere now that the convenience methods for constructing an IO from string error messages are gone (i.e., expose it through a lean_mk_io_user_error).
  2. It also might be worth renaming lean_mpz_value to something likelean_extract_mpz_value due to its signature being very different from other lean_*_value functions.

Mac (Sep 12 2021 at 00:35):

@Leonardo de Moura Oh, and on a related note, today's nightly (09-12) is missing a Windows binary because the CI build failed with timeout on one of the tests.

Mac (Sep 12 2021 at 00:36):

(it is the editor completion test which seems to hang non-deterministically on Windows leading to some nightlies missing a Windows binary.)

Christian Pehle (Sep 13 2021 at 18:30):

I personally found the C++ API super useful for quickly interfacing with C++ libraries, which in turn I think is one of the fastest ways to add useful non-trivial library functionality. I was able to modify the C++ bindings I'm working on to not use it, but using List, Tuples now is harder. Regarding Zig integration, the zig compiler contains a 1:1 copy of the clang driver, could one not "just" distribute clang with Lean, or is that unreasonable.

Christian Pehle (Sep 13 2021 at 18:36):

Compare https://github.com/ziglang/zig/blob/master/src/zig_clang_driver.cpp, with https://github.com/llvm/llvm-project/blob/main/clang/tools/driver/driver.cpp

Leonardo de Moura (Sep 15 2021 at 13:49):

@Christian Pehle See https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html for why Zig is a great choice for what C compiler to ship with a language.

Regarding the C++ API, we didn't really have one. They were just a bunch of helper files we used in the Lean implementation. Moreover, one of our ambitions is to have a Lean repo that does not use C++. The runtime will be written in C and everything else in Lean. I can see the C++ API may be useful for some users, but we don't have cycles to maintain a solid C++ API for Lean.


Last updated: Dec 20 2023 at 11:08 UTC