Zulip Chat Archive
Stream: lean4
Topic: The Lean C/C++ API
Mac (Sep 11 2021 at 20:29):
So I was in the process of updating Papyrus to the latest Lean nightly and noticed a number of breaking changes had been made in the past few days to the Lean C/C++ API. For one, it appears that the C++ interface has been completely removed from the distribution. The manner of linking to Lean also appears to have changed. As such, I am a little at a loss on how to update Papyrus.
Lean 4 devs (e.g., @Sebastian Ullrich and Leo) is there any advice you can give on how to migrate? Also are their any guidelines on how C/C++ bindings should be written going forward to make them more future proof?
Mac (Sep 11 2021 at 20:43):
Two particularly large point pain points are the absence of mpz
and utf8
(and mpz_object
from object
), which I needed to convert between Lean Int
s and Nat
s and LLVM APInt
s and between Lean String
s and LLVM StringRef
s/Twine
s.
Mac (Sep 11 2021 at 21:09):
Also the absence of the std::string
to lean_object
converters hurts.
Leonardo de Moura (Sep 11 2021 at 21:18):
@Mac Sorry about that, we assumed nobody was using the C++ .h files in the Lean distribution. These files were originally written to support the Lean implementation. This was not a random decision. It was done to simplify https://github.com/leanprover/lean4/pull/659
This PR is critical because it will allow us to have a Lean distribution that does not depend on a C compiler in the user machine. We believe Lake + this PR will dramatically improve the user experience.
I think Papyrus is also an important project. Could you "survive" with the C API? If not, do you want to volunteer to maintain a C++ API for Lean.
BTW, we do hope one day Lean runtime will be pure C, and everything else will be written in Lean.
Mac (Sep 11 2021 at 21:36):
@Leonardo de Moura Ah, makes sense. I think I can largely do without the C++ API. As you said, most of it used to implement Lean and very little needs to be used by third parties. Most of the useful C++ helpers can also be rewritten relatively easily.
The only critical things I am missing are utf8_strlen
and mpz
. The former is currently needed to needed to convert things like std::string
(i.e., potentially non-null terminated strings with a known size) into a lean_object
. However, it could likely be replaced with an alternative C lean_mk_string
that takes both a pointer and a size.
For mpz
, I was actually hacking around the existing Lean C++ API to get at the C mpz_t
, so if there was C API lean_mpz_value
which returned the mpz_t
, that would be sufficient. Similar methods could also exist for mpq
and mpfp
to get at their mpq_t
and mpfp_t
. They could probably be located in a separate .h
that includes gmp
so that gmp
is only a dependency for those who need such an API.
Leonardo de Moura (Sep 11 2021 at 21:38):
The only critical things I am missing are are utf8_strlen
I will this one to the C API.
Leonardo de Moura (Sep 11 2021 at 21:39):
Now, the mpz
, do you need big numbers?
Mac (Sep 11 2021 at 21:39):
Yes, for LLVM's APInt
.
Leonardo de Moura (Sep 11 2021 at 21:41):
Would a function that returns the Lean Nat/Int as a string be good enough?
How do you create APInt
objects in LLVM? Is it based on GMP too?
Mac (Sep 11 2021 at 21:43):
LLVM has its own numeric representation for APInt
which is also an array of limbs, so I was using mpz_import
/mpz_export
to convert between the two efficiently (without having to construct and parse an intermediary string).
Mario Carneiro (Sep 11 2021 at 21:46):
how about an interface to get the nth limb (with a radix of 2^8, 2^32 or 2^64 as convenient)? That would be relatively efficient for this kind of conversion
Mac (Sep 11 2021 at 21:47):
I would think exposing the already existingmpz_t
would be the simplest and most powerful solution.
Mario Carneiro (Sep 11 2021 at 21:47):
That would also work, but imposes a dependency on gmp
, and I'm guessing they are considering alternative backends
Mac (Sep 11 2021 at 21:48):
Hence why I said splitting it into a separate header. I am more than happy to rewrite my conversion code if the backend representation changes.
Mac (Sep 11 2021 at 21:49):
And its not like C interface doesn't expose implementation details for other types of objects (e.g., arrays, strings, etc.)
Mario Carneiro (Sep 11 2021 at 21:50):
those are all exposed via structs in lean.h though
Mario Carneiro (Sep 11 2021 at 21:51):
I don't know if something similar is possible for mpz unless you leave it as a void *
Mac (Sep 11 2021 at 21:51):
My ideal interface would be lean_object* lean_alloc_mpz(mpz_t)
and mpz_t* lean_mpz_value(lean_object*)
Leonardo de Moura (Sep 11 2021 at 21:54):
The motivation to eliminate the GMP dependency was PR https://github.com/leanprover/lean4/pull/659
@Sebastian Ullrich told me Zig was having trouble finding gmp.h
. We don't want to re-distribute gmp.h
with Lean. It would be great to just avoid gmp.h
in the header, and as @Mario Carneiro suggested, it would allow us to use different bignum libraries in the future without affecting user code.
Mac (Sep 11 2021 at 21:58):
@Leonardo de Moura my confusion is thus. Couldn't a separate header providing these methods #include <gmp.h>
without Lean actually distributing gmp.h
? That way it would not bloat Lean and would only be usable to libraries that need to be manually built with a separate C/C++ compiler anyway.
Mac (Sep 11 2021 at 22:03):
However, the limb indexing method @Mario Carneiro suggested would probably work too. But that would need at multiple API methods. That is, allocating a number of a given bit width, setting limbs, getting the bit width of a number and getting limbs.
Mac (Sep 11 2021 at 22:04):
Also, in that cast, it would probably be helpful to mention that LLVM uses a 64-bit limb size, so that would be the most pertinent size of limb to set (at least for me).
Leonardo de Moura (Sep 11 2021 at 22:05):
Yes, @Mario Carneiro 's solution is the best long-term solution, but just adding an extra lean_gmp.h
is an easier fix for now.
Mario Carneiro (Sep 11 2021 at 22:06):
I wouldn't be surprised if GMP uses 32 bit limbs on 32 bit and 64 bit limbs on 64 bit
Mac (Sep 11 2021 at 22:06):
@Mario Carneiro that's a good point, I did not check if LLVM's is also architecture dependent.
Mac (Sep 11 2021 at 22:07):
On closer examination, it does not appear to be .
Mario Carneiro (Sep 11 2021 at 22:13):
gmp's mp_limb_t
is apparently one of unsigned int
, unsigned long int
and unsigned long long int
, which I think puts it somewhere between 8 and 64 bits
Mac (Sep 11 2021 at 22:21):
Ah
Mac (Sep 11 2021 at 22:22):
@Leonardo de Moura Just to clarify, the relevant utf8_strlen
for me is the one that takes both a pointer and a size (though the one that takes a pointer may be useful to have for other users of the API).
Mac (Sep 11 2021 at 22:30):
@Leonardo de Moura One other utility that seems to be missing from the C API that feels like it would fit is decode_io_error
(as, for one, it is using C errnums).
Leonardo de Moura (Sep 11 2021 at 22:44):
@Mac Pushed the missing APIs.
Mac (Sep 11 2021 at 23:14):
@Leonardo de Moura Thank you so much!
Mac (Sep 11 2021 at 23:48):
@Leonardo de Moura Two final notes I have from porting the Papyrus C/C++ lib to the new API:
- You might want to expose
mk_io_user_error
somewhere now that the convenience methods for constructing anIO
from string error messages are gone (i.e., expose it through alean_mk_io_user_error
). - It also might be worth renaming
lean_mpz_value
to something likelean_extract_mpz_value
due to its signature being very different from otherlean_*_value
functions.
Mac (Sep 12 2021 at 00:35):
@Leonardo de Moura Oh, and on a related note, today's nightly (09-12) is missing a Windows binary because the CI build failed with timeout on one of the tests.
Mac (Sep 12 2021 at 00:36):
(it is the editor completion test which seems to hang non-deterministically on Windows leading to some nightlies missing a Windows binary.)
Christian Pehle (Sep 13 2021 at 18:30):
I personally found the C++ API super useful for quickly interfacing with C++ libraries, which in turn I think is one of the fastest ways to add useful non-trivial library functionality. I was able to modify the C++ bindings I'm working on to not use it, but using List, Tuples now is harder. Regarding Zig integration, the zig compiler contains a 1:1 copy of the clang driver, could one not "just" distribute clang with Lean, or is that unreasonable.
Christian Pehle (Sep 13 2021 at 18:36):
Compare https://github.com/ziglang/zig/blob/master/src/zig_clang_driver.cpp, with https://github.com/llvm/llvm-project/blob/main/clang/tools/driver/driver.cpp
Leonardo de Moura (Sep 15 2021 at 13:49):
@Christian Pehle See https://andrewkelley.me/post/zig-cc-powerful-drop-in-replacement-gcc-clang.html for why Zig is a great choice for what C compiler to ship with a language.
Regarding the C++ API, we didn't really have one. They were just a bunch of helper files we used in the Lean implementation. Moreover, one of our ambitions is to have a Lean repo that does not use C++. The runtime will be written in C and everything else in Lean. I can see the C++ API may be useful for some users, but we don't have cycles to maintain a solid C++ API for Lean.
Last updated: Dec 20 2023 at 11:08 UTC