Zulip Chat Archive
Stream: lean4 dev
Topic: Lean LLVM/MLIR
Tobias Grosser (Jan 04 2023 at 11:21):
I actually would be super interested in getting lean faster than C on such codes and have thought about this quite a bit. A lot of the vectorization/performance tricks that we apply in https://dl.acm.org/doi/10.1145/3428263 and https://dl.acm.org/doi/10.1145/3485539 could potentially work better in lean.
Henrik Böving (Jan 04 2023 at 12:33):
Tobias Grosser said:
I actually would be super interested in getting lean faster than C on such codes and have thought about this quite a bit. A lot of the vectorization/performance tricks that we apply in https://dl.acm.org/doi/10.1145/3428263 and https://dl.acm.org/doi/10.1145/3485539 could potentially work better in lean.
Why do you believe they could work better in Lean?
Tobias Grosser (Jan 04 2023 at 15:09):
In C++ we face issues with (a) pointer aliasing, (b) partial evaluating the type-specific variants - the templates become crazy, (c) it is hard to supply proofs that certain low-precision types are sufficient, (d) and we cannot synthesis vector code effectively from C++. Lean's pure nature makes pointer aliasing simple to work with and lean is more modular and flexible such that b&d can be addressed, and (c) is what lean is anyway good at.
Tobias Grosser (Jan 04 2023 at 15:10):
I feel we could build e..g something like https://terralang.org/ in lean with an LLVM/MLIR backend.
Tobias Grosser (Jan 04 2023 at 15:11):
Also, the macro system in lean will make many of these things a lot easier, and the interactive proof mode could probably be used to make this process more approachable from a human perspective.
Tobias Grosser (Jan 04 2023 at 15:13):
Also, if you look at how the simplex algorithm is implemented, it is really rewriting matrices in-place. Hence, the initial algorithm can be nicely written in a purely functional style yet the generated code could be fully in place. If we then find a good way to (a) specialize the code according to types and (b) switch between those variants with little overhead we are good.
Tobias Grosser (Jan 04 2023 at 15:14):
Given that we now have an LLVM backend we could implement sth like the clang/gcc vector extensions: https://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors
Tobias Grosser (Jan 04 2023 at 15:14):
Which expose fast SIMD code to the programmers.
Tobias Grosser (Jan 04 2023 at 15:15):
Compilers such as GHC do not have a good SIMD backend, so they have a hard time implementing these.
Tobias Grosser (Jan 04 2023 at 15:16):
LLVM also has excellent support for Checked arithmetic builtins and even SIMD versions of those, which again is sth that we could use.
Tobias Grosser (Jan 04 2023 at 15:18):
Finally, and now we go crazy (but I like that) we could even encode rationals as floats/doubles. ideas_fast-ap-integers.md-at-main-opencompl_ideas.pdf
Tobias Grosser (Jan 04 2023 at 15:19):
And again, to decide when a float is precise and when we need to fall-back to arbitrary precision rationals we certainly would benefit from lean as a language.
Andrés Goens (Jan 04 2023 at 16:55):
Tobias Grosser said:
I feel we could build e..g something like https://terralang.org/ in lean with an LLVM/MLIR backend.
that's pretty interesting :thinking: @Tobias Grosser do you know this well? I'm curious how this differs from Low*, or is it very similar?
Tobias Grosser (Jan 04 2023 at 16:58):
I guess it's similar. What's nice about Terra is that it integrates with llvm which means you get well optimized code that is JIT compiled without going through C and you can also have access of llvms vector abstractions.
Tobias Grosser (Jan 04 2023 at 16:59):
What makes Terra special is that you can use Lua variables from terras low level code and Lua can access Terra variables. It's very dynamic yet fast.
Tobias Grosser (Jan 04 2023 at 17:00):
I feel in lean we could do the same, but maybe even nicer with dynamic syntax extensions, eventually exposing mlir abstractions,...
Tobias Grosser (Jan 04 2023 at 17:01):
If this goes well we could give the lean users access to very low level abstractions, yet allowing for fast code to be generated. This will also help scilean and such.
Andrés Goens (Jan 04 2023 at 17:02):
It probably would require some kind of compiler support to treat these low-level abstractions specially in code generation, right?
Tobias Grosser (Jan 04 2023 at 17:02):
Lean already allows for annotations for using c.
Tobias Grosser (Jan 04 2023 at 17:03):
We could add annotations for using custom llvm and MLIR as implementation of a lean function.
Tobias Grosser (Jan 04 2023 at 17:03):
And then some macro magic to provide a lean api to end-users.
Tobias Grosser (Jan 04 2023 at 17:04):
Coincidentally we already have an llvm backend, so what's missing is mostly macro magic.
Andrés Goens (Jan 04 2023 at 17:06):
I guess also the right abstractions if we'd want to reason about that code
Tobias Grosser (Jan 04 2023 at 17:06):
Do you know anybody who could be interested in collaborating on the macro aspects of this?
Henrik Böving (Jan 04 2023 at 17:08):
Tobias Grosser said:
We could add annotations for using custom llvm and MLIR as implementation of a lean function.
I mean this would just be an attribute with special support in the LLVMEmitter in the Lean code generator no?
Tobias Grosser said:
And then some macro magic to provide a lean api to end-users.
If we have Lean functions tagged as implement by LLVM/MLIR what additional macro magic do we need?
Tobias Grosser (Jan 04 2023 at 17:09):
Andrés Goens said:
I guess also the right abstractions if we'd want to reason about that code
Maybe. I feel that this could be orthogonal. It would be nice if this is verified, but I am unsure finding canonical semantics for everything will be easy. The current c annotations also do not come with semantics and are already very useful. Offering a way to add verification seems to be beneficial as an opt in.
Tobias Grosser (Jan 04 2023 at 17:12):
Henrik Böving said:
Tobias Grosser said:
We could add annotations for using custom llvm and MLIR as implementation of a lean function.
I mean this would just be an attribute with special support in the LLVMEmitter in the Lean code generator no?
Tobias Grosser said:And then some macro magic to provide a lean api to end-users.
If we have Lean functions tagged as implement by LLVM/MLIR what additional macro magic do we need?
In theory we could supply a string of llvm-it that the llvm emitter than translates. This is easy and maybe just a minor change. I feel it would be nice if we would actually parse the LLVM-IR into lean data structure to be able to check certain properties early on and also to be able to eventually attach semantics. But maybe that is indeed step two?
Henrik Böving (Jan 04 2023 at 17:17):
I mean the C stuff is just parsed as strings right now as well :D there was a time where we even had full C expressions instead of just function support.
What kind of semantics are you thinking about here?
Andrés Goens (Jan 04 2023 at 17:19):
I guess in the end it boils down to how superficial the Lean data structure is that it represents it. A data structure that represents the LLVM/MLIR syntax is surely an improvement over a string, but I'd argue not a terribly big one. If we'd want some sort of semantic representation of the LLVM/MLIR code to be able to e.g. typecheck some of the code, which I think we'd need for your points (a) and (b) perhaps above, although I don't fully understand those. For your point (c) a semantic embedding sounds indispensable, whereas for (d) this probably would already be very useful just to be able to code generate something opaque.
Tobias Grosser (Jan 04 2023 at 17:23):
https://github.com/opencompl/lean-mlir shows our experiments on a lean-based parser for MLIR as well as ideas to build modular semantics. It's very much work-in-progress, so many of these questions are unanswered. However, I am sure @Siddharth Bhat and @Sébastien Michelland and others have ideas to share. I am happy to just start with a string-to-llvm annotation and feel there could be some very low-hanging fruits, e.g. a (type-generic) SIMD implementation for lean?
Tobias Grosser (Jan 04 2023 at 17:23):
That allow for manual SIMDization.
Tobias Grosser (Jan 04 2023 at 17:24):
I feel adding a string-only LLVM annotation as a first step could be a nice start and we could improve on this incrementally. I would guess such a patch is < 100 LOC.
Reid Barton (Jan 04 2023 at 17:26):
(maybe this conversation could be split off into a new topic?)
Tobias Grosser (Jan 04 2023 at 17:27):
(yes, I don't know how to. This belongs into lean4 dev or at the least into a separate thread. I don't want to experiment to not break this long thread. Anybody knows how to split threads?)
James Gallicchio (Jan 04 2023 at 18:31):
I think you need admin permissions... @Sebastian Ullrich @Gabriel Ebner ? (saw you online; this discussion might also be of interest to the dev team)
Notification Bot (Jan 04 2023 at 20:34):
35 messages were moved here from #std4 > SAT core defs in Std? by Sebastian Ullrich.
Tobias Grosser (Jan 04 2023 at 21:07):
Thank you!
Last updated: Dec 20 2023 at 11:08 UTC