Zulip Chat Archive

Stream: lean4

Topic: numpy/jax etc?


Tom (Jul 27 2022 at 02:48):

I just tried asking on the "Is there code for X" stream and was referred here instead:

Are there any high-performance computing libraries focused on numerical N-D array and matrix computations, similar to e.g. numpy? Or even something more sophisticated like jax?

Hanting Zhang (Jul 27 2022 at 03:31):

@Arthur Paulino had NumLean going a while back. There's also SciLean by @Tomas Skrivan, there's also some more I'm probably missing though

Arthur Paulino (Jul 27 2022 at 03:33):

Yeah, although NumLean was more of an exercise to play with the FFI. If you're going to invest some time in learning, definitely go for SciLean

Hanting Zhang (Jul 27 2022 at 03:34):

Also you can find all the lean4 projects here and see if there's anything for you: https://github.com/topics/lean4

Tomas Skrivan (Jul 27 2022 at 06:36):

I also have a proof of concept for wrapping Eigen C++ library, https://github.com/lecopivo/EigenLean . But in the current form it is just a FFI exercise.

SciLean is definitely not high performance right now. The current aim is to get powerful automatic/symbolic differentiation. It also has linear map transposition, called adjoint in SciLean. But there is nothing like vmap or pmap at the moment. Also interface for ND tensors is still very rough, I can't get it right and keep rewriting it.

Tom (Jul 27 2022 at 21:11):

I would prefer to avoid Eigen for optimization, TBH. Expression templates in C++ - as clever as they are - are just a poor man's version of AST rewriting (and temporary elimination). I also personally prefer the interfaces of some of the other C++ numerical libs (e.g. blaze or xtensor) but will concede that Eigen is popular, battle-tested and it's partially down to personal taste.

That aside, I think it would make more sense to try and push all the heavy lifting/expression rewriting etc into Lean (this partially motivated my recent questions about Lean metaprogramming). I'd also love to be able to do some other things:

1) See if it's possible to leverage mathlib to help with proofs/simplifications in the expression rewriter (I've no experience with it yet but am learning quick!)
2) Write the library to be polymorphic at its core (e.g over the element type) so it could still be usable with N-D arrays over non-floats: Fields (e.g. rationals) and other non-float types (e.g. Unums?), and even just rings/modules. I'm still getting bootstrapped with Lean and the FFI but it seems that we could then specialize/optimize the implementation for specific types (i.e. Floats) down to CPU/GPU while keeping the general versions implemented in Lean code.

Since I'm still learning the language and the ecosystem, I have a very limited view of what's available. I'm sure others much smarter than me have thought of this so any additional pointers to any prior art would be welcome!

Of course, eventually the idea of a) chunked arrays b) some sort of distributed runtime will inevitably come up but I'm trying to just get my feet wet :smile:

Tomas Skrivan (Jul 28 2022 at 02:45):

Yes these are some goal I'm trying to/will address with SciLean.

I have binding to Eigen only because I'm familiar with it, and I'm effectively using it only as a wrapper around linear solvers. Expression templates are anyway not available as they work only at compile time. I started using Lean exactly because I wanted to do more complicated optimizations/rewritings than expression templates allow you, plus doing anything complicated with C++ templates makes you to question your life.

When it comes to ND arrays, my current approach revolves around types that are isomorphic to the type Index -> Value for example a : FloatArray of size n*m is isomorphic to a function of type Fin n × Fin m -> Float. At its core this is enabled by the isomorphism between Fin (n*m) and Fin n × Fin m. This generalizes to arbitrarly complicated index type Index that you know how to flatten, i.e. have isomorphism with Fin n.
Nothing stops you to use the same approach with Array Value for arbitrary type Value or ByteArray and a Value type that can be read/written to bytes.

Tomas Skrivan (Jul 28 2022 at 02:58):

Also with @Arthur Paulino we were working on a library, https://github.com/lecopivo/lean4-karray , that would allow you to take Lean code and turn it to C code such that you can generate array kernels executable on CPU or GPU.

But we both had priorities elsewhere so the project is on hold right now.

Mac (Jul 28 2022 at 06:19):

Tomas Skrivan said:

Also with Arthur Paulino we were working on a library, https://github.com/lecopivo/lean4-karray , that would allow you to take Lean code and turn it to C code such that you can generate array kernels executable on CPU or GPU.

But we both had priorities elsewhere so the project is on hold right now.

A shameless self plug, but have you and/or Arthur considered using Alloy to generate the C code for the KArray project?

Tomas Skrivan (Jul 28 2022 at 06:56):

It is definitely something I want to look into but in some way I don't see the reason why should I write the C code. I want to write Lean code and generate C code based on that. Well, even the Lean code will be likely generated based on non-computable Lean code.

Tomas Skrivan (Jul 28 2022 at 07:09):

But I can definitely see some uses of Alloy inside KArray, for example the writeBytes and readBytes would be functions that you have to implement in C manually, code.

Mac (Jul 28 2022 at 08:09):

Tomas Skrivan said:

I want to write Lean code and generate C code based on that.

One thing I find neat about having the C embedded in Lean is that you can use Lean's metaprogramming facilities to generate C code, which can often be easier than piecing it together with strings.

Tomas Skrivan (Jul 28 2022 at 08:28):

Totally believe that, I will probably bug you about it once I get back to this :)


Last updated: Dec 20 2023 at 11:08 UTC