Zulip Chat Archive

Stream: lean4

Topic: Extern code depending on function argument


Tomas Skrivan (Nov 29 2021 at 15:15):

Is is possible to have external C definition of a function that depend on a function argument?
I'm imagining something like:

class CName (A : Type) where (cname : String)
instance : CName Float :=  "double"
open CName (cname)

@[extern s!"lean_inline_add_{cname A}", externMacro "LEAN_INLINE_ADD"]
def inline_add {A : Type} (a b : A) [Add A] : A := a + b

where LEAN_INLINE_ADD is C macro like:

#define LEAN_INLINE_ADD(TYPE)    lean_inline_add_ ## TYPE (TYPE a, TYPE b){ return a + b; }

This would require to potentially compile and link code during runtime. Is that possible?

Henrik Böving (Nov 29 2021 at 15:20):

I don't feel like this is that good of an idea even if it can work (which I don't believe it can? but feel free to convince me of the opposite)

Is there a reason you cannot solve this by having an add function of a specific instance of Add pointing at an extern macro? Regarding the inlining...I would probably rely on the C compiler to be smart enough to inline when it deems this correct for stuff like this.

Henrik Böving (Nov 29 2021 at 15:23):

Or do you actually have a more complex use case for this? @Tomas Skrivan

Tomas Skrivan (Nov 29 2021 at 15:29):

I want to create functions that can be executed on GPU.

The idea is to represent kernel code with some inductive type that will be created with something like Reflected. Then turn this kernel code representation to actual code.

Tomas Skrivan (Nov 29 2021 at 15:36):

A mock up of saxpy operation:

inductive GPUExpr where
  | input : Nat  GPUExpr
  | const : Float  GPUExpr
  | add : GPUExpr  GPUExpr  GPUExpr
  | mul : GPUExpr  GPUExpr  GPUExpr

@[extern s!"lean_gpu_map_{cname kernel}", externMacro "LEAN_GPU_MAP"]
constant map₂ (kernel : GPUExpr) : GPUArray  GPUArray  GPUArray

def saxpy_kernel (a : Float) := add (mul (const a) (input 0)) (input 1)

def saxpy (a : Float) (x y : GPUExpr) : GPUExpr := map₂ saxpy_kernel x y

Gabriel Ebner (Nov 29 2021 at 15:38):

You should make a Map2 type class and implement it for every kernel. (The implementation can then refer to a concrete lean_gpu_map_* function).

Gabriel Ebner (Nov 29 2021 at 15:39):

Lean doesn't monomorphize*, so there is no way the cname kernel can work.

Gabriel Ebner (Nov 29 2021 at 15:39):

(*) Well, it does. But that's only an optimization, and not guaranteed.

Tomas Skrivan (Nov 29 2021 at 15:40):

so there is no way the cname kernel can work

I do not understand this, cname just produces a string representation of the kernel/an unique name for it.

Tomas Skrivan (Nov 29 2021 at 15:42):

You should make a Map2 type class and implement it for every kernel. (The implementation can then refer to a concrete lean_gpu_map_* function).

Yes, I will fallback to do this if JIT compilation is not possible.

Reid Barton (Nov 29 2021 at 15:43):

I don't know much about writing code for GPUs but don't they basically have their own kind of phase distinction anyways? I mean it is not like Lean targets GPUs right?

Sebastian Ullrich (Nov 29 2021 at 15:44):

Personally I would prefer a Lean macro over a C macro here even with JIT :)

Tomas Skrivan (Nov 29 2021 at 15:45):

Yes it doesn't, that is why I have GPUExpr which is in some sense a subset of Lean.Expr.

Tomas Skrivan (Nov 29 2021 at 15:45):

Personally I would prefer a Lean macro over a C macro here even with JIT :)

Totally ok with that :)

Tomas Skrivan (Nov 29 2021 at 15:45):

And probably a much better idea.

Reid Barton (Nov 29 2021 at 15:46):

How does FFI to a C macro work? Does Lean wrap the macro in a function and link against that? Or emit the macro directly in generated C code?

Tomas Skrivan (Nov 29 2021 at 15:46):

Gabriel Ebner said:

Lean doesn't monomorphize*, so there is no way the cname kernel can work.

I think the mock up code is too simplistic and confusing. In the example, the type ofGPUExpr is always float. I would generate different kernels for int double etc. So GPUExpr should be already "monomorphized"

Tomas Skrivan (Nov 29 2021 at 15:58):

Ohh can macro invoke a compiler? Then I can write

map₂ (λ x y : Float => a * x + y) array1 array2

where map₂ is a macro that turns λ x y : Float => a * x + y into GPUExpr and defines appropriate instance for Map2(as @Gabriel Ebner suggested)

Gabriel Ebner (Nov 29 2021 at 15:59):

Macros can't, but you can with a custom elaborator which is almost the same thing.

Mac (Nov 29 2021 at 21:16):

Tomas Skrivan said:

so there is no way the cname kernel can work

I do not understand this, cname just produces a string representation of the kernel/an unique name for it.

The problem is that there is no guarantee that the value of kernel is known at compile time (which is when it needs to know what external function to dispatch to).

Tomas Skrivan (Nov 30 2021 at 11:04):

Mac said:

The problem is that there is no guarantee that the value of kernel is known at compile time (which is when it needs to know what external function to dispatch to).

I was thinking Lean might be able to compile and link at runtime. It looks like it is not the case, I'm ok with the custom elaborator solution for now(once I figure out how to do it :D).

Sometime ago, I wrote a plugin for one program so you can use C++ as a "scripting language", write snippet of C++ code, compile it, link it at runtime and run it without restarting the host application. I can imagine doing something similar with lean. Writing a function taking GPUExpr spitting out GPUCompiledExpr, that is implemented with lean_external_object that holds a pointer to the compiled kernel code. Then I can have a map function map : GPUCompiledExpr -> GPUArray -> GPUArray that can have a single external definition.


Last updated: Dec 20 2023 at 11:08 UTC