Zulip Chat Archive

Stream: lean4

Topic: a numerical lib for Lean 4


Arthur Paulino (Dec 08 2021 at 22:58):

Hi everyone! I'm building a lib for Lean 4 called NumLean which allows low level matricial operations. Right now, it's doing this:

import NumLean

def main : IO Unit := do
  let id  NLMatrix.id 5
  -- let ones ← NLMatrix.new 5 5 1
  let t : Tensor  Tensor.new id  plusF 4  plusF 6.0
  let m' : NLMatrix  t.compute
  IO.println $  m'.toString
-- 11.0 10.0 10.0 10.0 10.0
-- 10.0 10.0  6.0  6.0  6.0 ← these 6's are bugs
-- 10.0 10.0 11.0 10.0 10.0
-- 10.0 10.0 10.0 11.0 10.0
-- 10.0 10.0 10.0 10.0 11.0

The problem is that some memory issue is happening and I can't spot it. The commented output shows some unintended 6's. Would appreciate some help on this one. This is the repo link for those who are interested: https://github.com/arthurpaulino/NumLean

Henrik Böving (Dec 08 2021 at 23:24):

Are you aware of the other efforts around tensor and matrix operations at the moment? @Arthur Paulino

Arthur Paulino (Dec 09 2021 at 00:05):

Henrik Böving said:

Are you aware of the other efforts around tensor and matrix operations at the moment? Arthur Paulino

Not quite. I'm facing this more like a practice exercise/experiment, as I'm really stretching my brain while thinking in terms of usage design in Lean. I'm also getting to know Lean's FFI better in this process, and this knowledge can be useful to integrate more powerful stuff like CUDA someday.

I know @Tomas Skrivan has been working on something more high level, but he said that he's not worrying about performance/backend at the moment.

Do you know other projects around this subject?

Arthur Paulino (Dec 09 2021 at 00:15):

If I search on GitHub for "Lean" and "matrix" then filter results related to the Lean language, it shows two results:
https://github.com/search?l=Lean&q=matrix+lean&type=Repositories

Henrik Böving (Dec 09 2021 at 00:17):

Ah, the container thing from @Tomas Skrivan was indeed what I had in mind yes, would be very cool if the more performant API would end up supporting their abstraction though

Arthur Paulino (Dec 09 2021 at 00:20):

It can definitely be done, but for now I'm struggling with this (supposedly) simple task because I don't seem to be able to sum floats (doubles actually) to a matrix properly :smiley:

Arthur Paulino (Dec 09 2021 at 02:33):

Okay this is really strange. Probably a bug.

def main : IO Unit := do
  let id  NLMatrix.id 5
  IO.println $  id.toString
  IO.println "-------------------------------"
  -- let ones ← NLMatrix.new 5 5 1
  let t : Tensor  Tensor.new id  plusF 4  plusF 6.0
  let m' : NLMatrix  t.compute
  IO.println $  m'.toString

Produces:

1.0 0.0 0.0 0.0 0.0
0.0 1.0 0.0 0.0 0.0
0.0 0.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 0.0
0.0 0.0 0.0 0.0 1.0
-------------------------------
11.0 10.0 10.0 10.0 10.0
10.0 11.0 10.0 10.0 10.0
10.0 10.0 11.0 10.0 10.0
10.0 10.0 10.0 11.0 10.0
10.0 10.0 10.0 10.0 11.0

Which is correct. But the following code:

def main : IO Unit := do
  let id  NLMatrix.id 5
  -- IO.println $ ← id.toString -- this commented line is the only difference
  IO.println "-------------------------------"
  -- let ones ← NLMatrix.new 5 5 1
  let t : Tensor  Tensor.new id  plusF 4  plusF 6.0
  let m' : NLMatrix  t.compute
  IO.println $  m'.toString

Produces:

-------------------------------
11.0 10.0 10.0 10.0 10.0
10.0 11.0 10.0 10.0 10.0
10.0 10.0 11.0 10.0 10.0
10.0 10.0 10.0 11.0 10.0
10.0 10.0 10.0  6.0  6.0

Which is incorrect. Any clue what might be happening here?

Mario Carneiro (Dec 09 2021 at 03:34):

Your pointer arithmetic is wrong here:

internal void set_val_(nl_matrix* m, uint32_t i, double v) {
    *(m->data + i * sizeof(double)) = v; // this product may overflow
}

m->data is already a double* pointer, so advancing by i already skips i doubles. So advancing i * sizeof(double) is double counting

Mario Carneiro (Dec 09 2021 at 03:35):

protip: bounds check your pointer accesses

Arthur Paulino (Dec 09 2021 at 03:36):

Thank you so much! I'm gonna take a look at it tomorrow :pray:

Mario Carneiro (Dec 09 2021 at 03:37):

is there a reason you aren't just writing m->data[i] = v?

Arthur Paulino (Dec 09 2021 at 03:40):

Not at all. Just rusty pointer arithmetic skills :D

Arthur Paulino (Dec 09 2021 at 03:40):

No idea why I thought that regular indexing wouldn't work

Tomas Skrivan (Dec 09 2021 at 07:27):

Great! I would be really happy if someone else gets low level API for matrices working, so I don't have to do it :)

I have also started thinking about this a little bit. And my current plan on how to approach this:

  1. on CPU just use ByteArray, on GPU write a custom GPUByteArray (that is just a chunk of memory sitting in GPU memory)
  2. Define class CReflected (X : Type) and GPUReflected (X : Type) that provides functions how to read/write X from/to ByteArray. These functions will be mainly used for consistency and not for performance.
  3. Define CArray (X : Type) and GPUArray (X : Type) just a warper around ByteArrays making sure the buffer size is multiple of sizeof X
  4. Implement something like CArray.mapIdx (kernel : CExpr (UInt64 -> X -> Y)) (arr : CArray X) : CArray Y. Where CExpr T is a subtype of Lean.Expr that is compilable to the specific target, CPU or GPU. Still not sure how to do this, maybe you have to define a new mapIdx for every kernel, write a custom elaborator that takes a syntax, gets expression, checks if it can be compiled to the specific target, generate code, compile and the define appropriate version of mapIdx with specific 'extern' attribute.

Tomas Skrivan (Dec 09 2021 at 13:14):

For reference, in this thread I asked if the extern code can depend on function arguments. In this case the code would depend on kernel. It can't and the conclusion is to write custom elaborator that would generate a new mapIdx for every kernel.

Tomas Skrivan (Dec 09 2021 at 13:25):

Also I have some half baked code that defines strucuture Kernel specifying basic elements for a target you want to compile down to, C, cuda, OpenCL ... These core elements of Kernel are:

  1. ByteArray type
  2. internal types like float double int ... and pointers to const data float const*
  3. internal n-ary functions like float fadd(float a, float b), or 0-ary function like nulltpr

Tomas Skrivan (Dec 09 2021 at 13:25):

The definition of Kernel:

structure Kernel where
  KByteArray : Type
  byteArraySize : KByteArray  Nat
  byteArrayRead : (arr : KByteArray)  (i : Fin (byteArraySize arr))  UInt8
  malloc : (size : Nat)  KByteArray
  malloc_size :  n, byteArraySize (malloc n) = n

  KType : Type
  void : KType
  ptr : KType  KType       -- internal way to talk about buffers
  typeDec : DecidableEq KType
  typeName : KType  String
  typeSize : KType  Nat    -- size in bytes

  KFun : Type
  funDec : DecidableEq KFun
  funName : KFun  String
  funArgTypes : KFun  Array KType
  funOutType  : KFun  KType

  null : KFun
  null_is_ptr   : funOutType null = ptr void
  null_is_const : (funArgTypes null).size = 0

  -- Execute a function
  -- Stack all inputs in the order into a byte array and produce output
  -- This form is mainly used to prove stuff and provide basic runtime
  -- However it is not designed for speed!
  execute : KFun  KByteArray  KByteArray
  -- Just to make sure the output has the right size
  execute_otput :  f input, byteArraySize (execute f input) = typeSize (funOutType f)

Tomas Skrivan (Dec 09 2021 at 13:29):

Then expression for a specific kernel KExpr is just a subset of Lean.Expr. It is assume that the target is not functional programming language so there is no lambda abstraction. However each expression can have bound variables such that as a whole it can be treated as a function.

  inductive KExpr (k : Kernel) where
    | bvar  : (i : Nat)  KExpr k
    | const : (f : KFun k)  KExpr k
    | app   : (f : KExpr k)  (arg : KExpr k)  KExpr k

Tomas Skrivan (Dec 09 2021 at 13:32):

To communicate between Lean types and KType there is a class ReflectedType

  class ReflectedType (k : Kernel) (α : Type) extends Inhabited α where
    t : KType k
    readBytes  : (KBuffer k t.sizeof)  α
    writeBytes : (KBuffer k t.sizeof)  α  KByteArray k
    --- and some compatibility statements between read and write

that allows you to read and write Lean types from/to KByteArray and (KBuffer k t.sizeof) is just a KByteArray that has at least t.sizeof bytes.

Tomas Skrivan (Dec 09 2021 at 13:36):

Now I would be interested how to write a custom elaborator that for example transforms fun x : Float => 3*x + sin(x) to the appropriate kernel : KExpr and generates code for CArray.map kernel

Arthur Paulino (Dec 09 2021 at 14:27):

@Mario Carneiro You really spotted the root cause of the issues. It works nicely now! :tada:

Tomas Skrivan (Dec 09 2021 at 14:32):

Also certigrad did some bindings to Eigen. It is in Lean 3, but maybe some inspiration can be taken from there. I have never studied the source code though.

Arthur Paulino (Dec 09 2021 at 14:44):

Hm, let me explain what I'm trying to do and then you can tell me the flaws and holes in my approach. I think this will help me better understand what you're aiming for.

I've created a package that can perform matrix operations in C from Lean 4. Matrix instantiation, transposing, addition/multiplication with scalars and other matrices. In this package, I also coded a Tensor, which is an abstraction for concatenating matrix operations. A tensor also has a matrix in the head and a list of operations. Then when you call Tensor.compute it checks dimensions consistency across operations and if everything is fine then it performs the operations, returning a NLMatrix

Arthur Paulino (Dec 09 2021 at 14:46):

It happens here

Tomas Skrivan (Dec 09 2021 at 14:55):

I think you approach is fine, it is just involves too much custom C code for my liking. I want to as little work in C as possible. Also, I want to take the "expression template" approach to get as much speed as possible. This requires compiling a new C function for each expression, which is infeasible to do by hand and some automation is required.

Arthur Paulino (Dec 09 2021 at 14:57):

Can you show me some exemplary use cases for the API you want to provide?

Tomas Skrivan (Dec 09 2021 at 15:08):

Maybe something like this:

def saxpy (s : Float) (x y : Vector) := Vector.map2Idx (compile (fun xi i y' => s*xi + y'[i])) x y

where Vector is just a wrapper around ByteArrayon a specific target(e.g. CPU or GPU). The (compile expr) has to be a macro/custom elaborator that compiles and expression to some kernel function that holds pointer to linked shared library. The function Vector.map2Idx has signature Vector.map2Idx (k : CompiledKernel) (vectorToMutate : Vector) (vectorToRefer : Vector) : Vector

Then I would also write a custom macro optimize that would turn optimize (s*x+y) to Vector.map2Idx (compile (fun xi i y' => s*xi + y'[i])) x y

Tomas Skrivan (Dec 09 2021 at 15:15):

My high level tensor interface aready turns saxpy s x y to intro (λ i => s*x[i] + y[i]) (code). It is half way there, I just do not know how to decide whether x or y should be mutated in place and turn intro (λ i => s*x[i] + y[i]) to x.mapIdx (λ xi i => s*xi + y[i]) or y.mapIdx (λ yi i => s*x[i] + yi). You don't really have access to reference counts of x or y to make this decision automatically, so I'm not sure what to do.

Tomas Skrivan (Dec 09 2021 at 15:17):

@Arthur Paulino Why are your operations on matrices inside of IO monad?

Arthur Paulino (Dec 09 2021 at 15:21):

I've just updated the code, let's use this version for reference

Tomas Skrivan (Dec 09 2021 at 15:23):

Ohh I think your Tensor is something what I call "expression template". It is just some representation of operations on matrices. I find the name Tensor super confusing.

Arthur Paulino (Dec 09 2021 at 15:25):

I put everything inside IO monad because code could break in C in previous versions. But now that I'm checking for consistency before running the computations I guess I don't need to do that anymore

Arthur Paulino (Dec 09 2021 at 15:25):

Because now I can assume that, for instance, matrix A and B are of same dimensions when I sum them in C

Tomas Skrivan (Dec 09 2021 at 15:27):

The whole beauty of Lean and dependent types is exactly that you can have dimensions of matrices in types, so you can't accidentally add or multiply matrices with incorrect dimensions.

Tomas Skrivan (Dec 09 2021 at 15:28):

Also the type NLMatrix just cant form a vector space. You need dimensions in the type. And if NLMatrix can't be a vector space then I don't see the point in using Lean.

Tomas Skrivan (Dec 09 2021 at 15:29):

But of course, you can just create a subtype of NLMatrix that has a proof about dimensions :)

Arthur Paulino (Dec 09 2021 at 15:32):

Right, my code is currently limited to . First, I wanted to make sure I would be able to make the data pipeline work back and forth (Lean -> C -> Lean)

Arthur Paulino (Dec 09 2021 at 15:33):

And maaan, did I learn in the process :smiley:

Tomas Skrivan (Dec 09 2021 at 15:34):

And huge thanks for figuring it out! I was scared of that and didn't attempt doing so yet :grinning_face_with_smiling_eyes:

Tomas Skrivan (Dec 09 2021 at 15:34):

I will be stealing some parts of that setup for sure :)

Arthur Paulino (Dec 09 2021 at 15:37):

The way that I setup my design is such that Lean has no idea about the dimensions of matrices and always has to ask C for such (although always in )

Arthur Paulino (Dec 09 2021 at 15:37):

So the low level implementation is the real owner of the data

Arthur Paulino (Dec 09 2021 at 15:39):

Your approach is better and more mature, in which Lean dictates the rules and then the low level implementation has to accommodate for what's being requested by the user

Arthur Paulino (Dec 09 2021 at 15:48):

BTW your setup is pushing ~ files to GitHub like this and this, just in case you haven't noticed

Tomas Skrivan (Dec 09 2021 at 15:49):

I feel like taking your approach can get really ugly really fast. Do you pick row major or column major matrices? How are you going to deal with symmetric, anti-symmetric, diagonal, block matrices? Do you want to write custom C code for every type? And custom code for every pair-wise interaction?

Tomas Skrivan (Dec 09 2021 at 15:50):

BTW your setup is pushing ~ files to GitHub like this and this, just in case you haven't noticed

Yeah sometime I accidentally do that :( I would add *~ to gitignore

Tomas Skrivan (Dec 09 2021 at 15:51):

fixed :wink:

Tomas Skrivan (Dec 09 2021 at 15:53):

Anyway I do not want to completely detract you from your approach, I would be actually quite interested how far you can get. What will work well and what won't.

Arthur Paulino (Dec 09 2021 at 15:58):

The raw data is just an array of doubles (in contrast to arrays of arrays of doubles). I started this simple because I thought it would be easier to plug in CUDA or some other lib

Tomas Skrivan (Dec 09 2021 at 16:00):

But this (m->data[j + i * m->n_cols]) decides it is a row major matrix.

Arthur Paulino (Dec 09 2021 at 16:03):

Does it though? It's just a computation to get the real index. It doesn't impose serious restrictions like when you have to do indexing of indexing

Tomas Skrivan (Dec 09 2021 at 16:05):

Well you are using it in matrix multiplication and you will be using matrix multiplication in other functions and that way you will more or less commit to row major matrices.

Arthur Paulino (Dec 09 2021 at 16:12):

Ah, I see where the row major commitment is now. It's in the fact that the data is disposed in the array as the sequence of rows unfold

Tomas Skrivan (Dec 09 2021 at 16:13):

Yup :)

Tomas Skrivan (Dec 09 2021 at 16:15):

That is why I want to externally implement only ByteArrays, in C/C++/cuda/opencl, and to all of the other logic on Lean level.

Arthur Paulino (Dec 09 2021 at 16:29):

I'm gonna read your full code (now that I know where it is :D) and try to understand your reasoning

Arthur Paulino (Dec 09 2021 at 18:03):

@Tomas Skrivan do you have this line of thought organized/documented somewhere?

Tomas Skrivan (Dec 09 2021 at 18:11):

Let me clean it up and I will set up a repo.

Arthur Paulino (Dec 09 2021 at 18:13):

I am very interested in it and I want to understand more details about it. For instance, what is the role of ByteArrays in your approach? Wouldn't it cause translation overhead slowdown (as opposed to communicating with C directly via instances of Array Float)?

Arthur Paulino (Dec 09 2021 at 18:14):

However, I still haven't figured out how to translate an array of double in C to an instance of Array Float in lean. But I am quite sure it can be done

Mario Carneiro (Dec 09 2021 at 18:17):

What you have done is basically reimplement ByteArray, or rather DoubleArray

Mario Carneiro (Dec 09 2021 at 18:17):

Lean has a few unboxed array types that look like this

Mario Carneiro (Dec 09 2021 at 18:18):

Array Float is a boxed array type, you won't get C-like performance with this data structure

Arthur Paulino (Dec 09 2021 at 18:20):

Doesn't this function return the pointer to the double array's head?
static inline double * lean_float_array_cptr(b_lean_obj_arg a)
Maybe it's why I am facing these issues?

Arthur Paulino (Dec 09 2021 at 18:35):

EEh, nevermind. I just tried to access data from a Array Float in C and it didn't work the way I expected, so ByteArrays do seem to be the way to go

Mac (Dec 09 2021 at 19:02):

@Arthur Paulino there is FloatArray though for unboxed C arrays of double.

Arthur Paulino (Dec 09 2021 at 19:07):

Ah!! Thanks! I got confused and thought I'd have to use Float Array in Lean, but it's FloatArray!

Tomas Skrivan (Dec 09 2021 at 19:44):

Ok have put my sketch of a code into a repo, https://github.com/lecopivo/lean4-karray

The idea is to provide generic array of unboxed values of type α called KArray k α, where k : Kernel specifies if it is cpu/gpu or whatever memory.

Tomas Skrivan (Dec 09 2021 at 19:49):

I guess start by reading KArray.lean and look up definitions in Kernel.lean as they come up.

Arthur Paulino (Dec 09 2021 at 19:51):

Do you prefer PRs from a fork or from non-protected branches (if you invite me)?

Tomas Skrivan (Dec 09 2021 at 19:52):

I don't know :D I just invite you and give you full access.

Arthur Paulino (Dec 09 2021 at 19:53):

Sure, just add a license first and I will be more comfortable

Tomas Skrivan (Dec 09 2021 at 19:54):

License suggestion? MIT?

Arthur Paulino (Dec 09 2021 at 19:54):

MIT or Apache 2.0 should be fine

Tomas Skrivan (Dec 09 2021 at 19:59):

I'm fan of beerware :D ok jokes aside MIT it is then.

Kevin Buzzard (Dec 09 2021 at 23:19):

I think that in general in the community we've licensed projects using the same license as lean itself (ie Apache) but I don't know the legal reasons for this

Anne Baanen (Dec 10 2021 at 09:52):

Using the same license as Lean itself makes it easy to move code from/to core Lean. Big corporations with lots of lawyers (read: Microsoft) tend to prefer Apache over MIT since it's more precise, and addresses other forms of intellectual property including patents. So I would advocate Apache for your project if it's not too late :)

Tomas Skrivan (Dec 10 2021 at 10:06):

I'm ok switching it to Apache @Arthur Paulino ok with you too?

Arthur Paulino (Dec 10 2021 at 11:52):

Sure

Anders Christiansen Sørby (Dec 10 2021 at 12:24):

Cool project. I added a nix flake build https://github.com/arthurpaulino/NumLean/pull/1

Anders Christiansen Sørby (Dec 10 2021 at 12:26):

I've wanted to do something like this too, but Lean is quite fast itself so much of the code can be written in Lean too.

Anne Baanen (Dec 10 2021 at 12:36):

Arthur Paulino said:

Sure

Thank you! :heart:

Anders Christiansen Sørby (Dec 10 2021 at 12:56):

@Tomas Skrivan I added one for KArray too if you want it :smile: https://github.com/lecopivo/lean4-karray/pull/2

Anders Christiansen Sørby (Dec 10 2021 at 12:58):

Having a GPU library for Lean would be gold :heart_of_gold:

Chris Lovett (Jan 07 2022 at 03:31):

Here's another simple Matrix C++ implementation - also has vector and tensor and supports different layouts. It would would be interesting to compare. This implementation can also be optimized by an OpenBlas back end and can be code generated to optimized bitcode using LLVM. One could imagine a Lean implementation of such things that is compiled by a LLVM-based Lean compiler to achieve the same thing - but with all the proofs you want to ensure the code is correct.

Joe Hendrix (Sep 08 2022 at 17:15):

I was curious about the status of this (and any other linear algebra projects) in Lean. Are any numerical libraries still being worked on?

I have some primitives (BitVecs, ByteVec, polymorphic vectors and polymorphic matrices) in my crypto side project.
Mathlib seems to define matrices as functions. I think this is great for proving, but for computation we may want a variety of largely isomorphic representations of transforms.

I think it'd be great to support a variety of different approaches with different tradeoffs (in trust, performance, hardware requirements), but allow theorems to be soundly shared.

Thoughts on whether convergence to a single representation is feasible or practices to support reuse across implementations?

Mario Carneiro (Sep 08 2022 at 17:46):

My general sense on this topic is that it's fairly difficult to construct a computational type without a specific application in mind. The mathlib definition should be viewed as the noncomputable "ideal" definition, while specific applications will have some other bespoke type with the desired properties, and then you can prove isomorphism to the ideal definition and transfer all the theorems across that isomorphism

Tomas Skrivan (Sep 08 2022 at 17:50):

(deleted)

Mario Carneiro (Sep 08 2022 at 17:51):

We could have a few specific classes of computational matrices (e.g. a 2D Array A, a ByteArray encoding a bitvector, a FloatArray but this one isn't a ring) in a general library but it's easy to miss the mark if you don't know the characteristics of the application

Kevin Buzzard (Sep 08 2022 at 17:51):

For matrices doesn't one typically have several implementations for computing, depending for example on whether you know your matrix is sparse or not? I'm a bit unclear about how one is supposed to handle this kind of thing in Lean 4.

I'm also particularly interested in this kind of question at the minute, because in a few days I'll be (virtually) attending a conference at AIM on "Computational mathematics in computer assisted proofs" and someone pointed out to me that "Lean 4" might be a potential answer to a lot of questions that people have. Here's the participant list

http://admin.aimath.org/resources/compproofsv/participantlist/

and as you can see from the statements of interest, several people attending would probably want to hear what's going on in the Lean 4 community in this regard.

Tomas Skrivan (Sep 08 2022 at 18:09):

Yes I'm working on this on and off. I went through multiple iterations and currently the main interface is done through the relatively new GetElem typeclass.


Interface

For example, to get n × m matrix I would provide instance:

  instance {n m} : GetElem {a : FloatArray // n * m = a.size} (Fin n × Fin m) Float (λ _ _ => True)
    := λ a ij _ => a.1[(Enumtype.toFin ij).1]!

where Enumtype is a typeclass that has isomorphism between a finite type and Fin _. In this case, Fin n × Fin m ≈ Fin (n*m). The type Fin n × Fin m has lexicographical ordering, so you naturally get row-major matrix.

I have couple of other typeclasses for setting, modifying elements and introducing a new container

class SetElem (Cont : Type u) (Idx : Type v) (Elem : outParam (Type w)) where
  setElem : (x : Cont)  (i : Idx)  (xi : Elem)  Cont

class ModifyElem (Cont : Type u) (Idx : Type v) (Elem : outParam (Type w)) where
  modifyElem : (x : Cont)  (i : Idx)  (Elem  Elem)  Cont

class HasIntro {X Y} (T : Type) [FunType T X Y] where
  intro : (X  Y)  T
  toFun_intro :  f x, (intro f)[x] = f x

Maybe the SetElem should also have the (dom : outParam (cont → idx → Prop)) parameter. For example, you can modify only diagonal elements for diagonal matrix. I have to think about it a bit more as it should also support sparse matrices.

You can see the code here.

To get notation like x[i] += ... in do blocks. You can do

syntax atomic(Term.ident) noWs "[" term "]" " += " term : doElem
macro_rules
| `(doElem| $x:ident[ $i:term ] += $xi) => `(doElem| $x:ident := modifyElem ($x:ident) $i (λ val => val + $xi))

Carrier

What should the Cont type be? I mostly use FloatArray. I also have wrapper for C++ Eigen matrices, but it is just a proof of concept and I didn't touch it in a long time as I'm mostly focusing on symbolic computations.

Tomas Skrivan (Sep 08 2022 at 18:11):

My main worry with this approach is that the index type for matrices Fin n × Fin m is using polymorphic product which boxes its elements, so the final code can be slow if the boxing is not eliminated.

Mario Carneiro (Sep 08 2022 at 18:17):

Tomas Skrivan said:

My main worry with this approach is that the index type for matrices Fin n × Fin m is using polymorphic product which boxes its elements, so the final code can be slow if the boxing is not eliminated.

That shouldn't be a problem as long as everything in sight is marked @[inline]; lean can remove stuff like projections on a pair so as long as the user code passes arr[(i, j)] I would expect it to be untupled in the generated code

Tomas Skrivan (Sep 08 2022 at 18:21):

Yes that is my hope that it can be eliminated with sufficient in lining. Also, there should be some kind of compile time simplifier that might help with this too.

Tomas Skrivan (Sep 08 2022 at 19:39):

Joe Hendrix said:

Thoughts on whether convergence to a single representation is feasible or practices to support reuse across implementations?

We definitely need multiple representations. Just for matrices we need row/column-major, triangular, block, diagonal matrices, also is for CPU, GPU or distributed across machines?

James Gallicchio (Sep 08 2022 at 21:08):

Are there any well-established libraries for matrix operations that we can shamelessly grift an API from?

James Gallicchio (Sep 08 2022 at 21:10):

I don't know that we need to care about the efficiency of stuff like GetElem esque operations, since performance-sensitive matrix code should be exclusively using large-step operations like matrix mult/add/sub, factorization, etc...

Joe Hendrix (Sep 08 2022 at 21:22):

@Tomas Skrivan I like the typeclasses. It seems like the ModifyElem/SetElem type classes would be worth moving into Lean or MathLib. For my current purposes, it'd also be great to have comprehensions (including parallel comprehensions) generalized to support other types such as vectors/matrices.

I think the approach I'll take is to show my implementation types and definitions are isomorphic to the Mathlib definitions.

Tomas Skrivan (Sep 08 2022 at 21:37):

James Gallicchio said:

Are there any well-established libraries for matrix operations that we can shamelessly grift an API from?

That is exactly what I want to do with wrapping C++ Eigen library. It is a fairly popular linear algebra library in C++ with a nice collection of dense and sparse solvers. Someone just needs to sit down and write down all the wrapper functions. So far I wrote only a single wrapper for LDLT to demonstrate how to do it.

Tomas Skrivan (Sep 08 2022 at 21:40):

James Gallicchio said:

I don't know that we need to care about the efficiency of stuff like GetElem esque operations, since performance-sensitive matrix code should be exclusively using large-step operations like matrix mult/add/sub, factorization, etc...

My hope is that if you effectively write C code with Lean then equivalent C code will be generated. Maybe with some special annotations to the compiler or something. But definitely not something I'm too worried about in the near future.


Last updated: Dec 20 2023 at 11:08 UTC