Zulip Chat Archive

Stream: general

Topic: SciLean: scientific computing with Lean

Tomas Skrivan (Jan 15 2022 at 17:10):

For quite some time I have been working on a library that would allow writing scientific computing code in Lean. As Lean can formalize mathematics, I think it is an ideal programming language for math heavy code such as physics simulation or machine learning. Either as a way more powerful glue code, like Python, or leverage its meta programming capabilities, like Julia, or why not to turn Lean into computer algebra system, like Mathematica, if it already understands lots of mathematics.

I have finally arrived to a point where it makes sense to share my code.

Currently, I have only one working example worth mentioning and it is a simulation of wave equation, full code:

Wave equation can be defined through its Hamiltonian

def H (m k : ℝ) (x p : ℝ^n) :=
  let Δx := (1 : ℝ)/(n : ℝ)
  (Δx/(2*m)) * ∥p∥² + (Δx * k/2) * (∑ i, ∥x[i] - x[i-1]∥²)

(it is already discretized in space)

The simulation of such Hamiltonian system with Runge-Kutta 4 can be implemented with the following code:

def solver (m k : ℝ) (steps : Nat) : Impl (ode_solve (HamiltonianSystem (H n m k))) :=
by
  -- Unfold Hamiltonian definition and compute gradients
  simp[HamiltonianSystem, H]
  autograd
  autograd

  -- Apply RK4 method
  rw [ode_solve_fixed_dt runge_kutta4_step]
  lift_limit steps "Number of ODE solver steps."; admit; simp

  finish_impl

Roughly speaking:

simp[HamiltonianSystem, H]; autograd; autograd expands definitions and preform symbolic differentiation
rw [ode_solve_fixed_dt runge_kutta4_step] replaces noncomputable function ode_solve with Runge-Kutta 4 integration scheme
lift_limit steps "Number of ODE solver steps."; admit; simp says we are computing the specification only in an approximate sense and we choose steps integration steps

This then produces this animation
wave.gif

See the readme file for more detailed explanation. (It assumes you do not know Lean, so something might sound silly)

The current state of the library is that I have a solid base which is capable automatically prove that function is smooth, linear or has adjoint and it can also automatically compute differentials, adjoints and gradients. To a very limited degree it can also automatically invert functions, but that is currently only used in computing adjoints of expressions involving sums, as you sometimes need to reindex sums.

There is tons of directions I would like to explore:

Variational calculus
The above example shows wave equation that has already been discretized in space. I want to start with the continuous version of Hamiltonian and choose the space discretization in similar fashion as I picked the time discretization with RK4.
I have some rudimentary results with functional derivative. For this purpose I have smooth/linear lambda abstraction, i.e. you can write λ (x : X) (r : ℝ) ⊸ r*x the ⊸ indicate it is a linear function in every argument. Writing λ (r : ℝ) ⊸ r*r produces error as the function is not linear. example code
Numerical Algebra
I have already shared some of the results here, it mainly provides high level interface for array/tensor like objects.
Together with @Arthur Paulino we are working on a library that would allow somewhat automatically generating C code for unboxed arrays like FloatArray. Hopefully this will also extends to GPUs.
Wrapping existing linear solvers is also highly desirable.
Symbolic manipulation with polynomials
I want to do some computations with polynomials, differential forms and tensor products.
One big result I want is computational isomorphism between polynomials of n+1 variables and polynomials in one variable but values in polynomials with n variables i.e. 𝓟[ℝⁿ⁺¹, ℝ] ≅ 𝓟[ℝ, 𝓟[ℝⁿ, ℝ]]. This way, I can easily get Horner form of a polynomial(crucial for fast evaluation) in several variables just from Horner form of polynomial in one variable.
Some incoherent experimentation in this direction is here
(semi)Prismatic sets
In physics simulation, meshes are everywhere. Most commonly triangular or simplicial meshes. I want to have unified treatment of these and want to include meshes with quads, hexes, prisms and pyramids. I call these prisms and they can be generated by a point and two operations: cone and prod. I have inductive definition here, for example line segment is cone point, square is product of two segments prod (cone points) (cone point). The inclusion map of one Prism to another is represented by the inductive type Prism.Face and composition is Face.ofFace. This forms direct analog of simplex category(containing only inclusions). So I would like to define (semi)prismatic sets as presheaf over Prism category. This should be the core tool for dealing with meshes on top of which I can start building finite element/difference/volume code.
Machine learning
I have defined the code for forward and reverse mode differential operators and their core simplification identities, forward and reverse. These identities are expressing nothing more then functionality of (co)tangent map between manifolds. The task is to generalize the identities I have for normal differential to these two differential operators. Then we can hopefully unleash it on my mockup of VGG and hopefully get back-propagation.

I'm happy to hear any comments, suggestions. Or if anyone is interested in any of these topics I'm happy for any helping hand :)

However, the current thing I should focus on is probably optimizing order/priority of all simp lemmas and type class instances. As timeouts are sometimes really bad and I'm constantly running into infinite loops.

Yaël Dillies (Jan 15 2022 at 17:15):

Wow, that's great! This is part of my Cambridge computing project, so I'll refrain to look at the code :smile:

Eric Wieser (Jan 15 2022 at 17:30):

Is this the first "real" project depending on Mathlib4?

Arthur Paulino (Jan 15 2022 at 17:46):

That's some decent amount of work :open_mouth:
The usability of a machine learning lib is highly dependent on the existence of model deployment tools or at least a nice interactive REPL environment, so I think it would be a lower priority target at the moment

Tomas Skrivan (Jan 15 2022 at 20:34):

Another big direction I want to take is to use diffeological spaces to work with manifolds. Currently, I'm working only with convenient vector spaces and smooth maps between them. Therefore, I can't for example say what is a smooth map between spaces of all invertible functions. At some point I want to formulate fluid simulation as a geodesic path on the space of all volume preserving maps.

(I'm not actually fully formalizing all the math, not even providing full definitions, but I'm just peeking in to books and making sure it can be fully formalized at some point. Right now, I'm focusing on the computational aspect of it.)

I have very limited understanding of diffeology, so having someone who understands it would be nice. My current idea is to maybe to work only with diffeological spaces that are quotients of vector spaces and smooth map would be defined as a quotient of a smooth map. Not sure sure if this is a completely valid but should preserve reasonably well the computational aspect. It also naturally reflects how we usually work computationally with manifolds. For example, a point on a sphere is modeled as (x,y,z) with a condition |(x,y,z)| = 1. From time to time you have to project the point back on the sphere. This projection exactly defines the relation with which you take the quotient. Computationally, this is way way better then covering sphere with some maps and working with those.

Also I know there are some problems with tangent spaces of a general diffeological spaces that I do not fully understand. This probably gives some condition on the relation with which I would do the quotient. Maybe defining the relation based on a function satisfying implicit function theorem?

Patrick Massot (Jan 15 2022 at 21:50):

I think it's really great that someone tries to do numerical analysis in Lean! I had no idea that diffeological spaces could be useful in numerical maths. I only very vaguely know them as an esoteric corner of differential geometry. I wasn't aware of any link with anything else than diffeological spaces applied to diffeological spaces.

Tomas Skrivan (Jan 15 2022 at 22:06):

For example this paper on differentiable programming: 𝜆ₛ: computable semantics for differentiable programming with higher-order functions and datatypes uses diffeological spaces as a tool for soundness of their method.

I'm mostly following the first chapter of The Convenient Setting of Global Analysis. I especially like the part of the introduction:

An eminent mathematician once said that for infinite dimensional calculus each
serious application needs its own foundation. By a serious application one obviously
means some application of a hard inverse function theorem. These theorems can
be proved, if by assuming enough a priori estimates one creates enough Banach
space situation for some modified iteration procedure to converge. Many authors
try to build their platonic idea of an a priori estimate into their differential calculus.
We think that this makes the calculus inapplicable and hides the origin of the a
priori estimates. We believe that the calculus itself should be as easy to use as
possible, and that all further assumptions (which most often come from ellipticity
of some nonlinear partial differential equation of geometric origin) should be treated
separately, in a setting depending on the specific problem

This totally aligns with my experience from my studies, mostly analysis and PDEs, and the total disconnect on how mathematicians and physicist treat similar problems.

All transformations in SciLean can be(hopefully will be one day) formally argued by using convenient calculus (or diffeological spaces). To actually prove any kind of convergence of the generated algorithms you then have to go and manually provide some apriory estimates, choose correct Banach/Sobolev space etc. However, this is not a task I'm particularly interested in and it is super hard(yes looking at you Navier-Stokes).

Tomas Skrivan (Jan 15 2022 at 22:10):

My hope is to automatically prove consistency i.e. if the algorithm converges then it actually solves the specified problem.

However, the overruling principle is to minimize the human time it takes to go from formal specification of the problem to the actually executable code.

Patrick Massot (Jan 16 2022 at 11:03):

This is really surprising to me. It would be really super nice if this were actually useful. But I fear you'll have a very hard time finding experts here. Maybe you should email the authors of this paper and try to get them interested?

Tomas Skrivan (Jan 16 2022 at 11:47):

I will try that, but maybe I should read that paper properly first :)

Joe Hendrix (Jan 16 2022 at 16:54):

@Tomas Skrivan I'll mention that Ben Sherman (the first author of the paper) knows Lean -- he was an intern at Galois when we were working on formalizing a distributed protocol.

Tomas Skrivan (Jan 16 2022 at 16:55):

Cool! Thanks a lot.

Tomas Skrivan (Jan 19 2022 at 20:49):

ASCII art is cool but fully raytraced animations are cooler. I wrote a plugin for Houdini, software for visual effects similar to Blender, and now I can use Lean a scripting language in it. Visualization of the wave simulation: wave.mp4

Tomas Skrivan (May 08 2023 at 22:30):

I have cool new example to show.

The task it to shoot a target with a cannon i.e. find initial velocity for ballistic motion such that it hits specified target.

I wrote a widget that aims at your mouse :)
Peek-2023-05-08-17-59.gif

Here a description how is it done:

Ballistic motion is a solution to these differential equations:

$\begin{align} \dot x &= v \\ \dot v &= g - f(v) \end{align}$

where $f$ is some function describing air drag. In Lean this system is defined with the function ballisticMotion (with $f(v) = (5+\|v\|)v$

def balisticMotion (x v : ℝ×ℝ) := (v, g  - (5 + ‖v‖) • v)

The functionaimToTarget finds approximately the initial velocity to hit target at time T. The specification of this function is:

approx aimToTarget (v₀ : ℝ×ℝ) (optimizationRate : ℝ) :=
  λ (T : ℝ) (target : ℝ×ℝ) =>
  let shoot := λ (t : ℝ) (v : ℝ×ℝ) =>
                 odeSolve (t₀ := 0) (x₀ := ((0:ℝ×ℝ),v)) (t := t)
                   (f := λ (t : ℝ) (x,v) => balisticMotion x v)
  (λ v => (shoot T v).1)⁻¹ target

Where we define a local function shoot that returns a position an velocity of projectile at time t shot with initial velocity v. So if we want to hit the target at time T we are interested in inverting the function shoot, i.e. (λ v => (shoot T v).1)⁻¹ target.

The arguments v₀ and optimizationRate are later used as initial parameters for gradient descent.

This specification is then turned into executable code using tactics. The steps are roughly follows

Reformulate inverse as minimization and apply gradient descent
Run automatic differentiation - this will formulate the adjoint problem because we are differentiating through solution of ODEs
choose ODEs discretization for the forward and backward pass of the adjoint formulation

Unfortunately, my automatic differentiation is still not working completely smoothly, so there is no neat tactic called autodiff but bunch of calls to fun_trans which is a generic tactic doing function transformations.

If you want to see more details, here is the code.

Jeremy Avigad (May 09 2023 at 13:10):

Does this mean that we can now apply for DARPA grants for widgets?

Tomas Skrivan (Mar 01 2024 at 14:40):

I have a cool new picture to show :grinning_face_with_smiling_eyes: I can solve Laplace equation using Lean. Here is the solution of Laplace equation in a disk with boundary condition $u(\theta) = sin (6\theta)$ .
harmonic.png

You might be wondering why it is so bumpy, the thing is that I'm using a probabilistic algorithm. Let me explain how it works.

The starting point is the mean value property of harmonic functions(functions that satisfy Laplace equation). It is saying that the value at the point $x$ is the average of all the values on the sphere $S(x,r)$

$u(x) = \frac{1}{4\pi r^2} \int_{S(x,r)} u(y) dy$

Inspired by this property I define this recursive function

noncomputable
def harmonicRec (n : ℕ) (φ : Vec3 → Float) (g : Vec3 → Y) (x : Vec3) : Y :=
  match n with
  | 0 => g x
  | m+1 => (4*π)⁻¹ • ∫ (x' : sphere 0 1), harmonicRec m φ g (x + φ x • x'.1)

Which recursively computes averages over the sphere at the point x with the radius φ x. Where φ is the distance function to the boundary and g defines the boundary conditions.

The hope is that harminicRec n φ g converges(as n → ∞) to the solution u of Laplace equation on {x : φ | 0 < φ x} and u x = g x for φ x = 0.

As you have noticed harmonicRec is a noncomputable function so let's apply Monte Carlo method to evaluate all those integrals. The resulting algorithm is called walk on spheres. Here is the code turning harminicRec into a probabilistic program:

def walkOnSpheres (φ : Vec3 → Float) (g : Vec3 → Y) (n : ℕ) (x : Vec3) : Rand Y := do
  let f : Rand (Vec3 → Y) :=
    derive_random_approx
      (fun x => harmonicRec n φ g x)
    by
      induction n n' prev h
        . simp[harmonicRec]
        . simp[harmonicRec,h]
          simp only [smul_push,cintegral.arg_f.push_lambda]
          rw[integral_as_uniform_E Float]
      rw[pull_E_nat_recOn (x₀:=_) (r:=_) (hf:=by fun_prop)]
      simp (config:={zeta:=false})
  let f' ← f
  pure (f' x)

The idea is that the rw[integral_as_uniform_E Float] rewrites those integrals into expectation over a uniform random variable on a sphere. Then rw[pull_E_nat_recOn ...] pulls all those expectations out of the recursor and you end up with an expectation of some recursively defined random variable.

The resulting probabilistic program is:

Nat.rec
   (pure fun (x : Vec3) => g x)
   (fun (n : ℕ) (u : Rand (Vec3 → Y)) => do
        let u' ← u
        let y' ← uniform ↑(sphere 0 1.0)
        pure (fun (x : Vec3) =>  u' (x + φ x • ↑y')))
      n)

which recursively defines a random function u : Rand (Vec3 → Y). At each step it samples this function u' and samples a random point on a sphere y' then returns fun x => u' (x + φ x • y').

Tomas Skrivan (Mar 01 2024 at 14:50):

Or here is a piggy with very cold nose but very hot ears. We can check the heat distribution in his head :)
harmonicPig.png

Tomas Skrivan (Mar 04 2025 at 14:13):

After a long time and lots of mostly invisible work(making sure things scale) I have a new exciting results. Sam Estep at CMU is working on tool, Gradbench, to compare different automatic differentiation tools so I have preliminary results for SciLean.

The benchmark I implemented is Gaussian Mixture Model(i.e. fitting data with a collection of Gaussians).

Evaluation times for primal value i.e. evaluation of the loss function:

visualization.png

Not very encouraging, SciLean is dead last here.

(The plot is a bit crowded, you can go here and click on gmm on top left to see an interactive version of the plot)

Evaluation times of the derivative:

visualization (1).png

Encouraging, not last anymore.

However, what measures the quality of automatic differentiation is the overhead introduced by computing the derivative. If we look at the ratio derivative/primal then SciLean is fairly well. For example beating Enzyme here.

visualization (2).png

One of the next step I should probably focus on is to learn how to use profiler and investigate why the evaluation of the primal value is so slow. I'm suspecting that adding few @[inline] and fixing places where I make accidental copies might take me quite far.

Tomas Skrivan (Mar 17 2025 at 14:15):

I have done some profiling and implemented a new benchmark.

The benchmark is k-mean clustering, which groups set of points into k groups such that this function is minimized:

def objective {n k d : ℕ} (points : Float^[d]^[n]) (centroids : Float^[d]^[k]) :=
  ∑ᴵ i, minᴵ j, ∑ᴵ l, (points[i,l] - centroids[j,l])^2

After some optimizations, this function compiles down to equivalent program in C and is as fast as such C program! Which is really nice assurance that it is in fact possible to set up all the abstraction such that it gets compiled away.

The computation of derivative is not as fast as direct implementation in C but it is getting closer. Here is comparison to other tools:

visualization (3).png

Alex Meiburg (Mar 17 2025 at 14:30):

I would be very curious to know what kind of optimizations those are! :) Maybe even just dropping that in a blogpost or something, I'm sure it would be a useful guide for others too... as someone who has very little sense about when Lean is slow or fast, it would be super nice to learn about a real "in-practice" example like this!

Tomas Skrivan (Mar 17 2025 at 15:47):

Most of the magic is hidden in ∑ᴵ vs mathlib's ∑. To get to C level I had to do the following things:

custom Fintype: Mathlib's Fintype or FinEnum are really bad performance wise. SciLean has IndexType which is similar to FinEnum but with careful implementation. Operations with superscript ᴵ like ∑ᴵ are over IndexType rather than Fintype.
Nat vs USize: When writing raw for loops, like suming all numbers of an array, there is a noticeable performance hit for using Nat compared fixed size numbers like USize. Thus I have Idx n instead of Fin n which is based on USize and IndexType I n provides equivalenceI ≃ Idx n. I'm likely inconsistent as I probably somewhere assume that Idx m × Idx n ≃ Idx (m*n). This can be solved by making Idx n opaque and stating that its semantics is of Fin n. Lean is doing something similar already, try running Array.mkArray USize.size 0 :)
external vs internal iterators: IndexType used to have internal iterators i.e. extended Stream for writing loops over finite types. This is slow. Now I have external iterators i.e. class IndexType.Fold I providing fold/forIn function over I.
inline "everything": I had to inline lots of functions. An example, for x : Float^[m,n] the expression x[i,j] which is syntax sugar for getElem x (i,j) (...) and this should translate to x.toByteArray.ugetFloat (i*n+j) (...). Also lots of the high order function, like sum : (I → X) → X should be marked with @[specialize]. I still to not know when exactly should I use @[inline] or @[specialize] so I end up adding both.

I still need to do more profiling and optimization on the derivative computation, my best Lean implementation is still 75% slower than the identical C implementation. There will be some overhead due to the fact that Lean has to do reference counter check before each mutation, but I'm not getting any substantial speed up even if I used unsafe mutation function that does not do that check. There is something I still do not understand.

Tomas Skrivan (Mar 17 2025 at 16:07):

I think I have also encountered a performance bug. I have these structures

structure DataArray (X : Type*) [pd : PlainDataType X] where
  data : ByteArray
  h_size : data.size % pd.bytes = 0

structure DataArrayN (X I : Type*) {n} [PlainDataType X] [IndexType I n] where
  data : DataArray X
  h_size : data.size  = n

structure Float64Array where
  data : ByteArray
  h_size : data.size % 8 = 0

and functions to convert from DataArrayN to Float64Array

def toDataArrayN (x : Float64Array) : DataArrayN Float (Idx.size x) := ⟨⟨x.1, ...⟩, ...⟩
def toFloat64Array (x : DataArrayN Float I) : Float64Array := ⟨x.1.1, ...⟩

these should compile to identity but don't seem to. I didn't check the IR but if I implement them with cast sorry x I get faster code.

Henrik Böving (Mar 17 2025 at 17:17):

(deleted)

Robert Joseph (Mar 18 2025 at 05:59):

This is so cool @Tomas Skrivan ! Will you also be releasing with scilean like some of these profiling benchmark tools or is it just using gradbench?

Tomas Skrivan (Mar 18 2025 at 08:28):

You can find implementation of the GradBench benchmarks in Test/GradBench directory. Those tests are only to ensure that I still compute the same derivative. You can't easily really run those but you can have a look at the generated code for the derivative.

In this directory I have tests comparing different implementations of parts of those benchmarks. They produce output like this

k-means profile test
k := 1000, n := 10000, d := 16

reference c impl       time := 40.716340ms  loss := 7202.578688
best lean impl         time := 44.375879ms  loss := 7202.578688
scilean no BLAS        time := 79.685435ms  loss := 7202.578688
target scilean impl    time := 1280.020544ms    loss := 7202.578688

Tomas Skrivan (Mar 18 2025 at 08:30):

(deleted)

Arthur Paulino (Mar 18 2025 at 11:06):

Is there a chance it's behind because of garbage collection?

Tomas Skrivan (Mar 18 2025 at 12:13):

I don't think so as I would see something like lean_free_object with perf.

Last updated: May 02 2025 at 03:31 UTC