Zulip Chat Archive

Stream: mathlib4

Topic: Help with understanding native_decide performance

Toby Cathcart Burn (Jul 18 2025 at 14:13):

Could anyone help me understand why

import Mathlib

def k := 500
def qs : List (ℚ) := (List.range k).map (fun n => (1/n))
def bigMat : Matrix qs.toFinset (List.range k).toFinset ℚ  := Matrix.of (fun _ _ => 0)
def sz : ℕ := (Finset.card (qs.toFinset))
def v2  := Matrix.vecMul (fun e => e.val) bigMat
theorem t : Matrix.vecMul (fun e => e.val) bigMat = v2
  := by native_decide

takes 5 seconds to run,

import Mathlib

def k := 500
def qs : List (ℚ) := (List.range k).map (fun n => (1/n))
def bigMat : Matrix qs.toFinset (List.range k).toFinset ℚ  := Matrix.of (fun _ _ => 0)
def v2  := Matrix.vecMul (fun e => e.val) bigMat
def sz : ℕ := (Finset.card (qs.toFinset))
theorem t : Matrix.vecMul (fun e => e.val) bigMat = v2
  := by native_decide

(definitions of v2 and sz have swapped places) takes 30 seconds, and

import Mathlib

def k := 500
def qs : List (ℚ) := (List.range k).map (fun n => (1/n))
def bigMat : Matrix qs.toFinset (List.range k).toFinset ℚ  := Matrix.of (fun _ _ => 0)
theorem t : Matrix.vecMul (fun e => e.val) bigMat = Matrix.vecMul (fun e => e.val) bigMat
  := by native_decide

takes 5 seconds under lean v4.20-rc5 and 30 seconds using lean v4.22-rc3?

The motivation behind this question is a result that takes 5 minutes to verify under lean v4.20, but doesn't finish if left overnight under lean v4.22.

Jovan Gerbscheid (Jul 18 2025 at 20:07):

native_decide uses Lean's compiler to evaluate whether the result holds, instead of Lean's kernel (which is why people prefer not to use it). It turns out that the compiler generates different code depending on the order in which v2 and sz have been defined (which I find very surprising). However, mathlib's mathematical functions like Matrix.vecMul are not intended to be compiled, and hence they aren't optimized to be compiled efficiently. So basically you shouldn't rely on native_decide being efficient.

Consider proving your theorem in a different way!

Toby Cathcart Burn (Jul 18 2025 at 22:09):

Thanks. The theorem I've proved is that the area of a shape is a specific rational number with over 100 digits in the numerator and denominator, which makes me think it's unlikely that it can be proved without a calculation comparable to matrix multiplication. I'm aware of ways I could have done it much more efficiently, but they would all take longer to write proofs for, and I've seen enough stack overflow errors that I don't trust lean's kernel to handle the task sensibly without native_decide. I have 2 follow up questions:

Are there any resources for learning how to prove results involving large-ish data using Lean/Mathlib (e.g. what data structures are sensible for the task)?
Should anyone be notified about the performance regression in the 3rd code block I posted? The runtime appears to have changed from O(k^2) to O(k^3). If this is due to a change in Mathlib, I can understand performance not being a concern, but if it's due to a change in Lean, then the developers might want to know.

Kyle Miller (Jul 18 2025 at 22:20):

For computations, Matrix has the issue that it's a function, and the operations don't memoize any results.

I would probably implement some dense matrix data structure, or at least make a function that creates a dense matrix from a Matrix and returns a Matrix that simply reads values out of the dense matrix structure — that way you can slip it in after each vecMul/etc. operation

Toby Cathcart Burn (Jul 18 2025 at 22:40):

Since a single vecMul is the only operation directly involving the Matrix that I'm verifying computationally, and it only needs to view each entry once, I don't think memoization would help much (except for the List.toFinset in the type). Also the matrix I'm working with is sparse, so I should be able get something that runs in a few seconds, and I'll probably implement it that way now that I've realized I just need a sort rather than a mutable Map.
Do you know anything about how I should feed the data to lean? I've been using a list of short lists I generated, but the time the parser takes to process it appears to be quadratic in the length of the list, so I've split it into 100 parts which feels a bit silly.

Kyle Miller (Jul 18 2025 at 22:43):

It's likely not the parser inducing quadratic complexity, but elaboration. There's a known problem where numeric lists can be slow to elaborate; default instances can be one problem, thought it can also be from the compilation step.

Do you have a small excerpt of the list you can share?

Kyle Miller (Jul 18 2025 at 22:49):

A few ideas for ingesting large amounts of data into Lean:

use a String with your own custom data format and make a parser
use a Nat that encodes the data, again a parser
for a matrix of natural numbers, you can use List Nat and then for each Nat interpret it in some base, giving you a row of numbers (I used this trick once)
a string, but you load it from a file rather than include it in the source

Toby Cathcart Burn (Jul 18 2025 at 22:52):

Here's the first 3 elements (of just under 4000). The third is most representative. (a list of length <112, 4 lists of length <112, a rational number with large numerator and denominator). the arguments to Piece.treePiece have the types Fin 7, Fin 4 and ZMod 4.

import Mathlib
import Proof.Pieces
import Proof.Eqns

-- vol' e.1 =sum(vol' e.2)/4; e.3 is the coefficient of this eqn for linear_combination
def part0: List (List Piece × List (List Piece) × ℚ) :=[
  ([Piece.treePiece 0 0 0], [[], [Piece.treePiece 0 3 3], [], [Piece.treePiece 1 3 3]], 1/4) ,
  ([Piece.treePiece 0 0 3], [[], [Piece.treePiece 1 3 3], [Piece.treePiece 0 3 3], []], 21330466264694411118705366274969148868597023488809482950268001379227926734844653002742531656783041090887107469343925961194056902636554948928340001407335855138766815143504103340041316806185810931015033900383375938862019/154866390708323530670381178291540808500932175557095479800438653104944580267698405240939599604178028046983621932505732389308889591544694394643480685766119607291179630329136764331873408772421837889405020894961858799002472) ,
  ([Piece.treePiece 0 0 3, Piece.treePiece 0 1 1, Piece.treePiece 0 3 3, Piece.treePiece 1 2 0, Piece.treePiece 2 2 2, Piece.treePiece 3 1 1, Piece.treePiece 3 3 3, Piece.treePiece 4 2 0, Piece.treePiece 5 2 2, Piece.treePiece 6 0 3, Piece.treePiece 6 1 1], [[Piece.fullPiece], [Piece.treePiece 1 0 3, Piece.treePiece 1 1 0, Piece.treePiece 1 3 3, Piece.treePiece 2 0 2, Piece.treePiece 2 1 1, Piece.treePiece 2 3 2, Piece.treePiece 4 0 3, Piece.treePiece 4 1 0, Piece.treePiece 4 3 3, Piece.treePiece 5 0 2, Piece.treePiece 5 1 1, Piece.treePiece 5 3 2, Piece.trianglePiece 3], [Piece.fullPiece], [Piece.treePiece 1 0 3, Piece.treePiece 1 1 0, Piece.treePiece 1 3 3, Piece.treePiece 2 0 2, Piece.treePiece 2 1 1, Piece.treePiece 2 3 2, Piece.treePiece 4 0 3, Piece.treePiece 4 1 0, Piece.treePiece 4 3 3, Piece.treePiece 5 0 2, Piece.treePiece 5 1 1, Piece.treePiece 5 3 2, Piece.trianglePiece 3]], 4543266721356550064243107471444960050283001566395774737049476090769189190178861929218647835457199588650983572629663499168361114217338184992257720653768009396300288218933687996433023103186765454982462654090345294819007/88583575485161059543458033982761342462533204418658614445850909576028299913123487797817450973589832042874631745393278926684684846363565193736070952258220415370554748548266229197831589817825291272739671951918183233029413984) ,

The full file is 4MB in size and can be seen here: https://raw.githubusercontent.com/penteract/pythagTreeProof/refs/heads/master/Proof/Cert.lean

Toby Cathcart Burn (Jul 18 2025 at 22:53):

Thanks for your help, those are useful ideas.

Kyle Miller (Jul 18 2025 at 22:58):

I'm not too familiar with the code, but it might be worth looking into bv_decide, which consumes large proof certificates from CaDiCaL.

By the way, for the ingestion, you could have some elaborator where you write something like eqn% "the very large string", and then it builds the Expr from the string.

Instead of List, you likely want Array, which stores it all in a contiguous memory block at runtime.

Toby Cathcart Burn (Jul 18 2025 at 23:06):

Does Array have as much proved about it as List does? I don't see an Array.toFinset (indexing matrices by representations of equations was quite fun and convenient, although perhaps unreasonable).

Kyle Miller (Jul 18 2025 at 23:10):

It's a fairly well-supported core type, and for theory you can always reason about Array.toList of it.

By the way, Finset has a number of O(n) and O(n^2) algorithms that could be O(1) or O(n ln n) with a different type. Maybe it's not worrying about if you're working with things that only took 5s. (It's not fast, but it's not long enough to deserve a coffee break, and if you put it into its own module you can use the result without re-running it all the time.)

Kyle Miller (Jul 18 2025 at 23:12):

(@Cameron Zwarich Pinging you about a regression; not sure if we ever expected it to be fast in the first place, but maybe it's worth looking at?)

Toby Cathcart Burn (Jul 18 2025 at 23:27):

The current version takes around 250 seconds, justifying quite a few breaks, but I've recently thought of a moderately easy way to get it down to the order of 5 seconds (like the 2 other native_decides that I'm using), so I probably won't be optimizing it much further than that. I have already been taking advantage of splitting both the raw data and the proofs that need to look at the data into their own files (my proof is long enough that I'd split it up into multiple files before getting to the bit that needed serious computation).

Last updated: Feb 28 2026 at 14:05 UTC