Zulip Chat Archive

Stream: lean4

Topic: Reasoning about big sets of compile-time data


František Silváši 🦉 (Mar 24 2023 at 11:13):

The Lean 4 compiler (say, 03-20-2023 nightly) very promptly fails with stack overflow on large structures at compile time, say:

def largeThing : Array Nat := #[1, 2, 3, ..., 50000]

I understand it's possible to use ulimit -s unlimited and try this way (with set_option maxHeartbeats <big number>), but this is insufficient for my use case. I have furthermore found that a common workaround is to load this data at runtime, which works just fine normally, except I need to reason about these big structures, so I need the information available 'at compile time', so to speak.

I have also tried some desperado attempts, none of which have worked for various stack-overflowy and other reasons,
e.g. I would turn #[1, 2, 3, ..., k] into def foo (n : Nat) : Option Nat := if n == 0 then .some 1 else ... else .none, or I'd try this with match as well, etc. To no good effect.

Anyone can give me some pointers?

Jason Rute (Mar 24 2023 at 12:38):

Is this data your are reasoning about real data which is random and messy, or data which is easily described like the range list you gave above. If the former, how do you expect to "reason" about it? If the later, I think there should be a way around this. For example one could describe a theorem about the list largeThing as a list which is range list from 1 to 50000. You could even prove there is only one such list without computing it. I'm sure you could also work with other data structures like streams (which are lazy lists that don't compute the full list).

Jason Rute (Mar 24 2023 at 12:52):

I guess another option which would work on data loaded at run time is to have two things: You could describe your data by a type which says that this data is of some form, e.g. the type Vector 50000 which especially says it is a list of 50000 elements (see docs4#Vector). Then at run time you load your data as a List. The trick is that you need to convert it to a Vector 50000. But you can write a function import foo (rawList: List) : Option (Vector 50000) which returns some v where v : Vector 50000 if the list is of the right size. Of course, you can replace Vector with your own custom type asserting whatever properties you need about your data. As long as they are decidable properties that can be checked at run time, then you know this approach will work.

Jason Rute (Mar 24 2023 at 12:57):

Also, even for the case were your data is well structured, you can also load it at runtime. For example, you can write a function foo (length: Nat) which returns the list #[1, ..., length]without ever computing it out at compile time. You can prove that (foo 50000).length = 50000 and all sort of other properties you need.

František Silváši 🦉 (Mar 24 2023 at 13:13):

The data in question is a huge set of polynomial constraints that can not be expressed any better than enumerating them. I want to reason about when these (and/or their subsets) are satisfied. I'll need to think about your answer, thank you kindly!

In some sense, one could view this data as a big sequence of instructions forming a program, about whose semantics I want to reason.

Mario Carneiro (Mar 24 2023 at 13:21):

Note that large lists/arrays are a particularly bad case for the compiler. You can still represent large data as long as you break it up into smaller chunks and sub-definitions

Mario Carneiro (Mar 24 2023 at 13:23):

Even just using a macro to generate the expression under the hood can help, because the #[x1, .., xn] macro creates a lot of extra temporaries when n is large (> 32 or so)

František Silváši 🦉 (Mar 24 2023 at 13:25):

I will give all these approach a try; if something succeeds, I'll make sure to report on this; thanks everyone.

James Gallicchio (Mar 25 2023 at 20:48):

is this also potentially an application of native_decide?


Last updated: Dec 20 2023 at 11:08 UTC