Zulip Chat Archive

Stream: lean4

Topic: Most efficient way to compile a string


Riyaz Ahuja (Nov 12 2024 at 22:48):

I have a ton of strings of lean code (each string is the contents of a full lean file) and I need to 1) compile these strings as if they were actual files to get out their error messages, and 2) (more important) extract out all the intermediate proof states and other theorem information.

As of right now, in my project I'm creating temp files for this code via python, and running lake exe function_that_extracts_states_and_msgs tmp_name via the shell from python as well. This works, but when I have a lot of these that I want to run sequentially (i.e. run the script on a string and get error messages, use that to fix error message, repeat), it gets really really slow. Additionally, the script is outputting json of the data that is extracted, which I made a bunch of python objects to hold it nicely, but there's a lot of back-and-forth between these python objects and lean objects.

Although this all technically works, I think its really slow and inefficient. I've looked into a couple ways to reduce the overhead of doing lake exe ... every time, but I'm a bit lost on what's the best way to do this as well as how best to implement it.

For example, I've looked into doing a python reverse-FFI, but it seems quite difficult to getting it to work as the function that extracts the proof state info has dependencies to mathlib, as well as a lot of monad stuff. Additionally, perhaps the overhead comes from running lake exe, so the solution is to have lean drive the process by creating the tempfiles and compiling them by itself without needing to call lake exe again, so whatever object that outputs is still in lean, and I can just have my extraction function take that as an input.

Any advice or ideas would be really helpful! Also, for reference, the "extraction function" I'm taking about is scripts/training_data.lean, so I essentially run lake exe training_data FILE for each tempfile. It currently works by using compileModule and getInvocationTrees and moduleMessages.

Adam Topaz (Nov 12 2024 at 22:56):

you could use the Lean4 repl in "file mode" to accomplish this.


Last updated: May 02 2025 at 03:31 UTC