Zulip Chat Archive

Stream: general

Topic: Extract Latex and Lean from Blueprints


Albert Jiang (Sep 23 2025 at 14:54):

Hi sorry if this has been already posted/addressed somewhere else, but is there a clean way of extracting all the corresponding Latex and Lean statements/proofs/definitions from a blueprint project? Ideally given a repository that has a blueprint folder, I can get the information about all the nodes in the dependency graph.

If there is, I'd really appreciate some pointers, and if not I'm happy to work on it if someone has a clear idea about how best to achieve this goal. This is part of the goal I'm trying to achieve at Oberwolfach for the "Autoformalization correctness metric" project.

Albert Jiang (Sep 23 2025 at 14:57):

I guess @Patrick Massot @Pietro Monticone since you're the main contributors to Blueprint. Sorry about the pinging and again, happy to work on this if it does not already exist.

Eric Wieser (Sep 23 2025 at 15:03):

I think the leanblueprint python package has API for this

Albert Jiang (Sep 23 2025 at 15:27):

Eric Wieser said:

I think the leanblueprint python package has API for this

Thanks! Do you mean leanblueprint checkdecls? Looks like it could be doing the thing I wanted it to do. Checking atm.

Thomas Zhu (Sep 23 2025 at 19:19):

For extracting from Lean, you can check https://github.com/cmu-l3/ntp-toolkit or https://github.com/lean-dojo/LeanDojo. The output should be a list of defs/theorems with their Lean source and identifiers.

For extracting blueprint LaTeX, you can check https://github.com/hanwenzhu/blueprint-gen/tree/main/scripts/convert (either parse_latex.py for parsing using regex or parse_latex_plastex.py for processing using plasTeX + leanblueprint). The scripts are for my own purpose and you will need to take the core functions and modify them. The output is a list of nodes with their LaTeX description and Lean identifiers.

(For context, the latter is part of a proof of concept I am currently working on at CMU to generate blueprints directly from Lean source, where I need to parse existing blueprints to convert to the new format.)

Albert Jiang (Sep 24 2025 at 08:45):

Thomas Zhu said:

For extracting from Lean, you can check https://github.com/cmu-l3/ntp-toolkit or https://github.com/lean-dojo/LeanDojo. The output should be a list of defs/theorems with their Lean source and identifiers.

For extracting blueprint LaTeX, you can check https://github.com/hanwenzhu/blueprint-gen/tree/main/scripts/convert (either parse_latex.py for parsing using regex or parse_latex_plastex.py for processing using plasTeX + leanblueprint). The scripts are for my own purpose and you will need to take the core functions and modify them. The output is a list of nodes with their LaTeX description and Lean identifiers.

(For context, the latter is part of a proof of concept I am currently working on at CMU to generate blueprints directly from Lean source, where I need to parse existing blueprints to convert to the new format.)

Thank you! Super useful!

Thomas Zhu (Sep 24 2025 at 18:48):

By the way, @Auguste Poiroux just informed me of his project which looks like exactly what you want: https://github.com/augustepoiroux/LeanBlueprintExtractor

Auguste Poiroux (Sep 25 2025 at 13:34):

Thanks @Thomas Zhu for mentioning my project ^^ LeanBlueprintExtractor is indeed designed to extract and align latex and Lean statements in one command lean-blueprint-extract-local --project-dir /path/to/lean/project. @Albert Jiang I would be happy to help if you encounter issues

Auguste Poiroux (Sep 25 2025 at 13:36):

It is backed by LeanInteract, so it is compatible with a large range of Lean versions out of the box. I will soon update LeanBlueprintExtractor to add compatibility with the latest Lean versions.

Albert Jiang (Sep 26 2025 at 07:48):

Thanks both!!! LeanBlueprintExtractor really hits the nail on its head and the ntp toolkit is incredibly useful as well.

Albert Jiang (Sep 26 2025 at 07:49):

one nitpick for LeanBlueprintExtractor: it seems to depend on LeanInteract, and more specifically on a branch of it. I figured this out with the help of @Simon Sorg but it would be great to make this clear or let LeanBlueprintExtractor to depend on a release version of LeanInteract.

Auguste Poiroux (Sep 26 2025 at 08:28):

Indeed, I am currently working on making an official release version (LeanInteract v0.9.0 is coming soon). I was working on LeanBlueprintExtractor <-> LeanInteract <-> Lean REPL at the same time in a tight loop to develop the necessary features and to iterate fast ;)


Last updated: Dec 20 2025 at 21:32 UTC