Zulip Chat Archive
Stream: metaprogramming / tactics
Topic: Prototyping a programming language with Lean4's elaborators
Spanky (Jan 01 2025 at 12:04):
I've been working on the prototype of a programming language (currently it's mainly written in Rust, with a Tree-Sitter parser), and am interested in trying to re-implement the prototype in Lean4.
What is attracting me to Lean, is that in theory it should allow me to reuse a ton of complex components (parser, typechecker, cogedenner etc) and would let me focus on the interesting bits. I could transpile my program's syntax into Lean's expressions using elaborators and have all of Lean's power at my disposal.
The first step would be to redefine my grammar using Lean's syntax
declaration, in order to model my language as a DSL. But I'm already seeing many big obstacles...
- It looks like I cannot
import
files written in my language, right? Either I write my program within a.lean
file, or otherwise I have to read the file manually and parse it... -
The point above implies that I can't take advantage of Lean's IDE support for my programs (unless they live within
.lean
files), correct?
In any case, I've tried to play with the "IMP" DSL shown in "Metaprogramming in Lean 4"and noticed that the IDE support is poor within IMP expressions (no color highlighting, no jump-to-definition etc): so I guess I wouldn't gain much anyways? -
My language's syntax is indentation independent (it uses
{}
and;
) and uses tab characters for indentation. These points mean that I couldn't use Lean's parser even if there were a solution for the points above, right?
At this point I have two questions:
- Is it a good idea to try to implement my programming language within Lean? Or am I going to run into similar problems at every stage and am better off by writing my own compiler from scratch? My goal language is something similar to Koka: a strongly-typed language with algebraic effect handlers. I believe I should be able to map its whole semantics onto Lean objects.
- Where do I start with parsing? Is there any good parser (or a combinator library) available in Lean I can use for my indentation-insensitive grammar? Otherwise should I try to use Tree-Sitter via FFI or what else?
Kyle Miller (Jan 01 2025 at 15:53):
Regarding obstacles:
- You could take a look at Verso, which uses Lean files for markup. Each file starts with a header, and then there's a Lean command to switch to the DSL. Example. I think Verso uses a mix of Lean parsers (and syntax defined with
syntax
) and custom parsers. - Yes, likely. Re IMP, it must not be setting metadata necessary to drive the Lean LSP, like TermInfo, etc. Verso has gone through the work to support these things, and I believe that it supports advanced features like incremental elaboration as well. I'd expect you'd use custom environment extensions to collect DSL-specific information needed during compilation; that's how you can get additional information passed between different modules.
- Indentation-independent languages should have no problem using Lean's parsers. Any reason you think it would be a problem?
Your questions:
- If you understand compilers and you've written a compiler before, I don't see why not. Eventually you could write your own frontend to control the file format completely (i.e., no more Lean headers), but I don't know if there's any way to use the Lean VS Code extension with custom languages.
- Re 3 above, I'm not sure why indentation-insensitive grammars would cause any issues. So long as the language is parseable with Lean's
syntax
declarations, it's probably doable. I know there is some limitation in the keywords system, but I don't remember the details.
Spanky (Jan 01 2025 at 17:16):
About point 3:
- If any tab character (
\t
) is used to indent code anywhere in a Lean file, the Lean's parser bails out: https://github.com/leanprover/lean4/blob/master/src/Lean/Parser/Basic.lean#L560 - While experimenting I had some issues with newlines too (either broke some expression in points where Lean didn't like it, or indented the new lines in forbidden ways), but I'm unable to reproduce that right now.
About point 1, I don't like the idea of having to rename all my source files to .lean
and having to use some lean commands (an import
and an open
) at the top of the file. That would make my existing source files incompatible with the other (partial) implementation of the language, the one written in Rust.
My current plan is to write my custom importMyLang filePath
command, which uses IO to read filePath
into a string, then replaces the \t
and \n
characters with
and parses the result with the Lean's parser.
I'll need to figure out how to use the parser wisely: either figure out how pass some of the local syntax
/macro
/elab
rules to the parser or alternatively how to manually use the ParserFn
s from Lean.Parser
to parse my code and then feed it into some elaborators.
EDIT: could anyone direct me to some sort of guide about the Lean's parser? I've been exploring its sources for a while, but I coludn't figure out yet how the parser gets invoked to process a new file, or how the syntax
command is implemented, or even how the ParserFn
s work exactly...
Kyle Miller (Jan 01 2025 at 17:20):
I forgot that whitespace
disallowed tab characters. (That's the only issue though — I thought you might have been suggesting that somehow Lean is only suitable for indentation-sensitive grammars.)
Kyle Miller (Jan 01 2025 at 17:21):
You could look at Lean/Elab/Frontend.lean for an example Lean frontend. This one isn't actively used by Lean itself, but it's still around because there are some applications using it. You can see Parser.parseCommand
being called in processCommand
.
Spanky (Jan 01 2025 at 17:22):
Kyle Miller said:
I thought you might have been suggesting that somehow Lean is only suitable for indentation-sensitive grammars.)
I was indeed thinking that, but it sounds like I was mistaken.
Kyle Miller (Jan 01 2025 at 17:28):
I don't think you need to replace \n
with
, just \t
, if you want to make use of builtin whitespace
parser.
Note that Lean.Parser.mkInputContext
(used in the frontend) already does \r\n
-> \n
normalization. You could make your own mkInputContext
function that inserts your additional normalization.
Frédéric Dupuis (Jan 01 2025 at 17:35):
Spanky said:
About point 1, I don't like the idea of having to rename all my source files to
.lean
and having to use some lean commands (animport
and anopen
) at the top of the file. That would make my existing source files incompatible with the other (partial) implementation of the language, the one written in Rust.
If you just want to parse external files, I think something like lean4-parser is what you're looking for, you'll have a much easier time than if you try to use Lean's parser for this.
Patrick Massot (Jan 01 2025 at 18:10):
Spanky said:
EDIT: could anyone direct me to some sort of guide about the Lean's parser? I've been exploring its sources for a while, but I coludn't figure out yet how the parser gets invoked to process a new file, or how the
syntax
command is implemented, or even how theParserFn
s work exactly...
Have you read https://lean-lang.org/doc/reference/latest/Elaboration-and-Compilation/#parser already?
Spanky (Jan 01 2025 at 18:35):
Frédéric Dupuis said:
If you just want to parse external files, I think something like lean4-parser is what you're looking for, you'll have a much easier time than if you try to use Lean's parser for this.
It seems way simpler (and possibly more versatile for a language completely different from Lean?) than the Lean's parser.
On the other hand, it doesn't seem to come with anything to parse expressions, nor with any convenient abstractions like the syntax
command.
I guess I'll keep juggling between this one and the Lean's parser until I manage to find my way.
Patrick Massot said:
Have you read https://lean-lang.org/doc/reference/latest/Elaboration-and-Compilation/#parser already?
I haven't. Thanks for the link, reading it right now.
Last updated: May 02 2025 at 03:31 UTC