Zulip Chat Archive

Stream: general

Topic: tree-sitter


Julian Berman (Dec 31 2020 at 02:35):

In case it's not already on the radar... tree-sitter (https://tree-sitter.github.io/tree-sitter/ -- a nice incremental parsing library with lots of nice editor integrations) looks like it has some folks who put together an Agda parser: https://github.com/tree-sitter/tree-sitter-agda -- perhaps that's interesting to have some day for lean (4)?

Alex J. Best (Dec 31 2020 at 02:39):

One cool reason to do this is that tree-sitter grammars are used for https://github.com/github/semantic which is what gives the cool jump to definition/references feature you can see on github python repos. Having this for lean would be awesome, but of course a long way away

Chris B (Dec 31 2020 at 03:27):

I wrote tree-sitter grammars for mm0 and mm1 to try some stuff out; it's not too bad if you have something like an EBNF spec for the grammar.

Chris B (Dec 31 2020 at 03:33):

I think you kind of need the people doing the editor support on board though. If they already have their own model/abstractions for building and working on the AST, they probably won't want to graft the tree-sitter one on as a totally separate thing.

Julian Berman (Dec 31 2020 at 03:35):

That's what re-reminded me to mention tree-sitter yeah (I was about to fix a syntax issue in lean.vim and wished I didn't have to :D)

Bryan Gin-ge Chen (Dec 31 2020 at 03:40):

The in-progress Lean 4 server seems to be here.

Johan Commelin (Dec 31 2020 at 05:23):

Isn't this going to be hard/impossible for Lean 4, because it has such an extremely flexible syntax? You basically need to be Lean 4 to be able to parse the code.

Chris B (Dec 31 2020 at 15:40):

I'm not sure how v4 actually handles user-declared syntax extensions. If they're initially parsed generically as something like "user item" according to a set of rules and elaborated later then it should be possible for outside parsers to produce a usable AST. If users are able to arbitrarily add entirely new rules to the grammar then you're probably right.

Sebastian Ullrich (Dec 31 2020 at 15:47):

Johan Commelin said:

You basically need to be Lean 4 to be able to parse the code.

This has already been true with Lean 3, custom syntax will just become more prevalent with Lean 4. The best you could probably do is to generate a tree-sitter grammar for a fixed set of syntax extensions. Whitespace sensitivity sounds like fun though.

Julian Berman (Feb 28 2021 at 14:59):

I stubbornly tried to start doing this. What I have so far lives here: https://github.com/Julian/tree-sitter-lean

It's got many flaws, just trying to get a decent amount of coverage before doing any of the real difficult parts, but I can already use it for better syntax highlighting in my editor than what I had before.


Last updated: Dec 20 2023 at 11:08 UTC