Zulip Chat Archive

Stream: lean4

Topic: Syntactic discrepancies on comma-separated lists


Adrien Champion (May 06 2022 at 14:00):

Lean 4 is inconsistent in the way it parses comma-separated lists in commands and terms. My
understanding is that the nominal shape uses manyIndent as in structure-instance parsing
which allows the following styles:

structure Pair (α : Type _) (β : Type _) where
  fst: α
  snd: β

def noSep : Pair Nat String := {
  fst := 7
  snd := "seven"
}
def commaSep : Pair Nat String := {
  fst := 7,
  snd := "seven"
}
def commaSepTrail : Pair Nat String := {
  fst := 7,
  snd := "seven",
}

The first and last styles have the nice property that all lines are homogeneous which arguably
simplifies edits.

But not all comma-separated lists work this way. In class/structure definitions for instance,
extends takes a list of classes/types which has to be comma-separated with no trailing comma.

class Class1
class Class2

class Class3
extends
  Class1,
  Class2

-- does not compile
class Class3
extends
  Class1
  Class2
-- does not compile
class Class3
extends
  Class1,
  Class2,

This is not the only place this happens. The parser rule for extends
uses sepBy1, looking for that in src/Lean/Parser/Command.lean yields the following

sepBy1 is used in other src/Lean/Parser/* files but I'm not knowledgeable enough to (easily)
discern whether having the more flexible list parser would make sense.

I was wondering whether there is interest in fixing (some of) these discrepancies. I would be glad
to contribute on this matter if there is interest, although it's not clear to me how involved such a
contribution would get and how deep an understanding of Lean's internals I would need.

Leonardo de Moura (May 06 2022 at 20:13):

It would be great to get feedback on this proposal. Some of the examples you provided are great targets for a whitespace sensitivity, but others are not (e.g., universe parameters .{ <ids> }).
Regarding the implementation itself, I think this is not a good match for first-time contributors because of staging issues. The person making the change must understand really well the staging mechanism we use to bootstrap Lean.

Mario Carneiro (May 06 2022 at 23:44):

While I'm not sure about adding whitespace sensitivity (which needs individual review), I think that adding an optional final separator to ~everywhere sepBy1 is used is a very nice quality of life improvement. That is, rw [a, b] works so rw [a, b,] should also work

Arthur Paulino (May 06 2022 at 23:53):

Mario Carneiro said:

That is, rw [a, b] works so rw [a, b,] should also work

I have mixed feeling about it. Seems like another degree of freedom for us to tame new contributors with a formatting rule in mathlib. How do you see it improving UX?

Mario Carneiro (May 07 2022 at 00:07):

It is quite annoying for edit distance stuff. If I have something like

rw [
  foo, bar, baz,
  qux, gla
]

then if I delete gla I have to remove the comma before it too, and if I delete qux then I have to also remove the comma on the previous line. Since it's a syntax error otherwise, I'm not getting any compiler advice in the meantime, so I can't easily remove qux and undo to see the difference between the goal states (this is more important for simp thanrw since simp doesn't get mid-tactic goal states). This also increases the number of changed lines for git purposes, which can make reviewing harder (did they also change foo to foo' or something? No, it's just the comma at the end that changed).

Mario Carneiro (May 07 2022 at 00:07):

Rust does this pretty uniformly and I have found it to be an unmitigated good thing

Mario Carneiro (May 07 2022 at 00:11):

it's also one of the differences between JSON and the JSON dialect that vscode uses in its config files (or actual JS objects) that you will notice if you work with it enough

Arthur Paulino (May 07 2022 at 00:12):

That's a good point. And now that I think about commas, some proofs have methodical commas after every } or tactic and some don't (when the goal is closed). And both are fine

Mario Carneiro (May 07 2022 at 00:15):

We already have tons and tons of syntactic variations. A formatter will one day lay down the law, but I don't see any reason not to be permissive in the language if it doesn't have negative parsing side effects

Mario Carneiro (May 07 2022 at 00:16):

in any case it doesn't prevent us from having a style guide

Mario Carneiro (May 07 2022 at 00:19):

This is one reason I believe people use the

{ a
, b
, c
}

notation in haskell but I have to say I find this style abhorrent

Damiano Testa (May 07 2022 at 06:45):

I also feel that allowing "unneeded" commas is helpful.

Besides the rw comments made by Mario, I also find it hard (in Lean 3) to have a comma after every command except the last one when debugging meta code. For instance, if I want to find which line is problematic in

meta def [...] :=
do command1,
   command2,
   command3,
   command4

I would like to be able to comment the commands from the bottom up, but I also have to get rid of the previous comma each time.

I'm not sure if this is the same issue and is probably also affected by my inexperience.

Mario Carneiro (May 07 2022 at 06:58):

that one is not a problem in lean 4 since you don't need the commas anymore

Damiano Testa (May 07 2022 at 07:04):

One more reason to seriously start learning Lean4!

Adrien Champion (May 07 2022 at 09:46):

@Mario Carneiro 's argument for optional trailing commas is the exact reason I opened this topic, when I first said basically the same thing to @Sebastian Ullrich privately. (I'm originally a Rustacean myself.)

Adrien Champion (May 07 2022 at 09:48):

It seems like the whole opt-trail-comma + whitespace thing needs discussion and a more expert contributor to take care of it. But I quite like @Mario Carneiro 's idea of just modifying sepBy1 (and potentially others I missed) with an opt-trail-comma, sounds like something I might be able to do

Mario Carneiro (May 07 2022 at 09:49):

my recommendation would be to have a variant of sepBy1 that has the optional trailing separator; every use needs to be reviewed

Adrien Champion (May 07 2022 at 09:50):

Makes sense

Adrien Champion (May 07 2022 at 09:52):

If I understand the contribution workflow correctly, the next step would be to create a detailed RFC issue. Let me know if I should do that

Arthur Paulino (May 07 2022 at 10:43):

What if the user writes rw [h hx] when they didn't mean rw [h, hx]? That is, the user wants to apply hx to h but it might be parsed as two different arguments for rw.

In other words, maybe simply adding an optional comma after the rw/simp arguments and before the ] solves this issue

This would make rw [,] legal but :shrug:

Mario Carneiro (May 07 2022 at 11:44):

Internal commas are not optional. The only way I think that would be okay is if it is using newlines and whitespace sensitivity as separator instead

Mario Carneiro (May 07 2022 at 11:45):

The grammar I am thinking of is term,* ","?

Mario Carneiro (May 07 2022 at 11:46):

although that syntax allows , alone, as you say

Mario Carneiro (May 07 2022 at 11:50):

A BNF description would be:

sepByT(A, B) ::= A? | A B sepByT(A, B)
sepBy1T(A, B) ::= A | A B sepBy1T(A, B)?

This also works with newline separation if B is "," | "\n", although indentation matching doesn't quite work like that

Adrien Champion (May 07 2022 at 12:08):

I just noticed that sepBy1 has an optional argument for optional trailing separator

Adrien Champion (May 07 2022 at 12:23):

@Mario Carneiro I'm struggling to understand what sepBy1 precisely does (looked at sepByFnAux, but I don't have a lot of time at this precise moment), but as far as I can tell it parses lists of at least one element, right?

If that's the case then the grammar is term ("," term)* ","?, so it does not allow , alone. Of course what we're saying here also applies to potentially empty lists, but it seems there's interesting things to improve just by considering calls to sepBy1. Could we maybe take care of these first? It seems to me like a relatively low hanging fruit suited for a first issue

Mario Carneiro (May 07 2022 at 12:25):

Yes, the BNF I just wrote above has two variants, for lists of at least 1 (sepBy1T) and for length 0 or more (sepByT)

Mario Carneiro (May 07 2022 at 12:27):

The first one can also be written sepByT(A, B) ::= (A B)* A? and the second one is sepBy1T(A, B) ::= A (B A)* B? like you say

Adrien Champion (May 07 2022 at 12:29):

Right my bad, replied too fast

Mario Carneiro (May 07 2022 at 12:32):

It looks like sepByFn already supports trailing separators

Mario Carneiro (May 07 2022 at 12:36):

In fact, it looks like there is even short syntax for it: term,*,? is what I was calling sepByT

Adrien Champion (May 07 2022 at 12:37):

Mario Carneiro said:

It looks like sepByFn already supports trailing separators

So do sepBy and sepBy1 as I pointed out

Mario Carneiro (May 07 2022 at 12:37):

the slightly longer syntax for that is sepBy(term, ",", ", ", allowTrailingSep)

Mario Carneiro (May 07 2022 at 12:39):

so it seems like there is nothing to do here besides reviewing existing uses of ,* to see whether ,*,? is better

Adrien Champion (May 07 2022 at 12:40):

Yeah that's the low hanging fruit I was thinking of


Last updated: Dec 20 2023 at 11:08 UTC