Zulip Chat Archive

Stream: lean4

Topic: What are the changes to identifier validity in Lean4?


Eric Wieser (Dec 31 2023 at 10:23):

I know that in Lean 4, we can now use ! and ? in (or at least, at the end of?) identifiers. Are there any other changes to the rules, or are all the rest copied from Lean 3?

Eric Wieser (Dec 31 2023 at 10:24):

(context: I want to ensure the regex used by pygments is correct)

Mario Carneiro (Dec 31 2023 at 14:40):

I think that's the only change

Mario Carneiro (Dec 31 2023 at 14:41):

! and ? can be used anywhere after the first character (they are in the "identRest" group)

Mario Carneiro (Dec 31 2023 at 14:47):

there is a BNF at https://github.com/leanprover/lean4/blob/master/doc/lexical_structure.md#identifiers but it was (temporarily?) removed from the manual for some reason

Mario Carneiro (Dec 31 2023 at 14:54):

there are also two implemementations of the function in lean, one in Lean and one in C++... and they subtly differ! The lean version doesn't consider 0x207f which is superscript-n to be a letterlike symbol and hence an identRest.

Eric Wieser (Dec 31 2023 at 15:03):

Is that a bug?

Mario Carneiro (Dec 31 2023 at 15:06):

I have also confirmed that the BNF description matches the C++ one (it includes superscript-n in the right place), except it is missing the capture group [𝒜-𝖟], described as "Latin letters, Script, Double-struck, Fractur" in the C++ source, in the letterlike group, and it also does not have ! and ? in the identRest group

Mario Carneiro (Dec 31 2023 at 15:10):

Lean 3 only has one implementation of this function, in C++, and unsurprisingly it almost exactly matches the lean 4 C++ code since the lean 4 code derives from the lean 3. The only difference is the inclusion of ! and ? in the identRest group

Mario Carneiro (Dec 31 2023 at 15:11):

I think the lean 4 C++ definition is the correct one and all the other divergences are bugs

Mario Carneiro (Dec 31 2023 at 15:15):

actually maybe not, there is a kevin.lean test (cc: @Kevin Buzzard presumably) containing

macro:10000 x:term "ⁿ" : term => `($x ^ $(mkIdent `n))
#check fun (n : Nat) => nⁿ

which would only work if is not an identifier character. So possibly it was removed from the lean version and the other versions were not updated to match

Eric Wieser (Dec 31 2023 at 15:18):

Does your superscript parser supersede whatever that might have been for?

Eric Wieser (Dec 31 2023 at 15:20):

Mario Carneiro said:

! and ? can be used anywhere after the first character (they are in the "identRest" group)

I've updated pygments/pygments#2618 accordingly

Mario Carneiro (Dec 31 2023 at 15:22):

most likely, but it will have the same issue regarding identifier parsing, you have to put some whitespace if you want a₀ to be treated as a subscript operator rather than one identifier

Mario Carneiro (Dec 31 2023 at 15:23):

is the only superscript allowed by the C++ lexer as an identifier, so it is seemingly an outlier (possibly relating to its unusually early appearance in unicode next to the numeric subscripts)

Mario Carneiro (Dec 31 2023 at 15:26):

I wanted to show an actual bug caused by the difference in the C++ lexer compared to the lean one, but actually it seems that the code is not reachable at all. It is only used by name::escape and this function is unused. The more common name::to_string just always skips escaping, which seems wrong but this function tends to be used only for advisory printing so it's probably not a big deal

Mario Carneiro (Dec 31 2023 at 15:31):

the C++ name::to_string function can be seen in such examples as

import Lean

#eval toString (Lean.Expr.const foo.bar».baz []) -- implemented in C++
-- "foo.bar.baz"
#eval toString foo.bar».baz -- implemented in lean
-- "«foo.bar».baz"

Note that the C++ code knows how to do escaping, it just is disabled for some reason


Last updated: May 02 2025 at 03:31 UTC