Zulip Chat Archive
Stream: lean4
Topic: What are the changes to identifier validity in Lean4?
Eric Wieser (Dec 31 2023 at 10:23):
I know that in Lean 4, we can now use !
and ?
in (or at least, at the end of?) identifiers. Are there any other changes to the rules, or are all the rest copied from Lean 3?
Eric Wieser (Dec 31 2023 at 10:24):
(context: I want to ensure the regex used by pygments is correct)
Mario Carneiro (Dec 31 2023 at 14:40):
I think that's the only change
Mario Carneiro (Dec 31 2023 at 14:41):
!
and ?
can be used anywhere after the first character (they are in the "identRest" group)
Mario Carneiro (Dec 31 2023 at 14:47):
there is a BNF at https://github.com/leanprover/lean4/blob/master/doc/lexical_structure.md#identifiers but it was (temporarily?) removed from the manual for some reason
Mario Carneiro (Dec 31 2023 at 14:54):
there are also two implemementations of the function in lean, one in Lean and one in C++... and they subtly differ! The lean version doesn't consider 0x207f
which is superscript-n ⁿ
to be a letterlike symbol and hence an identRest.
Eric Wieser (Dec 31 2023 at 15:03):
Is that a bug?
Mario Carneiro (Dec 31 2023 at 15:06):
I have also confirmed that the BNF description matches the C++ one (it includes superscript-n in the right place), except it is missing the capture group [𝒜-𝖟]
, described as "Latin letters, Script, Double-struck, Fractur" in the C++ source, in the letterlike
group, and it also does not have !
and ?
in the identRest group
Mario Carneiro (Dec 31 2023 at 15:10):
Lean 3 only has one implementation of this function, in C++, and unsurprisingly it almost exactly matches the lean 4 C++ code since the lean 4 code derives from the lean 3. The only difference is the inclusion of !
and ?
in the identRest group
Mario Carneiro (Dec 31 2023 at 15:11):
I think the lean 4 C++ definition is the correct one and all the other divergences are bugs
Mario Carneiro (Dec 31 2023 at 15:15):
actually maybe not, there is a kevin.lean
test (cc: @Kevin Buzzard presumably) containing
macro:10000 x:term "ⁿ" : term => `($x ^ $(mkIdent `n))
#check fun (n : Nat) => nⁿ
which would only work if ⁿ
is not an identifier character. So possibly it was removed from the lean version and the other versions were not updated to match
Eric Wieser (Dec 31 2023 at 15:18):
Does your superscript parser supersede whatever that might have been for?
Eric Wieser (Dec 31 2023 at 15:20):
Mario Carneiro said:
!
and?
can be used anywhere after the first character (they are in the "identRest" group)
I've updated pygments/pygments#2618 accordingly
Mario Carneiro (Dec 31 2023 at 15:22):
most likely, but it will have the same issue regarding identifier parsing, you have to put some whitespace if you want a₀
to be treated as a subscript operator rather than one identifier
Mario Carneiro (Dec 31 2023 at 15:23):
ⁿ
is the only superscript allowed by the C++ lexer as an identifier, so it is seemingly an outlier (possibly relating to its unusually early appearance in unicode next to the numeric subscripts)
Mario Carneiro (Dec 31 2023 at 15:26):
I wanted to show an actual bug caused by the difference in the C++ lexer compared to the lean one, but actually it seems that the code is not reachable at all. It is only used by name::escape
and this function is unused. The more common name::to_string
just always skips escaping, which seems wrong but this function tends to be used only for advisory printing so it's probably not a big deal
Mario Carneiro (Dec 31 2023 at 15:31):
the C++ name::to_string
function can be seen in such examples as
import Lean
#eval toString (Lean.Expr.const `«foo.bar».baz []) -- implemented in C++
-- "foo.bar.baz"
#eval toString `«foo.bar».baz -- implemented in lean
-- "«foo.bar».baz"
Note that the C++ code knows how to do escaping, it just is disabled for some reason
Last updated: May 02 2025 at 03:31 UTC