Zulip Chat Archive
Stream: mathlib4
Topic: Tactic casing convention
Mario Carneiro (Jul 29 2021 at 15:27):
Does anyone have opinions about the tactic casing convention? Lean 4 seems to be indecisive about whether tactics should be snake case (simp_all
) or camel case (inferInstance
, refineLift'
), while I went for all camel case in mathport and I think @Aurélien Saue is following my syntax descriptions in mathlib4#27. There are also things like erewrite
that are just uncased, should they be eRewrite
? eRw
seems weird.
Mario Carneiro (Jul 29 2021 at 15:28):
casesM
? introV
? exFalso
? existsI
?
Mario Carneiro (Jul 29 2021 at 15:29):
cc: @Daniel Selsam @Gabriel Ebner @Sebastian Ullrich
Daniel Selsam (Jul 29 2021 at 15:31):
I vote camelCase everywhere (possibly with special cases for weird ones like eRw
)
Gabriel Ebner (Jul 29 2021 at 15:32):
There is some precedence for snake case: most commands like set_option
, elab_rules
, etc. are snake case
Daniel Selsam (Jul 29 2021 at 15:34):
the TacticM
counterparts of the tactic syntax are regular Lean programs, definitely camelCased, and the tactic syntax feels to me at least like sugar for calling them
Daniel Selsam (Jul 29 2021 at 15:35):
But I haven't done much interactive proving yet so my intuitions may be out of line
Mario Carneiro (Jul 29 2021 at 15:39):
Gabriel Ebner said:
There is some precedence for snake case: most commands like
set_option
,elab_rules
, etc. are snake case
Also set_option
is in the term grammar; not sure there are many other multi-word keywords in the term grammar
Gabriel Ebner (Jul 29 2021 at 15:39):
With snake case?
Mario Carneiro (Jul 29 2021 at 15:39):
yes
Mario Carneiro (Jul 29 2021 at 15:40):
def bla := set_option foo 37 in 2 + 2
Mario Carneiro (Jul 29 2021 at 15:42):
I think there is a strong argument for maintaining the parallel between term constructions and tactics for things like have
/ let
/ show
that maybe generalizes to set_option
?
Gabriel Ebner (Jul 29 2021 at 15:43):
Yes, set_option
should obviously be the same in command, term, and tactic mode.
Mario Carneiro (Jul 29 2021 at 15:43):
If set_option
becomes a tactic (and why wouldn't it), it would be confusing if it was spelled differently
Mario Carneiro (Jul 29 2021 at 15:43):
that said, you could just as easily turn it around and argue that commands should be camel cased
Mario Carneiro (Jul 29 2021 at 15:49):
For reference, in rust keywords are mostly one-word but there are typeof
and macro_rules
which weakly suggest a snake casing convention; haskell somehow manages to get away with keywords being actual english words (or symbols), with space separated compound keywords like deriving instance
or type family
; and C++ keywords are snake cased like co_await
or mashed lowercase like alignas
or sizeof
Gabriel Ebner (Jul 29 2021 at 15:52):
Mmh, by infer instance
?
Eric Rodriguez (Jul 29 2021 at 15:53):
Why are we mixing conventions now? Nat/Basic.lean hints that defs/lemmas are going to be snake_case in future; is it trying to be a way to easily tell apart tactics vs defs?
Mario Carneiro (Jul 29 2021 at 15:54):
tactics vs defs are already syntactically distinguishable, so they can overlap or not as we prefer
Mario Carneiro (Jul 29 2021 at 15:55):
unless we want to introduce a shorthand where e
means exact e
, which seems like a bad idea since it will probably cause lots of clashes
Eric Rodriguez (Jul 29 2021 at 15:56):
Eric Rodriguez (Jul 29 2021 at 15:56):
but I can't imagine it being a good idea
Mario Carneiro (Jul 29 2021 at 15:57):
lol it's using a zero width keyword
Mario Carneiro (Jul 29 2021 at 15:58):
strictly speaking that's a user notation, not a tactic, so it lets you skip the by
as well as exact
Mac (Jul 29 2021 at 20:23):
Mario Carneiro said:
For reference, in rust keywords are mostly one-word but there are
typeof
andmacro_rules
which weakly suggest a snake casing convention;
From what I understood from the discussion of when natLit!
became nat_lit
, I believe that snake case is now the standard in the Lean 4 core for any and all multi-word syntax keywords. This is why set_option
, macro_rules
and the like are snake case. Things like typeof
, erw
, are special exceptions because mish-mashes of short words are often though of as a single (new) word in CS (ex. segfault or exfalso
).
The question, I think, is whether tactics are to be thought of as keywords (and thus snake case) or function-like identifiers (and thus camel case). Given that that tactics don't create new keywords, I would go for the latter.
Mario Carneiro (Jul 29 2021 at 20:25):
they are basically contextual keywords though. If you use a misspelled tactic it doesn't even parse
Mac (Jul 29 2021 at 20:26):
@Mario Carneiro true, to the parser, they are symbols and thus functionally keywords, but I would consider that an implementation detail.
Mario Carneiro (Jul 29 2021 at 20:29):
I mean, unlike a regular function where you can still understand what comes next without the name making reference to anything, a tactic is a new piece of syntax with its own parsing rules, so if you don't recognize the tactic nothing after it makes sense
Mac (Jul 29 2021 at 20:29):
Also, the Lean 4 core seems to lean in this direction as every multi-word tactic is in camel case outside simp_all
(which I suspect may be an oversight).
Mac (Jul 29 2021 at 20:37):
Mario Carneiro Well, yeah, I certainly don't consider tactics to be regular functions. However, I do consider them to be like functions in that they are are pieces of notation identified by a single name with a singular function. Keywords, however, I view as common parser tokens that may appear in many different pieces of syntax (ex. where
, do
, by
, let
, etc.).
Mario Carneiro (Jul 29 2021 at 20:38):
Currently, the fact that most tactics start with a keyword seems to be a mere convention inherited from lean 3, where this practice is required. I don't know if we're going to see more infix and mixfix tactics in lean 4 going forward
Mario Carneiro (Jul 29 2021 at 20:40):
I wonder whether ssreflect has any good ideas for tactic operators
Mac (Jul 29 2021 at 20:41):
@Mario Carneiro well the fact that tactics start with a symbol (identifier) is rather key to its parsing, as the tactic parser only makes the lead symbol not a keyword (and the whole 'leading behavior' of parser categories seems to have been specifically designed for this purpose). A symbol in any other location would create real keywords.
Mario Carneiro (Jul 29 2021 at 20:41):
You can have non-keyword symbols anywhere using &"foo"
Mario Carneiro (Jul 29 2021 at 20:42):
I believe that the auto-non-keyword thing is a mere convenience
Mac (Jul 29 2021 at 20:42):
@Mario Carneiro that is entirely separate concept from the kind of pseudo-keyword a tactic is.
Mario Carneiro (Jul 29 2021 at 20:43):
I don't see how
Mario Carneiro (Jul 29 2021 at 20:44):
for example, simp
uses an optional (config = ...)
argument, and the word config
is written as &"config"
so that you can use config as a variable or function name elsewhere
Mac (Jul 29 2021 at 20:47):
they are parsed very differently? the leading behavior parser uses the leading identifier to disambiguate between parsers to run. Without the special leading behavior and just using a non-reserved symbol (i.e., &"foo"
), every tactic would look like it begin with ident
to the parser. Thus, no disambiguation would occur and every parser would be run (it wouldn't even stop on the first success due to the longest-match behavior of the parser). Thus the former is O(log n) in parsers (as TokenMap
us an RBMap
) whereas the later is O(n).
Mario Carneiro (Jul 29 2021 at 20:51):
that sounds like an implementation issue; non-reserved symbols shouldn't make it any slower in principle
Mac (Jul 29 2021 at 20:58):
I disagree. In my view, there is a major conceptual difference between non-reserved symbols and parser leading behavior.
Mac (Jul 29 2021 at 20:58):
But maybe one of Lean 4 developers (e.g., @Sebastian Ullrich) would have better insight than me. Especially since he, as I understand it, was largely responsible for the Lean 4 parser.
Mac (Jul 29 2021 at 21:00):
However, I should note, as I am not an expert in this matter, I may be understanding this all wrong.
Mario Carneiro (Jul 29 2021 at 21:12):
That is, if you have n different parsers that differ only in a single place where they each have different non-reserved symbols, it should be possible to distinguish them in O(log n) using a discrimination tree
Mario Carneiro (Jul 29 2021 at 21:13):
It's possible that this has been special-cased in the case of the initial symbol in a tactic, but it works generally
Mac (Jul 29 2021 at 23:11):
I would think that to get that to work generally would require knowing/storing much more information about the parsers than is usually feasible (especially in Lean where parsers can be arbitrary functions without any clear grammar).
Mario Carneiro (Jul 29 2021 at 23:35):
Why? &"foo"
is a very declarative way to say "match an ident that is equal to `foo
"
Mario Carneiro (Jul 29 2021 at 23:36):
it's not essentially different from "foo"
which says "match a token that is equal to "foo"
"
Mac (Jul 30 2021 at 01:09):
At the syntax
level, yes, you could probably do such inferences. However, at the Parser
/ParserFn
level it is no longer so declarative. Furthermore, syntax
can actually reference arbitrary parsers so any syntax
doing so would also be unable to be analyzed in such a manner.
Mac (Jul 30 2021 at 01:14):
I am also a little confused, @Mario Carneiro . I know that you have fiddled with parsers in-depth before and realized that they keep very little information about their grammar (i.e., when you wanted to print all the grammars of in a parser category and discovered that wasn't really feasible). So, as I feel like you probably already know some of the points I am making, I suspect I may be misunderstanding whatever you are trying to convey.
Mario Carneiro (Jul 30 2021 at 01:16):
I'm talking about the parsers as they are expressed at the syntax
level. In order to preserve the distinction to the compiled level, one would need to retain that TokenMap
like structure in the parser tables
Mac (Jul 30 2021 at 01:25):
I'm still confused as to what you are proposing. Maybe a concrete example will help.
For something like syntax foo &"and" bar : term
where foo
and bar
are parser aliases (ex., almost any built-in Lean syntax kind), how are you proposing this discriminates from other parsers with a different token at "and"?
Mario Carneiro (Jul 30 2021 at 01:37):
That syntax gets compiled down to a bunch of transitions in a state machine (a token trie), and so there will be an initial state corresponding to term
, followed by an edge along foo
(if foo
is a terminal or nonterminal; if it is an abbreviation for something then it might be multiple edges) to a state s1
, which has a &"and"
edge to s2
which has a bar
edge to s3
which is an accepting state (marked with the reduction rule "apply term_and_
").
The important part in this case is what an &"and"
edge is. This is a terminal, so it directs the tokenizer to parse a token, and then assert that the parsed token is "and"
. This is exactly the same as what would happen with "and"
; the only difference is that one considers Syntax.atom
tokens and the other considers Syntax.ident
Mario Carneiro (Jul 30 2021 at 01:38):
When multiple parsers are added that all pass through state s1
, it will acquire many out-edges like &"and"
, &"or"
and so on. These can be stored in a TokenMap
in order to get O(log n) lookup
Mario Carneiro (Jul 30 2021 at 01:40):
AFAIK this is exactly what is already happening for regular (reserved) tokens, but it's possible that it is not being done for non-reserved tokens
Mac (Jul 30 2021 at 01:54):
Mario Carneiro said:
When multiple parsers are added that all pass through state
s1
, it will acquire many out-edges like&"and"
,&"or"
and so on. These can be stored in aTokenMap
in order to get O(log n) lookup
Note that, afaik, in the current Lean 4 parser implementation TokenMap
lookups only occur for the leading token of a category parser. A normal parser does not do any such lookups. That is, if I have the syntax "foo" "bar"
"foo" "baz"
and "foo" "bam"
there is no lookup on the second token. Furthermore, parsers are not guaranteed to even have first token information much less 2nd/3rd/4th information. Parsers are not guaranteed to parse tokens either, they can just parse the raw input stream (nor are they guaranteed to be deterministic or terminate).
Mario Carneiro (Jul 30 2021 at 01:54):
Looking at the code some more, I think the lean 4 parser setup is a bit flatter than this. What I described is how lean 3 notations are compiled, but in lean 4 it appears that it only looks one token deep, and special cases the leading parser and trailing parser cases (corresponding to the initial state and the state after a term
edge) to have proper token trees, and all other states are simply stuck in one big list. This means that while it can efficiently handle things like:
syntax "lead1" term : term
...
syntax "leadn" term : term
syntax term "trail1" term : term
...
syntax term "trailn" term : term
it will have linear time performance on
syntax "a" "lead1" term : term
...
syntax "a" "leadn" term : term
syntax "a" term "trail1" term : term
...
syntax "a" term "trailn" term : term
Mario Carneiro (Jul 30 2021 at 01:56):
It would still be possible to implement this even in the presence of arbitrary parsers; you would just have a token tree for the ones generated by syntax and an "other" list for all the manually implemented parsers
Mac (Jul 30 2021 at 01:58):
Yep, you more-or-less got to what I was saying before I finished saying it. :P
Mario Carneiro (Jul 30 2021 at 01:58):
That would handle things like this:
syntax "properly" "organized" "tokens" thenCallMyArbitraryParser : term
Mac (Jul 30 2021 at 02:02):
Note that what you are suggesting is essentially turning every parser into a (closed?) parser category. You could thus already simulate this in that manner.
Mario Carneiro (Jul 30 2021 at 02:03):
Not closed, it would be added to whenever you add to the parent category
Mario Carneiro (Jul 30 2021 at 02:03):
although maybe you can't directly add to it through syntax
Mac (Jul 30 2021 at 02:04):
For your example, this would look like:
declare_syntax_cat properly
syntax "properly" properly : term
declare_syntax_cat organized
syntax "organized" organized : properly
declare_syntax_cat tokens
syntax "tokens" tokens : organized
syntax thenCallMyArbitraryParser : tokens
Mario Carneiro (Jul 30 2021 at 02:05):
Similarly:
declare_syntax_cat termA
syntax "a" termA : term
syntax "lead1" term : termA
...
syntax "leadn" term : termA
syntax term "trail1" term : termA
...
syntax term "trailn" term : termA
Mario Carneiro (Jul 30 2021 at 02:06):
Although I guess you have to declare in advance whether you expect many "foo"
parsers or many &"foo"
parsers by setting the LeadingIdentBehavior
appropriately
Mac (Jul 30 2021 at 02:10):
I suspect that this approach may result in some less-than-ideal performance overhead (either in memory or runtime) if used universally.
Last updated: Dec 20 2023 at 11:08 UTC