Zulip Chat Archive

Stream: mathlib4

Topic: Tactic casing convention


Mario Carneiro (Jul 29 2021 at 15:27):

Does anyone have opinions about the tactic casing convention? Lean 4 seems to be indecisive about whether tactics should be snake case (simp_all) or camel case (inferInstance, refineLift'), while I went for all camel case in mathport and I think @Aurélien Saue is following my syntax descriptions in mathlib4#27. There are also things like erewrite that are just uncased, should they be eRewrite? eRw seems weird.

Mario Carneiro (Jul 29 2021 at 15:28):

casesM? introV? exFalso? existsI?

Mario Carneiro (Jul 29 2021 at 15:29):

cc: @Daniel Selsam @Gabriel Ebner @Sebastian Ullrich

Daniel Selsam (Jul 29 2021 at 15:31):

I vote camelCase everywhere (possibly with special cases for weird ones like eRw)

Gabriel Ebner (Jul 29 2021 at 15:32):

There is some precedence for snake case: most commands like set_option, elab_rules, etc. are snake case

Daniel Selsam (Jul 29 2021 at 15:34):

the TacticM counterparts of the tactic syntax are regular Lean programs, definitely camelCased, and the tactic syntax feels to me at least like sugar for calling them

Daniel Selsam (Jul 29 2021 at 15:35):

But I haven't done much interactive proving yet so my intuitions may be out of line

Mario Carneiro (Jul 29 2021 at 15:39):

Gabriel Ebner said:

There is some precedence for snake case: most commands like set_option, elab_rules, etc. are snake case

Also set_option is in the term grammar; not sure there are many other multi-word keywords in the term grammar

Gabriel Ebner (Jul 29 2021 at 15:39):

With snake case?

Mario Carneiro (Jul 29 2021 at 15:39):

yes

Mario Carneiro (Jul 29 2021 at 15:40):

def bla := set_option foo 37 in 2 + 2

Mario Carneiro (Jul 29 2021 at 15:42):

I think there is a strong argument for maintaining the parallel between term constructions and tactics for things like have / let / show that maybe generalizes to set_option?

Gabriel Ebner (Jul 29 2021 at 15:43):

Yes, set_option should obviously be the same in command, term, and tactic mode.

Mario Carneiro (Jul 29 2021 at 15:43):

If set_option becomes a tactic (and why wouldn't it), it would be confusing if it was spelled differently

Mario Carneiro (Jul 29 2021 at 15:43):

that said, you could just as easily turn it around and argue that commands should be camel cased

Mario Carneiro (Jul 29 2021 at 15:49):

For reference, in rust keywords are mostly one-word but there are typeof and macro_rules which weakly suggest a snake casing convention; haskell somehow manages to get away with keywords being actual english words (or symbols), with space separated compound keywords like deriving instance or type family; and C++ keywords are snake cased like co_await or mashed lowercase like alignas or sizeof

Gabriel Ebner (Jul 29 2021 at 15:52):

Mmh, by infer instance?

Eric Rodriguez (Jul 29 2021 at 15:53):

Why are we mixing conventions now? Nat/Basic.lean hints that defs/lemmas are going to be snake_case in future; is it trying to be a way to easily tell apart tactics vs defs?

Mario Carneiro (Jul 29 2021 at 15:54):

tactics vs defs are already syntactically distinguishable, so they can overlap or not as we prefer

Mario Carneiro (Jul 29 2021 at 15:55):

unless we want to introduce a shorthand where e means exact e, which seems like a bad idea since it will probably cause lots of clashes

Eric Rodriguez (Jul 29 2021 at 15:56):

it isn't unheard of tbf

Eric Rodriguez (Jul 29 2021 at 15:56):

but I can't imagine it being a good idea

Mario Carneiro (Jul 29 2021 at 15:57):

lol it's using a zero width keyword

Mario Carneiro (Jul 29 2021 at 15:58):

strictly speaking that's a user notation, not a tactic, so it lets you skip the by as well as exact

Mac (Jul 29 2021 at 20:23):

Mario Carneiro said:

For reference, in rust keywords are mostly one-word but there are typeof and macro_rules which weakly suggest a snake casing convention;

From what I understood from the discussion of when natLit! became nat_lit, I believe that snake case is now the standard in the Lean 4 core for any and all multi-word syntax keywords. This is why set_option, macro_rules and the like are snake case. Things like typeof, erw, are special exceptions because mish-mashes of short words are often though of as a single (new) word in CS (ex. segfault or exfalso).

The question, I think, is whether tactics are to be thought of as keywords (and thus snake case) or function-like identifiers (and thus camel case). Given that that tactics don't create new keywords, I would go for the latter.

Mario Carneiro (Jul 29 2021 at 20:25):

they are basically contextual keywords though. If you use a misspelled tactic it doesn't even parse

Mac (Jul 29 2021 at 20:26):

@Mario Carneiro true, to the parser, they are symbols and thus functionally keywords, but I would consider that an implementation detail.

Mario Carneiro (Jul 29 2021 at 20:29):

I mean, unlike a regular function where you can still understand what comes next without the name making reference to anything, a tactic is a new piece of syntax with its own parsing rules, so if you don't recognize the tactic nothing after it makes sense

Mac (Jul 29 2021 at 20:29):

Also, the Lean 4 core seems to lean in this direction as every multi-word tactic is in camel case outside simp_all (which I suspect may be an oversight).

Mac (Jul 29 2021 at 20:37):

Mario Carneiro Well, yeah, I certainly don't consider tactics to be regular functions. However, I do consider them to be like functions in that they are are pieces of notation identified by a single name with a singular function. Keywords, however, I view as common parser tokens that may appear in many different pieces of syntax (ex. where, do, by, let, etc.).

Mario Carneiro (Jul 29 2021 at 20:38):

Currently, the fact that most tactics start with a keyword seems to be a mere convention inherited from lean 3, where this practice is required. I don't know if we're going to see more infix and mixfix tactics in lean 4 going forward

Mario Carneiro (Jul 29 2021 at 20:40):

I wonder whether ssreflect has any good ideas for tactic operators

Mac (Jul 29 2021 at 20:41):

@Mario Carneiro well the fact that tactics start with a symbol (identifier) is rather key to its parsing, as the tactic parser only makes the lead symbol not a keyword (and the whole 'leading behavior' of parser categories seems to have been specifically designed for this purpose). A symbol in any other location would create real keywords.

Mario Carneiro (Jul 29 2021 at 20:41):

You can have non-keyword symbols anywhere using &"foo"

Mario Carneiro (Jul 29 2021 at 20:42):

I believe that the auto-non-keyword thing is a mere convenience

Mac (Jul 29 2021 at 20:42):

@Mario Carneiro that is entirely separate concept from the kind of pseudo-keyword a tactic is.

Mario Carneiro (Jul 29 2021 at 20:43):

I don't see how

Mario Carneiro (Jul 29 2021 at 20:44):

for example, simp uses an optional (config = ...) argument, and the word config is written as &"config" so that you can use config as a variable or function name elsewhere

Mac (Jul 29 2021 at 20:47):

they are parsed very differently? the leading behavior parser uses the leading identifier to disambiguate between parsers to run. Without the special leading behavior and just using a non-reserved symbol (i.e., &"foo"), every tactic would look like it begin with ident to the parser. Thus, no disambiguation would occur and every parser would be run (it wouldn't even stop on the first success due to the longest-match behavior of the parser). Thus the former is O(log n) in parsers (as TokenMap us an RBMap) whereas the later is O(n).

Mario Carneiro (Jul 29 2021 at 20:51):

that sounds like an implementation issue; non-reserved symbols shouldn't make it any slower in principle

Mac (Jul 29 2021 at 20:58):

I disagree. In my view, there is a major conceptual difference between non-reserved symbols and parser leading behavior.

Mac (Jul 29 2021 at 20:58):

But maybe one of Lean 4 developers (e.g., @Sebastian Ullrich) would have better insight than me. Especially since he, as I understand it, was largely responsible for the Lean 4 parser.

Mac (Jul 29 2021 at 21:00):

However, I should note, as I am not an expert in this matter, I may be understanding this all wrong.

Mario Carneiro (Jul 29 2021 at 21:12):

That is, if you have n different parsers that differ only in a single place where they each have different non-reserved symbols, it should be possible to distinguish them in O(log n) using a discrimination tree

Mario Carneiro (Jul 29 2021 at 21:13):

It's possible that this has been special-cased in the case of the initial symbol in a tactic, but it works generally

Mac (Jul 29 2021 at 23:11):

I would think that to get that to work generally would require knowing/storing much more information about the parsers than is usually feasible (especially in Lean where parsers can be arbitrary functions without any clear grammar).

Mario Carneiro (Jul 29 2021 at 23:35):

Why? &"foo" is a very declarative way to say "match an ident that is equal to `foo"

Mario Carneiro (Jul 29 2021 at 23:36):

it's not essentially different from "foo" which says "match a token that is equal to "foo""

Mac (Jul 30 2021 at 01:09):

At the syntax level, yes, you could probably do such inferences. However, at the Parser/ParserFn level it is no longer so declarative. Furthermore, syntax can actually reference arbitrary parsers so any syntax doing so would also be unable to be analyzed in such a manner.

Mac (Jul 30 2021 at 01:14):

I am also a little confused, @Mario Carneiro . I know that you have fiddled with parsers in-depth before and realized that they keep very little information about their grammar (i.e., when you wanted to print all the grammars of in a parser category and discovered that wasn't really feasible). So, as I feel like you probably already know some of the points I am making, I suspect I may be misunderstanding whatever you are trying to convey.

Mario Carneiro (Jul 30 2021 at 01:16):

I'm talking about the parsers as they are expressed at the syntax level. In order to preserve the distinction to the compiled level, one would need to retain that TokenMap like structure in the parser tables

Mac (Jul 30 2021 at 01:25):

I'm still confused as to what you are proposing. Maybe a concrete example will help.
For something like syntax foo &"and" bar : term where foo and bar are parser aliases (ex., almost any built-in Lean syntax kind), how are you proposing this discriminates from other parsers with a different token at "and"?

Mario Carneiro (Jul 30 2021 at 01:37):

That syntax gets compiled down to a bunch of transitions in a state machine (a token trie), and so there will be an initial state corresponding to term , followed by an edge along foo (if foo is a terminal or nonterminal; if it is an abbreviation for something then it might be multiple edges) to a state s1, which has a &"and" edge to s2 which has a bar edge to s3 which is an accepting state (marked with the reduction rule "apply term_and_").

The important part in this case is what an &"and" edge is. This is a terminal, so it directs the tokenizer to parse a token, and then assert that the parsed token is "and". This is exactly the same as what would happen with "and"; the only difference is that one considers Syntax.atom tokens and the other considers Syntax.ident

Mario Carneiro (Jul 30 2021 at 01:38):

When multiple parsers are added that all pass through state s1, it will acquire many out-edges like &"and", &"or" and so on. These can be stored in a TokenMap in order to get O(log n) lookup

Mario Carneiro (Jul 30 2021 at 01:40):

AFAIK this is exactly what is already happening for regular (reserved) tokens, but it's possible that it is not being done for non-reserved tokens

Mac (Jul 30 2021 at 01:54):

Mario Carneiro said:

When multiple parsers are added that all pass through state s1, it will acquire many out-edges like &"and", &"or" and so on. These can be stored in a TokenMap in order to get O(log n) lookup

Note that, afaik, in the current Lean 4 parser implementation TokenMap lookups only occur for the leading token of a category parser. A normal parser does not do any such lookups. That is, if I have the syntax "foo" "bar" "foo" "baz" and "foo" "bam" there is no lookup on the second token. Furthermore, parsers are not guaranteed to even have first token information much less 2nd/3rd/4th information. Parsers are not guaranteed to parse tokens either, they can just parse the raw input stream (nor are they guaranteed to be deterministic or terminate).

Mario Carneiro (Jul 30 2021 at 01:54):

Looking at the code some more, I think the lean 4 parser setup is a bit flatter than this. What I described is how lean 3 notations are compiled, but in lean 4 it appears that it only looks one token deep, and special cases the leading parser and trailing parser cases (corresponding to the initial state and the state after a term edge) to have proper token trees, and all other states are simply stuck in one big list. This means that while it can efficiently handle things like:

syntax "lead1" term : term
...
syntax "leadn" term : term

syntax term "trail1" term : term
...
syntax term "trailn" term : term

it will have linear time performance on

syntax "a" "lead1" term : term
...
syntax "a" "leadn" term : term

syntax "a" term "trail1" term : term
...
syntax "a" term "trailn" term : term

Mario Carneiro (Jul 30 2021 at 01:56):

It would still be possible to implement this even in the presence of arbitrary parsers; you would just have a token tree for the ones generated by syntax and an "other" list for all the manually implemented parsers

Mac (Jul 30 2021 at 01:58):

Yep, you more-or-less got to what I was saying before I finished saying it. :P

Mario Carneiro (Jul 30 2021 at 01:58):

That would handle things like this:

syntax "properly" "organized" "tokens" thenCallMyArbitraryParser : term

Mac (Jul 30 2021 at 02:02):

Note that what you are suggesting is essentially turning every parser into a (closed?) parser category. You could thus already simulate this in that manner.

Mario Carneiro (Jul 30 2021 at 02:03):

Not closed, it would be added to whenever you add to the parent category

Mario Carneiro (Jul 30 2021 at 02:03):

although maybe you can't directly add to it through syntax

Mac (Jul 30 2021 at 02:04):

For your example, this would look like:

declare_syntax_cat properly
syntax "properly" properly : term
declare_syntax_cat organized
syntax "organized" organized : properly
declare_syntax_cat tokens
syntax "tokens" tokens : organized
syntax thenCallMyArbitraryParser : tokens

Mario Carneiro (Jul 30 2021 at 02:05):

Similarly:

declare_syntax_cat termA
syntax "a" termA : term
syntax "lead1" term : termA
...
syntax "leadn" term : termA

syntax term "trail1" term : termA
...
syntax term "trailn" term : termA

Mario Carneiro (Jul 30 2021 at 02:06):

Although I guess you have to declare in advance whether you expect many "foo" parsers or many &"foo" parsers by setting the LeadingIdentBehavior appropriately

Mac (Jul 30 2021 at 02:10):

I suspect that this approach may result in some less-than-ideal performance overhead (either in memory or runtime) if used universally.


Last updated: Dec 20 2023 at 11:08 UTC