Zulip Chat Archive

Stream: lean4

Topic: End of proofs


Patrick Massot (May 02 2023 at 19:55):

Are we really really sure we don't want to bring back the good old battle-tested begin/end? I just spent at least 10 minutes fighting a more sophisticated version of

example : 1 + 1 = 2 := by
  calc 1 + 1 = 1 + 1 := by rfl
  _ = 2 := by rfl [h]

example (P : Prop) (h : P) : P := by
  exact h

What do you expect as an error message here? The answer is an error on line 5 saying expected ':='. In my real example I used a tactic that was not rfl and giving it a list of arguments was plausible.

A much simpler version not involving calc is:

example (P : Prop) (h : P) : P := by
  assumption P

Say I mixed up exact and assumption. Guess the error message? Of course it is expected command, and I understand why. The proof is over after assumption and Lean tries to process whatever comes after this proof.

In both cases there is an error but the error message and/or its location are extremely confusing because nobody told Lean: here the proof should be over, and whatever comes next isn't part of it.

This will be a nightmare for teaching but honestly I think we should also not do this in mathlib.

Heather Macbeth (May 02 2023 at 20:10):

I agree ... I opened lean4#1971 a while ago for a similar thing, do you want to add some comments there with these examples?

Heather Macbeth (May 02 2023 at 20:11):

Patrick Massot said:

This will be a nightmare for teaching

and yes ... it does arise regularly in teaching.

Patrick Massot (May 02 2023 at 20:18):

Done.

Eric Wieser (May 02 2023 at 20:19):

Can't Lean use the indent-based heuristics of python here?

Eric Wieser (May 02 2023 at 20:20):

We've already made the language indentation-sensitive, it seems like the worst of both worlds if you still can't deduce the program structure from the indentation alone

Kevin Buzzard (May 02 2023 at 21:06):

Patrick Massot said:

Done.

Yeah I've been using done too. It's a great substitute for end!

Shreyas Srinivas (May 02 2023 at 23:16):

Eric Wieser said:

We've already made the language indentation-sensitive, it seems like the worst of both worlds if you still can't deduce the program structure from the indentation alone

If lean can't deduce structure by indentation, wouldn't this problem just reappear in the form of confusion between section endings, namespace endings, and proof endings (for example which end should be followed by a namespace name)? I had this experience with vhdl, which uses a bunch of different "begins" and "ends" with all sorts of variations in what is required after which sort of "end" like lean with namespaces.

Scott Morrison (May 03 2023 at 00:14):

At the very least we should give projects the option of enabling begin ... end (and disabling by ...?)

Shreyas Srinivas (May 03 2023 at 00:20):

If you add begin...end you would have an error on end, like it happens now with done right?

Shreyas Srinivas (May 03 2023 at 01:15):

@Patrick Massot I found two ways

  1. Using semicolons to delimit the end of a tactic:
example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 :=
      by rfl;
    _ = 2 := by rfl [h]; --  error shows at the semicolon

example (P : Prop) (h : P) : P := by
  exact h;
  1. Using done at the end as @Kevin suggests.
example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 :=
      by rfl
    _ = 2 := by rfl [h]
  done                   -- Error shows here

example (P : Prop) (h : P) : P := by
  exact h
  done  -- Error shows here

Shreyas Srinivas (May 03 2023 at 01:24):

Further as the following example shows, the issue is not with detecting the end of the proof itself. It is about detecting the end of the tactic line. Lean just puts the error at the beginning of the next line when the previous tactic line ends with a newline character. This issue won't be solved by adding begin and end. The solution is to use semicolons

example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 := by
      rfl [h] -- changed here
    _ = 2 := by -- error shows at the beginning of this line
      rfl


example (P : Prop) (h : P) : P := by
  assumption

Semicolon solution:

example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 := by
      rfl [h]; - - error shows here now
    _ = 2 := by
      rfl;


example (P : Prop) (h : P) : P := by
  assumption

Shreyas Srinivas (May 03 2023 at 01:25):

Somehow even though tpil says that tactics can be separated by either semicolons or newlines, maybe because of the way newlines are parsed, the errors show at the end of the last newline in the newline sequence.

Patrick Massot (May 03 2023 at 07:30):

Shreyas, of course you are right that the core issue is detecting the end of a tactic. My claim is that it becomes really super confusing when this issue infects the next declaration. I'm mostly thinking of my teaching where I give students a Lean file full of sorry that need to be replaced with a proof. In the current setup it's clear many students will think they solved an exercise because they see no error there and move to the next one and be utterly confused.

Yaël Dillies (May 03 2023 at 07:32):

Stupid idea: What about you insert example : True := trivial after each exercise? Then at least the n+1n+1-st exercise after is protected from errors propagating from the nn-th one.

Patrick Massot (May 03 2023 at 07:33):

And it's true that semicolons fix a lot of things, but spending hours running around the computer labs to tell students they should put semicolons at the end of lines if no fun at all. I know that by insisting Lean shouldn't need semi-colons and still never get confused is basically asking Lean to read students' minds (or discard student input and automatically write the correct answer). But somehow python manages to do that. I remember that when we switch from Maple to Sage I feared that running around telling students to add semicolons would be replaced by running around telling student to fix their indentation, but this didn't happen. And I still run to tell students to add colons at the end of if and for lines...

Shreyas Srinivas (May 03 2023 at 08:20):

Does done also break it for teaching? I have found it a great replacement for goals accomplished. Plus it fills the role of writing QED at the end of the proof. The error shows on the done line.

I can't imagine it gets any better with begin...end because we will still have the problem of detecting the end of tactics inside the block. This would only be fully fixed a la lean 3 if the commas (equivalent to semicolons) are also reintroduced. Until then the error will appear at end instead. Same as it does with done now.

In case this is actually adopted , I would hope that we at least don't make it essential. Syntax-wise, the current syntax is consistent with the rest of the language. I intend to continue using it with done. Changing it everywhere will be very tedious. Further the parsing of the tactic block is done by the parser at present. Changing this will mean making changes at the level of the parser. I just dug it up last night. So I don't know how well it works to offer an option

Kyle Miller (May 03 2023 at 08:30):

If we're worried at all about the multiple uses of end and Lean being happy to incorrectly use end as a synchronization point, we could choose new keywords for tactic blocks, like say proof ... qed.

I wonder if we could also make a much stricter parser for proof ... qed blocks that only accepts reasonably-indented input and outputs reasonable errors if it's not indented correctly. I think it'd be fair to say that there must be a newline after proof, and then until the qed the way it could work is it reads in blocks of lines, broken up according to indentation, and then each of these blocks is parsed as a single complete tactic.

Kyle Miller (May 03 2023 at 08:34):

Basically, if proof ... qed blocks took the indentation as a very strong hint for resynchonization when there are parser errors (and for parsing too! a line break without extra indentation should end the tactic!), I think this would greatly reduce confusion for students writing tactic proofs.

Shreyas Srinivas (May 03 2023 at 08:35):

The issue comes from somewhere around this line : https://github.com/leanprover/lean4/blob/f9da1d8b55ca6989297fb952985b7d8d6c77e14b/src/Lean/Parser/Term.lean#L74

Shreyas Srinivas (May 03 2023 at 08:35):

Kyle Miller said:

Basically, if proof ... qed blocks took the indentation as a very strong hint for resynchonization when there are parser errors (and for parsing too! a line break should end the tactic!), I think this would greatly reduce confusion for students writing tactic proofs.

That would also solve the original problem without the block

Kyle Miller (May 03 2023 at 08:40):

Part of the original problem is that there's not a good place for the error that the proof is not yet completed. Right now, it's a tiny red squiggle under "by", at the beginning of the proof, far away from your focus of attention when you're trying to finish a proof.

I know you suggested done, but I'm not sure you can rely on anyone including something that's optional.

Shreyas Srinivas (May 03 2023 at 08:40):

Kyle Miller said:

Basically, if proof ... qed blocks took the indentation as a very strong hint for resynchonization when there are parser errors (and for parsing too! a line break without extra indentation should end the tactic!), I think this would greatly reduce confusion for students writing tactic proofs.

also, currently if you use done, by...done already does what I suspect would be the behaviour of begin...end (some speculation on my part here).

Shreyas Srinivas (May 03 2023 at 08:41):

okay, so making done or qed a requirement to be the last tactic in a tactic block might do the trick perhaps?

Kyle Miller (May 03 2023 at 08:41):

I'm also not sure of what I'm saying is a good general solution for how to parse multi-line constructs. I just think it would be good for safe parsing to keep people from getting confused. Or maybe people would like it as scaffolding when writing proofs, then they take the scaffolding down at the end.

Kyle Miller (May 03 2023 at 08:44):

Shreyas Srinivas said:

also, currently if you use done, by...done already does what I suspect would be the behaviour of begin...end (some speculation on my part here).

I think you need more parser support for done to be reliable. Tactics that accept a list of subsequent tactics can gobble this up if you get the indentation wrong.

Kyle Miller (May 03 2023 at 08:46):

Something that needs more thought still with proof ... qed is making sure that nested proofs also parse reliably. The bullet for focusing on a subgoal has a similar issue that you don't see the error that you're not done at the end of the sub-block (since there's nothing at the end).

Shreyas Srinivas (May 03 2023 at 08:47):

Kyle Miller said:

Shreyas Srinivas said:

also, currently if you use done, by...done already does what I suspect would be the behaviour of begin...end (some speculation on my part here).

I think you need more parser support for done to be reliable. Tactics that accept a list of subsequent tactics can gobble this up if you get the indentation wrong.

"If you get the indentation wrong" => I think that's an issue with any indentation based language. python. haskell. ocaml.
The support might be relatively simple to add here : https://github.com/leanprover/lean4/blob/f9da1d8b55ca6989297fb952985b7d8d6c77e14b/src/Lean/Parser/Term.lean#L64

Shreyas Srinivas (May 03 2023 at 08:48):

Kyle Miller said:

Something that needs more thought still with proof ... qed is making sure that nested proofs also parse reliably. The bullet for focusing on a subgoal has a similar issue that you don't see the error that you're not done at the end of the sub-block (since there's nothing at the end).

Currently, if you do change to something like proof...qed for the tactic block, wouldn't it also be used for all have and let proof blocks?

Kyle Miller (May 03 2023 at 08:49):

I think that's an issue with any indentation based language. python. haskell. ocaml.

I should have given an example:

by
  classical
  done

The classical tactic is actually the start of a tactic block. If you don't indent after a tactic that starts a tactic block, then all the tactics after it are given to it.

It's basically a tautology that in an indentation-based language if you get indentation wrong you have a problem, but this can be a fairly confusing way to get it wrong.

Shreyas Srinivas (May 03 2023 at 08:51):

Kyle Miller said:

I think that's an issue with any indentation based language. python. haskell. ocaml.

I should have given an example:

by
  classical
  done

The classical tactic is actually the start of a tactic block. If you don't indent after a tactic that starts a tactic block, then all the tactics after it are given to it.

It's basically a tautology that in an indentation-based language if you get indentation wrong you have a problem, but this can be a fairly confusing way to get it wrong.

This seems like an incorrect way to get tactics into classical. Ideally the tactics that go with classical as a block should have an extra indent.

Kyle Miller (May 03 2023 at 08:51):

wouldn't it also be used for all have and let proof blocks?

Sure, ideally, but I think you might need the proof ... qed parser to be aware of proof ... qed blocks to get the parsing right, and to be robust against accidental mistakes. For example, the qed probably would be lined up with proof, so the basic rule I suggested earlier wouldn't support sub-blocks.

Shreyas Srinivas (May 03 2023 at 08:52):

by
classical
classical1
classical2
...
done
done

Kyle Miller (May 03 2023 at 08:53):

Shreyas Srinivas said:

This seems like an incorrect way to get tactics into classical. Ideally the tactics that go with classical as a block should have an extra indent.

Why is that the ideal? In a classical proof, generally you want to switch to classical mode and give a proof, so why should the entire proof have extra indentation on account of it using classical reasoning? (I actually prefer it without the indent.)

Kyle Miller (May 03 2023 at 08:54):

Here's another kind of example that shows up in mathlib:

by
  cases x with | ctor a b =>
  cases y with | ctor c d =>
  exact h a b c d

Kyle Miller (May 03 2023 at 08:55):

Maybe we'll decide this is bad style eventually, but it can be thought of as being a simple destructuring assignment, so why should succeeding lines need to be intented?

Kyle Miller (May 03 2023 at 08:56):

Anyway, the point is that some people like this style, and they're examples where any done at the end gets consumed by a tactic.

Kyle Miller (May 03 2023 at 08:56):

For parsing synchronization's sake, having a special ending keyword that the beginning of a tactic block knows about would help avoid needing to worry about this issue.

Shreyas Srinivas (May 03 2023 at 08:56):

Kyle Miller said:

Why is that the ideal? In a classical proof, generally you want to switch to classical mode and give a proof, so why should the entire proof have extra indentation on account of it using classical reasoning? (I actually prefer it without the indent.)

Because of the way the tactic actually behaves. If all it did was change some mode or add some hypothesis or something this would be fine. But it is reading the entire block of tactics that follows, as some argument(s), and executing them within it. This needs to be clear by syntax. At least that is my opinion

Mario Carneiro (May 03 2023 at 09:00):

Arguably the fact that classical is a block scoping tactic in the first place is an implementation detail. It did not used to be

Shreyas Srinivas (May 03 2023 at 09:00):

Kyle Miller said:

For parsing synchronization's sake, having a special ending keyword that the beginning of a tactic block knows about would help avoid needing to worry about this issue.

As far as I can see, that is already the case with by. If you add a requirement for done or qed there will be an end-marker too. That is why the request for begin...end confuses me. At present it wouldn't do anything substantially different from by.... done. Maybe you have a more complex idea in mind

Shreyas Srinivas (May 03 2023 at 09:01):

To really make the error message precise like in lean 3 you would need a tactic delimiter that is not whitespace. Like ;. This also exists.

Mario Carneiro (May 03 2023 at 09:04):

I don't know that it is required to have a non-whitespace token after the proof to report errors there. You could report errors at the end of the last line, or the start of the next line, or the start of the line + indent if there is any

Shreyas Srinivas (May 03 2023 at 09:04):

Mario Carneiro said:

I don't know that it is required to have a non-whitespace token after the proof to report errors there. You could report errors at the end of the last line, or the start of the next line, or the start of the line + indent if there is any

Right now, it seems to happening at after the line, at the end of the first separator. Hence my suspicion at line 78 above. Perhaps someone who knows the parser better than me could help.

Mario Carneiro (May 03 2023 at 09:06):

line 78 of what?

Mario Carneiro (May 03 2023 at 09:09):

but also I'm not talking about the current parser, I am aware it doesn't do this now but it could, at least when there is no parse error. The parse error examples are harder; I don't think lean has a good way to specify parser fallback behavior so it is not so configurable

Shreyas Srinivas (May 03 2023 at 09:15):

This line : https://github.com/leanprover/lean4/blob/f9da1d8b55ca6989297fb952985b7d8d6c77e14b/src/Lean/Parser/Term.lean#L64

The issue is explained in the example below. With the newline separator, we have the issue described where error appears at the next line on which there is code. With the semicolon separator, the error appears on the semicolon after the line containing the error.
Shreyas Srinivas said:

Further as the following example shows, the issue is not with detecting the end of the proof itself. It is about detecting the end of the tactic line. Lean just puts the error at the beginning of the next line when the previous tactic line ends with a newline character. This issue won't be solved by adding begin and end. The solution is to use semicolons

example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 := by
      rfl [h] -- changed here
    _ = 2 := by -- error shows at the beginning of this line
      rfl


example (P : Prop) (h : P) : P := by
  assumption

Semicolon solution:

example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 := by
      rfl [h]; - - error shows here now
    _ = 2 := by
      rfl;


example (P : Prop) (h : P) : P := by
  assumption

Shreyas Srinivas (May 03 2023 at 09:18):

So what's happening here is that lean is ignoring (or bunching together) any extra linebreaks. and then reporting the error after the end of one tactic in the tactic sequence. At the end of the proof this happens when the next def or theorem line begins. But you can see this even within a proof. Take the first example above. But insert a few new lines after the error.

example : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 := by
      rfl [h] -- changed here



    _ = 2 := by -- error shows at the beginning of this line
      rfl

Shreyas Srinivas (May 03 2023 at 09:18):

This does not happen with the semicolon example

Shreyas Srinivas (May 03 2023 at 09:20):

This is why I think what is happening is that here :https://github.com/leanprover/lean4/blob/f9da1d8b55ca6989297fb952985b7d8d6c77e14b/src/Lean/Parser/Term.lean#L74
somehow checkNewLineBreak doesn't care about one or more linebreaks. just that the last character was a line break.

Shreyas Srinivas (May 03 2023 at 09:21):

this checkLineBreak function is defined here : https://github.com/leanprover/lean4/blob/f9da1d8b55ca6989297fb952985b7d8d6c77e14b/src/Lean/Parser/Basic.lean#L951

Kyle Miller (May 03 2023 at 09:39):

Shreyas Srinivas said:

At present it wouldn't do anything substantially different from by.... done. Maybe you have a more complex idea in mind

The algorithm I outlined in one of my first messages is substantially different from how by ... done parses. You could think of it as a change of the precedence level of certain kinds of whitespace.

Think about how Python works for a moment: there is the statement level, which has knowledge of line indentation rules and a little bit about parentheses to handle line continuations (to avoid needing to always put \ at the end of a line continuation), and then there is the expression level. Roughly speaking, you can parse Python by first breaking everything up according to the statement level, and then you can parse the expressions later. This causes statement-breaking newlines to have a sort of change to the precedence level so that it tends toward being an "operator" between statements. Lean 4 is very different from Python in that it tries to solve the embedding-statements-in-expressions problem; Python has intentionally avoided this, and they just call code that would want this feature to be "unpythonic" (no multi-line statements inside lambdas for you).

My suggestion is essentially to have a construct that switches to statement-level parsing rules.

I want each "statement" in the block to be parsed individually after the block is segmented into statements. This prevents Lean from being creative in finding a way to continue parsing. (But it also allows the parser to continue to check whether succeeding statements parse, even if a preceding one has a parse error.)

Consider this:

example (n : Nat) : 2 = 2 := by
  cases (
  exact rfl

I want an error to be reported right after the (. It should be "unexpected end of tactic; expected ')', ',' or ':'". Right now, if this is at the end of a file, it reports the error after rfl and says "unexpected end of input; expected ')', ',' or ':'".

Kyle Miller (May 03 2023 at 09:42):

I don't think tweaking how whitespace is parsed could make this happen. It needs something more substantial to get the parser to commit to parsing individual statements.

Kyle Miller (May 03 2023 at 09:51):

This wouldn't be exactly how I would implement it (since it's somewhat leaky), but one could imagine that the parsing is done as if

example (n : Nat) : 2 = 2 :=
proof
  cases (
  exact rfl
qed

is written as

example (n : Nat) : 2 = 2 :=
by
  { cases ( }
  { exact rfl }

but imagine for the moment that { ... } doesn't focus and doesn't require that the goal be solved.

Shreyas Srinivas (May 03 2023 at 09:55):

Kyle Miller said:

My suggestion is essentially to have a construct that switches to statement-level parsing rules. I want each "statement" in the block to be parsed individually after the block is segmented into statements.

As far as I know, the current parser is a top down recursive descent parser. so if what you are suggesting is that the statements be parsed after they are all separated. I am not sure how this can be implemented in a parsec style parser.

This prevents Lean from being creative in finding a way to continue parsing. (But it also allows the parser to continue to check whether succeeding statements parse, even if a preceding one has a parse error.)

Consider this:

example (n : Nat) : 2 = 2 := by
  cases (
  exact rfl

I want an error to be reported right after the (. It should be "unexpected end of tactic; expected ')', ',' or ':'". Right now, if this is at the end of a file, it reports the error after rfl and says "unexpected end of input; expected ')', ',' or ':'".

Semicolon does this already. The problem is the way the newline tactic separator seems to work. It seems to read until the next line with code begins and puts an error just before that. The semicolon parser is much simpler. It essentially says "look there's a semicolon. this tactic line is over"

Kyle Miller (May 03 2023 at 10:00):

Shreyas Srinivas said:

As far as I know, the current parser is a top down recursive descent parser. so if what you are suggesting is that the statements be parsed after they are all separated. I am not sure how this can be implemented in a parsec style parser.

You make a special proof ... qed parser that is proof...qed-aware, sort of like how nested comments are parsed, then you split it up, then run the whole tactic parser on each segmented statement. This causes the parser to commit. In principle this is doable, which is why I was suggesting it.

Kyle Miller (May 03 2023 at 10:01):

Shreyas Srinivas said:

Semicolon does this already.

Don't forget one of the design requirements: Patrick doesn't want to run around the computer lab reminding people to insert semicolons :smile:

Shreyas Srinivas (May 03 2023 at 10:03):

The fix for this is to change where the error is reported (I think elaborator). If the errors were to be reported at the beginning of the line in which an error is found instead of right after it, or just before the tactic separator rather than after, that might fix the issue as @Mario says.

Kyle Miller (May 03 2023 at 10:05):

I don't think that sort of thing is a full fix though -- I believe we want parser errors to be completely localized to individual tactics. Making changes to preceding lines in a tactic block shouldn't affect whether later lines parse correctly.

Shreyas Srinivas (May 03 2023 at 10:06):

I have never tried to mix two different kinds of parsers before. It sounds very risky. Maybe the parsec experts can shed more light on this.

Kyle Miller (May 03 2023 at 10:08):

I'm not sure what the issue is here -- you can definitely do this with a Parsec-style parser.

Kyle Miller (May 03 2023 at 10:10):

If you're worried about memoization in the packrat parser, that's not an issue since this would be getting it to commit to a particular high-level parse.

Shreyas Srinivas (May 03 2023 at 10:11):

Kyle Miller said:

If you're worried about memoization in the packrat parser, that's not an issue since this would be getting it to commit to a particular high-level parse.

how would you do the high level parse with a recursive descent parser without going to lower levels?

Shreyas Srinivas (May 03 2023 at 10:13):

would it look like "parse until you meet this character and call it a tactic"? what happens if this character appears unstructurally inside the tactic line (say inside some string literal or part of some other operator). More generally, how does the demarcation of where to use the high level parser and where to use the low level parser work?

Shreyas Srinivas (May 03 2023 at 10:27):

When does the high level parser know that it has to call the low level parser?
Either

  1. It doesn't run inside the tactic blocks because that's the lower level parser's job. It can only tell apart these blocks by lexemes.
  2. It does parse the tactic block just to identify it and then calls the lower level parser for tactic blocks on this same block. So it wastes effort and doesn't solve our problem because it misidentifies the end of the block, which is the problem we started with.

Shreyas Srinivas (May 03 2023 at 10:46):

Further, if as Patrick says it is plausible for a list of arguments to exist in the non-simplified version of his example, then how would you detect the end of a tactic without explicit separation characters like the semicolon if you are not looking inside the tactic lines anyway.

Mario Carneiro (May 03 2023 at 10:50):

Kyle Miller said:

Consider this:

example (n : Nat) : 2 = 2 := by
  cases (
  exact rfl

I want an error to be reported right after the (. It should be "unexpected end of tactic; expected ')', ',' or ':'". Right now, if this is at the end of a file, it reports the error after rfl and says "unexpected end of input; expected ')', ',' or ':'".

AFAIK this is a deliberate design choice. When you open a parenthesis it clears all the state regarding whitespace sensitivity so that it is legal to write e.g.

example (n : Nat) : 2 = 2 := by
  exact (by
 exact rfl
  )

Mario Carneiro (May 03 2023 at 10:51):

you don't know that the exact rfl is part of the outer by block in your example, and the ( is a strong signal that it isn't

Mario Carneiro (May 03 2023 at 10:52):

and for examples like the unindented classical it really isn't

Mario Carneiro (May 03 2023 at 10:52):

so this behavior can only be considered as a fallback behavior, because it will give the wrong answers for successful parses

Alex Keizer (May 03 2023 at 10:55):

Kyle Miller said:

Shreyas Srinivas said:

Semicolon does this already.

Don't forget one of the design requirements: Patrick doesn't want to run around the computer lab reminding people to insert semicolons :smile:

So why don't we just make the semicolons mandatory? Then the compiler will do the job of reminding people to insert semicolons.

This discussion reminds me a bit of javascript, which does automatic insertion of semicolons in some cases, but the consensus seems to be that the errors are just too confusing when it guesses wrong, so people advocate for just always using a semicolon to end a statement. JS can't go back to mandating semicolons for backwards compatibility reasons, but we still can. And I'd argue that the benefit of clearer errors far outweighs the extra effort/noise in having to write some explicit tactic terminator (whether that is a comma or a semicolon).

Shreyas Srinivas (May 03 2023 at 11:00):

Alex Keizer said:

JS can't go back to mandating semicolons for backwards compatibility reasons

At this point we would also have "backward compatibility" problems since mathlib4 already has some thousand odd files, and then there is Aesop, Std, and all the docs and tutorial materials.

Shreyas Srinivas (May 03 2023 at 11:01):

Adding a compiler warning might be a nice tradeoff, if the respective CIs can be configured to ignore it

Mario Carneiro (May 03 2023 at 11:04):

Shreyas Srinivas said:

Alex Keizer said:

JS can't go back to mandating semicolons for backwards compatibility reasons

At this point we would also have "backward compatibility" problems since mathlib4 already has some thousand odd files, and then there is Aesop, Std, and all the docs and tutorial materials.

Nope, those are quite explicitly not blockers to syntax changes. We promised as much to the devs. We will need to make conversion scripts or something if it is a really invasive change, but I don't think this is a major concern.

Mario Carneiro (May 03 2023 at 11:06):

(that is not to say that I agree with a syntax change in this area)

Mario Carneiro (May 03 2023 at 11:07):

I would definitely prefer this to be limited to an error reporting change, or a change to the parsing of expressions that already contain a syntax error. I think that for completed code the current syntax of lean 4 is quite good

Mario Carneiro (May 03 2023 at 11:12):

Kyle's example is difficult because

example (n : Nat) : 2 = 2 := by
  cases (
  exact rfl
  )

is actually correct code (well it's not a syntax error at least). So you need to make it all the way to the end, get a parse error somewhere, then backtrack to the ( and auto-close it, report a syntax error there, and then continue parsing the rest of the block

Shreyas Srinivas (May 03 2023 at 11:16):

Mario Carneiro said:

I would definitely prefer this to be limited to an error reporting change, or a change to the parsing of expressions that already contain a syntax error. I think that for completed code the current syntax of lean 4 is quite good

Does this mean that you support keeping the current by syntax for tactic blocks, at least as one way to write tactics?

Mario Carneiro (May 03 2023 at 11:21):

yes I haven't seen any issue where by is the root cause rather than one component in an issue caused by the interaction of several features

Alex Keizer (May 03 2023 at 11:24):

Of course, we should keep existing code in mind and not break it needlessly. We could introduce an autoSemicolon option to control this behaviour. Then existing code can opt-out of strict semicolons, while new projects can benefit from it. Or, we keep the default as-is, but teaching projects can opt-in to the stricter syntax so that students can avoid the confusing errors.

Max Nowak 🐉 (May 03 2023 at 11:25):

From a pure "code aesthetics" perspective, I'd find a python-like approach (i.e. without semicolons) better, since it's less clutter. We already have whitespace-sensitivity, so let's go with that. Also if I read exact h x; and don't know much about Lean syntax, my eyes might parse it as exact (h) (x;) which is obviously wrong. But I don't know enough nitty-gritty details to weigh in more than this.

Alex Keizer (May 03 2023 at 11:27):

I'll freely admit that I'm more of a fan of explicit delimiters and having a formatter figure out indentation for me. But this is of course mostly a matter of taste / whatever someone is used to.

Johan Commelin (May 03 2023 at 11:31):

Mario Carneiro said:

Kyle's example is difficult because

example (n : Nat) : 2 = 2 := by
  cases (
  exact rfl
  )

is actually correct code (well it's not a syntax error at least). So you need to make it all the way to the end, get a parse error somewhere, then backtrack to the ( and auto-close it, report a syntax error there, and then continue parsing the rest of the block

My understanding of Kyle's proposal is that

example (n : Nat) : 2 = 2 :=
proof
  cases (
  exact rfl
  )
qed

should be incorrect code.

But maybe I completely misunderstood...

Kyle Miller (May 03 2023 at 11:36):

Yes, it'd be incorrect code because it would parse roughly as if it were

example (n : Nat) : 2 = 2 :=
by
  focus { cases ( }
  focus { exact rfl }
  focus { ) }

where again focus { ... } means focus ... rather than using { ... }.

If you want the other parse, I'm suggesting that you have to write

example (n : Nat) : 2 = 2 :=
proof
  cases (
    exact rfl
    )
qed

instead, where indentation is what is used to segment into statements.

Mario Carneiro (May 03 2023 at 11:37):

yes, I would be opposed to that. I want to be able to turn off whitespace sensitivity locally by some syntax, because it might not always be reasonable to just keep indenting

Kyle Miller (May 03 2023 at 11:38):

(Not a response to mario) I'm suggesting something that's somewhat different from Python: it would not be aware of ( and ), and it would rely on indentation. This seems like a reasonable compromise since in Python you use f(x,y,z) notation for functions rather than f x y z, so it avoids needing end-of-line \'s.

Mario Carneiro (May 03 2023 at 11:40):

it avoids needing end-of-line \'s.

I'm not sure it does, because in your setup the line reigns supreme regarding how the sequence parses, so if you need to indicate that two lines are to be treated as one...?

Kyle Miller (May 03 2023 at 11:42):

Use more identation for continuation lines. It's not perfect, but for the purpose of making a sane environment for teaching, it seems easy to explain and understand.

Kyle Miller (May 03 2023 at 11:42):

I'm not suggesting that this design ever make it into mathlib code, to be clear.

Mario Carneiro (May 03 2023 at 11:42):

for example, let's say you want to refer to an expression containing some long identifiers and stay within line limits. In that case you might use parentheses and newlines, and quite possibly break indentation

Mario Carneiro (May 03 2023 at 11:43):

I know there are some examples of string literals that break indentation in mathlib

Kyle Miller (May 03 2023 at 11:43):

Then, sure, there could be \'s to help with that. Not every such problem can be solved with parentheses, so these would be needed at some point to solve that problem.

Kyle Miller (May 03 2023 at 11:45):

(Re string literals: I wish we had something like the Haskell-style string continuation syntax:

    "I am a\
     \ string."
-->
    "I am a string."

All the whitespace between \ and \ is deleted.)

Mario Carneiro (May 03 2023 at 11:45):

yeah this seems like an oversight

Johan Commelin (May 03 2023 at 11:45):

I'm very much in favour of creating a fool-proof proof-environment for teaching purposes. It doesn't need to be power-user-friendly.

Mario Carneiro (May 03 2023 at 11:48):

I agree that Kyle's proposal would work well in that setting, since complex tactics generally don't arise

Mario Carneiro (May 03 2023 at 11:48):

I don't know whether it is possible to literally segment the text and give fragments to the inner parsers so that they don't bleed into each other

Shreyas Srinivas (May 03 2023 at 11:50):

Mario Carneiro said:

I don't know whether it is possible to literally segment the text and give fragments to the inner parsers so that they don't bleed into each other

How would it be done reliably?

Shreyas Srinivas (May 03 2023 at 11:53):

If you have to do it with raw text then how do you distinguish real segmenting characters from spurious occurrences. If you do use the outer parser, and then all the issues in the outer-parser could be a problem (i.e. bleed into the inner parser) in the segmentation.

Mario Carneiro (May 03 2023 at 11:54):

Ah, here's what you could do:

  • First, pre-parse the text using the outer parser (line / indent segmentation) to get regions for the inner parser
  • Then use the inner parser (tactic item parser) to parse the text
    • If it returns a parse error, but the position of the error is outside its region, adjust the location of the error to the end of the region
    • If it succeeds but the end position is outside the region, report an error at the end of the region
    • If it succeeds inside the region, proceed

Shreyas Srinivas (May 03 2023 at 11:54):

Shreyas Srinivas said:

When does the high level parser know that it has to call the low level parser?
Either

  1. It doesn't run inside the tactic blocks because that's the lower level parser's job. It can only tell apart these blocks by lexemes.
  2. It does parse the tactic block just to identify it and then calls the lower level parser for tactic blocks on this same block. So it wastes effort and doesn't solve our problem because it misidentifies the end of the block, which is the problem we started with.

you mean point 2 here?

Mario Carneiro (May 03 2023 at 11:55):

The bad part is that you get subpar error messages if you have to stop the parse early. If you could modify the input text then you could call the inner parser where the end of the region is actually EOF

Mario Carneiro (May 03 2023 at 11:56):

The outer parser is not calling the inner parser during the pre-parse stage. It is just looking at lines and indentation

Mario Carneiro (May 03 2023 at 11:57):

it can't rely on the inner parser because the line could have a typo like assumption h

Shreyas Srinivas (May 03 2023 at 11:57):

Mario Carneiro said:

The outer parser is not calling the inner parser during the pre-parse stage. It is just looking at lines and indentation

how does this outer parser / inner parser thing work when you have nested proof blocks? such as Kyle's classical block?

Kyle Miller (May 03 2023 at 11:57):

Simple algorithm that works even without Lean having a lexer:

  1. The proof keyword triggers the proof block parser. For the following, assume comments count as whitespace.
  2. Ensure that there is only whitespace for the rest of the line after proof.
  3. Read lines one at a time, ensuring that they all have at least as much indentation as proof. Use indentation and \'s to combine lines into statements. Rule: all succesive lines with more indentation than the current line get combined into the statement. If any of these lines start with the keyword qed, then fail.
  4. If a line starts with the proof keyword, then recurse.
  5. Parse a qed and make sure it has the same indentation level as proof.

Now that these are segmented, give the statement text (and only the statement text) to the tactic parser. Maybe Mario's tricks might be needed, but ideally the errors can be reported as-is, or at least their positions can be translated to a position in the original source file.

Mario Carneiro (May 03 2023 at 11:59):

If any of these lines start with the keyword qed, then fail.

Not sure about this one, it means you can't have nested proof blocks at all. I would rather say that the qed has to have the same indent as the proof

Kyle Miller (May 03 2023 at 12:01):

This is meant to be covered by "recurse" in step 4. Step 5 is to find that qed. Maybe my steps aren't perfectly accurate as written, but it sounds like you know what I'm trying to say.

Shreyas Srinivas (May 03 2023 at 12:01):

Kyle Miller said:

Simple algorithm that works even without Lean having a lexer:

  1. The proof keyword triggers the proof block parser. For the following, assume comments count as whitespace.

So if I write a comment outside the proof block that contains the word proof, I am doomed.

  1. Ensure that there is only whitespace for the rest of the line after proof.

A single careless whitespace by a student wold trip them up and they would have trouble finding it. Further how would this work with nested statements and haves and let`s

  1. Read lines one at a time, ensuring that they all have at least as much indentation as proof. Use indentation and \'s to combine lines into statements. Rule: all succesive lines with more indentation than the current line get combined into the statement. If any of these lines start with the keyword qed, then fail.

What @Mario Carneiro says, and proof blocks.

  1. Parse a qed and make sure it has th esame indentation level as proof.
  2. Now that these are segmented, give the statement text (and only the statement text) to the tactic parser. Maybe Mario's tricks might be needed, but ideally the errors can be reported as-is, or at least their positions can be translated to a position in the original source file.

Mario Carneiro (May 03 2023 at 12:02):

No I mean in step 3 you will see an inner qed and fail while segmenting the outer proof

Mario Carneiro (May 03 2023 at 12:03):

I don't think it is necessary to reinvent the indentation parsing, and TBH it would go against lean's architecture a bit. It is easier just to let the lexer do its work

Mario Carneiro (May 03 2023 at 12:04):

the only tricky part, as I mentioned, is that you have to actually modify the input text when calling the inner parser, because nothing short of EOF will stop these parsers for sure

Kyle Miller (May 03 2023 at 12:04):

@Shreyas Srinivas What's your high-level argument here? I have to admit I'm baffled why you seem to be so opposed to this design. The only thing that I can really think of is that you think by ... done and requiring semicolons should be sufficient, so proving that this design could never work is what you think you need to do.

Mario Carneiro (May 03 2023 at 12:05):

also if you do just pull out a bit of text then the reported positions will be wrong, although that seems fixable

Shreyas Srinivas (May 03 2023 at 12:06):

Kyle Miller said:

Shreyas Srinivas What's your high-level argument here? I have to admit I'm baffled why you seem to be so opposed to this design. The only thing that I can really think of is that you think by ... done and requiring semicolons should be sufficient, so proving that this design could never work is what you think you need to do.

I am opposed because switching between two different parsers based on segmentation is a highly error prone way to do things. Especially when you start including nested proof blocks, statements, some term language stuff and so on, so you have to keep track of state and switch outer-parser -> inner-parser -> outer-parser -> inner-parser and so on. And since this would be in the parser, there could be a huge number of downstream errors this causes.

Mario Carneiro (May 03 2023 at 12:09):

Kyle's description should probably be paired with mine, there is no switching of parsers here. He described the behavior of the outer pre-parser (my step 1) and I described how it interacts with the inner parser

Mario Carneiro (May 03 2023 at 12:09):

Actually I would leave off Kyle's step 4, the pre-parser isn't doing any recursion

Mario Carneiro (May 03 2023 at 12:10):

the recursion would happen when the inner parser is called on a block that happens to call into the proof parser

Kyle Miller (May 03 2023 at 12:12):

I think there are two possible implementations here. I was thinking the proof parser could keep track of a tree of proof ... qed blocks, so that way the qed is allowed to violate the indentation continuation rule, but I don't really care about what the specific implementation is so long as you get localized errors.

Johan Commelin (May 03 2023 at 12:12):

@Shreyas Srinivas But Lean is designed to have all sorts of DSL's. I don't see why this is so different...

Shreyas Srinivas (May 03 2023 at 12:15):

Johan Commelin said:

Shreyas Srinivas But Lean is designed to have all sorts of DSL's. I don't see why this is so different...

As I understand it, Kyle proposes to change the parser and not write a DSL on top of the existing language

Kyle Miller (May 03 2023 at 12:15):

Shreyas Srinivas said:

So if I write a comment outside the proof block that contains the word proof, I am doomed.

No, this is impossible with the design I've been describing.

switching between two different parsers based on segmentation is a highly error prone way to do things

That may be your opinion, but I'm not sure what to say, this is roughly how Python works. :shrug:

Shreyas Srinivas (May 03 2023 at 12:21):

Kyle Miller said:

switching between two different parsers based on segmentation is a highly error prone way to do things

That may be your opinion, but I'm not sure what to say, this is roughly how Python works. :shrug:

I don't get that from here : https://devguide.python.org/internals/parser/
Python's syntax is defined by its grammar and the parser generator generates the parser. So if you propose a syntax change, it would have to be in the grammar, not in the parser internals.

Mario Carneiro (May 03 2023 at 12:31):

this is not a change to "parser internals"

Mario Carneiro (May 03 2023 at 12:31):

this is just a fancy plugin parser like Johan says

Shreyas Srinivas (May 03 2023 at 12:45):

okay... then does it in anyway change the validity of existing proofs in the current syntax? Will the current syntax remain usable? If it is a parser plug in, I can imagine this plugin being triggered only if the relevant identifiers are used, while everything else that worked before works as it is.

Mario Carneiro (May 03 2023 at 13:35):

Some tweaks still necessary, but here's a prototype:

import Lean

namespace Lean.Parser.Term
open PrettyPrinter

partial def proofQedFn (p : ParserFn) : ParserFn := fun c s =>
  let start := s.pos
  let proofCol := (c.fileMap.toPosition start).column
  let iniSz := s.stackSize
  let s := symbolFn "proof" c s
  let prev := s.stxStack.back
  if !checkTailLinebreak prev then s.mkErrorAt "newline" start else
  let line, bodyCol := c.fileMap.toPosition s.pos
  let s := if bodyCol > proofCol then
    let rec readLines s line :=
      let iniSz  := s.stackSize
      let iniPos := s.pos
      let rec readLine s line :=
        let s := tokenFn [] c s
        if s.hasError then s else
        let pos := c.fileMap.toPosition s.pos
        if pos.line  line && pos.column  bodyCol then s else
        readLine s pos.line
      let s := readLine s line
      if s.hasError then s else
      let endPos := s.pos
      let s := s.restore iniSz iniPos
      let s := adaptUncacheableContextFn
        (fun c => { c with input := c.input.extract iniPos endPos }) p c { s with pos := 0 }
      let s := { s with pos := s.pos + iniPos }
      if let some e := s.errorMsg then
        if e.unexpected == "unexpected end of input" then
          { s with errorMsg := some { e with unexpected := "unexpected newline" } }
        else
          s
      else
        if s.pos  endPos then s.mkErrorAt "newline" s.pos else
        let pos := c.fileMap.toPosition s.pos
        if pos.column < bodyCol then s else
        readLines s pos.line
    readLines s line
  else
    s
  if s.hasError then s else
  let s := s.mkNode nullKind (iniSz + 1)
  let pos := c.fileMap.toPosition s.pos
  let s := symbolFn "qed" c s
  if s.hasError then s else
  if pos.column  proofCol then s.mkErrorAt "the 'qed' should line up with 'proof'" s.pos else
  s.mkNode `Lean.Parser.Term.proofQed iniSz

def proofQedG (p : Parser) : Parser where
  info := nodeInfo `Lean.Parser.Term.proofQed <|
    andthenInfo (symbol "proof").info (andthenInfo p.info (symbol "qed").info)
  fn := proofQedFn p.fn

@[combinator_formatter proofQedG] def proofQedG.formatter (p : Formatter) : Formatter := p
@[combinator_parenthesizer proofQedG] def proofQedG.parenthesizer (p : Parenthesizer) : Parenthesizer := p

@[term_parser] def proofQed : Parser := proofQedG tacticParser

end Parser.Term

open Elab Term TSyntax.Compat
@[term_elab Parser.Term.proofQed] def elabProofQed : TermElab := fun stx expectedType? => do
  let tk := stx[0]
  let qed' := stx[2]
  let tacs : Array Syntax := stx[1].getArgs
  elabTerm ( `(by%$tk {%$tk $tacs* }%$qed')) expectedType?

example : True :=
proof
  skip
  trivial
qed

Mario Carneiro (May 03 2023 at 13:39):

Shreyas Srinivas said:

okay... then does it in anyway change the validity of existing proofs in the current syntax? Will the current syntax remain usable? If it is a parser plug in, I can imagine this plugin being triggered only if the relevant identifiers are used, while everything else that worked before works as it is.

It's just a file that you use to get the syntax, like any other. The only effect of this parser on unrelated code is that proof and qed are now keywords

Shreyas Srinivas (May 03 2023 at 16:29):

Mario Carneiro said:

Some tweaks still necessary, but here's a prototype:

Should we use this thread to debug this code? I have a simple example where something strange happens

Shreyas Srinivas (May 03 2023 at 16:33):

For now I will post the example here. It can be moved later to another thread if required. This is the corrected version of Patrick's original example

theorem T1 : 1 + 1 = 2 := by
  calc
    1 + 1 = 1 + 1 := by rfl;
    _ = 2 := by rfl;
  done;
-- No errors above. Also works without any semicolons

theorem T2 : 1 + 1 = 2 :=
proof
  calc
    1 + 1 = 1 + 1 := by rfl
    _ = 2 := by rfl -- error : expected '{' or indented tactic sequence
qed

Shreyas Srinivas (May 03 2023 at 17:06):

A small variation:

example : 1 + 1 = 2 :=
proof
  calc
    1 + 1 = 1 + 1 := by rfl
    _ = 2 :=
    proof
      rfl -- error : expected 'qed'
    qed
qed

Notification Bot (May 04 2023 at 00:33):

Ruben Van de Velde has marked this topic as resolved.

Sebastian Ullrich (Aug 08 2023 at 13:44):

I have proposed a UI fix for the specific "incompleteness of current declaration reported at next declaration" issue at lean4#2393

Sebastian Ullrich (Aug 08 2023 at 13:44):

Though now I see that this thread has been marked as resolved so I guess there was no need!

Shreyas Srinivas (Aug 08 2023 at 13:51):

Sebastian Ullrich said:

Though now I see that this thread has been marked as resolved so I guess there was no need!

I don't think it was quite resolved.

Shreyas Srinivas (Aug 08 2023 at 13:53):

it will include all preceding whitespace in the error message range as the expected token could be inserted at any of these places to fix the error.

What happens if there is no whitespace, for instance in the example for f3, def f3 :=def f4?

Sebastian Ullrich (Aug 08 2023 at 13:56):

What is your concern about this case? I don't see it being problematic even now

Shreyas Srinivas (Aug 08 2023 at 13:56):

Sebastian Ullrich said:

What is your concern about this case? I don't see it being problematic even now

Not concern. Trying to understand how the error highlighting will work

Shreyas Srinivas (Aug 08 2023 at 13:57):

specifically where the red squiggly line will appear in the event of no whitespace.

Sebastian Ullrich (Aug 08 2023 at 13:59):

If there is no whitespace then logically there is no change

Shreyas Srinivas (Aug 08 2023 at 14:02):

okay. thanks :)

Sebastian Ullrich (Sep 06 2023 at 09:19):

Sebastian Ullrich said:

I have proposed a UI fix for the specific "incompleteness of current declaration reported at next declaration" issue at lean4#2393

If you care about this issue, it would be great if you could give the finished PR a try: https://github.com/leanprover/lean4/pull/2393#issuecomment-1706058179
leanprover/lean4-pr-releases:pr-release-2393

Notification Bot (Sep 06 2023 at 09:19):

Sebastian Ullrich has marked this topic as unresolved.

Patrick Massot (Sep 08 2023 at 15:36):

When I tried leanprover/lean4-pr-releases:pr-release-2393, the first test I did was

theorem foo (h : P) : P := by



theorem bar (h : P) : P := h

which seems to work as expected: it puts red squiggles on all blank lines between the theorems. But if I move to

theorem foo (h : P) : P := by
  show P


theorem bar (h : P) : P := h

then only th by gets squiggled. Is that a bug @Sebastian Ullrich?

Sebastian Ullrich (Sep 08 2023 at 15:43):

Yes That is not covered by the PR, because this is a syntactically valid tactic proof. "Where exactly should 'unsolved goals' be displayed" is strictly speaking a separate issue.

Sebastian Ullrich (Sep 08 2023 at 15:44):

So let's solve the parser one first and continue from there

Patrick Massot (Sep 08 2023 at 15:45):

I'm sorry, I don't understand this answer. I asked "Is that a bug?" and you answer Yes but then the following explanation seems to say this is intended behavior.

Patrick Massot (Sep 08 2023 at 15:47):

I need to go teaching (teaching Lean 4 of course!).

Sebastian Ullrich (Sep 08 2023 at 15:49):

Then I wouldn't call it an issue! It was intentional at one point but I can see that it's not optimal, so let's continue that discussion in a separate thread/GH issue

Sebastian Ullrich (Sep 08 2023 at 15:50):

I did misread your question though, oops!

Sven Nilsen (Sep 08 2023 at 16:25):

(deleted)

Patrick Massot (Sep 08 2023 at 22:18):

I feel like I no longer understand what is the topic of this discussion :sad:

Sebastian Ullrich (Sep 11 2023 at 16:00):

@Patrick Massot Sorry for the confusion. I see now that various related issues have been discussed in this thread that will need to be addressed in the implementation in different ways. My PR aims to address the specific issue of the error message of a syntactically incomplete proof (or other declaration) only being shown at the subsequent declaration, i.e. the main issue of lean4#1971

Patrick Massot (Sep 11 2023 at 17:51):

Thanks Sebastian. It is indeed already a nice improvement.

Mac Malone (Sep 11 2023 at 18:16):

@Patrick Massot Your specific example of

example (P : Prop) (h : P) : P := by
  assumption P

could be resolved by simply checking for a linebreak after or at the end of a tacticSeq1Indented. To demonstrate:

import Lean.Parser.Extra

@[run_parser_attribute_hooks] def byEnd := Lean.Parser.checkLinebreakBefore
  "line break; `by` block is finished"

macro "by'" ts:sepBy1IndentSemicolon(tactic) byEnd : term =>
  `(by $(⟨ts.raw.getArgs⟩)*)

example (P : Prop) (h : P) : P := by'
  assumption P -- expected line break; `by` block is finished

Mac Malone (Sep 11 2023 at 18:22):

Here is a complete adaption of the by syntax:

import Lean.Parser.Tactic

open Lean Parser

def tacticSeqEnd := checkLinebreakBefore
  "line break; tactic sequence is over; did you mean to separate tactics with a `;`?"

open Tactic in
def tacticSeqIndentGt := withAntiquot (mkAntiquot "tacticSeq" ``tacticSeq) <| node ``tacticSeq <|
  tacticSeqBracketed <|> (checkColGt "indented tactic sequence" >> tacticSeq1Indented >> tacticSeqEnd)

macro:lead ppAllowUngrouped "by' " ts:tacticSeqIndentGt : term =>
  `(by $(⟨ts.raw.getArgs⟩)*)

example (P : Prop) (h : P) : P := by'
  assumption P
          -- ^ expected line break; tactic sequence is over; did you mean to separate tactics with a `;`?

Patrick Massot (Sep 11 2023 at 18:24):

Indeed this looks nice in this example. It remains to test using it on many examples.

Patrick Massot (Sep 11 2023 at 18:24):

For casual readers: this is all about a different kind of failure, specifically the first example in https://github.com/leanprover/lean4/issues/1971#issuecomment-1532094266

Sebastian Ullrich (Sep 11 2023 at 18:25):

Unfortunately I don't think this works in practice as (by assumption) is a perfectly fine subterm

Patrick Massot (Sep 11 2023 at 18:25):

Indeed.

Mac Malone (Sep 11 2023 at 18:26):

Oh, good point. This is why Patrick is correct that many examples would be needed to test this.

Mac Malone (Sep 11 2023 at 18:28):

However, I do think some like this is possible to give a cleaner error.

Mac Malone (Sep 11 2023 at 18:33):

A quick fix for the ) case would be:

import Lean.Parser.Tactic

open Lean Parser

def tacticSeqEnd := lookahead ")" <|> checkLinebreakBefore

open Tactic in
def tacticSeqIndentGt := withAntiquot (mkAntiquot "tacticSeq" ``tacticSeq) <| node ``tacticSeq <|
  tacticSeqBracketed <|> (checkColGt "indented tactic sequence" >> tacticSeq1Indented >> tacticSeqEnd)

macro:lead ppAllowUngrouped "by' " ts:tacticSeqIndentGt : term =>
  `(by $(⟨ts.raw.getArgs⟩)*)

example (P : Prop) (h : P) : P := (by' assumption) -- works

example (P : Prop) (h : P) : P := by'
  assumption P
          -- ^ expected ')' or line break

Mac Malone (Sep 11 2023 at 18:33):

I do not believe there is a builtin parser combinator to overwrite the tacticSeqEnd error with something smarter, but it could be easily written.

Mac Malone (Sep 11 2023 at 18:41):

Like this:

import Lean.Parser.Tactic

open Lean Parser

def withErrorFn (errorMsg : String) (p : ParserFn) : ParserFn := fun c s =>
  if p c s |>.hasError then s.mkUnexpectedError errorMsg else s

def withError (errorMsg : String) : Parser  Parser :=
  withFn <| withErrorFn errorMsg

def finisher :=
  "⟩" <|>  ")"  <|> "}" <|> "]" <|> "," <|> checkLinebreakBefore

def tacticSeqEnd := lookahead finisher |>
  withError "tactic sequence is over; did you mean to separate tactics with a `;`?"

open Tactic in
def tacticSeqIndentGt := withAntiquot (mkAntiquot "tacticSeq" ``tacticSeq) <| node ``tacticSeq <|
  tacticSeqBracketed <|> (checkColGt "indented tactic sequence" >> tacticSeq1Indented >> tacticSeqEnd)

macro:lead ppAllowUngrouped "by' " ts:tacticSeqIndentGt : term =>
  `(by $(⟨ts.raw.getArgs⟩)*)

example (P : Prop) (h : P) : P := (by' assumption) -- works

example (P : Prop) (h : P) : List (PLift P) :=
  [.up <| by' assumption, .up <| by' assumption] -- works

example (P : Prop) (h : P) : PProd P P :=
  by' assumption, by' assumption -- works

example (P : Prop) (h : P) : PProd P P :=
  {fst := by' assumption, snd := by' assumption} -- works

example (P : Prop) (h : P) : P :=  (by' assumption P)
                                                -- ^ tactic sequence is over...

example (P : Prop) (h : P) : P := by'
  assumption P
          -- ^ tactic sequence is over; did you mean to separate tactics with a `;`?

The parentheses are probably enough since any other finisher can be avoided by wrapping the by in parentheses, but this is more convenient.

Mac Malone (Sep 11 2023 at 19:05):

Another more complex but general solution would be to have some kind of "conditional error" that sets the error message that will be thrown if the next parser encounters an error.

Mac Malone (Sep 11 2023 at 19:12):

It could also be solved by changing tacticSeq1Indent to behave differently based on whether the by state covers more than one line.

Sebastian Ullrich (Sep 12 2023 at 09:57):

Patrick Massot said:

example (P : Prop) (h : P) : P := by
  assumption P

Say I mixed up exact and assumption. Guess the error message? Of course it is expected command, and I understand why. The proof is over after assumption and Lean tries to process whatever comes after this proof.

Note that now that the PR is merged, the error message changed to "unexpected identifier; expected command". I would argue this is at least a slight improvement: it first complains about something unexpected, i.e. hints that maybe you have simply given too much input.

Sebastian Ullrich (Sep 12 2023 at 12:43):

Patrick Massot said:

Are we really really sure we don't want to bring back the good old battle-tested begin/end? I just spent at least 10 minutes fighting a more sophisticated version of

example : 1 + 1 = 2 := by
  calc 1 + 1 = 1 + 1 := by rfl
  _ = 2 := by rfl [h]

example (P : Prop) (h : P) : P := by
  exact h

What do you expect as an error message here? The answer is an error on line 5 saying expected ':='. In my real example I used a tactic that was not rfl and giving it a list of arguments was plausible.

lean4#2532 fixes the error position. The message now is "no worse" than with assumption P above


Last updated: Dec 20 2023 at 11:08 UTC