Zulip Chat Archive
Stream: lean4
Topic: infinite parsing loops
Eric Paul (Jan 05 2026 at 08:24):
Adding the following syntax declaration leads to an infinite loop in parsing:
syntax "test",* : command
a
As does using many instead of sepBy.
The reason is that the added syntax declaration is the best parser to run. So it runs it and then creates an error. But since it is optional to have a "test", it restores the state (i.e. removes the error and resets the position). Therefore, the result of attempting to parse a command is the same starting position with no error message. And so it loops.
There is already a check in parseCommand to prevent looping by consuming input if the position hasn't changed and there is an error. It seems to me that a reasonable fix would be to move this check to also occur in the case where this no error.
Eric Paul (Jan 05 2026 at 22:18):
Essentially the same issue can cause an infinite loop in parsing in other syntax categories, meaning that the fix I'd mentioned above is not actually sufficient. (Really the change is not that it is in a different syntax category but that the infinite loop happens in a different place.)
For example, the following also gets stuck in an infinite loop:
declare_syntax_cat testing
syntax testing "test",* : testing
syntax "lead" : testing
syntax "[[" testing "]]" : command
[[ lead ]]
The infinite loop in this example is inside of trailingLoop. The trailing parser involving sepBy is chosen as it is the best choice. As before, it errors and then restores the state. Since there is no longer an error, trailingLoop is called again and repeats.
Perhaps the trailingLoop should check to see if the successful parse did not consume input and if so, exit the loop.
Robin Arnez (Jan 06 2026 at 15:35):
Maybe it would also be reasonable to reject these syntax declaration, i.e. error that "test",* accepts the empty string?
Robin Arnez (Jan 06 2026 at 15:35):
After all, "test",+ works and does what you expect in both examples
Eric Paul (Jan 08 2026 at 00:37):
That makes sense. I worry that it might be hard to determine if a syntax declaration accepts the empty string in the right sense. For example, syntax "test",* lookahead("hi") : command will infinitely loop on the input hi and technically does not accept the empty string.
I also personally like the idea that anything I can express with syntax is a valid addition.
Last updated: Feb 28 2026 at 14:05 UTC