Zulip Chat Archive
Stream: lean4
Topic: Parser for digits and uppercase letters
Damiano Testa (Aug 28 2024 at 20:32):
This question is in the context of Stacks Project tags.
It is also my first attempt at writing a parser and something has not gone the way it should. Below is what I have, but I am clearly missing something... but I do not know what! What should I do to make the code below work?
The intention is to get a parser that accepts any sequence of consecutive digits or uppercase letters.
import Lean.Elab.Command
open Lean Parser
/-- `stacksTag` is the node kind of Stacks Project Tags: a sequence of digits and
uppercase letters. -/
abbrev stacksTagKind : SyntaxNodeKind := `stacksTag
def stacksTagFn : ParserFn := fun c s =>
let i := s.pos
let st := takeWhileFn (fun c => c.isDigit || c.isUpper) c s
mkNodeToken stacksTagKind i c st
def stacksTagNoAntiquot : Parser := {
fn := stacksTagFn
info := mkAtomicInfo "stacksTag"
}
-- the commented attribute was present in the code that I pasted
--@[run_builtin_parser_attribute_hooks]
def stacksTag : Parser :=
withAntiquot (mkAntiquot "stacksTag" stacksTagKind) stacksTagNoAntiquot
-- this "works" in the sense that the parsing does what I expect it to
run_cmd
let stx ← `(stacksTag| 50B11D1S)
dbg_trace stx
-- however, I must be missing some sort of "initialization"
/-
don't know how to generate formatter for non-definition '{ info := mkAtomicInfo "stacksTag", fn := stacksTagFn }'
-/
/-- The `stacks` attribute. -/
syntax (name := stacks) "stacks " stacksTag (ppSpace str)? : attr
Damiano Testa (Aug 28 2024 at 20:37):
If you feel like educating me, I would also welcome any comment explaining what some of the functions defined above do. They are there, since they were the pieces that made ident
work that I took from core and I do not really understand how they fit together, or whether they are necessary.
Matthew Ballard (Aug 28 2024 at 20:58):
open Lean.Parser
namespace Lean.PrettyPrinter.Formatter
@[combinator_formatter stacksTagNoAntiquot] def stacksTagNoAntiquot.formatter :=
Lean.PrettyPrinter.Formatter.visitAtom stacksTagKind
end Lean.PrettyPrinter.Formatter
namespace Lean.PrettyPrinter.Parenthesizer
@[combinator_parenthesizer stacksTagNoAntiquot] def stacksTagAntiquot.parenthesizer := visitToken
end Lean.PrettyPrinter.Parenthesizer
Does this fix the error?
Damiano Testa (Aug 28 2024 at 21:02):
It appears to do, thanks!
Damiano Testa (Aug 28 2024 at 21:02):
I'll report back, in case I run into issues later on!
Matthew Ballard (Aug 28 2024 at 21:04):
I suspect anything with that attribute that is type correct silences the error. So that actual implementation (taken from scientific literals) here should be dubious
Damiano Testa (Aug 28 2024 at 21:05):
Ok, is it expected that logInfo
does not know how to print the new syntax? (Not that I would want to do that, I was just curious.)
Matthew Ballard (Aug 28 2024 at 21:17):
What monad are you in?
Damiano Testa (Aug 28 2024 at 21:17):
Where? Part of this happens in AttrM
which I think is just an abbrev for CoreM
.
Damiano Testa (Aug 28 2024 at 21:18):
Ah, the logInfo I was running it in a run_cmd
, so CommandElabM
, I guess.
Damiano Testa (Aug 28 2024 at 21:20):
Btw, while your fix works, the error messages for typing the wrong input are really harsh! You just get an "unexpected identifier", rather than a friendlier "Please, could you use digits and uppercase letters?".
Damiano Testa (Aug 28 2024 at 21:21):
(I do not think that the "bad" errors are a problem with your fix, but with me not really knowing how to implement this correctly.)
Damiano Testa (Aug 28 2024 at 21:27):
I am tempted to make the parser accept alphanum
and then let the elaborator handle the non-uppercase-letter for a better user experience.
Matthew Ballard (Aug 28 2024 at 21:37):
Damiano Testa said:
Ah, the logInfo I was running it in a
run_cmd
, soCommandElabM
, I guess.
I guess you need to elaborate the stacks tag syntax then
Matthew Ballard (Aug 28 2024 at 21:38):
Damiano Testa said:
Btw, while your fix works, the error messages for typing the wrong input are really harsh! You just get an "unexpected identifier", rather than a friendlier "Please, could you use digits and uppercase letters?".
I think it the ParserFn
is too permissive. So nothing matching is cool with it. Then moves on to looking for a string literal or closed brace
Damiano Testa (Aug 28 2024 at 21:41):
Well, the issue is that stacks 04sQ
underlines 04
in red and says unexpected token; expected ']'
.
Damiano Testa (Aug 28 2024 at 21:42):
I think that a better message is to underline 04sQ
in yellow and say "Please, use only digits and uppercase letters".
Damiano Testa (Aug 28 2024 at 21:43):
But for this, I suspect that the parser will have to accept alphanum
and then the elaborator looks at it and says, actually, not all alphanum
are good.
Damiano Testa (Aug 28 2024 at 22:00):
I went ahead and opened #16230.
Last updated: May 02 2025 at 03:31 UTC