Zulip Chat Archive

Stream: lean4

Topic: parenthesize error: unknown constant 'null'


Mario Carneiro (Aug 20 2021 at 00:41):

MWE:

import Lean
open Lean PrettyPrinter Parenthesizer

instance : MonadQuotation Id where
  getRef              := pure Syntax.missing
  withRef             := fun _ => id
  getCurrMacroScope   := pure 0
  getMainModule       := pure `_fakeMod
  withFreshMacroScope := id


declare_syntax_cat foo
syntax "_" : foo
syntax "(" foo (" | " foo)* ")" : foo

#eval show CoreM _ from do
  let stx : Syntax := do `(foo| (_ | _))
  parenthesize (categoryParser.parenthesizer `foo 0) stx

There seems to be an issue with parenthesizing the (" | " foo)* part of the notation. (The desired result is for this to do nothing to the syntax btw.)

Mac (Aug 20 2021 at 02:54):

If you inspect the output of your eval function you will get the following:

def ex : CoreM Syntax :=
  let stx : Syntax := do `(foo| (_ | _))
  parenthesize (categoryParser.parenthesizer `foo 0) stx

#print ex
/-
def ex : CoreM Syntax :=
let stx : Syntax :=
  do
    let info ← MonadRef.mkInfoFromRefPos
    let _ ← getCurrMacroScope
    let _ ← getMainModule
    pure
        (Syntax.node `«foo(_|_)»
          #[Syntax.atom info "(", Syntax.node `foo_ #[Syntax.atom info "_"],
            Syntax.node `null #[Syntax.node `null #[Syntax.atom info "|", Syntax.node `foo_ #[Syntax.atom info "_"]]],
            Syntax.atom info ")"]);
parenthesize (categoryParser.parenthesizer `foo 0) stx
-/

Mac (Aug 20 2021 at 02:55):

The problem here is that there is no parenthesizer for the `null nodes (as null is not a real syntax kind).

Mac (Aug 20 2021 at 02:58):

From experience, I would heavily advise against using parentheticals in syntax (like (" | " foo)) and instead create a separate syntax for such elements. After all, parentheticals are just null nodes instead of named nodes.

Mac (Aug 20 2021 at 03:02):

Also, is there any particular reason why you are not using sepBy1? For example, the following works:

declare_syntax_cat foo
syntax "_" : foo
syntax "(" sepBy1(foo, "|") ")" : foo

#eval show CoreM _ from do
  let stx : Syntax := do `(foo| (_ | _))
  parenthesize (categoryParser.parenthesizer `foo 0) stx

Mario Carneiro (Aug 20 2021 at 04:02):

I wasn't able to figure out the syntax for sepBy1; I'm using that now. Regarding overuse of null nodes, what are the rules for when the parenthesizer / formatter works on them? Shouldn't the parenthesizer for «foo(_|_)» decompose the null nodes directly, rather than dispatching to a (nonexistent) null parenthesizer?

Mac (Aug 20 2021 at 05:02):

Mario Carneiro said:

I'm using that now. Regarding overuse of null nodes, what are the rules for when the parenthesizer / formatter works on them? Shouldn't the parenthesizer for «foo(_|_)» decompose the null nodes directly, rather than dispatching to a (nonexistent) null parenthesizer?

I believe the answer to the form is never (unless the parent node parenthesizer special cases it). The problem is that there is no "right" behavior -- in some cases a null node should be parenthesized in other cases they shouldn't be.

I think it erroring out is intended to be a middle ground, which says essentially: "I give up, there is no well-defined way to parenthesized this expression". This is because, unless I am mistaken, the parenthesizer erroring out its not a fatal condition for the pretty printer. I assume it has some default for cases where parenthesizing fails. However, this is all just speculation -- I have not really delved into the parenthesizer code. Honestly, you probably know more than me about how it works considered you wrote this example.

Mac (Aug 20 2021 at 05:03):

Mario Carneiro said:

I wasn't able to figure out the syntax for sepBy1

Did you try go-to-declaration on it? ;) EDIT: Apparently that goes to syntax rather than sepBy1. :(

Mac (Aug 20 2021 at 05:07):

I tend to go directly to the parser source code. Which, as sepBy1 is Syntax its definition is in Lean.Parser.Syntax.

Mario Carneiro (Aug 20 2021 at 05:55):

I think it erroring out is intended to be a middle ground, which says essentially: "I give up, there is no well-defined way to parenthesized this expression". This is because, unless I am mistaken, the parenthesizer erroring out its not a fatal condition for the pretty printer. I assume it has some default for cases where parenthesizing fails. However, this is all just speculation -- I have not really delved into the parenthesizer code. Honestly, you probably know more than me about how it works considered you wrote this example.

I'm not sure whether the official conduits handle this better, but if the parenthesizer fails then my entire program halts (or rather, I recover very far away and lose a lot of output quality), so that's not an option. If the parenthesizer fails inside a subterm, it will cause the entire term to not be parenthesized because the parenthesizer itself has no error recovery for this condition.

I believe the answer to the form is never (unless the parent node parenthesizer special cases it). The problem is that there is no "right" behavior -- in some cases a null node should be parenthesized in other cases they shouldn't be.

The reason I find this argument suspicious is that if I add an extra named node there, it works:

syntax "_" : foo
syntax bar := " | " foo
syntax "(" foo bar* ")" : foo

This is not adding any additional information that would change the task of parenthesizing. Of course having a named node there helps with looking up what to do at a bar node, but the bar parenthesizer could also just be inlined into the foo parenthesizer, and the observable effect would be the same. And with the bar parenthesizer inlined, you no longer need the node to be named, because you have the context to use the correct parenthesizer now rather than looking one up.

Mario Carneiro (Aug 20 2021 at 07:22):

I think I found the bug. The many combinator interacts a bit strangely with andthen in syntax declarations:

declare_syntax_cat foo

syntax ("a" "b")* : foo
#eval println!"{(`(foo| a b a b) : Id _)}"
-- (Lean.Elab.Term.fooAB [["a" "b"] ["a" "b"]])

syntax ("c")* : foo
#eval println!"{(`(foo| c c) : Id _)}"
-- (Lean.Elab.Term.fooC ["c" "c"])

Note that it creates an extra nullKind level in the first case but not the second. This makes sense from a conserving-nodes perspective, there is no point in a null node with only one child, but it means that manyNoAntiquot.parenthesizer is incorrect for the case where the nested parser is an andthen of stuff because it needs an extra layer of visitArgs to traverse the extra layer of nodes.

Sebastian Ullrich (Aug 20 2021 at 07:37):

Yes :( . Any combinator that uses many stx should put the result in a new group if there is more than one parser. This is not done consistently yet.

Mario Carneiro (Aug 20 2021 at 07:44):

I feel like it should be possible to fix the syntax traversal to be able to handle this correctly. An arbitrary ParserDescr could either be describing a parser that returns a Syntax or a parser that returns an Array Syntax, and it seems to conflate the two at some points. I think, in the present case, you could address the problem by having many.parenthesizer set a flag in the parenthesizer state saying "we are expecting a node parser, not an array parser", and then if the flag is set you call visitArgs and reset the flag in any combinator that is supposed to produce more than one node (like andthen)

Sebastian Ullrich (Aug 20 2021 at 07:51):

No, that many even accepts a parser of arity > 1 was a hack that should be reverted. We should not conflate andthen with any other combinators.

Mario Carneiro (Aug 20 2021 at 07:57):

What's the proposed alternative? Is the arity of a parser even a static quantity (since orelse makes things complicated)?

Sebastian Ullrich (Aug 20 2021 at 07:57):

Really what we should ideally do is to track the "arity" (# of returned nodes) of a ParserDescr in its type. I've already tried this before with Parser, but it breaks the coercion from String.

Sebastian Ullrich (Aug 20 2021 at 07:58):

Sebastian Ullrich said:

Any combinator that uses many stx should put the result in a new group if there is more than one parser.

This is the simple solution

Mario Carneiro (Aug 20 2021 at 07:59):

Would it help if there was an explicit node combinator?

Sebastian Ullrich (Aug 20 2021 at 07:59):

The two sides of an orelse should definitely have the same arity. We had to fix that in our grammar before.

Mario Carneiro (Aug 20 2021 at 07:59):

i.e. (node "a" "b")*

Sebastian Ullrich (Aug 20 2021 at 08:01):

If that produces an anonymous node, is there an advantage over doing so implicitly?

Mario Carneiro (Aug 20 2021 at 08:01):

unclear

Mario Carneiro (Aug 20 2021 at 08:01):

it might solve some coercion issues

Sebastian Ullrich (Aug 20 2021 at 08:01):

Btw, wrapping each many stx in a unique kind would solve the orelse issue I mentioned on GH

Mario Carneiro (Aug 20 2021 at 08:03):

As long as you have the design such that it can support compile time known arities, you don't necessarily need it to be part of the type, it can be a function on ParserDescr with some compile time validation for things like orelse

Sebastian Ullrich (Aug 20 2021 at 08:05):

Yes, that would be nice to have for Parser. For ParserDescr I don't see an issue with tracking it in the type.

Mario Carneiro (Aug 20 2021 at 08:07):

do you care about the exact arity or just 0, 1, 2+?

Mario Carneiro (Aug 20 2021 at 08:08):

I just found this code in my syntax specification, is this violating the arity requirement?

syntax notation3Item := strLit <|> bindersItem <|> (ident (identScope)?)

Sebastian Ullrich (Aug 20 2021 at 08:12):

Should be fine since the parentheses induce a node?

Sebastian Ullrich (Aug 20 2021 at 08:15):

On top of arity (0,1,2+ might be fine, not sure if it's necessary though), we'd also like to statically know the possible returned kind(s) in case of arity=1. orelse should not be allowed to be used if the sides' kinds are not disjoint. This would eliminate the terrible backtracking the pretty printer has to do for each orelse.

Mario Carneiro (Aug 20 2021 at 08:18):

Sebastian Ullrich said:

Should be fine since the parentheses induce a node?

They don't? At least (("a" "b") ("c" "d")) produces a flat parse ["a" "b" "c" "d"]

Mario Carneiro (Aug 20 2021 at 08:19):

not the reaction I expected from the author of the syntax system :P

Sebastian Ullrich (Aug 20 2021 at 08:20):

I might not have written that exact part :grinning_face_with_smiling_eyes:

Mario Carneiro (Aug 20 2021 at 08:21):

Sebastian Ullrich said:

On top of arity (0,1,2+ might be fine, not sure if it's necessary though), we'd also like to statically know the possible returned kind(s) in case of arity=1. orelse should not be allowed to be used if the sides' kinds are not disjoint. This would eliminate the terrible backtracking the pretty printer has to do for each orelse.

That sounds tough to put in the type system

Mario Carneiro (Aug 20 2021 at 08:22):

I mean you could but I would feel bad for the poor kernel

Sebastian Ullrich (Aug 20 2021 at 08:23):

While Parser is pretty low-level, for stx we really want to make sure that there is no way to misuse it. I think this is only achievable if, on top of grouping every many stx, we also give each of them a unique kind as mentioned above. That would introduce a few unnecessary kinds, but I don't think that would be an issue in practice.

Mario Carneiro (Aug 20 2021 at 08:25):

What I mean is, you can do some of this as post hoc validation rather than dependent types

Sebastian Ullrich (Aug 20 2021 at 08:26):

sure

Mario Carneiro (Aug 20 2021 at 08:26):

for example you can do the checking in the syntax command while it's constructing the ParserDescr

Sebastian Ullrich (Aug 20 2021 at 16:07):

Mario Carneiro said:

Would it help if there was an explicit node combinator?

There is group("a" "b") btw


Last updated: Dec 20 2023 at 11:08 UTC