Zulip Chat Archive

Stream: general

Topic: precendence for do style dsls

Frederick Pu (Mar 11 2025 at 03:06):

I'm trying to make a dsl that piggy backs off of lean's do notation. I'm wondering how I can give it precendence just like do notation. Namely you can write Id.run do .... instead of Id.run (do ...). Here is the code I have so far:

import Lean

open Lean Elab Term
-- Declare syntax categories for cSeq and cItem

declare_syntax_cat cSeq

declare_syntax_cat cItem
-- Syntax for C-style integer declaration (creates a mutable variable)

syntax "int" ident "=" term ";" : cItem

syntax doSeq : cItem -- for debugging purposes only (shouldn't need this once dsl is fully defined)

syntax cItem : cSeq
syntax cItem cSeq : cSeq
syntax "c_do" cSeq : term

macro_rules
| `(c_do int $x:ident = $val; ) => `(do let mut $x := $val)
| `(c_do int $x:ident = $val; $seq:cSeq ) => `(do let mut $x := $val; c_do $seq)
| `(c_do $t:doSeq) => `(do $t)

def womp : Nat := Id.run (c_do int x = 0; return 0)

I want to be able to write Id.run c_do int x = 0; return 0

Derek Rhodes (Mar 11 2025 at 03:15):

If you're willing to use notation, then I think this may do what you want:

notation:10 "RUN " r:11 => Id.run $ r

def womp' : Nat := RUN c_do int x = 0; return 0

Frederick Pu (Mar 11 2025 at 03:15):

how does do do it?

Derek Rhodes (Mar 11 2025 at 03:16):

(the $ probably isn't necessary in the notation)

how does do do it?

Oh that's a good question, I have no idea! :)

Frederick Pu (Mar 11 2025 at 03:16):

mb we need a wiz like @Kyle Miller

Frederick Pu (Mar 11 2025 at 04:22):

actually ive switched more this approach

syntax "int" ident "=" term ";" : doElem
syntax "if" "(" term ")" "{" doSeq "}" : doElem
syntax ident "=" term ";" : doElem

macro_rules
  | `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
  | `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
  -- unexpected token '='; expected ')'
  | `(doElem| $x:ident = $y;) => `(doElem | $x := $y)

can some one pls explain why $x := $y doesnt work as a doElem?

Mario Carneiro (Mar 11 2025 at 14:55):

You should have precedences on the syntax declaration, just like in notation

Mario Carneiro (Mar 11 2025 at 14:58):

the invocation that will give you the same precedence as lean's do is

syntax:arg "c_do" cSeq : term

Mario Carneiro (Mar 11 2025 at 15:00):

Frederick Pu said:

can some one pls explain why $x := $y doesnt work as a doElem?

you need to give it a "type", $x:ident, because it's inferring $x:doElem which doesn't permit a := after it

Frederick Pu (Mar 11 2025 at 15:11):

but it's alrady $x:ident in match statement. What's the reason for this?

Patrick Massot (Mar 11 2025 at 15:16):

That’s the golden rule of antiquotations: if Lean is unhappy, add a type annotation even if it looks obvious.

Patrick Massot (Mar 11 2025 at 15:17):

They really don’t work like type annotations in ordinary Lean code.

Mario Carneiro (Mar 11 2025 at 16:11):

Frederick Pu said:

but it's alrady $x:ident in match statement. What's the reason for this?

The parser runs before typechecking, so it does not have the ability to propagate the type information while still parsing the syntax inside the quotation

Frederick Pu (Mar 11 2025 at 16:12):

but ident is a syntax category right?

Mario Carneiro (Mar 11 2025 at 16:12):

yes but the fact that x is a binding for an ident in that context is typing information

Mario Carneiro (Mar 11 2025 at 16:13):

specifically, you can see that x has type TSyntax ident in the context

Mario Carneiro (Mar 11 2025 at 16:14):

when you use the syntax $x, this is ambiguous to the parser regarding what syntax class it is supposed to be acting like

Mario Carneiro (Mar 11 2025 at 16:14):

and it would need to know the type of x to get the answer right in all situations

Mario Carneiro (Mar 11 2025 at 16:15):

so instead it just has a heuristic which is wrong a good chunk of the time

Mario Carneiro (Mar 11 2025 at 16:15):

in this case that heuristic guesses that $x is a doElem

Mario Carneiro (Mar 11 2025 at 16:15):

you override the heuristic by using the form $x:cat instead

Mario Carneiro (Mar 11 2025 at 16:17):

As Patrick says, it's not really a type annotation in the same sense as in regular lean code, because rather than using inference if you don't supply the type it uses a local heuristic (basically it looks at the first syntax category the current parser would consume at this point)

Frederick Pu (Mar 11 2025 at 16:34):

now im trying to make a macro for for loops. Any way to enforce that the first and second i is the same?

syntax "int" ident "=" term ";" : doElem
syntax "if" "(" term ")" "{" doSeq "}" : doElem
syntax ident "=" term ";" : doElem
syntax "for" "(" "int" ident "=" term ";" ident "<" term ";" ident "++" ")" "{" doSeq "}" : doElem

macro_rules
  | `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
  | `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
  | `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
  | `(doElem | for (int $i:ident = $t1:term; $i:ident < $t2:term; $i++){$seq}) => `(doElem| for $i in List.range' $t1 ($t2 - $t1) do $seq)

Frederick Pu (Mar 11 2025 at 17:38):

i ended up fixing it like this:

syntax "int" ident "=" term ";" : doElem
syntax "if" "(" term ")" "{" doSeq "}" : doElem
syntax ident "=" term ";" : doElem
syntax "for" "(" "int" ident "=" term ";" ident "<" term ";" ident "++" ")" "{" doSeq "}" : doElem

macro_rules
  | `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
  | `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
  | `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
  | `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
    if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
      Macro.throwError "i's must match"
    `(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do $seq)
def womp : Nat := Id.run do
  int x = 10;
  x = x + 1;
  if (x = 30){
    for (int j = 0; j < x; j++){
      x = x + 1;
    }
    return 0;
  }
  return 10
#print womp

how do i make the linter error for unused variable apply all of the j's that occur in the for declaration?
image.png

Mario Carneiro (Mar 11 2025 at 18:40):

it's easier to do that in an elab rather than a macro. In a macro you can fake it by having some spurious syntax that contains a use of the variable but elaborates to nothing, like haveI := j; e stuck before some subexpression

Frederick Pu (Mar 11 2025 at 18:41):

can u give an example?

Mario Carneiro (Mar 11 2025 at 18:42):

something like

    `(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do let _ := ($i1, $i2); $seq)

Frederick Pu (Mar 11 2025 at 18:45):

macro_rules
  | `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
  | `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
  | `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
  | `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
    if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
      Macro.throwError "i's must match"
    /-
      application type mismatch
        seq.raw
      argument
        seq
      has type
        TSyntax `Lean.Parser.Term.doSeq : Type
      but is expected to have type
        TSyntax `term : Type
      All Messages (4)
    -/
    `(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do let _ := ($i1, $i2) $seq)

Mario Carneiro (Mar 11 2025 at 18:46):

semicolon?

Mario Carneiro (Mar 11 2025 at 18:46):

possibly also nested do

Mario Carneiro (Mar 11 2025 at 18:46):

like ... do let := ($i1, $i2); do $seq

Frederick Pu (Mar 11 2025 at 18:47):

what does the second do do?

Mario Carneiro (Mar 11 2025 at 18:47):

it makes the doSeq into a doElem

Frederick Pu (Mar 11 2025 at 18:48):

now i get no linter errors at all

Frederick Pu (Mar 11 2025 at 18:51):

this works better:

macro_rules
  | `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
  | `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
  | `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
  | `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
    if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
      Macro.throwError "i's must match"
    `(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do let $i1 := 0; let $i2 := 0; do $seq)

Frederick Pu (Mar 11 2025 at 18:52):

i think the issue before is since all three ident are identical so the ident is actually getting used

Frederick Pu (Mar 11 2025 at 18:55):

image.png

Frederick Pu (Mar 11 2025 at 18:56):

but that breaks the semantics tho

Mario Carneiro (Mar 11 2025 at 19:01):

oh, I thought you wanted to get no linter error because it is actually used twice

Frederick Pu (Mar 11 2025 at 19:02):

hmm i never thought about it like that

Frederick Pu (Mar 11 2025 at 19:02):

what about this

macro_rules
  | `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
  | `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
  | `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
  | `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
    if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
      Macro.throwError "i's must match"
    `(doElem| for x in List.range' $t1 ($t2 - $t1) do let $i := x; let $i1 := x; let $i2 := x; do $seq)

Frederick Pu (Mar 11 2025 at 19:02):

is the x that i use guaranteed to be a new variable?

Frederick Pu (Mar 11 2025 at 19:03):

actually it doesnt work cause one of the idents gets shadowed

Frederick Pu (Mar 11 2025 at 19:04):

image.png

Frederick Pu (Mar 11 2025 at 19:06):

wait if the idents are considered different then why does your original fix get rid of the linter errors?

Frederick Pu (Mar 11 2025 at 19:07):

or does the additional let just trick the linter?

Mario Carneiro (Mar 11 2025 at 20:47):

no, it's not about the idents being different but about what they resolve to. The first i is used as the variable binder, so that's the definition site, and the other two are passed to a let that uses the variable, so those two act as uses of the variable (and e.g. you can ctrl-click on them and jump to the binder use)

Mario Carneiro (Mar 11 2025 at 20:47):

Frederick Pu said:

is the x that i use guaranteed to be a new variable?

Yes

Mario Carneiro (Mar 11 2025 at 20:49):

When you use i1 and i2 like that second example, they are not uses of the first variable but fresh binders, so ctrl-clicking on them will do nothing because they are all actually distinct binders, and uses inside the body will resolve to the last one since let $i2 will shadow the let $i and let $i1 binders

Mario Carneiro (Mar 11 2025 at 20:50):

I'm not sure why the last j is white in the photo though, which indicates that it isn't being used as either a variable binding or a use site

Mario Carneiro (Mar 11 2025 at 20:53):

There isn't that much magic going on in the first example. It's really just macro expanding to something like

for i in List.range a (b - a) do
  let __x := (i, i)
  ...

and if you were to write that directly, the linter would validly say that the first i is not unused, and the two later occurrences of i would resolve to the first one. The macro is doing exactly the same thing, it's just that the literal spans of the three i identifiers is slightly different in the text

Frederick Pu (Mar 11 2025 at 21:08):

i think the third one being white was cause i deleted one of the lets

Frederick Pu (Mar 11 2025 at 21:09):

image.png

Frederick Pu (Mar 11 2025 at 21:20):

also im not sure what u mean by binder

Mario Carneiro (Mar 11 2025 at 21:44):

a binder is something that creates a new local variable which has meaning within some scope

Mario Carneiro (Mar 11 2025 at 21:44):

In your for loop syntax, from the user's perspective, one would expect the first occurrence of j to be a binder and the other two to be a use

Mario Carneiro (Mar 11 2025 at 21:46):

another way to illustrate the difference between binders and uses is that a binder will not be an error if you put a completely new word that has not been seen in the text before like foobar1, while if you use a new word in a use site it will be an unknown variable error

Frederick Pu (Mar 12 2025 at 05:04):

so how come the two additional uses of the identifier in the un elaborated macro are associated with the two uses in elaborated result?

Mario Carneiro (Mar 12 2025 at 09:55):

that's what happens when you use i1 in the macro and embed it in the result. i1 is a Syntax object, and it contains data inside it about where in the source it came from, so that squiggles and hovers generated by the underlying expression eventually get back to the right span of text

Mario Carneiro (Mar 12 2025 at 09:58):

if you used ($i1, $i1) in the macro expansion instead of ($i1, $i2), then the source idenfifier corresponding to i1 in the text (the j in j < n) would get two type hovers which are the same, and the source identifier corresponding to i2 (the j in j++) would not be colored like a variable and have no type hovers

Frederick Pu (Mar 12 2025 at 19:17):

also this is kinda unrelated but what is Syntax. and how does the macro elaboration pipeline look like. Are all the syntax parsers parsed first and then whole thing is elaborated?

Mario Carneiro (Mar 12 2025 at 19:24):

A macro is basically a Syntax -> Syntax function

Mario Carneiro (Mar 12 2025 at 19:24):

an elab is a Syntax -> Expr function

Mario Carneiro (Mar 12 2025 at 19:25):

Syntax is the type that the parser produces from the source text, it has information in it about which grammatical constructs are used and where in the file each thing came from

Frederick Pu (Mar 12 2025 at 19:27):

so what order are all of the macros applied

Mario Carneiro (Mar 12 2025 at 19:32):

At the top level you have a command syntax object, corresponding to something like a definition or theorem or inductive type declaration. The parser parsed the text expecting a command, and this results in an AST object, that is, Syntax. The head node of that syntax is a grammar constructor produced from a parser such as Lean.Parser.Command.declaration (as a name), and this is the key which is used to look up in a list of elabs and macros. If there are any macros registered for the head symbol then that macro is run and it repeats with the result. If it's an elab then that function is run (in the case of commands it is just executed for effects).

An elab will be passed the input syntax, and will generally pattern match out parts of it and call elabTerm or other things on those AST subnodes in order to get e.g. the type and body of a declaration as Expr. This function in turn checks the head node, runs any macros, and otherwise runs an elab registered for that, which elaborates its subnodes and so on

Mario Carneiro (Mar 12 2025 at 19:32):

so the TLDR is that macros are run from the outside in

Mario Carneiro (Mar 12 2025 at 19:32):

and macros take precedence over elabs (but usually there is no overlap)

Mario Carneiro (Mar 12 2025 at 19:34):

this is also described in the other MIL book

Frederick Pu (Mar 12 2025 at 19:46):

so during the parser of the command syntax object does it call the parsers for other syntax categories. Or is the entire command syntax object parsed first?

Mario Carneiro (Mar 12 2025 at 19:47):

the entire command is parsed before any macros/elabs are run

Mario Carneiro (Mar 12 2025 at 19:47):

but parsing is indeed recursive, a parser for defs will call the parser for terms

Mario Carneiro (Mar 12 2025 at 19:49):

parser declarations tell you both the head node of the generated syntax tree as well as being functions you call to produce the syntax when some other parser decides it wants to call this one

Mario Carneiro (Mar 12 2025 at 19:50):

src#Lean.Parser.Command.declaration is a function which calls a bunch of other parsers in order to parse a node which is labeled `Lean.Parser.Command.declaration

Frederick Pu (Mar 12 2025 at 19:53):

will a parser for term call parser for doElem?

Mario Carneiro (Mar 12 2025 at 19:57):

term is an extensible grammar category. These work by having an attribute which allow you to register new things into that grammar category and all of them are tried (by some longest match rule) to parse a term. One of the term parsers is src#Lean.Parser.Term.do, which parses do followed by a doSeq, which calls doSeqIndent which calls doSeqItem which calls doElemParser which is the extensible parser for doElems.

David Thrane Christiansen (Mar 12 2025 at 20:44):

@Frederick Pu You might find it useful to read the manual section on extending Lean

Frederick Pu (Mar 14 2025 at 17:01):

thx ill take a look

Last updated: May 02 2025 at 03:31 UTC