Zulip Chat Archive
Stream: general
Topic: precendence for do style dsls
Frederick Pu (Mar 11 2025 at 03:06):
I'm trying to make a dsl that piggy backs off of lean's do notation. I'm wondering how I can give it precendence just like do notation. Namely you can write Id.run do ....
instead of Id.run (do ...)
. Here is the code I have so far:
import Lean
open Lean Elab Term
-- Declare syntax categories for cSeq and cItem
declare_syntax_cat cSeq
declare_syntax_cat cItem
-- Syntax for C-style integer declaration (creates a mutable variable)
syntax "int" ident "=" term ";" : cItem
syntax doSeq : cItem -- for debugging purposes only (shouldn't need this once dsl is fully defined)
syntax cItem : cSeq
syntax cItem cSeq : cSeq
syntax "c_do" cSeq : term
macro_rules
| `(c_do int $x:ident = $val; ) => `(do let mut $x := $val)
| `(c_do int $x:ident = $val; $seq:cSeq ) => `(do let mut $x := $val; c_do $seq)
| `(c_do $t:doSeq) => `(do $t)
def womp : Nat := Id.run (c_do int x = 0; return 0)
I want to be able to write Id.run c_do int x = 0; return 0
Derek Rhodes (Mar 11 2025 at 03:15):
If you're willing to use notation
, then I think this may do what you want:
notation:10 "RUN " r:11 => Id.run $ r
def womp' : Nat := RUN c_do int x = 0; return 0
Frederick Pu (Mar 11 2025 at 03:15):
how does do
do it?
Derek Rhodes (Mar 11 2025 at 03:16):
(the $ probably isn't necessary in the notation)
how does
do
do it?
Oh that's a good question, I have no idea! :)
Frederick Pu (Mar 11 2025 at 03:16):
mb we need a wiz like @Kyle Miller
Frederick Pu (Mar 11 2025 at 04:22):
actually ive switched more this approach
syntax "int" ident "=" term ";" : doElem
syntax "if" "(" term ")" "{" doSeq "}" : doElem
syntax ident "=" term ";" : doElem
macro_rules
| `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
| `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
-- unexpected token '='; expected ')'
| `(doElem| $x:ident = $y;) => `(doElem | $x := $y)
can some one pls explain why $x := $y
doesnt work as a doElem?
Mario Carneiro (Mar 11 2025 at 14:55):
You should have precedences on the syntax
declaration, just like in notation
Mario Carneiro (Mar 11 2025 at 14:58):
the invocation that will give you the same precedence as lean's do
is
syntax:arg "c_do" cSeq : term
Mario Carneiro (Mar 11 2025 at 15:00):
Frederick Pu said:
can some one pls explain why
$x := $y
doesnt work as a doElem?
you need to give it a "type", $x:ident
, because it's inferring $x:doElem
which doesn't permit a :=
after it
Frederick Pu (Mar 11 2025 at 15:11):
but it's alrady $x:ident
in match statement. What's the reason for this?
Patrick Massot (Mar 11 2025 at 15:16):
That’s the golden rule of antiquotations: if Lean is unhappy, add a type annotation even if it looks obvious.
Patrick Massot (Mar 11 2025 at 15:17):
They really don’t work like type annotations in ordinary Lean code.
Mario Carneiro (Mar 11 2025 at 16:11):
Frederick Pu said:
but it's alrady
$x:ident
in match statement. What's the reason for this?
The parser runs before typechecking, so it does not have the ability to propagate the type information while still parsing the syntax inside the quotation
Frederick Pu (Mar 11 2025 at 16:12):
but ident is a syntax category right?
Mario Carneiro (Mar 11 2025 at 16:12):
yes but the fact that x
is a binding for an ident in that context is typing information
Mario Carneiro (Mar 11 2025 at 16:13):
specifically, you can see that x
has type TSyntax ident
in the context
Mario Carneiro (Mar 11 2025 at 16:14):
when you use the syntax $x
, this is ambiguous to the parser regarding what syntax class it is supposed to be acting like
Mario Carneiro (Mar 11 2025 at 16:14):
and it would need to know the type of x
to get the answer right in all situations
Mario Carneiro (Mar 11 2025 at 16:15):
so instead it just has a heuristic which is wrong a good chunk of the time
Mario Carneiro (Mar 11 2025 at 16:15):
in this case that heuristic guesses that $x
is a doElem
Mario Carneiro (Mar 11 2025 at 16:15):
you override the heuristic by using the form $x:cat
instead
Mario Carneiro (Mar 11 2025 at 16:17):
As Patrick says, it's not really a type annotation in the same sense as in regular lean code, because rather than using inference if you don't supply the type it uses a local heuristic (basically it looks at the first syntax category the current parser would consume at this point)
Frederick Pu (Mar 11 2025 at 16:34):
now im trying to make a macro for for loops. Any way to enforce that the first and second i is the same?
syntax "int" ident "=" term ";" : doElem
syntax "if" "(" term ")" "{" doSeq "}" : doElem
syntax ident "=" term ";" : doElem
syntax "for" "(" "int" ident "=" term ";" ident "<" term ";" ident "++" ")" "{" doSeq "}" : doElem
macro_rules
| `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
| `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
| `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
| `(doElem | for (int $i:ident = $t1:term; $i:ident < $t2:term; $i++){$seq}) => `(doElem| for $i in List.range' $t1 ($t2 - $t1) do $seq)
Frederick Pu (Mar 11 2025 at 17:38):
i ended up fixing it like this:
syntax "int" ident "=" term ";" : doElem
syntax "if" "(" term ")" "{" doSeq "}" : doElem
syntax ident "=" term ";" : doElem
syntax "for" "(" "int" ident "=" term ";" ident "<" term ";" ident "++" ")" "{" doSeq "}" : doElem
macro_rules
| `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
| `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
| `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
| `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
Macro.throwError "i's must match"
`(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do $seq)
def womp : Nat := Id.run do
int x = 10;
x = x + 1;
if (x = 30){
for (int j = 0; j < x; j++){
x = x + 1;
}
return 0;
}
return 10
#print womp
how do i make the linter error for unused variable apply all of the j's that occur in the for declaration?
image.png
Mario Carneiro (Mar 11 2025 at 18:40):
it's easier to do that in an elab rather than a macro. In a macro you can fake it by having some spurious syntax that contains a use of the variable but elaborates to nothing, like haveI := j; e
stuck before some subexpression
Frederick Pu (Mar 11 2025 at 18:41):
can u give an example?
Mario Carneiro (Mar 11 2025 at 18:42):
something like
`(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do let _ := ($i1, $i2); $seq)
Frederick Pu (Mar 11 2025 at 18:45):
macro_rules
| `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
| `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
| `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
| `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
Macro.throwError "i's must match"
/-
application type mismatch
seq.raw
argument
seq
has type
TSyntax `Lean.Parser.Term.doSeq : Type
but is expected to have type
TSyntax `term : Type
All Messages (4)
-/
`(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do let _ := ($i1, $i2) $seq)
Mario Carneiro (Mar 11 2025 at 18:46):
semicolon?
Mario Carneiro (Mar 11 2025 at 18:46):
possibly also nested do
Mario Carneiro (Mar 11 2025 at 18:46):
like ... do let := ($i1, $i2); do $seq
Frederick Pu (Mar 11 2025 at 18:47):
what does the second do
do?
Mario Carneiro (Mar 11 2025 at 18:47):
it makes the doSeq into a doElem
Frederick Pu (Mar 11 2025 at 18:48):
now i get no linter errors at all
Frederick Pu (Mar 11 2025 at 18:51):
this works better:
macro_rules
| `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
| `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
| `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
| `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
Macro.throwError "i's must match"
`(doElem| for $i:ident in List.range' $t1 ($t2 - $t1) do let $i1 := 0; let $i2 := 0; do $seq)
Frederick Pu (Mar 11 2025 at 18:52):
i think the issue before is since all three ident
are identical so the ident
is actually getting used
Frederick Pu (Mar 11 2025 at 18:55):
Frederick Pu (Mar 11 2025 at 18:56):
but that breaks the semantics tho
Mario Carneiro (Mar 11 2025 at 19:01):
oh, I thought you wanted to get no linter error because it is actually used twice
Frederick Pu (Mar 11 2025 at 19:02):
hmm i never thought about it like that
Frederick Pu (Mar 11 2025 at 19:02):
what about this
macro_rules
| `(doElem| int $x:ident = $y;) => `(doElem| let mut $x := $y)
| `(doElem| $x:ident = $y;) => `(doElem| $x:ident := $y)
| `(doElem| if ($t:term) {$seq:doSeq}) => `(doElem| if $t then $seq)
| `(doElem | for (int $i:ident = $t1:term; $i1:ident < $t2:term; $i2++){$seq}) => do
if i.getId ≠ i1.getId ∨ i1.getId ≠ i2.getId then
Macro.throwError "i's must match"
`(doElem| for x in List.range' $t1 ($t2 - $t1) do let $i := x; let $i1 := x; let $i2 := x; do $seq)
Frederick Pu (Mar 11 2025 at 19:02):
is the x
that i use guaranteed to be a new variable?
Frederick Pu (Mar 11 2025 at 19:03):
actually it doesnt work cause one of the idents gets shadowed
Frederick Pu (Mar 11 2025 at 19:04):
Frederick Pu (Mar 11 2025 at 19:06):
wait if the idents are considered different then why does your original fix get rid of the linter errors?
Frederick Pu (Mar 11 2025 at 19:07):
or does the additional let
just trick the linter?
Mario Carneiro (Mar 11 2025 at 20:47):
no, it's not about the idents being different but about what they resolve to. The first i
is used as the variable binder, so that's the definition site, and the other two are passed to a let that uses the variable, so those two act as uses of the variable (and e.g. you can ctrl-click on them and jump to the binder use)
Mario Carneiro (Mar 11 2025 at 20:47):
Frederick Pu said:
is the
x
that i use guaranteed to be a new variable?
Yes
Mario Carneiro (Mar 11 2025 at 20:49):
When you use i1
and i2
like that second example, they are not uses of the first variable but fresh binders, so ctrl-clicking on them will do nothing because they are all actually distinct binders, and uses inside the body will resolve to the last one since let $i2
will shadow the let $i
and let $i1
binders
Mario Carneiro (Mar 11 2025 at 20:50):
I'm not sure why the last j
is white in the photo though, which indicates that it isn't being used as either a variable binding or a use site
Mario Carneiro (Mar 11 2025 at 20:53):
There isn't that much magic going on in the first example. It's really just macro expanding to something like
for i in List.range a (b - a) do
let __x := (i, i)
...
and if you were to write that directly, the linter would validly say that the first i
is not unused, and the two later occurrences of i
would resolve to the first one. The macro is doing exactly the same thing, it's just that the literal spans of the three i
identifiers is slightly different in the text
Frederick Pu (Mar 11 2025 at 21:08):
i think the third one being white was cause i deleted one of the lets
Frederick Pu (Mar 11 2025 at 21:09):
Frederick Pu (Mar 11 2025 at 21:20):
also im not sure what u mean by binder
Mario Carneiro (Mar 11 2025 at 21:44):
a binder is something that creates a new local variable which has meaning within some scope
Mario Carneiro (Mar 11 2025 at 21:44):
In your for loop syntax, from the user's perspective, one would expect the first occurrence of j
to be a binder and the other two to be a use
Mario Carneiro (Mar 11 2025 at 21:46):
another way to illustrate the difference between binders and uses is that a binder will not be an error if you put a completely new word that has not been seen in the text before like foobar1
, while if you use a new word in a use site it will be an unknown variable error
Frederick Pu (Mar 12 2025 at 05:04):
so how come the two additional uses of the identifier in the un elaborated macro are associated with the two uses in elaborated result?
Mario Carneiro (Mar 12 2025 at 09:55):
that's what happens when you use i1
in the macro and embed it in the result. i1
is a Syntax
object, and it contains data inside it about where in the source it came from, so that squiggles and hovers generated by the underlying expression eventually get back to the right span of text
Mario Carneiro (Mar 12 2025 at 09:58):
if you used ($i1, $i1)
in the macro expansion instead of ($i1, $i2)
, then the source idenfifier corresponding to i1
in the text (the j
in j < n
) would get two type hovers which are the same, and the source identifier corresponding to i2
(the j
in j++
) would not be colored like a variable and have no type hovers
Frederick Pu (Mar 12 2025 at 19:17):
also this is kinda unrelated but what is Syntax
. and how does the macro elaboration pipeline look like. Are all the syntax parsers parsed first and then whole thing is elaborated?
Mario Carneiro (Mar 12 2025 at 19:24):
A macro is basically a Syntax -> Syntax
function
Mario Carneiro (Mar 12 2025 at 19:24):
an elab is a Syntax -> Expr
function
Mario Carneiro (Mar 12 2025 at 19:25):
Syntax
is the type that the parser produces from the source text, it has information in it about which grammatical constructs are used and where in the file each thing came from
Frederick Pu (Mar 12 2025 at 19:27):
so what order are all of the macros applied
Mario Carneiro (Mar 12 2025 at 19:32):
At the top level you have a command
syntax object, corresponding to something like a definition or theorem or inductive type declaration. The parser parsed the text expecting a command, and this results in an AST object, that is, Syntax
. The head node of that syntax is a grammar constructor produced from a parser such as Lean.Parser.Command.declaration
(as a name), and this is the key which is used to look up in a list of elabs and macros. If there are any macros registered for the head symbol then that macro is run and it repeats with the result. If it's an elab then that function is run (in the case of commands it is just executed for effects).
An elab will be passed the input syntax, and will generally pattern match out parts of it and call elabTerm
or other things on those AST subnodes in order to get e.g. the type and body of a declaration as Expr
. This function in turn checks the head node, runs any macros, and otherwise runs an elab registered for that, which elaborates its subnodes and so on
Mario Carneiro (Mar 12 2025 at 19:32):
so the TLDR is that macros are run from the outside in
Mario Carneiro (Mar 12 2025 at 19:32):
and macros take precedence over elabs (but usually there is no overlap)
Mario Carneiro (Mar 12 2025 at 19:34):
this is also described in the other MIL book
Frederick Pu (Mar 12 2025 at 19:46):
so during the parser of the command syntax object does it call the parsers for other syntax categories. Or is the entire command syntax object parsed first?
Mario Carneiro (Mar 12 2025 at 19:47):
the entire command is parsed before any macros/elabs are run
Mario Carneiro (Mar 12 2025 at 19:47):
but parsing is indeed recursive, a parser for defs will call the parser for terms
Mario Carneiro (Mar 12 2025 at 19:49):
parser declarations tell you both the head node of the generated syntax tree as well as being functions you call to produce the syntax when some other parser decides it wants to call this one
Mario Carneiro (Mar 12 2025 at 19:50):
src#Lean.Parser.Command.declaration is a function which calls a bunch of other parsers in order to parse a node which is labeled `Lean.Parser.Command.declaration
Frederick Pu (Mar 12 2025 at 19:53):
will a parser for term call parser for doElem
?
Mario Carneiro (Mar 12 2025 at 19:57):
term
is an extensible grammar category. These work by having an attribute which allow you to register new things into that grammar category and all of them are tried (by some longest match rule) to parse a term. One of the term parsers is src#Lean.Parser.Term.do, which parses do
followed by a doSeq
, which calls doSeqIndent
which calls doSeqItem
which calls doElemParser
which is the extensible parser for doElem
s.
David Thrane Christiansen (Mar 12 2025 at 20:44):
@Frederick Pu You might find it useful to read the manual section on extending Lean
Frederick Pu (Mar 14 2025 at 17:01):
thx ill take a look
Last updated: May 02 2025 at 03:31 UTC