Zulip Chat Archive
Stream: lean4
Topic: Java AST in Lean4
ohhaimark (Sep 01 2021 at 04:12):
What's the laziest way I can get a Java AST in Lean4?
Mario Carneiro (Sep 01 2021 at 04:22):
Have a java parser and print out the ast in s expressions or json or something
Mario Carneiro (Sep 01 2021 at 04:25):
Or write a java parser yourself in lean 4
Andrew Kent (Sep 01 2021 at 16:01):
I'm not sure what the laziest way is per se, but if you are wanting to define some types to represent the AST, the Java Language Specification isn't the worst document to work from at least.
If you're wanting a parser... these ANTLR Java grammars might be a helpful starting point. I don't recall if in Java you have to interleave parsing with type checking and other analyses (like you do for C or C++)... hopefully not... since that is an absolute nightmare to get right.
ohhaimark (Sep 01 2021 at 16:43):
I'm trying out the lean4 syntax
stuff, but I'm not sure how to solve something:
I have the following setup:
syntax "java" "{" java_file "}" : term
syntax (pkg_decl)? import_decl* ty_decl* : java_file
syntax ident ("." ident)* : q_name
syntax "package" q_name ";" : pkg_decl
syntax "import" (ident ".")+ ("*" <|> ident) ";" : import_decl
syntax "public" <|> "protected" <|> "private" <|> "abstract" <|> "static" <|> "final" : modifier
syntax modifier* "class" ident noWs (ty_params)? (ty_super)? class_body : ty_decl
syntax "<" (ident (ty_super)?),* ">" : ty_params
syntax "<" (ty_ref),* ">" : ty_args
syntax ("extends" ty_ref)? ("implements" ty_ref,*)? : ty_super
syntax q_name (ty_args)? : ty_ref
syntax "{" class_body_decl* "}" : class_body
syntax modifier* (field_decl <|> ctor_or_method_decl) : class_body_decl
syntax ty ident ("=" expr)? ";" : field_decl
syntax (ty_params)? (ty_ret)? ident params (ty_throws)? : ctor_or_method_decl
syntax "void" <|> ty : ty_ret
syntax "(" param,* ")" : params
syntax ty ident : param
syntax "throws" ty_ref,+ : ty_throws
def test := java {
package org.stnreu001.csc2002s;
import java.util.*;
public class HelloWorld<A> extends Comparator<A> {
public int x = 0;
public HelloWorld() {
}
}
}
But I don't know how to get the parser to distinguish between methods and fields.
Sebastian Ullrich (Sep 01 2021 at 16:49):
<|>
does not backtrack if the LHS has consumed input, so you'll want to split
syntax modifier* (field_decl <|> ctor_or_method_decl) : class_body_decl
into two rules:
syntax modifier* field_decl : class_body_decl
syntax modifier* ctor_or_method_decl : class_body_decl
optionally factoring out the common modifier*
prefix using yet another syntax category
ohhaimark (Sep 01 2021 at 17:05):
Is there a syntax category for number literals in lean? Where can I discover what syntax categories already exist?
Mario Carneiro (Sep 01 2021 at 17:06):
num
Mario Carneiro (Sep 01 2021 at 17:07):
It's not actually a syntax category, it's a parser (alias)
Mario Carneiro (Sep 01 2021 at 17:08):
They are sort of scattered all over, but that one is declared in a block with several others at the top of Lean/Parser.lean
ohhaimark (Sep 01 2021 at 18:03):
How do parsers deal with the parsing of Function<A,Function<B,C>>
so you don't have to do c++ style `Function<A,Function<B,C> >
?
Mario Carneiro (Sep 01 2021 at 19:27):
You put a hack in so that >>
counts as two >
when needed
Mario Carneiro (Sep 01 2021 at 19:28):
and I guess, >>>
counts as three >
(since java has >>>
too IIRC)
Mario Carneiro (Sep 01 2021 at 19:31):
For a lean 4 parser, I would suggest replacing ">"
in the ty_args
and ty_params
rules with a custom parser that consumes a >
character and doesn't use tokens
Mac (Sep 01 2021 at 23:58):
Mario Carneiro said:
For a lean 4 parser, I would suggest replacing
">"
in thety_args
andty_params
rules with a custom parser that consumes a>
character and doesn't use tokens
This is actually much more difficult to do than it sounds since parsers to run for syntax categories are selected based on their tokens and their is no way (that I know of to so) to say declare that a parser can eat any token that begins with a '>' . This means that if another parser that consumes the full token exists, the parser will likely choose that first. You may be able to get around this by having your parser having no declared firstToken
and a very high priority.
ohhaimark (Sep 02 2021 at 02:53):
Yay, I caused a stack overflow. It happened when I added ty_array
.
declare_syntax_cat java_file
syntax "java" "{" java_file "}" : term
declare_syntax_cat q_name
syntax ident ("." ident)* : q_name
declare_syntax_cat pkg_decl
syntax "package" q_name ";" : pkg_decl
declare_syntax_cat import_decl
syntax "import" (ident ".")+ ("*" <|> ident) ";" : import_decl
declare_syntax_cat ty_ref
declare_syntax_cat ty_args
syntax ident (ty_args)? : ty_ref
syntax "<" (ty_ref),* ">" : ty_args
declare_syntax_cat ty_prim
syntax "int" <|> "long" <|> "float" <|> "double" <|> "boolean" : ty_prim
declare_syntax_cat ty
declare_syntax_cat ty_array
syntax ty "[" "]" : ty_array
syntax ty_array : ty
syntax ty_prim : ty
syntax ty_ref : ty
declare_syntax_cat ty_ret
syntax "void" : ty_ret
syntax ty : ty_ret
syntax q_name (ty_args)? : ty_ref
declare_syntax_cat ty_super
syntax ("extends" ty_ref)? ("implements" ty_ref,*)? : ty_super
declare_syntax_cat mod_access
syntax "public" <|> "protected" <|> "private" : mod_access
declare_syntax_cat mod_class_method
syntax (mod_access)? "abstract"? "final"? : mod_class_method
declare_syntax_cat mod_static_method
syntax (mod_access)? "static" "final"? : mod_static_method
declare_syntax_cat mod_method
syntax mod_class_method : mod_method
syntax mod_static_method : mod_method
declare_syntax_cat mod_field
syntax (mod_access)? "static"? "final"? : mod_field
declare_syntax_cat mod_local_var
syntax "final"? : mod_field
declare_syntax_cat ty_params
syntax "<" (ident (ty_super)?),* ">" : ty_params
declare_syntax_cat lit
syntax num : lit
declare_syntax_cat expr
syntax lit : expr
syntax ident : expr
declare_syntax_cat var_decl
syntax ty ident ("=" expr)? ";" : var_decl
declare_syntax_cat field_decl
syntax mod_field var_decl : field_decl
declare_syntax_cat class_body_decl
syntax field_decl : class_body_decl
declare_syntax_cat class_body
syntax "{" class_body_decl* "}" : class_body
declare_syntax_cat param
syntax ty ident : param
declare_syntax_cat params
syntax "(" param,* ")" : params
declare_syntax_cat ty_throws
syntax "throws" ty_ref,+ : ty_throws
declare_syntax_cat local_var_decl
syntax mod_local_var var_decl : local_var_decl
declare_syntax_cat assign
syntax expr "=" expr ";" : assign
declare_syntax_cat stmt
declare_syntax_cat block
syntax "{" stmt* "}" : block
syntax block : stmt
syntax var_decl : stmt
syntax local_var_decl : stmt
syntax assign : stmt
declare_syntax_cat method_body
syntax ";" : method_body
syntax block : method_body
declare_syntax_cat method_decl
syntax mod_method (ty_params)? ty_ret ident params (ty_throws)? method_body : method_decl
declare_syntax_cat ctor_decl
syntax mod_class_method (ty_params)? (ty_ret)? ident params (ty_throws)? method_body : ctor_decl
declare_syntax_cat ty_decl
syntax mod_method "class" ident (ty_params)? (ty_super)? class_body : ty_decl
syntax (pkg_decl)? import_decl* ty_decl* : java_file
syntax field_decl : class_body_decl
syntax ctor_decl : class_body_decl
syntax method_decl : class_body_decl
syntax num : lit
def test := java {
package org.stnreu001.csc2002s;
import java.util.*;
public class HelloWorld<A extends Object implements List<B> > extends Comparator<A>{
public static final int x = 0;
public static void main(String[] args) {
}
public void HelloWorld() throws Exception {
int y = 3;
y = 4;
}
}
}
ohhaimark (Sep 02 2021 at 04:09):
Ah, fixed
declare_syntax_cat ty_no_array
syntax ty_prim : ty_no_array
syntax ty_ref : ty_no_array
declare_syntax_cat ty
syntax ty_no_array ("[" "]")* : ty
ohhaimark (Sep 02 2021 at 04:30):
I don't have to interleave parsing and typechecking at least, but I need to be able to differentiate type names from expression names. Would it be possible to reuse Lean4's existing binders and unification to resolve names in context and propagate information when a name is used like a type?
Mario Carneiro (Sep 02 2021 at 07:26):
Mac said:
Mario Carneiro said:
For a lean 4 parser, I would suggest replacing
">"
in thety_args
andty_params
rules with a custom parser that consumes a>
character and doesn't use tokensThis is actually much more difficult to do than it sounds since parsers to run for syntax categories are selected based on their tokens and their is no way (that I know of to so) to say declare that a parser can eat any token that begins with a '>' . This means that if another parser that consumes the full token exists, the parser will likely choose that first. You may be able to get around this by having your parser having no declared
firstToken
and a very high priority.
You can say that your parser will eat tokens >
, >>
, >>>
Mario Carneiro (Sep 02 2021 at 07:30):
@ohhaimark Where in the grammar do you need to distinguish type names from expression names? I thought java didn't inherit this cursed feature from C++
ohhaimark (Sep 02 2021 at 07:35):
In distinguishing method calls from static method calls.
typenamenotfollowingjavaconventions.staticcall()
localvariableinscope.membercall()
ohhaimark (Sep 02 2021 at 07:38):
The java spec talks about having different levels of knowledge about a name in its grammar. AmbigiousName, TypeName, PackageOrTypeName, etc.
Mario Carneiro (Sep 02 2021 at 07:45):
It seems like you can still parse that without knowing which is which, just have an ambiguous production and determine the difference at typechecking time
Sebastian Ullrich (Sep 02 2021 at 08:12):
Mac said:
Mario Carneiro said:
For a lean 4 parser, I would suggest replacing
">"
in thety_args
andty_params
rules with a custom parser that consumes a>
character and doesn't use tokensThis is actually much more difficult to do than it sounds since parsers to run for syntax categories are selected based on their tokens and their is no way (that I know of to so) to say declare that a parser can eat any token that begins with a '>' .
I don't see the issue, there are no category parsers starting with >
here
Sebastian Ullrich (Sep 02 2021 at 08:41):
@ohhaimark Btw, instead of a single-parser syntax category, you can also use
syntax ty := ty_no_array ("[" "]")*
ohhaimark (Sep 02 2021 at 09:15):
Yay
syntax q_name := ident ("." ident)*
syntax pkg_decl := "package" q_name ";"
syntax import_decl := "import" (ident ".")+ ("*" <|> ident) ";"
declare_syntax_cat ty_ref
declare_syntax_cat ty_args
syntax ident (ty_args)? : ty_ref
syntax "<" (ty_ref),* ">" : ty_args
syntax ty_prim := "int" <|> "long" <|> "float" <|> "double" <|> "boolean"
declare_syntax_cat ty_no_array
syntax ty_prim : ty_no_array
syntax ty_ref : ty_no_array
syntax ty := ty_no_array ("[" "]")*
declare_syntax_cat ty_ret
syntax "void" : ty_ret
syntax ty : ty_ret
syntax q_name (ty_args)? : ty_ref
syntax ty_super := ("extends" ty_ref)? ("implements" ty_ref,*)?
syntax mod_access := "public" <|> "protected" <|> "private"
syntax mod_class_method := (mod_access)? "abstract"? "final"?
syntax mod_static_method := (mod_access)? "static" "final"?
declare_syntax_cat mod_method
syntax mod_class_method : mod_method
syntax mod_static_method : mod_method
syntax mod_field := (mod_access)? "static"? "final"?
syntax mod_local_var := "final"?
syntax ty_params := "<" (ident (ty_super)?),* ">"
declare_syntax_cat lit
syntax num : lit
syntax param := ty ident
syntax params := "(" param,* ")"
declare_syntax_cat expr
syntax args := "(" expr,* ")"
syntax ident_or_invoke := (ty_args)? ident (args)?
syntax dot := ident_or_invoke ("." ident_or_invoke)*
syntax lit : expr
syntax dot : expr
syntax var_decl := ty ident ("=" expr)? ";"
syntax field_decl := mod_field var_decl
declare_syntax_cat class_body_decl
syntax field_decl : class_body_decl
declare_syntax_cat ty_throws
syntax "throws" ty_ref,+ : ty_throws
declare_syntax_cat method_body
declare_syntax_cat ctor_decl
syntax ctor_decl : class_body_decl
syntax mod_class_method (ty_params)? ident params (ty_throws)? method_body : ctor_decl
declare_syntax_cat method_decl
syntax method_decl : class_body_decl
syntax mod_method (ty_params)? ty_ret ident params (ty_throws)? method_body : method_decl
syntax class_body := "{" class_body_decl* "}"
syntax local_var_decl := mod_local_var var_decl
syntax assign := expr "=" expr ";"
declare_syntax_cat stmt
syntax block := "{" stmt* "}"
syntax block : stmt
syntax var_decl : stmt
syntax local_var_decl : stmt
syntax assign : stmt
syntax ";" : method_body
syntax block : method_body
syntax ty_decl := mod_method "class" ident (ty_params)? (ty_super)? class_body
syntax num : lit
syntax java_compilation_unit := (pkg_decl)? import_decl* ty_decl*
syntax "java" "{" java_compilation_unit "}" : term
I should probably start putting this in a Gist or something.
ohhaimark (Sep 02 2021 at 09:16):
def test := java {
package org.stnreu001.csc2002s;
import java.util.*;
public class Test <A extends Object implements List<B> > extends Comparator<A>{
public static final int x = 0;
public static void main(String[] args) {
List<Map<String,Double> > listOfMaps;
}
public void HelloWorld() throws Exception {
final int y = 3;
y = 4;
y.x = f(y.z);
}
}
}
ohhaimark (Sep 02 2021 at 09:28):
Can precedence help a production rule that's left recursive? e.g.
...
syntax ident_or_invoke := (ty_args)? ident (args)?
declare_syntax_cat expr
syntax dot := expr "." ident_or_invoke
syntax dot : expr
syntax ident_or_invoke : expr
Sebastian Ullrich (Sep 02 2021 at 10:17):
No. A syntax rule must be syntactically left-recursive to be handled correctly.
syntax expr "." ident_or_invoke : expr
ohhaimark (Sep 02 2021 at 10:46):
Can macro_rules for translating to AST give you pretty printing?
Sebastian Ullrich (Sep 02 2021 at 11:22):
Not directly, if I understood your question correctly, but the parser itself can contain hints for Lean's built-in pretty printer https://github.com/leanprover/lean4/blob/aba0a479ec221ad2e31de09c6e71ce82a705ec38/src/Lean/Parser/Extra.lean#L63-L78
https://github.com/leanprover/lean4/blob/aba0a479ec221ad2e31de09c6e71ce82a705ec38/src/Init/Notation.lean#L143
ohhaimark (Sep 02 2021 at 11:56):
Whats happening here
syntax "lolol" : term
inductive testlol := | lol
macro_rules
| `(lolol) => ``(testlol.lol)
def yee : testlol := lolol
#eval yee
failed to synthesize
Lean.MetaEval testlol
ohhaimark (Sep 02 2021 at 11:58):
Oh lol, has nothing to do with the macro stuff.
ohhaimark (Sep 02 2021 at 11:58):
inductive testlol := | lol
#eval (testlol.lol)
ohhaimark (Sep 02 2021 at 12:07):
Also with https://github.com/leanprover/lean4/blob/aba0a479ec221ad2e31de09c6e71ce82a705ec38/src/Init/Notation.lean#L410-L414, I see that the trivial tactic can macro expand into various other tactics. Does that mean the macro expander is backtracking?
Mario Carneiro (Sep 02 2021 at 13:18):
Sebastian Ullrich said:
ohhaimark Btw, instead of a single-parser syntax category, you can also use
syntax ty := ty_no_array ("[" "]")*
I thought you said that ("a" "b")*
was bad practice?
ohhaimark (Sep 02 2021 at 14:56):
@Mario Carneiro Why is there no MetaEval instance for the inductive definitions I make?
Mario Carneiro (Sep 02 2021 at 14:56):
because one hasn't been defined
Mario Carneiro (Sep 02 2021 at 14:57):
What is it you want to get out of evaluating the expression?
Mario Carneiro (Sep 02 2021 at 14:57):
If you want to print it then use deriving Repr
and then #eval repr testlol.lol
ohhaimark (Sep 02 2021 at 14:58):
Is there a deriving mechanism for it?
I expect inductive definitions to evaluate to themselves.
Oh, so it's also the printed output?
Can you select arbitrary reductions to be used?
Mario Carneiro (Sep 02 2021 at 14:59):
You can evaluate expressions of type testlol
, but let _ := testlol.lol; ()
isn't very useful, you presumably want to print it at the end or something
ohhaimark (Sep 02 2021 at 15:00):
So it's just defining how to pretty print?
Mario Carneiro (Sep 02 2021 at 15:00):
and for that you need deriving Repr
or ToString
(I forget if that has a deriving instance)
Mario Carneiro (Sep 02 2021 at 15:00):
yes
Mario Carneiro (Sep 02 2021 at 15:00):
You can define a printer yourself too
Mario Carneiro (Sep 02 2021 at 15:01):
Or just use the inductive as an intermediate type and construct some base type at the end like strings or tuples or something
ohhaimark (Sep 02 2021 at 15:04):
Can you have deriving for mutual inductive types?
ohhaimark (Sep 02 2021 at 15:06):
Nvm, I just didn't put enough deriving Reprs.
ohhaimark (Sep 02 2021 at 15:19):
How would I write a syntax extension that takes
def Flags := List Flag
deriving Repr Flags
and produces something like
def Flags := List Flag
instance : Repr Flags := inferInstanceAs (Repr (List Flag))
or find something which doesn't require something like this in the first place.
Mario Carneiro (Sep 03 2021 at 02:27):
cc: @Sebastian Ullrich , can we get this in lean core? Mathlib wants this too
Mac (Sep 03 2021 at 03:38):
@Mario Carneiro Isn't tis the same problem as discussed in the previous inferInstance
Zulip thread where @Sebastian Ullrich said that "Ah, well, the current API doesn't quite allow for that yet".
Mario Carneiro (Sep 03 2021 at 03:38):
yes, this is a syntax extension
Mario Carneiro (Sep 03 2021 at 03:39):
The API issue might have to do with needing to use terms instead of class names
Mario Carneiro (Sep 03 2021 at 03:40):
but for this basic example it could just be
def Flags := List Flag
deriving Repr
Leonardo de Moura (Sep 03 2021 at 18:37):
@Mario Carneiro I added basic support for deriving
at def
. We can now write
def Boo := List (String × String)
deriving BEq, Repr, DecidableEq
def M := ReaderT String (StateT Nat IO)
deriving Monad, MonadState, MonadReader
It is very simple for now. It only supports
- Type "aliases".
- Classes with a prefix of outParams followed by one parameter.
Mario Carneiro (Sep 03 2021 at 23:49):
@Leonardo de Moura Thanks!
Alcides Fonseca (Sep 08 2021 at 12:59):
A completely different take is to use an existing Java parser (https://javaparser.org or https://spoon.gforge.inria.fr) and create a tool that translates the parsed instances of the Java AST to a similar Java AST in Lean.
Last updated: Dec 20 2023 at 11:08 UTC