Zulip Chat Archive
Stream: general
Topic: parsing strings into inductive types
Kayla Thomas (Aug 18 2023 at 05:34):
I am looking to parse strings representing formulas in first order logic into an inductive type. I found a couple of libraries related to parsing, Lean.Data.Parsec
and Lean.Parser.Basic
. It seems that the latter may be designed for the syntax of Lean itself and therefore may not be what I am looking for? Is there information on how to use Lean.Data.Parsec
? Are there others? It would be nice to eventually have a proof that the parsing is unique, that is, if two strings parse to the same instance of the inductive type, then the strings are identical, so I don't know if that means that I should try to write something custom? Is there a specific algorithm that would give the best running time for this grammar? Any advice is welcome.
This is the grammar:
space = ' '
left_paren = '('
right_paren = ')'
name = ('a' | ... | 'z' | 'A' | ... | 'Z' | '0' | ... | '9' | '_' | ''')+
formula_ = pred | eq | true | not | imp | forall
pred = name (left_paren name (space name)* right_paren)?
eq = left_paren name space 'eq' space name right_paren
true = 'true'
not = left_paren 'not' space formula right_paren
imp = left_paren formula space 'imp' space formula right_paren
forall = left_paren 'forall' space name space formula right_paren
This is the inductive type:
inductive Formula : Type
| pred_ : String → List String → Formula
| eq_ : String → String → Formula
| true_ : Formula
| not_ : Formula → Formula
| imp_ : Formula → Formula → Formula
| forall_ : String → Formula → Formula
deriving Inhabited, DecidableEq
Andrés Goens (Aug 18 2023 at 07:35):
Hey @Kayla Thomas! Probably the easiest way to use those libraries and parse into an inductive type like that is to use them in Lean directly as syntax extensions and macros. A great resource there is the metaprogramming book. You can also call the parsing functions directly but I don't know that that's very documented, although the code is fairly readable, so you'd be more on your own there. Also, @Mac Malone also has a tool that would allow you to do that from Lean syntax extensions: You can see this thread. I haven't played around with it yet so I can't comment how easy that would be (I really want to find some time soon to do so!) but it might be worth a try for your use case!
Kayla Thomas (Aug 18 2023 at 16:28):
@Andrés Goens Thank you!
Mac Malone (Aug 18 2023 at 16:33):
@Kayla Thomas / @Andrés Goens Unfortunately, I am not sure my tool is sufficiently customizable at the moment to generate an inductive from Lean syntax. However, I have been looking for more potential use cases to adapt the tool to, so I would be happy to take a look at supporting this one! I can probably spend some of this weekend and this coming Monday trying this out.
Kayla Thomas (Aug 18 2023 at 16:46):
Thank you. I'm not certain I fully understand everything mentioned here, so I will expand the details of my use case, and ask if it still fits. I am working on writing a simple proof checker in Lean that will read in text files containing formulas in this syntax as part of a list of proof steps. The hope is to compile the Lean code and run it as a standalone executable.
Mac Malone (Aug 18 2023 at 16:50):
@Kayla Thomas One factor in what is the best solution here is how closely you wish to follow this grammar. For instance, does name
need to follow that pattern or would a Lean ident
be sufficient? if you are willing, to adjust the grammar a little, it could be easily set up as a Lean DSL. For example,
declare_syntax_cat formula
syntax ident ("(" ident+ ")")? : formula
syntax "(" ident " eq " ident ")" : formula
syntax "true" : formula
syntax "(" "not " formula ")" : formula
syntax "(" formula " imp " formula ")" : formula
syntax "(" "forall " ident formula ")" : formula
Mac Malone (Aug 18 2023 at 16:51):
Sorry, posted that before reading your additional context.
Mac Malone (Aug 18 2023 at 16:51):
In that case, a grammar like the one above plus Partax should be able to help.
Mac Malone (Aug 18 2023 at 16:56):
@Kayla Thomas Are you planning to support Windows with your standalone executable? If so, you will need a fix in Lean, because currently Lean executable do run outside a MSYS2 environment on Windows. My apologies, I confused a Lean built from the source with the release Lean.
Kayla Thomas (Aug 18 2023 at 16:59):
No, I don't think so, at least I'm not worried about it at this point.
Kayla Thomas (Aug 18 2023 at 17:00):
I am guessing that my name
grammar is probably a subset of a Lean ident
?
Andrés Goens (Aug 18 2023 at 17:02):
Kayla Thomas said:
I am guessing that my
name
grammar is probably a subset of a Leanident
?
not really, for example 0_1
would be a name
in your grammar above, and it's not a Lean ident
(but maybe you don't want it to be either)?
Kayla Thomas (Aug 18 2023 at 17:02):
I would be alright with disallowing it.
Andrés Goens (Aug 18 2023 at 17:03):
for what it's worth, I at least would find that use case very useful @Mac Malone! I probably have some time to look into it next week as well, would be happy to do so
Mac Malone (Aug 18 2023 at 17:05):
Kayla Thomas said:
No, I don't think so, at least I'm not worried about it at this point.
In that case, the second question is how much do you can about whitespace? You mentioned that you want the parsing to be unique. So I guess you probably want to be strict about that?
Kayla Thomas (Aug 18 2023 at 17:10):
I am on the fence about that. If it is not too difficult to have the option of being strict about white space, that would be ideal maybe.
Kayla Thomas (Aug 19 2023 at 04:36):
@Mac Malone Does being strict about the amount of whitespace make it harder? Where does it break down? Does the Lean syntax not support that?
Mac Malone (Aug 19 2023 at 19:15):
@Kayla Thomas Lean does support it, it just requires more annotations. For example:
declare_syntax_cat formula
syntax ident ("(" many1WsIdent ")")? : formula
syntax "(" noWs ident ws " eq " ws ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " ws formula noWs ")" : formula
syntax "(" noWs formula ws " imp " ws formula noWs ")" : formula
syntax "(" noWs "forall " ws ident ws formula noWs ")" : formula
where many1WsIdent
is a custom parser:
import Lean.Parser.Extra
open Lean.Parser in
def many1Ws (p : Parser) : Parser := sepBy1 p " " checkWsBefore
def many1WsIdent := many1Ws Lean.Parser.ident
Kayla Thomas (Aug 19 2023 at 19:21):
Cool. Is there a "one or more" option? That would be preferable in place of the many1Space(ident)
.
Kayla Thomas (Aug 19 2023 at 19:22):
Or is that what that is?
Mac Malone (Aug 19 2023 at 19:27):
@Kayla Thomas I updated to explain that that is a custom parser. Still, even this formulation isn't strict on the amount of whitespace. If you want exactly 1 space it requires another custom parser.
Kayla Thomas (Aug 19 2023 at 19:28):
Thank you.
Kayla Thomas (Aug 19 2023 at 19:30):
This looks promising.
Kayla Thomas (Aug 19 2023 at 19:32):
Are ws
and noWs
custom too?
Mac Malone (Aug 19 2023 at 19:34):
@Kayla Thomas no they are builtin, but ws
accepts any amount of whitespace.
Kayla Thomas (Aug 19 2023 at 19:39):
Hmm. I can't find them in the online api doc, but if I click on them in Visual Code they take me to checkWsBefore
and checkNoWsBefore
, but I'm not sure where the abbreviation or notation is defined.
Mac Malone (Aug 19 2023 at 19:47):
@Kayla Thomas it is defined through a register_parser_alias
in Lean.Parser
Mac Malone (Aug 19 2023 at 19:48):
@Kayla Thomas Here is the full example with only 1 space accepted:
import Lean.Parser.Extra
section
open Lean Parser PrettyPrinter
def checkTailSpace (prev : Syntax) : Bool :=
match prev.getTailInfo with
| .original _ _ trailing _ => trailing == " ".toSubstring
| _ => false
def checkSpaceBeforeFn (errorMsg : String) : ParserFn := fun _ s =>
let prev := s.stxStack.back
if checkTailSpace prev then s else s.mkError errorMsg
def checkSpaceBefore (errorMsg : String := "space before") : Parser := {
info := epsilonInfo
fn := checkSpaceBeforeFn errorMsg
}
@[combinator_formatter checkSpaceBefore] def checkSpaceBefore.formatter : Formatter := pure ()
@[combinator_parenthesizer checkSpaceBefore] def checkSpaceBefore.parenthesizer : Parenthesizer := pure ()
def space := checkSpaceBefore
def many1Space (p : Parser) : Parser := sepBy1 p " " space
def many1SpaceIdent := many1Space Lean.Parser.ident
end
declare_syntax_cat formula
syntax ident ("(" many1SpaceIdent ")")? : formula
syntax "(" noWs ident space " eq " space ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " space formula noWs ")" : formula
syntax "(" noWs formula space " imp " space formula noWs ")" : formula
syntax "(" noWs "forall " space ident space formula noWs ")" : formula
Kayla Thomas (Aug 19 2023 at 19:53):
@Mac Malone Thank you! I think I will need to go through the metaprogramming book to understand it.
Mac Malone (Aug 19 2023 at 20:03):
@Kayla Thomas And here is a final versions that produces Formula
:
import Lean.Parser.Extra
section
open Lean Parser PrettyPrinter
def checkTailSpace (prev : Syntax) : Bool :=
match prev.getTailInfo with
| .original _ _ trailing _ => trailing == " ".toSubstring
| _ => false
def checkSpaceBeforeFn (errorMsg : String) : ParserFn := fun _ s =>
let prev := s.stxStack.back
if checkTailSpace prev then s else s.mkError errorMsg
def checkSpaceBefore (errorMsg : String := "space before") : Parser := {
info := epsilonInfo
fn := checkSpaceBeforeFn errorMsg
}
@[combinator_formatter checkSpaceBefore] def checkSpaceBefore.formatter : Formatter := pure ()
@[combinator_parenthesizer checkSpaceBefore] def checkSpaceBefore.parenthesizer : Parenthesizer := pure ()
def space := checkSpaceBefore
def many1Space (p : Parser) : Parser := sepBy1 p " " space
def many1SpaceIdent := many1Space ident
end
declare_syntax_cat formula
syntax ident ("(" many1SpaceIdent ")")? : formula
syntax "(" noWs ident space " eq " space ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " space formula noWs ")" : formula
syntax "(" noWs formula space " imp " space formula noWs ")" : formula
syntax "(" noWs "forall " space ident space formula noWs ")" : formula
open Lean (Name)
inductive Formula : Type
| pred : Name → List Name → Formula
| eq : Name → Name → Formula
| true : Formula
| not : Formula → Formula
| imp : Formula → Formula → Formula
| forall : Name → Formula → Formula
deriving Inhabited, DecidableEq, Repr
syntax "formula% " formula : term
open Lean in macro_rules | `(formula% $x) => match x with
| `(formula| $f:ident($xs*)) =>
let xs : Array (TSyntax `term) := xs.elemsAndSeps.map (quote ·.getId)
``(Formula.pred $(quote f.getId) [$xs,*])
| `(formula| ($a eq $b)) => ``(Formula.eq $(quote a.getId) $(quote b.getId))
| `(formula| true) => ``(Formula.true)
| `(formula| (not $a)) => ``(Formula.not (formula% $a))
| `(formula| ($a imp $b)) => ``(Formula.imp (formula% $a) (formula% $b))
| `(formula| (forall $a $x)) => ``(Formula.forall $(quote a.getId) (formula% $x))
| _ => Macro.throwUnsupported
#eval formula% (forall x f(a b c)) -- #eval formula% (forall x f(a b c)) -- Formula.forall `x (Formula.pred `f [`a, `b, `c])
Kayla Thomas (Aug 19 2023 at 20:10):
@Mac Malone Wow! Thank you!
Andrés Goens (Aug 21 2023 at 18:40):
an issue with that is that it works within Lean (with this formal%
syntax) but you wouldn't be able to simply call this on a file you read from IO (unless you Link with Lean and build this into the environment), or maybe use Partax? @Mac Malone would Partax allow @Kayla Thomas to parse this from file? do these custom parsers make it more complicated?
Mac Malone (Aug 21 2023 at 21:43):
@Andrés Goens Yes, that is the use case Partax was originally designed for. Here is the full example using Partax:
import Partax
import Lean.Parser.Extra
open Lean
section
open Parser PrettyPrinter
def checkTailSpace (prev : Syntax) : Bool :=
match prev.getTailInfo with
| .original _ _ trailing _ => trailing == " ".toSubstring
| _ => false
def checkSpaceBeforeFn (errorMsg : String) : ParserFn := fun _ s =>
let prev := s.stxStack.back
if checkTailSpace prev then s else s.mkError errorMsg
def checkSpaceBefore (errorMsg : String := "space before") : Parser := {
info := epsilonInfo
fn := checkSpaceBeforeFn errorMsg
}
@[combinator_formatter checkSpaceBefore]
def checkSpaceBefore.formatter : Formatter := pure ()
@[combinator_parenthesizer checkSpaceBefore]
def checkSpaceBefore.parenthesizer : Parenthesizer := pure ()
def space := checkSpaceBefore
def many1Space (p : Parser) : Parser := sepBy1 p " " space
def many1SpaceIdent := many1Space ident
end
declare_syntax_cat formula
syntax ident ("(" many1SpaceIdent ")")? : formula
syntax "(" noWs ident space " eq " space ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " space formula noWs ")" : formula
syntax "(" noWs formula space " imp " space formula noWs ")" : formula
syntax "(" noWs "forall " space ident space formula noWs ")" : formula
inductive Formula : Type
| pred : Name → List Name → Formula
| eq : Name → Name → Formula
| true : Formula
| not : Formula → Formula
| imp : Formula → Formula → Formula
| forall : Name → Formula → Formula
deriving Inhabited, DecidableEq, Repr
syntax "formula% " formula : term
open Lean in macro_rules | `(formula% $x) => match x with
| `(formula| $f:ident($xs*)) =>
let xs : Array (TSyntax `term) := xs.elemsAndSeps.map (quote ·.getId)
``(Formula.pred $(quote f.getId) [$xs,*])
| `(formula| ($a eq $b)) => ``(Formula.eq $(quote a.getId) $(quote b.getId))
| `(formula| true) => ``(Formula.true)
| `(formula| (not $a)) => ``(Formula.not (formula% $a))
| `(formula| ($a imp $b)) => ``(Formula.imp (formula% $a) (formula% $b))
| `(formula| (forall $a $x)) => ``(Formula.forall $(quote a.getId) (formula% $x))
| _ => Macro.throwUnsupported
#eval formula% (forall x f(a b c))
-- Formula.forall `x (Formula.pred `f [`a, `b, `c])
namespace Partax.LParse
def checkSpaceBefore (errorMsg : String := "space before") : LParseM Unit := do
unless (← getIgnoredBefore) == " ".toSubstring do LParse.error errorMsg
partial def many1Space (p : LParseM Syntax) : LParseM Syntax := do
go #[]
where
go args := do
let args := args.push (← p)
if (← getIgnoredBefore) == " ".toSubstring then
go args
else
return Lean.mkNullNode args
end Partax.LParse
open Partax in
compile_parser_category formula with {CompileConfig.lParse with
parserAliases := CompileConfig.lParse.parserAliases
|>.insert ``checkSpaceBefore ``LParse.checkSpaceBefore
|>.insert ``many1Space ``LParse.many1Space
}
partial def Formula.ofSyntax? : Syntax → Except String Formula
| `(formula| $f:ident($xs*)) =>
let xs := xs.elemsAndSeps.map (·.getId)
return Formula.pred f.getId xs.toList
| `(formula| ($a eq $b)) =>
return Formula.eq a.getId b.getId
| `(formula| true) =>
return Formula.true
| `(formula| (not $a)) =>
return Formula.not (← ofSyntax? a)
| `(formula| ($a imp $b)) =>
return Formula.imp (← ofSyntax? a) (← ofSyntax? b)
| `(formula| (forall $a $x)) =>
return Formula.forall a.getId (← ofSyntax? x)
| _ =>
throw "ill-formed syntax"
#eval formula.run' "(forall x f(a b c))" >>= (Formula.ofSyntax? ·)
-- Except.ok (Formula.forall `x (Formula.pred `f [`a, `b, `c]))
Mac Malone (Aug 21 2023 at 21:45):
However, if you just wish to parse strings outside a Lean metaprogramming context, it might just be better to write the parser yourself using the raw parser combinators provided by something like Partax.
Mac Malone (Aug 21 2023 at 21:49):
This is especially true when you are already using custom parsers in the syntax itself. One of the future plans for Partax is to provide a new syntax
-like command that has better support for defining custom tokens/parsers, so that you can still use the pretty sugar, but that does not exist yet.
Last updated: Dec 20 2023 at 11:08 UTC