Zulip Chat Archive

Stream: general

Topic: parsing strings into inductive types


Kayla Thomas (Aug 18 2023 at 05:34):

I am looking to parse strings representing formulas in first order logic into an inductive type. I found a couple of libraries related to parsing, Lean.Data.Parsec and Lean.Parser.Basic. It seems that the latter may be designed for the syntax of Lean itself and therefore may not be what I am looking for? Is there information on how to use Lean.Data.Parsec? Are there others? It would be nice to eventually have a proof that the parsing is unique, that is, if two strings parse to the same instance of the inductive type, then the strings are identical, so I don't know if that means that I should try to write something custom? Is there a specific algorithm that would give the best running time for this grammar? Any advice is welcome.

This is the grammar:

space = ' '
left_paren = '('
right_paren = ')'
name = ('a' | ... | 'z' | 'A' | ... | 'Z' | '0' | ... | '9' | '_' | ''')+
formula_ = pred | eq | true | not | imp | forall
pred = name (left_paren name (space name)* right_paren)?
eq = left_paren name space 'eq' space name right_paren
true = 'true'
not = left_paren 'not' space formula right_paren
imp = left_paren formula space 'imp' space formula right_paren
forall = left_paren 'forall' space name space formula right_paren

This is the inductive type:

inductive Formula : Type
  | pred_ : String  List String  Formula
  | eq_ : String  String  Formula
  | true_ : Formula
  | not_ : Formula  Formula
  | imp_ : Formula  Formula  Formula
  | forall_ : String  Formula  Formula
  deriving Inhabited, DecidableEq

Andrés Goens (Aug 18 2023 at 07:35):

Hey @Kayla Thomas! Probably the easiest way to use those libraries and parse into an inductive type like that is to use them in Lean directly as syntax extensions and macros. A great resource there is the metaprogramming book. You can also call the parsing functions directly but I don't know that that's very documented, although the code is fairly readable, so you'd be more on your own there. Also, @Mac Malone also has a tool that would allow you to do that from Lean syntax extensions: You can see this thread. I haven't played around with it yet so I can't comment how easy that would be (I really want to find some time soon to do so!) but it might be worth a try for your use case!

Kayla Thomas (Aug 18 2023 at 16:28):

@Andrés Goens Thank you!

Mac Malone (Aug 18 2023 at 16:33):

@Kayla Thomas / @Andrés Goens Unfortunately, I am not sure my tool is sufficiently customizable at the moment to generate an inductive from Lean syntax. However, I have been looking for more potential use cases to adapt the tool to, so I would be happy to take a look at supporting this one! I can probably spend some of this weekend and this coming Monday trying this out.

Kayla Thomas (Aug 18 2023 at 16:46):

Thank you. I'm not certain I fully understand everything mentioned here, so I will expand the details of my use case, and ask if it still fits. I am working on writing a simple proof checker in Lean that will read in text files containing formulas in this syntax as part of a list of proof steps. The hope is to compile the Lean code and run it as a standalone executable.

Mac Malone (Aug 18 2023 at 16:50):

@Kayla Thomas One factor in what is the best solution here is how closely you wish to follow this grammar. For instance, does name need to follow that pattern or would a Lean ident be sufficient? if you are willing, to adjust the grammar a little, it could be easily set up as a Lean DSL. For example,

declare_syntax_cat formula
syntax ident ("(" ident+ ")")? : formula
syntax "(" ident " eq " ident ")" : formula
syntax "true" : formula
syntax "(" "not " formula ")" : formula
syntax "(" formula " imp " formula  ")" : formula
syntax "(" "forall " ident formula  ")" : formula

Mac Malone (Aug 18 2023 at 16:51):

Sorry, posted that before reading your additional context.

Mac Malone (Aug 18 2023 at 16:51):

In that case, a grammar like the one above plus Partax should be able to help.

Mac Malone (Aug 18 2023 at 16:56):

@Kayla Thomas Are you planning to support Windows with your standalone executable? If so, you will need a fix in Lean, because currently Lean executable do run outside a MSYS2 environment on Windows. My apologies, I confused a Lean built from the source with the release Lean.

Kayla Thomas (Aug 18 2023 at 16:59):

No, I don't think so, at least I'm not worried about it at this point.

Kayla Thomas (Aug 18 2023 at 17:00):

I am guessing that my name grammar is probably a subset of a Lean ident?

Andrés Goens (Aug 18 2023 at 17:02):

Kayla Thomas said:

I am guessing that my name grammar is probably a subset of a Lean ident?

not really, for example 0_1 would be a name in your grammar above, and it's not a Lean ident (but maybe you don't want it to be either)?

Kayla Thomas (Aug 18 2023 at 17:02):

I would be alright with disallowing it.

Andrés Goens (Aug 18 2023 at 17:03):

for what it's worth, I at least would find that use case very useful @Mac Malone! I probably have some time to look into it next week as well, would be happy to do so

Mac Malone (Aug 18 2023 at 17:05):

Kayla Thomas said:

No, I don't think so, at least I'm not worried about it at this point.

In that case, the second question is how much do you can about whitespace? You mentioned that you want the parsing to be unique. So I guess you probably want to be strict about that?

Kayla Thomas (Aug 18 2023 at 17:10):

I am on the fence about that. If it is not too difficult to have the option of being strict about white space, that would be ideal maybe.

Kayla Thomas (Aug 19 2023 at 04:36):

@Mac Malone Does being strict about the amount of whitespace make it harder? Where does it break down? Does the Lean syntax not support that?

Mac Malone (Aug 19 2023 at 19:15):

@Kayla Thomas Lean does support it, it just requires more annotations. For example:

declare_syntax_cat formula
syntax ident ("(" many1WsIdent ")")? : formula
syntax "(" noWs ident ws  " eq " ws ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " ws formula noWs ")" : formula
syntax "(" noWs formula ws " imp " ws formula noWs  ")" : formula
syntax "(" noWs "forall " ws ident ws formula  noWs ")" : formula

where many1WsIdent is a custom parser:

import Lean.Parser.Extra

open Lean.Parser in
def many1Ws (p : Parser) : Parser := sepBy1 p " " checkWsBefore
def many1WsIdent := many1Ws Lean.Parser.ident

Kayla Thomas (Aug 19 2023 at 19:21):

Cool. Is there a "one or more" option? That would be preferable in place of the many1Space(ident).

Kayla Thomas (Aug 19 2023 at 19:22):

Or is that what that is?

Mac Malone (Aug 19 2023 at 19:27):

@Kayla Thomas I updated to explain that that is a custom parser. Still, even this formulation isn't strict on the amount of whitespace. If you want exactly 1 space it requires another custom parser.

Kayla Thomas (Aug 19 2023 at 19:28):

Thank you.

Kayla Thomas (Aug 19 2023 at 19:30):

This looks promising.

Kayla Thomas (Aug 19 2023 at 19:32):

Are ws and noWs custom too?

Mac Malone (Aug 19 2023 at 19:34):

@Kayla Thomas no they are builtin, but ws accepts any amount of whitespace.

Kayla Thomas (Aug 19 2023 at 19:39):

Hmm. I can't find them in the online api doc, but if I click on them in Visual Code they take me to checkWsBefore and checkNoWsBefore, but I'm not sure where the abbreviation or notation is defined.

Mac Malone (Aug 19 2023 at 19:47):

@Kayla Thomas it is defined through a register_parser_alias in Lean.Parser

Mac Malone (Aug 19 2023 at 19:48):

@Kayla Thomas Here is the full example with only 1 space accepted:

import Lean.Parser.Extra

section
open Lean Parser PrettyPrinter

def checkTailSpace (prev : Syntax) : Bool :=
  match prev.getTailInfo with
  | .original _ _ trailing _ => trailing == " ".toSubstring
  | _                        => false

def checkSpaceBeforeFn (errorMsg : String) : ParserFn := fun _ s =>
  let prev := s.stxStack.back
  if checkTailSpace prev then s else s.mkError errorMsg

def checkSpaceBefore (errorMsg : String := "space before") : Parser := {
  info := epsilonInfo
  fn   := checkSpaceBeforeFn errorMsg
}

@[combinator_formatter checkSpaceBefore] def checkSpaceBefore.formatter : Formatter := pure ()
@[combinator_parenthesizer checkSpaceBefore] def checkSpaceBefore.parenthesizer : Parenthesizer := pure ()

def space := checkSpaceBefore

def many1Space (p : Parser) : Parser := sepBy1 p " " space
def many1SpaceIdent := many1Space Lean.Parser.ident
end

declare_syntax_cat formula
syntax ident ("(" many1SpaceIdent ")")? : formula
syntax "(" noWs ident space  " eq " space ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " space formula noWs ")" : formula
syntax "(" noWs formula space " imp " space formula noWs  ")" : formula
syntax "(" noWs "forall " space ident space formula  noWs ")" : formula

Kayla Thomas (Aug 19 2023 at 19:53):

@Mac Malone Thank you! I think I will need to go through the metaprogramming book to understand it.

Mac Malone (Aug 19 2023 at 20:03):

@Kayla Thomas And here is a final versions that produces Formula:

import Lean.Parser.Extra

section
open Lean Parser PrettyPrinter

def checkTailSpace (prev : Syntax) : Bool :=
  match prev.getTailInfo with
  | .original _ _ trailing _ => trailing == " ".toSubstring
  | _                        => false

def checkSpaceBeforeFn (errorMsg : String) : ParserFn := fun _ s =>
  let prev := s.stxStack.back
  if checkTailSpace prev then s else s.mkError errorMsg

def checkSpaceBefore (errorMsg : String := "space before") : Parser := {
  info := epsilonInfo
  fn   := checkSpaceBeforeFn errorMsg
}

@[combinator_formatter checkSpaceBefore] def checkSpaceBefore.formatter : Formatter := pure ()
@[combinator_parenthesizer checkSpaceBefore] def checkSpaceBefore.parenthesizer : Parenthesizer := pure ()

def space := checkSpaceBefore

def many1Space (p : Parser) : Parser := sepBy1 p " " space
def many1SpaceIdent := many1Space ident
end

declare_syntax_cat formula
syntax ident ("(" many1SpaceIdent ")")? : formula
syntax "(" noWs ident space  " eq " space ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " space formula noWs ")" : formula
syntax "(" noWs formula space " imp " space formula noWs  ")" : formula
syntax "(" noWs "forall " space ident space formula  noWs ")" : formula

open Lean (Name)

inductive Formula : Type
| pred : Name  List Name  Formula
| eq : Name  Name  Formula
| true : Formula
| not : Formula  Formula
| imp : Formula  Formula  Formula
| forall : Name  Formula  Formula
deriving Inhabited, DecidableEq, Repr

syntax "formula% " formula  : term
open Lean in macro_rules  | `(formula% $x) => match x with
  | `(formula| $f:ident($xs*)) =>
    let xs : Array (TSyntax `term) := xs.elemsAndSeps.map (quote ·.getId)
    ``(Formula.pred $(quote f.getId) [$xs,*])
  | `(formula| ($a eq $b)) => ``(Formula.eq $(quote a.getId) $(quote b.getId))
  | `(formula| true) => ``(Formula.true)
  | `(formula| (not $a)) => ``(Formula.not (formula% $a))
  | `(formula| ($a imp $b)) => ``(Formula.imp (formula% $a) (formula% $b))
  | `(formula| (forall $a $x)) => ``(Formula.forall $(quote a.getId) (formula% $x))
  | _ => Macro.throwUnsupported

#eval formula% (forall x f(a b c)) -- #eval formula% (forall x f(a b c)) -- Formula.forall `x (Formula.pred `f [`a, `b, `c])

Kayla Thomas (Aug 19 2023 at 20:10):

@Mac Malone Wow! Thank you!

Andrés Goens (Aug 21 2023 at 18:40):

an issue with that is that it works within Lean (with this formal% syntax) but you wouldn't be able to simply call this on a file you read from IO (unless you Link with Lean and build this into the environment), or maybe use Partax? @Mac Malone would Partax allow @Kayla Thomas to parse this from file? do these custom parsers make it more complicated?

Mac Malone (Aug 21 2023 at 21:43):

@Andrés Goens Yes, that is the use case Partax was originally designed for. Here is the full example using Partax:

import Partax
import Lean.Parser.Extra

open Lean

section
open Parser PrettyPrinter

def checkTailSpace (prev : Syntax) : Bool :=
  match prev.getTailInfo with
  | .original _ _ trailing _ => trailing == " ".toSubstring
  | _                        => false

def checkSpaceBeforeFn (errorMsg : String) : ParserFn := fun _ s =>
  let prev := s.stxStack.back
  if checkTailSpace prev then s else s.mkError errorMsg

def checkSpaceBefore (errorMsg : String := "space before") : Parser := {
  info := epsilonInfo
  fn   := checkSpaceBeforeFn errorMsg
}

@[combinator_formatter checkSpaceBefore]
def checkSpaceBefore.formatter : Formatter := pure ()

@[combinator_parenthesizer checkSpaceBefore]
def checkSpaceBefore.parenthesizer : Parenthesizer := pure ()

def space := checkSpaceBefore

def many1Space (p : Parser) : Parser := sepBy1 p " " space
def many1SpaceIdent := many1Space ident
end

declare_syntax_cat formula
syntax ident ("(" many1SpaceIdent ")")? : formula
syntax "(" noWs ident space  " eq " space ident noWs ")" : formula
syntax "true" : formula
syntax "(" noWs "not " space formula noWs ")" : formula
syntax "(" noWs formula space " imp " space formula noWs  ")" : formula
syntax "(" noWs "forall " space ident space formula  noWs ")" : formula

inductive Formula : Type
| pred : Name  List Name  Formula
| eq : Name  Name  Formula
| true : Formula
| not : Formula  Formula
| imp : Formula  Formula  Formula
| forall : Name  Formula  Formula
deriving Inhabited, DecidableEq, Repr

syntax "formula% " formula  : term
open Lean in macro_rules  | `(formula% $x) => match x with
  | `(formula| $f:ident($xs*)) =>
    let xs : Array (TSyntax `term) := xs.elemsAndSeps.map (quote ·.getId)
    ``(Formula.pred $(quote f.getId) [$xs,*])
  | `(formula| ($a eq $b)) => ``(Formula.eq $(quote a.getId) $(quote b.getId))
  | `(formula| true) => ``(Formula.true)
  | `(formula| (not $a)) => ``(Formula.not (formula% $a))
  | `(formula| ($a imp $b)) => ``(Formula.imp (formula% $a) (formula% $b))
  | `(formula| (forall $a $x)) => ``(Formula.forall $(quote a.getId) (formula% $x))
  | _ => Macro.throwUnsupported

#eval formula% (forall x f(a b c))
  -- Formula.forall `x (Formula.pred `f [`a, `b, `c])

namespace Partax.LParse

def checkSpaceBefore (errorMsg : String := "space before") : LParseM Unit := do
  unless ( getIgnoredBefore) == " ".toSubstring do LParse.error errorMsg

partial def many1Space (p : LParseM Syntax) : LParseM Syntax := do
  go #[]
where
  go args := do
    let args := args.push ( p)
    if ( getIgnoredBefore) == " ".toSubstring then
      go args
    else
      return Lean.mkNullNode args

end Partax.LParse

open Partax in
compile_parser_category formula with {CompileConfig.lParse with
  parserAliases := CompileConfig.lParse.parserAliases
    |>.insert ``checkSpaceBefore ``LParse.checkSpaceBefore
    |>.insert ``many1Space ``LParse.many1Space
}

partial def Formula.ofSyntax? : Syntax  Except String Formula
| `(formula| $f:ident($xs*)) =>
  let xs := xs.elemsAndSeps.map (·.getId)
  return Formula.pred f.getId xs.toList
| `(formula| ($a eq $b)) =>
  return Formula.eq a.getId b.getId
| `(formula| true) =>
  return Formula.true
| `(formula| (not $a)) =>
  return Formula.not ( ofSyntax? a)
| `(formula| ($a imp $b)) =>
  return Formula.imp ( ofSyntax? a) ( ofSyntax? b)
| `(formula| (forall $a $x)) =>
  return Formula.forall a.getId ( ofSyntax? x)
| _ =>
  throw "ill-formed syntax"

#eval formula.run' "(forall x f(a b c))" >>= (Formula.ofSyntax? ·)
  -- Except.ok (Formula.forall `x (Formula.pred `f [`a, `b, `c]))

Mac Malone (Aug 21 2023 at 21:45):

However, if you just wish to parse strings outside a Lean metaprogramming context, it might just be better to write the parser yourself using the raw parser combinators provided by something like Partax.

Mac Malone (Aug 21 2023 at 21:49):

This is especially true when you are already using custom parsers in the syntax itself. One of the future plans for Partax is to provide a new syntax-like command that has better support for defining custom tokens/parsers, so that you can still use the pretty sugar, but that does not exist yet.


Last updated: Dec 20 2023 at 11:08 UTC