Zulip Chat Archive

Stream: general

Topic: Proving termination for Lean.Syntax serializiation function

Rohan Ganapavarapu (Sep 18 2024 at 03:34):

Hi,

I am currently working on writing code that serializes Lean.Syntax in the compiler (if someone else has done this, please let me know), I am new to proof assistants but have prior fp experience. I am trying to figure out how to prove termination for the following function, I am honestly completely lost:

def serializeSyntax (stx : Syntax) : Json :=
  match stx with
  | Syntax.missing =>
    Json.null
  | Syntax.atom info val =>
    Json.mkObj [
      ("type", Json.str "atom"),
      ("value", Json.str val),
      ("info", serializeSourceInfo info)
    ]
  | Syntax.ident info rawVal val preresolved =>
    Json.mkObj [
      ("type", Json.str "ident"),
      ("value", Json.str val.toString),
      ("rawValue", Json.str rawVal.toString),
      ("info", serializeSourceInfo info),
      ("preresolved", Json.arr (preresolved.toArray.map serializePreresolved))
    ]
  | Syntax.node info kind args =>
    let serializedArgs := args.map serializeSyntax
    Json.mkObj [
      ("type", Json.str "node"),
      ("kind", Json.str kind.toString),
      ("args", Json.arr serializedArgs),
      ("info", serializeSourceInfo info)
    ]
termination_by sizeOf stx
decreasing_by
  simp_wf
  _ -- i have no idea how to prove this

Here is the whole file for those who are interested:

import Lean.Data.Json
import Lean.Data.Json.FromToJson
import Lean.Syntax
open Lean

def serializeSourceInfo (info : SourceInfo) : Json :=
  match info with
  | SourceInfo.none =>
    Json.null
  | SourceInfo.original leading pos tailPos trailing =>
    Json.mkObj [
      ("leading", Json.str leading.toString),
      ("pos", Json.num pos.byteIdx),
      ("tailPos", Json.str tailPos.toString),
      ("trailing", Json.num trailing.byteIdx)
    ]
  | SourceInfo.synthetic pos tailPos canonical =>
    Json.mkObj [
      ("pos", Json.num pos.byteIdx),
      ("tailPos", Json.num tailPos.byteIdx),
      ("canonical", Json.bool canonical)
    ]

def serializePreresolved (pr : Syntax.Preresolved) : Json :=
  match pr with
  | Syntax.Preresolved.namespace ns =>
    Json.mkObj [("namespace", Json.str ns.toString)]
  | Syntax.Preresolved.decl n fields =>
    Json.mkObj [
      ("decl", Json.str n.toString),
      ("fields", Json.arr (fields.toArray.map Json.str))
    ]

def serializeSyntax (stx : Syntax) : Json :=
  match stx with
  | Syntax.missing =>
    Json.null
  | Syntax.atom info val =>
    Json.mkObj [
      ("type", Json.str "atom"),
      ("value", Json.str val),
      ("info", serializeSourceInfo info)
    ]
  | Syntax.ident info rawVal val preresolved =>
    Json.mkObj [
      ("type", Json.str "ident"),
      ("value", Json.str val.toString),
      ("rawValue", Json.str rawVal.toString),
      ("info", serializeSourceInfo info),
      ("preresolved", Json.arr (preresolved.toArray.map serializePreresolved))
    ]
  | Syntax.node info kind args =>
    let serializedArgs := args.map serializeSyntax
    Json.mkObj [
      ("type", Json.str "node"),
      ("kind", Json.str kind.toString),
      ("args", Json.arr serializedArgs),
      ("info", serializeSourceInfo info)
    ]
termination_by sizeOf stx
decreasing_by
  simp_wf
  _ -- i have no idea how to prove this

Joachim Breitner (Sep 18 2024 at 09:29):

In your application, do you need to reason about that program? Else just slap partial on it (partial def …) on it and be done with it :-)

Thomas Murrills (Sep 18 2024 at 21:24):

Also, just curious, does deriving instance Lean.ToJson, Lean.FromJson for Lean.Syntax.Preresolved, String.Pos, Substring, Lean.SourceInfo, Lean.Syntax, Lean.TSyntax do what you need, or do you need to use custom json?

Also, to #xy, would serializing directly to oleans work for you, or do you need the interchange capabilities of json? If the former, you could simply use pickle and withUnpickle (docs), and this would probably be more robust than handling json. :) (I'm pretty sure Lean.Syntax doesn't contain any closures, but I haven't checked.)

Thomas Murrills (Sep 18 2024 at 21:36):

Hmm, actually, the derived json instances don't behave the way I'd hoped. While the ToJson instance seems alright (at least at first glance), it doesn't behave nicely with the corresponding FromJson instance; for example,

import Lean

deriving instance Lean.ToJson, Lean.FromJson for Lean.Syntax.Preresolved, String.Pos, Substring, Lean.SourceInfo, Lean.Syntax, Lean.TSyntax

run_meta do -- error: no inductive constructor matched
  let x ← `(tactic|simp [Bool.true])
  let json := Lean.ToJson.toJson x.raw
  match Lean.FromJson.fromJson? (α := Lean.Syntax) json with
  | .ok x' => Lean.logInfo m!"{x' == x.raw}"
  | .error s => throwError m!"{s}\n\n{json}"

Rohan Ganapavarapu (Sep 18 2024 at 21:41):

Joachim Breitner said:

In your application, do you need to reason about that program? Else just slap partial on it (partial def …) on it and be done with it :-)

Thanks. I have no idea how I didn't know about that... :man_facepalming:

Rohan Ganapavarapu (Sep 18 2024 at 21:48):

Thomas Murrills said:

Also, just curious, does deriving instance Lean.ToJson, Lean.FromJson for Lean.Syntax.Preresolved, String.Pos, Substring, Lean.SourceInfo, Lean.Syntax, Lean.TSyntax do what you need, or do you need to use custom json?

Also, to #xy, would serializing directly to oleans work for you, or do you need the interchange capabilities of json? If the former, you could simply use pickle and withUnpickle (docs), and this would probably be more robust than handling json. :) (I'm pretty sure Lean.Syntax doesn't contain any closures, but I haven't checked.)

I am trying to write a tokenizer for lean for use in a transformer model (ML), I don't think a binary format would be preferable.

I didn't know about Lean.ToJson and FromJson, sort of happy it doesn't work so my code doesn't go to waste.

Last updated: May 02 2025 at 03:31 UTC