Zulip Chat Archive

Stream: lean4

Topic: json parser and serializer


Notification Bot (Nov 26 2022 at 22:08):

This topic was moved here from #lean4 > json parse and serializer by Alexandre Rademaker.

Marc Huisinga (Nov 26 2022 at 22:19):

import Lean.Data.Json

open Lean

structure Foo where
  version : Nat := 1
  module  : String
  deriving FromJson, ToJson

def main : IO Unit :=
  IO.println <| toString <| toJson {module := "test" : Foo}

#eval main -- {"version": 1, "module": "test"}

Please post an #mwe next time, not screenshots of your code.

Alexandre Rademaker (Nov 27 2022 at 03:30):

Thank you and sorry for the screenshot, I was trying to make clear the errors.

Sebastian Ullrich (Nov 27 2022 at 09:48):

You should have unfolded "All Messages" then :)

Alexandre Rademaker (Nov 28 2022 at 11:04):

How? Do you mean in the Lean Infoview panel? In Emacs I would probably open the Flycheck errors and copy from there.

Marc Huisinga (Nov 28 2022 at 11:06):

Click on "All messages" in the infoview panel.

Alexandre Rademaker (Nov 28 2022 at 11:28):

Thank you, sorry for the mess. Anyway, back to the main problem... now trying to go one step further, write the json in a file and read json objects one-by-one from a text file (one object per line, json-lines format).... directions for read/write files primitives and parsing ?!

I was able to parse with the Json.parse, nice. So now Just need to get to read/write files. I am trying to understand the Lean implementation comparing it to Aeson. If I got it right, the type Json correspond to the Value type from Aeson, right?

#check (Json.parse "{\"name\" : \"alexandre\", \"telefone\": 12}")

Andrés Goens (Nov 28 2022 at 14:14):

Alexandre Rademaker said:

How? Do you mean in the Lean Infoview panel? In Emacs I would probably open the Flycheck errors and copy from there.

there's also an Emacs mode for Lean (although you wouldn't use Flycheck errors there but you get a Lean Goal buffer that has those)

Andrés Goens (Nov 28 2022 at 14:18):

to do I/O and read from or write to files you can use the IO monad, also just like in Haskell.

Andrés Goens (Nov 28 2022 at 14:20):

you can't do it as simply with #check like that, it'd be easier to compile your main function (with lake) and test it there, but if you want to do it with the infoview too you can call IO from #eval a (metaprogramming) command. I wouldn't recommend going there first though, just use lake

Sebastian Ullrich (Nov 28 2022 at 14:22):

#eval does IO just fine

Alexandre Rademaker (Dec 15 2022 at 21:25):

How to report a bug? Not sure if a bug or an expected behavior. Haskell Aeson parse empty strings, but Lean reports error.

λ> decode (B.fromStrict $ C.pack "") :: Maybe Value
Nothing

Now Lean reports an error in the second case, it seems that Json.parse can't deal with empty string:

#eval Json.parse ""

shows error: offset 0: unexpected end of input

Reid Barton (Dec 15 2022 at 21:28):

It seems to me that both Aeson and Lean are telling you the empty string is not valid JSON (which I believe is correct).

Eric Wieser (Dec 15 2022 at 21:31):

Yes, the json for an empty string would be the two characters "" which to write in lean is "\"\""

Alexandre Rademaker (Dec 15 2022 at 21:32):

But the Haskell decode gives me a Maybe value. Not an error. Lean gives me an error.

Mario Carneiro (Dec 15 2022 at 21:33):

the Maybe is the error

Mario Carneiro (Dec 15 2022 at 21:33):

Aeson is just less specific about the kind of error than lean

Mario Carneiro (Dec 15 2022 at 21:34):

(honestly a parser that only yields Maybe T is pretty barebones and I am surprised a production library would do that)

Alexandre Rademaker (Dec 15 2022 at 21:35):

I see your point, they could have used Either, right?

Mario Carneiro (Dec 15 2022 at 21:36):

Looks like Aeson does provide an Either-returning variant but you have to call eitherDecode instead of decode

Alexandre Rademaker (Dec 15 2022 at 21:40):

Hum, that is quite interesting! Using Aeson for a while but I didn't realize that problem! Lean just educated me to be more careful. Reproducing Lean behaviors in Haskell:

λ> eitherDecode (B.fromStrict $ C.pack "\"\"")  :: (Either String Value)
Right (String "")
λ> eitherDecode (B.fromStrict $ C.pack "")  :: (Either String Value)
Left "Error in $: not enough input"

Marc Huisinga (Dec 15 2022 at 22:50):

This post made me revisit whether the one issue I found using a JSON test suite is fixed now, namely that readFile filters out NUL. Looks like this tiny issue is still there, as parsing n_multidigit_number_then_00.json still succeeds :grinning_face_with_smiling_eyes:

Marc Huisinga (Dec 15 2022 at 22:58):

I don't think I ever ran the transform tests, but there seems to be some weirdness with invalid codepoints that might be worth investigating:

test_transform\string_1_escaped_invalid_codepoint.json: ok: ["㠀"]
test_transform\string_1_invalid_codepoint.json: ok: ["A"]
test_transform\string_2_escaped_invalid_codepoints.json: ok: ["㠀㠀"]
test_transform\string_2_invalid_codepoints.json: ok: ["AA"]
test_transform\string_3_escaped_invalid_codepoints.json: ok: ["㠀㠀㠀"]
test_transform\string_3_invalid_codepoints.json: ok: ["AAA"]

Marc Huisinga (Dec 15 2022 at 23:01):

(This is just

import Lean.Data.Json

open System.FilePath
open IO.FS

def main : IO UInt32 := do
  let filePaths  walkDir "test_transform"
  for p in filePaths do
    let c  readFile p
    IO.println s!"{p}: {Lean.Json.parse c}"
  return 0

#eval main

btw)

Alexandre Rademaker (Jan 03 2023 at 13:21):

I ended up implementing in Haskell what I tried for 1-2 days in Lean. The Haskell code is https://gist.github.com/arademaker/35c9025f61bb5499eef65649bf0a8a65. But I would like to understand what can or still can't be ported to Lean.

The idea is simple. Read a JSON-lines file with a single JSON object per line, apply a simple transformation on each object and serialize a new JSON-lines file with simpler JSON objects on each line. I used eitherDecode, but once I confirmed that my data didn't raise any exception, I simplified the code to use only decode. I had to deal with Data.ByteString.Lazy and Data.ByteString.Char8 basically because I want to use functions like lines and putStrLn.

The Sentence and Token types were created to facilitate the parsing of JSON. Once I make them instance of FromJSON and ToJSON Aeson creates code for parsing and serializing. Is it possible in Lean?

The ugly part is that I needed the simplified types SToken and SSentence and the direct encoding approach to serialize the result data into JSON again. What are the limitations and possible improvements in the Lean implementation of JSON parser/serializer? In my simple use-case, what I could have done better in Lean? In my Lean version, I ended up in a ugly nesting of match expressions like below just to get the Tokens out of the Sentence object:

def getTokens (obj : Except String Json) :=
 match obj with
  | (Except.ok r) => Array.data $ match Json.getArr? $ r.getObjValD "tokens" with
                                  | Except.ok r => r
                                  | Except.error m => Array.mk []
  | (Except.error m) => List.nil

def test := do
  let a  IO.FS.readFile "/Users/ar/work/wn/glosstag/data/sentences/own-00.jl"
  pure (getTokens $ List.head! $ List.map Json.parse $ lines a)

The JSON-lines files can be found here. I understand that this is not a MWE... So maybe I need to refine my question further.. anyway, feedbacks will be welcome.

James Gallicchio (Jan 03 2023 at 18:22):

(deleted)


Last updated: Dec 20 2023 at 11:08 UTC