Zulip Chat Archive

Stream: lean4 dev

Topic: Regarding the API for IO.FS.Handle


Cayden Codel (Sep 17 2024 at 17:32):

Hello Lean developers. I am feeling lost/dissatisfied over the limited API for file IO in Lean, and wanted to learn more about available tools, ways the API could improve, and general strategies to effectively use Lean's IO.

Currently, I am trying to improve a research tool that I've written in Lean for SAT proof checking. One of the improvements/features I want to implement is proof streaming. This means that the Lean program will need to read a proof file as it's produced by a SAT solver (through a pipe). As a result, typical IO operations like .readToEnd will not suffice, as proofs can, in the worst case, be gigabytes or terabytes in size.

Looking over the file IO data structures and APIs in Lean, I discovered FS.Handle, which does most of what I want. However, the available operations on handles are somewhat limited: a program can request a block of bytes with read : USize -> IO ByteArray, or a program can request a line of text with getLine : IO String. Given my use case, these operations raise a few issues/questions:

  1. To use read, the program is somewhat responsible for "buffering" the input, in order to catch cases where tokens span across read calls. It's not too hard to solve this problem by engineering a BufferedHandle on top of Handle and implement C-style getc or (extremely limited) fscanf functionality. Is this approach reasonable, and/or does an implementation already exist?
  2. The getLine function returns a Lean String. Lean's strings are UTF-8 encoded. However, the data I'm working with is purely ASCII, so it would be more performant(?) to work with an iterator on a ByteArray instead. Yet I don't know of an easy way to O(1) convert String to ByteArray (unsafeCast doesn't work). Would adding a getLineBytes function to Handle's API be a wanted feature, and if so, would I submit a PR for it? Or is there a good way to treat Strings as ByteArrays?
  3. More generally, are there plans to implement C-style printf/scanf functions? Are there libraries that do this?
  4. Anything else to add to this discussion?

Cayden Codel (Sep 17 2024 at 17:34):

A quick search on Zulip shows that there has been some recent work on Lean's file IO (e.g., see this PR, or the Socket.lean repository). But I wanted to get a fuller picture of things, and so I felt a post here would be a good starting point.

Henrik Böving (Sep 17 2024 at 21:08):

We would eventually want to get a nice file IO API at the level of rust's std::io but that's on the (rather long) TODO list.

Mac Malone (Oct 01 2024 at 13:36):

Cayden Codel said:

I don't know of an easy way to O(1) convert String to ByteArray (unsafeCast doesn't work). [...] Or is there a good way to treat Strings as ByteArrays?**

There is docs4#String.toUTF8 to covert a String to a ByteArray. It does perform a memory copy, though.


Last updated: May 02 2025 at 03:31 UTC