Zulip Chat Archive

Stream: general

Topic: Lean copy file IO


Ioannis Konstantoulas (Oct 05 2023 at 14:34):

I see Lean has an IO.FS.rename function for moving files; does it have a similar function for copying files to other directories?

Adam Topaz (Oct 05 2023 at 14:37):

you can always write the contents of one file to a new file with docs#IO.FS.writeBinFile or something like that.

David Renshaw (Oct 05 2023 at 14:54):

IO.FS.writeBinFile outPath (IO.FS.readBinFile inPath)

Adam Topaz (Oct 05 2023 at 15:00):

You could also stream the file(s) if you want to handle large files.

David Renshaw (Oct 05 2023 at 15:04):

Perhaps we should add a streaming copyFile function to Std.

Schrodinger ZHU Yifan (Oct 05 2023 at 15:12):

https://man7.org/linux/man-pages/man2/splice.2.html

Schrodinger ZHU Yifan (Oct 05 2023 at 15:13):

EIO from ocaml actually dispatch syscalls depending on stream types

Schrodinger ZHU Yifan (Oct 05 2023 at 15:13):

For example, splice is used to copy between fds

Ioannis Konstantoulas (Oct 05 2023 at 16:58):

Is Adam's code a reasonably fast/reliable way to copy files? I am unfamiliar with the implementation of these operations, and would think they would be delegated to the OS via some system calls.

Schrodinger ZHU Yifan (Oct 05 2023 at 17:10):

I don't think there is any specialization going on. Streams in general are wrappers around lean_external_object that set a FILE* as its field.

That is being said, I think a benchmark is needed to decide if higher performance can be achieved.

Adam Topaz (Oct 06 2023 at 16:33):

FWIW, here's a short function with a steaming copyFile:

def copyFile (inFile outFile : System.FilePath) (bufSize : USize := 1024): IO Unit :=
  IO.FS.withFile outFile .write fun outHandle =>
  IO.FS.withFile inFile .read fun inHandle => do
    let mut buf  inHandle.read bufSize
    while !buf.isEmpty do
      outHandle.write buf
      buf  inHandle.read bufSize

Utensil Song (Oct 14 2023 at 05:49):

I do hope for API consistency in this case. Copy and move should be implemented in the neighbor in the same way (external or lean) so the users can make reasonable guess to find it, and there won't be subtle implementation discrepancies.

Utensil Song (Oct 14 2023 at 05:57):

Furthermore, when it comes to something this basic, it's better to choose an API design from another language/library instead of going over redesign, it improves API predictability and saves design bike shedding to focus on what really matters. An API design taken from Rust or Haskell or Python could do.

Obviously people expect a native implementation in this case, making the decision doesn't require evaluating the performance, it's just not exactly a productive to reinventing the wheel here. Rewriting a lot of basic functionality to pure Lean is another more time-consuming project, and the priority should be making APIs available in the standard libraries of other languages available to Lean as soon as possible, as predictable as possible.

Utensil Song (Oct 14 2023 at 06:03):

Many similar APIs are scattered, per my observation, it seems to be just where it's handy. Now it could be in Lean, Lake, Std, Mathlib, personal project, where ever it's needed, which is a serious issue for future. I can see some of the technical reasons like platform-dependent stuff should not be in Lean core etc. but these reasons can not beat API usability, and there are solutions to handle situations like this, for example, Lean core maintains the minimal API it needs and accepts, but Std will provide a complete set, and internally it calls or at least match Lean core to ensure API and behavior consistency, and Std is where people to expect APIs similar to other languages.


Last updated: Dec 20 2023 at 11:08 UTC