Zulip Chat Archive

Stream: lean4

Topic: scripted vs compiled code

Arthur Paulino (May 15 2022 at 14:48):

Running tests with compiled binary files like this allowed us to find a bug in the String.toUTF8 function.

def main : IO Unit :=
  IO.println $ "\x00yes mani !".toUTF8

The code above, if run with lean --run ⋯ outputs [0, 121, 101, 115, 32, 109, 97, 110, 105, 32, 33], which is expected. However, if you compile it and then run the binary, it outputs [].

I found it on a previous Lean 4 version but it still happens on leanprover/lean4:nightly-2022-05-14

Mauricio Collares (May 15 2022 at 15:55):

There might be extra costs to fixing this bug (by not mapping Lean strings to C-style null-terminated strings), especially if the compile code doesn't already store the string size separately

Arthur Paulino (May 16 2022 at 01:47):

More on this: the issue seems to be prior to running toUTF8. Somehow, "\x00yes mani !" gets parsed as an empty string in the compiled code

Sebastian Ullrich (May 16 2022 at 11:25):

https://github.com/leanprover/lean4/pull/1153

Arthur Paulino (May 16 2022 at 11:48):

@Sebastian Ullrich sorry to ask, but how confident can we be that if the code gets parsed properly then there should be no difference in semantics between scripted and compiled code? And why? Thanks in advance!

Sebastian Ullrich (May 16 2022 at 11:54):

I don't know how to answer that. Any divergence is a clear bug, of which we have found very few so far

Arthur Paulino (May 16 2022 at 12:13):

I'm asking this because I thought about it while implementing Fxy. In my current solution it's actually an interpreter, so Fxy code is parsed and then run as a Lean program. And when I considered compiling Fxy code, I thought about translating it into C and then building it. So it would open up many windows for bugs.

Is this different from Lean 4's take on compiled/scripted code? My guess is that it's different because, in my intuition, even scripted Lean 4 code runs code transpiled to C (same C code that would run if the Lean code were compiled). But I could be wrong so that's why I'm asking

Sebastian Ullrich (May 16 2022 at 13:26):

Uh, no, we did not implement a C interpreter :) . The interpreter works on the intermediate representation level, there's a bit of documentation at https://github.com/leanprover/lean4/blob/master/src/library/compiler/ir_interpreter.cpp.

James Gallicchio (May 17 2022 at 19:54):

Arthur Paulino said:

Sebastian Ullrich sorry to ask, but how confident can we be that if the code gets parsed properly then there should be no difference in semantics between scripted and compiled code? And why? Thanks in advance!

(I think "certifying the Lean compiler" basically involves showing the semantics are the same between the IR interpreter and the compiled C code, so in a few years maybe we'll have a proof that they're the same? :D)

Last updated: May 02 2025 at 03:31 UTC