Zulip Chat Archive

Stream: lean4

Topic: Ensuring naive purity


Ioannis Konstantoulas (Oct 11 2023 at 11:36):

This question is out of pure curiosity.

One misunderstanding about "pure" functions is that if I have f : Nat -> Nat and at a point evaluate f 1 to 1, I guarantee f 1 will always evaluate to 1, because "functions only depend on their arguments". Of course I can write a function of signature Nat -> Nat with completely non-deterministic behavior by running IO actions inside it.

Is there a way to ensure in Lean that I really have a deterministic function only through its signature or some attribute? E.g. can I forbid running IO inside the function body somehow?

Utensil Song (Oct 11 2023 at 11:44):

Of course I can write a function of signature Nat -> Nat with completely non-deterministic behavior by running IO actions inside it.

This is not true in a functional programming language like Lean 4. You'll need a monad to run IO actions, if you try, you'll have errors indicating type mismatch, that's how it's ensured to be forbidden. Chapter Monads in #fpil may have answers you need.

Ioannis Konstantoulas (Oct 11 2023 at 11:47):

Utensil Song said:

Of course I can write a function of signature Nat -> Nat with completely non-deterministic behavior by running IO actions inside it.

This is not true in a functional programming language like Lean 4. You'll need a monad to run IO actions, if you try, you'll have errors indicating type mismatch, that's how it's ensured to be forbidden. Chapter Monads in #fpil may have answers you need.

I am confused; what about

def natFun : Nat  Nat
  | k => match (IO.getRandomBytes 8 |>.run' ()) with
    | some bytes => bytes.toUInt64BE!.toNat % 256
    | none => k

#eval natFun 4
#eval natFun 4
#eval natFun 4

Tomas Skrivan (Oct 11 2023 at 12:01):

This code is abusing that IO monad is effectively StateM IO.RealWorld and IO.RealWorld is just Unit. In the most recent version of lean, IO.RealWorld is turned into opaque and you will no longer be able to write () to create an element of IO.RealWorld. Furthermore, soon IO will be opaque too so you won't be able to write the |>.run'

Tomas Skrivan (Oct 11 2023 at 12:03):

You are effectively exploiting a consistency bug.

Ioannis Konstantoulas (Oct 11 2023 at 12:05):

Nice; that takes care of IO; are there any other similar traps that we know of? For the record, I had no idea this was considered a bug; I just freely used it in real code that I am writing.

Eric Wieser (Oct 11 2023 at 12:06):

These all fall into the category of "inconsistent implemented_by / extern"s, which are only relevant for execution and not proofs

Ioannis Konstantoulas (Oct 11 2023 at 12:07):

Eric Wieser said:

These all fall into the category of "inconsistent implemented_by / extern"s, which are only relevant for execution and not proofs

Yes, I am not talking about proofs at all; I am porting some CLI utilities I wrote in sh to Lean as programming practice, and have been writing this kind of code a lot.

Utensil Song (Oct 11 2023 at 12:08):

Do we have

main = do
    -- Save the current state of the universe
    world_as_we_know_it <- getWorld

    -- Cause serious international side effects
    launchMissiles

    -- After realizing that was a terrible, terrible mistake, restore the
    -- pre-war state of the universe.
    putWorld world_as_we_know_it

in Lean 4? (ref)

Eric Wieser (Oct 11 2023 at 12:09):

I think you can write that but it doesn't work

Tomas Skrivan (Oct 11 2023 at 12:10):

I think you can prove that you didn't delete your hard drive but in fact you actually did.

Eric Wieser (Oct 11 2023 at 12:11):

def ohNo : IO Unit := do
  let s  EStateM.get
  IO.println "Oh no"
  EStateM.set s
#eval ohNo

Ioannis Konstantoulas (Oct 11 2023 at 12:11):

Tomas Skrivan said:

I think you can prove that you didn't delete your hard drive but in fact you actually did.

"Beware of bugs in the above code; I have only proved it correct, not tried it." -- Donald Knuth

Eric Wieser (Oct 11 2023 at 12:14):

EStateM.set does work for the random number generator though:

def anyway : IO Unit := do
  let s  EStateM.get
  let x :=  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8
  EStateM.set s
  let y :=  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8
  IO.println f!"{x} {y}"
#eval anyway

Utensil Song (Oct 11 2023 at 12:15):

Actually, I mean is it possible to write it as a pure function like @Ioannis Konstantoulas tried:

image.png

Eric Wieser (Oct 11 2023 at 12:17):

You can't run IO from the Id monad without cheating

Utensil Song (Oct 11 2023 at 12:18):

the question is asked after witnessing cheating :joy:

Eric Wieser (Oct 11 2023 at 12:18):

ohNo.run' () lets you cheat

Utensil Song (Oct 11 2023 at 12:19):

Eric Wieser said:

EStateM.set does work for the random number generator though:

def anyway : IO Unit := do
  let s  EStateM.get
  let x :=  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8
  EStateM.set s
  let y :=  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8
  IO.println f!"{x} {y}"
#eval anyway

Nice, it now consistently generates the same "random" number.

Chris Wong (Oct 11 2023 at 12:19):

Does the Lean interpreter do CSE or something?

Ioannis Konstantoulas (Oct 11 2023 at 12:20):

The fact that two evals return different results is not a problem, right?

def anyway : IO Unit := do
  let s  EStateM.get
  let x :=  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8
  EStateM.set s
  let y :=  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8
  IO.println f!"{x} {y}"
#eval anyway
#eval anyway

Eric Wieser (Oct 11 2023 at 12:21):

Correct, because the real world changed between the two evals

Ioannis Konstantoulas (Oct 11 2023 at 12:22):

Chris Wong said:

Does the Lean interpreter do CSE or something?

What is CSE?

Ruben Van de Velde (Oct 11 2023 at 12:23):

Common subexpression elimination, maybe

Chris Wong (Oct 11 2023 at 12:23):

Ioannis Konstantoulas said:

Chris Wong said:

Does the Lean interpreter do CSE or something?

What is CSE?

Common subexpression elimination – after inlining, both getRandomBytes calls refer to the same RealWorld, so it would be a "valid" optimization to unify them

Chris Wong (Oct 11 2023 at 12:25):

Though I wonder why Lean is happy to turn two calls into one, but not one into zero :thinking:

Tomas Skrivan (Oct 11 2023 at 12:30):

Good catch, it is exactly what is happening:

def randNat' : IO Nat :=
  dbg_trace "calling randNat"
  (·.toUInt64BE!.toNat % 256) <$> IO.getRandomBytes 8

def anyway : IO Unit := do
  let s  EStateM.get
  let x :=  randNat'
  EStateM.set s
  let y :=  randNat'
  IO.println f!"{x} {y}"

#eval anyway

prints calling randNat only once

James Gallicchio (Oct 11 2023 at 16:31):

once the linearity checker is implemented, the realworld parameter is gonna be marked linear, you won't be able to get the state, and none of these (wrong) optimizations will occur :p

Sebastian Ullrich (Oct 11 2023 at 16:40):

There are some stark misconceptions in this thread. Deduplicating an IO value does not affect the number of executions. dbg_trace breaks precisely because it does not live in IO, thus the dbg_ prefix. There are no issues with compiler optimizations and IO.

Tomas Skrivan (Oct 11 2023 at 19:09):

Wait so why is it producing two same "random" numbers?

Mario Carneiro (Oct 11 2023 at 19:10):

because the random state was reset?

Sebastian Ullrich (Oct 11 2023 at 19:16):

Ah, by breaking open the IO abstraction, you allow for CSE, yes. But the compiler would be free to deduplicate the dbg_trace even without that.

Mario Carneiro (Oct 11 2023 at 19:16):

I tend to think that even exposing that IO A is a function type is a bad idea. Because if it is, then it is equal to its eta expansion, and you can do funny things exploiting that to duplicate effects

Mario Carneiro (Oct 11 2023 at 19:22):

def hello : IO Unit := IO.println "hi"

@[noinline] def blackBox {α} (x : α) : α := x
def foo {α β γ} (x : α) (f : β  γ) (a : β) : γ :=
  (blackBox (x, f a)).2

def dup (x : IO Unit) : IO Unit :=
  fun w => foo (x w) x w

example (x : IO Unit) : dup x = x := rfl

#eval hello
-- hi

#eval dup hello
-- hi
-- hi

Mario Carneiro (Oct 11 2023 at 19:24):

I'm pretty sure I can cause another native_decide contradiction using this and ST

Mario Carneiro (Oct 11 2023 at 19:29):

@[noinline] def blackBox {α} (x : α) : α := x
def foo {α β γ} (x : α) (f : β  γ) (a : β) : γ :=
  (blackBox (x, f a)).2
def dup {ω} (x : ST ω Unit) : ST ω Unit :=
  fun w => foo (x w) x w

def flipBool {ω} (r : ST.Ref ω Bool) : ST ω Unit := do r.set !( r.get)

def ex1 : Bool := runST fun _ => do
  let r  ST.mkRef false
  flipBool r
  r.get

def ex2 : Bool := runST fun _ => do
  let r  ST.mkRef false
  dup (flipBool r)
  r.get

example : False :=
  have : ex1  ex2 := by native_decide
  this rfl

Jireh Loreaux (Oct 11 2023 at 19:31):

I'm confused. I thought we weren't supposed to trust native_decide anyway?

Mario Carneiro (Oct 11 2023 at 19:32):

it's not at the same level of correctness, but we still want the compiler not to be able to prove false things, because if it can then safe lean code can cause UB

Joachim Breitner (Oct 11 2023 at 19:33):

Trust is always a choice :-)
If you trust native_decide, you trust much more code, and we know that this allows you to prove false. But that doesn’t mean we want that. I assume we don’t, and will fix these bugs as we find them.

Mario Carneiro (Oct 11 2023 at 19:33):

We do not want native_decide to be inconsistent

Mario Carneiro (Oct 11 2023 at 19:34):

or at least, it should be the fault of the user (e.g. some implemented_by) if something goes wrong

Mario Carneiro (Oct 11 2023 at 19:37):

def bla : Bool :=
  have : ex1  ex2 := by native_decide
  if h : ex1 = ex2 then blackBox (this h).elim else (h rfl).elim

#eval bla -- boom

Tomas Skrivan (Oct 12 2023 at 00:51):

Mario Carneiro said:

because the random state was reset?

Yes, ideally that would happen but the function lean_io_get_random_bytes is not using the IO.RealWorld at all. IO.RealWorld is not holding any information, so you can't reset the state of the random generator.

Tomas Skrivan (Oct 12 2023 at 00:54):

(deleted)

Scott Morrison (Oct 12 2023 at 03:50):

I did make a start on making EIO an opaque definition (i.e. so you can't see at all that it is really an EStateM). However there is a fair bit of API to collect, and some design decisions about how to expose "enough" of the fact that it is a function type, so I didn't get there. I may try again later, but if someone would like to make an RFC or PR for this I would happily review it!


Last updated: Dec 20 2023 at 11:08 UTC