Zulip Chat Archive

Stream: general

Topic: rand_bool


Yaël Dillies (Jan 12 2022 at 12:38):

docs#rand_bool is broken

import system.random

open list

def sample (N : ) : list bool := (range N).map $ λ k, (rand_bool (mk_std_gen k)).1

#eval sample 3737
-- [tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, ..., tt]

It always return true! :rofl:

Kevin Buzzard (Jan 12 2022 at 12:40):

It might just be chance! https://dilbert.com/strip/2001-10-25

Yaël Dillies (Jan 12 2022 at 12:41):

More generally, I noticed that docs#rand_nat really wasn't random when lo and hi are too close together. rand_bool being defined from rand_nat 0 1, the above isn't surprising.

Yaël Dillies (Jan 12 2022 at 12:51):

Leonardo de Moura was the one writing this. Should I bother him?

Gabriel Ebner (Jan 12 2022 at 12:53):

If the bug is still there in Lean 4, then I'd say yes.

Gabriel Ebner (Jan 12 2022 at 12:53):

But the code is also pretty outdated style-wise, so maybe removing/rewriting it would also be an option.

Yaël Dillies (Jan 12 2022 at 12:54):

I'm writing my computational project using Lean and Lean 3 isn't fast enough to keep up with the level of precision required, so I'll have to port my 20 lines or so to Lean 4 and then I'll see whether the bug still exists.

Gabriel Ebner (Jan 12 2022 at 12:55):

To be fair, you're also using it in a slightly unexpected way. I believe this is how it is intended to be used:

import system.random

def sample (N : ) : state std_gen (list bool) :=
  (list.range N).mmap $ λ k, rand_bool

#eval ((sample 3737).run mk_std_gen).1  -- ``random''

Yaël Dillies (Jan 12 2022 at 12:56):

Oh, this sounds like my problem is coming from the / 2147483562 in

def mk_std_gen (s : nat := 0) : std_gen :=
let q  := s / 2147483562,
    s1 := s % 2147483562,
    s2 := q % 2147483398 in
s1 + 1, s2 + 1

Gabriel Ebner (Jan 12 2022 at 12:57):

(deleted)

Yaël Dillies (Jan 12 2022 at 12:58):

I see. I was using a new seed every time rather than passing it on using std_next where arguably the randomness happens.

Gabriel Ebner (Jan 12 2022 at 12:59):

I just checked and if you run my sample 10 function on various seeds then it also returns different results. It's apparently just the first boolean that is not so random.

Gabriel Ebner (Jan 12 2022 at 13:00):

just the first boolean that is not so random.

Which is still worth fixing.

Yaël Dillies (Jan 12 2022 at 13:03):

In my real code, I'm generating a bunch of experiments (and sample is meant to collect them) but each experiment needs several coin tosses. To me it seems that the mmap will cause a problem because the first coin toss of the first experiment will have the same std_gen as the second experiment.

Yaël Dillies (Jan 12 2022 at 13:06):

Schematically,

1st exp  2nd exp  3rd exp
1st gen   2nd gen   3rd gen
                      
1st toss  1st toss  1st toss
2nd gen   3rd gen   4th gen
                      
2st toss  2st toss  2st toss
3nd gen   4rd gen   5th gen

Am I right?

Yaël Dillies (Jan 12 2022 at 13:08):

My idea was then to create a new seed every time

1st exp  2nd exp            3rd exp
1st seed  2nd seed            3rd seed
                                
1st toss  1st toss            1st toss
1nd gen   1st unrelated gen   1st yet unrelated gen
                                
2st toss  2st toss            2st toss
2nd gen   2nd unrelated gen   2nd yet unrelated gen

Reid Barton (Jan 12 2022 at 13:09):

If you write your program as a single computation in the state std_gen monad, then you will automatically use the RNG in the correct way. You should only ever generate one seed.

Yaël Dillies (Jan 12 2022 at 13:11):

Oh okay. I really wish this was better documented...

Reid Barton (Jan 12 2022 at 13:11):

You can also use std_split to turn a seed into two "independent" seeds, if writing the program as a single computation is too onerous for some reason (e.g. if you needed parallelism). Though I wouldn't necessarily trust the statistical properties of std_split (the one in GHC had some fun bugs for a while).

Yaël Dillies (Jan 12 2022 at 13:13):

So sample uses state. But how should I write the auxiliary functions?

Johan Commelin (Jan 12 2022 at 13:38):

Wouldn't the experiment be a pure function that you feed a random input?

Johan Commelin (Jan 12 2022 at 13:39):

Typically, I would guess that you can do almost everything outside the random monad, and run the actual experiment in ~10 lines in the random monad.

Gabriel Ebner (Jan 12 2022 at 13:47):

Maybe slightly counterintuitively, but you should run all experiments in the "same" monad.

Gabriel Ebner (Jan 12 2022 at 13:47):

That is, something like experiment : state std_gen experiment_result and then:
(list.range 10).mmap $ λ _, experiment if you want to run 10 experiments.

Yaël Dillies (Jan 12 2022 at 14:00):

Johan Commelin said:

Wouldn't the experiment be a pure function that you feed a random input?

But how does the experiment itself manage randomness? It has to call several times the coin toss function

Bhavik Mehta (Jan 12 2022 at 14:08):

I'm pretty sure this file is a port of Haskell's version, so you might find that documentation helpful https://hackage.haskell.org/package/random-1.2.1/docs/System-Random.html

Yaël Dillies (Jan 12 2022 at 14:10):

Btw, why is docs#state_t not fully universe polymorphic?

structure state_t' (σ : Type u) (m : Type (max u w)  Type v) (α : Type w) : Type (max u v) :=
(run : σ  m (α × σ))

Yaël Dillies (Jan 12 2022 at 14:28):

Gabriel Ebner said:

Maybe slightly counterintuitively, but you should run all experiments in the "same" monad.

Thanks a lot! It works great!

Mario Carneiro (Jan 13 2022 at 04:33):

@Yaël Dillies Probably that (m : Type (max u w) → Type v) constraint will lead to unification issues


Last updated: Dec 20 2023 at 11:08 UTC