Zulip Chat Archive
Stream: general
Topic: rand_bool
Yaël Dillies (Jan 12 2022 at 12:38):
docs#rand_bool is broken
import system.random
open list
def sample (N : ℕ) : list bool := (range N).map $ λ k, (rand_bool (mk_std_gen k)).1
#eval sample 3737
-- [tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, tt, ..., tt]
It always return true! :rofl:
Kevin Buzzard (Jan 12 2022 at 12:40):
It might just be chance! https://dilbert.com/strip/2001-10-25
Yaël Dillies (Jan 12 2022 at 12:41):
More generally, I noticed that docs#rand_nat really wasn't random when lo
and hi
are too close together. rand_bool
being defined from rand_nat 0 1
, the above isn't surprising.
Yaël Dillies (Jan 12 2022 at 12:51):
Leonardo de Moura was the one writing this. Should I bother him?
Gabriel Ebner (Jan 12 2022 at 12:53):
If the bug is still there in Lean 4, then I'd say yes.
Gabriel Ebner (Jan 12 2022 at 12:53):
But the code is also pretty outdated style-wise, so maybe removing/rewriting it would also be an option.
Yaël Dillies (Jan 12 2022 at 12:54):
I'm writing my computational project using Lean and Lean 3 isn't fast enough to keep up with the level of precision required, so I'll have to port my 20 lines or so to Lean 4 and then I'll see whether the bug still exists.
Gabriel Ebner (Jan 12 2022 at 12:55):
To be fair, you're also using it in a slightly unexpected way. I believe this is how it is intended to be used:
import system.random
def sample (N : ℕ) : state std_gen (list bool) :=
(list.range N).mmap $ λ k, ⟨rand_bool⟩
#eval ((sample 3737).run mk_std_gen).1 -- ``random''
Yaël Dillies (Jan 12 2022 at 12:56):
Oh, this sounds like my problem is coming from the / 2147483562
in
def mk_std_gen (s : nat := 0) : std_gen :=
let q := s / 2147483562,
s1 := s % 2147483562,
s2 := q % 2147483398 in
⟨s1 + 1, s2 + 1⟩
Gabriel Ebner (Jan 12 2022 at 12:57):
(deleted)
Yaël Dillies (Jan 12 2022 at 12:58):
I see. I was using a new seed every time rather than passing it on using std_next
where arguably the randomness happens.
Gabriel Ebner (Jan 12 2022 at 12:59):
I just checked and if you run my sample 10
function on various seeds then it also returns different results. It's apparently just the first boolean that is not so random.
Gabriel Ebner (Jan 12 2022 at 13:00):
just the first boolean that is not so random.
Which is still worth fixing.
Yaël Dillies (Jan 12 2022 at 13:03):
In my real code, I'm generating a bunch of experiments (and sample
is meant to collect them) but each experiment needs several coin tosses. To me it seems that the mmap
will cause a problem because the first coin toss of the first experiment will have the same std_gen
as the second experiment.
Yaël Dillies (Jan 12 2022 at 13:06):
Schematically,
1st exp → 2nd exp → 3rd exp
1st gen 2nd gen 3rd gen
↓ ↓ ↓
1st toss 1st toss 1st toss
2nd gen 3rd gen 4th gen
↓ ↓ ↓
2st toss 2st toss 2st toss
3nd gen 4rd gen 5th gen
Am I right?
Yaël Dillies (Jan 12 2022 at 13:08):
My idea was then to create a new seed every time
1st exp → 2nd exp → 3rd exp
1st seed 2nd seed 3rd seed
↓ ↓ ↓
1st toss 1st toss 1st toss
1nd gen 1st unrelated gen 1st yet unrelated gen
↓ ↓ ↓
2st toss 2st toss 2st toss
2nd gen 2nd unrelated gen 2nd yet unrelated gen
Reid Barton (Jan 12 2022 at 13:09):
If you write your program as a single computation in the state std_gen
monad, then you will automatically use the RNG in the correct way. You should only ever generate one seed.
Yaël Dillies (Jan 12 2022 at 13:11):
Oh okay. I really wish this was better documented...
Reid Barton (Jan 12 2022 at 13:11):
You can also use std_split
to turn a seed into two "independent" seeds, if writing the program as a single computation is too onerous for some reason (e.g. if you needed parallelism). Though I wouldn't necessarily trust the statistical properties of std_split
(the one in GHC had some fun bugs for a while).
Yaël Dillies (Jan 12 2022 at 13:13):
So sample
uses state
. But how should I write the auxiliary functions?
Johan Commelin (Jan 12 2022 at 13:38):
Wouldn't the experiment be a pure function that you feed a random input?
Johan Commelin (Jan 12 2022 at 13:39):
Typically, I would guess that you can do almost everything outside the random monad, and run the actual experiment in ~10 lines in the random monad.
Gabriel Ebner (Jan 12 2022 at 13:47):
Maybe slightly counterintuitively, but you should run all experiments in the "same" monad.
Gabriel Ebner (Jan 12 2022 at 13:47):
That is, something like experiment : state std_gen experiment_result
and then:
(list.range 10).mmap $ λ _, experiment
if you want to run 10 experiments.
Yaël Dillies (Jan 12 2022 at 14:00):
Johan Commelin said:
Wouldn't the experiment be a pure function that you feed a random input?
But how does the experiment itself manage randomness? It has to call several times the coin toss function
Bhavik Mehta (Jan 12 2022 at 14:08):
I'm pretty sure this file is a port of Haskell's version, so you might find that documentation helpful https://hackage.haskell.org/package/random-1.2.1/docs/System-Random.html
Yaël Dillies (Jan 12 2022 at 14:10):
Btw, why is docs#state_t not fully universe polymorphic?
structure state_t' (σ : Type u) (m : Type (max u w) → Type v) (α : Type w) : Type (max u v) :=
(run : σ → m (α × σ))
Yaël Dillies (Jan 12 2022 at 14:28):
Gabriel Ebner said:
Maybe slightly counterintuitively, but you should run all experiments in the "same" monad.
Thanks a lot! It works great!
Mario Carneiro (Jan 13 2022 at 04:33):
@Yaël Dillies Probably that (m : Type (max u w) → Type v)
constraint will lead to unification issues
Last updated: Dec 20 2023 at 11:08 UTC