Zulip Chat Archive

Stream: lean4

Topic: MLList crash


Jovan Gerbscheid (Nov 29 2023 at 01:41):

In my DiscrTree project, I was told to use MLList as a list monad transformer, but when I tried this, it turned out to be terribly slow. So I decided to do a little test of its speed. But the following simple program made my lean crash:

import Std

open Lean Meta

def g (n : Nat) : MLList MetaM Nat := do
  for i in [:n] do
    if true then
      continue
  return 4

#eval MLList.force $ (g 10000)

telling me that
Message: Server process for file:///(.......)/lake-packages/std/Std/Data/MLList/test.lean crashed, likely due to a stack overflow or a bug. Code: -32902
What's going on here?

Scott Morrison (Nov 29 2023 at 01:49):

Yeah, force is bad. asArray works fine.

Jovan Gerbscheid (Nov 29 2023 at 01:50):

I replaced force by asArray and get the same error.

Scott Morrison (Nov 29 2023 at 01:53):

Strange. I get #[4].

Scott Morrison (Nov 29 2023 at 01:53):

Have you restarted the server?

Jovan Gerbscheid (Nov 29 2023 at 01:53):

for me it works with 2000 and not with 3000

Jovan Gerbscheid (Nov 29 2023 at 01:55):

I just reloaded my VSCode, and get the same result

Scott Morrison (Nov 29 2023 at 01:56):

Yeah, okay, me too!

Scott Morrison (Nov 29 2023 at 02:01):

(As in, I was being confused by the error messages, and indeed mine also crashes at 3000, for either force or toArray.)

Mario Carneiro (Nov 29 2023 at 02:16):

Looks like removing @[specialize] from casesM makes the issue go away

Jovan Gerbscheid (Nov 29 2023 at 02:18):

I did this and ran the same test in the MLList/Basic.lean file, but I still get the error

Scott Morrison (Nov 29 2023 at 02:19):

Yeah, removing @[specialize] is not working for me.

Mario Carneiro (Nov 29 2023 at 02:27):

here's a MWE:

import Lean

unsafe inductive MLList (m : Type u  Type u) (α : Type u) : Type u
  | nil : MLList m α
  | cons : α  MLList m α  MLList m α
  | thunk : Thunk (MLList m α)  MLList m α
  | squash : (Unit  m (MLList m α))  MLList m α

namespace MLList

unsafe def uncons {m : Type u  Type u} [Monad m] :
    MLList m α  m (Option (α × MLList m α))
  | .nil => pure none
  | .thunk t => uncons t.get
  | .squash t => t () >>= uncons
  | .cons x xs => return (x, xs)

@[specialize]
unsafe def casesM [Monad m] (xs : MLList m α)
    (hnil : Unit  m (MLList m β)) (hcons : α  MLList m α  m (MLList m β)) : MLList m β :=
  squash fun _ => do
    match  xs.uncons with
    | none => hnil ()
    | some (x, xs) => hcons x xs

unsafe def cases [Monad m] (xs : MLList m α)
    (hnil : Unit  MLList m β) (hcons : α  MLList m α  MLList m β) : MLList m β :=
  xs.casesM (fun _ => return hnil ()) (fun x xs => return hcons x xs)

unsafe def append [Monad m] (xs : MLList m α) (ys : Unit  MLList m α) : MLList m α :=
  xs.cases ys fun x xs => cons x (append xs ys)

unsafe def bind [Monad m] (xs : MLList m α) (f : α  MLList m β) : MLList m β :=
  xs.cases (fun _ => nil) fun x xs => append (f x) (fun _ => bind xs f)

unsafe instance [Monad m] : Monad (MLList m) where
  pure a := cons a nil
  bind := bind

open Lean Meta

set_option trace.compiler.ir.result true
unsafe def g (n : Nat) : MLList IO Nat := do
  for i in [:n] do
    if true then
      continue
  return 4

open Lean Meta

#eval show IO _ from do
  let x : MLList IO Nat := g 10000
  if let .squash m := x then
    let x  m ()

Mario Carneiro (Nov 29 2023 at 02:33):

the stack overflow occurs in MLList.casesM._elambda_1._at.MLList.g._spec_6 which looks like this:

def MLList.casesM._elambda_1._at.MLList.g._spec_6 (x_1 : obj) (x_2 : obj) (x_3 : obj) (x_4 : obj) (x_5 : @& obj) (x_6 : obj) : obj :=
  let x_7 : obj := proj[1] x_3;
  inc x_7;
  let x_8 : obj := MLList.uncons._rarg x_3 x_4;
  let x_9 : obj := pap MLList.casesM._elambda_1._at.MLList.g._spec_6._lambda_1 x_1 x_2;
  let x_10 : obj := app x_7   x_8 x_9 x_6;
  ret x_10

This doesn't look recursive but I believe the app call at x_10 is loading a function which is itself set up to call MLList.casesM._elambda_1._at.MLList.g._spec_6

Mario Carneiro (Nov 29 2023 at 04:25):

minimized some more:

set_option trace.compiler.extract_closed false
set_option trace.compiler.ir true

set_option genSizeOfSpec false in
inductive MLList (α : Type u) : Type u
  | cons : α  MLList α
  | squash : (Unit  Option (MLList α))  MLList α
  deriving Inhabited

namespace MLList

partial def uncons : MLList α  Option α
  | .squash t => t () >>= uncons
  | .cons x => return x

def cases (xs : MLList α) (hcons : α  MLList β) : MLList β :=
  squash fun _ => do
    let x  xs.uncons
    return hcons x

def bind (xs : MLList α) (f : α  MLList β) : MLList β :=
  xs.cases fun x => (f x).cases cons

def loop : Nat  MLList Nat
  | 0 => cons 4
  | n+1 => bind (cons ()) fun _ => loop n

#eval show Option _ from do
  let x : MLList Nat := bind (loop 10000) cons
  if let .squash m := x then
    let x  m ()

Mario Carneiro (Nov 29 2023 at 04:50):

I wish I knew a better way to do this other than by-hand evaluation, but we can work out the evaluation sequence for this definition:

· bind (loop 10000) cons
· bind (bind (cons ()) fun _ => loop 9999) cons
· bind ((cons ()).cases fun _ => (loop 9999).cases cons) cons
· bind (squash fun _ => do
    let x  uncons (cons ())
    return (loop 9999).cases cons) cons
· cases
    (squash fun _ => do
      let x  uncons (cons ())
      return (loop 9999).cases cons)
    (fun x => (cons x).cases cons)
· squash fun _ => do
    let x  uncons (squash fun _ => do
      let x  uncons (cons ())
      return (loop 9999).cases cons)
    return (cons x).cases cons

It stops here, but after pattern matching on .squash m and evaluating m () it is run some more:

· uncons (squash fun _ =>
    uncons (cons ())
    >>= fun x => return (loop 9999).cases cons)
  >>= fun x => return (cons x).cases cons
· uncons (cons ())
  >>= fun x => return (loop 9999).cases cons
  >>= uncons
  >>= fun x => return (cons x).cases cons
· pure x
  >>= fun x => return (loop 9999).cases cons
  >>= uncons
  >>= fun x => return (cons x).cases cons
· pure ((squash fun _ =>
    uncons (cons ())
    >>= fun x => return (loop 9998).cases cons).cases cons)
  >>= uncons
  >>= fun x => return (cons x).cases cons
· pure (squash fun _ =>
    uncons (squash fun _ =>
      uncons (cons ())
      >>= fun x => return (loop 9998).cases cons)
    >>= fun x => return cons x)
  >>= uncons
  >>= fun x => return (cons x).cases cons
· uncons (squash fun _ =>
    uncons (squash fun _ =>
      uncons (cons ())
      >>= fun x => return (loop 9998).cases cons)
    >>= fun x => return cons x)
  >>= fun x => return (cons x).cases cons
· uncons (squash fun _ =>
    uncons (cons ())
    >>= fun x => return (loop 9998).cases cons)
  >>= fun x => return cons x
  >>= uncons
  >>= fun x => return (cons x).cases cons
· uncons (cons ())
  >>= fun x => return (loop 9998).cases cons
  >>= uncons
  >>= fun x => return cons x
  >>= uncons
  >>= fun x => return (cons x).cases cons
...

At this point the repeating pattern is clear, this will keep accumulating >>= uncons lines 10000 deep and run out of stack space

Mario Carneiro (Nov 29 2023 at 05:24):

which can be summarized as:

· uncons ((loop (n+1)).cases cons)
===>
· uncons (uncons ((loop n).cases cons) >>= fun x => return cons x)

In other words, the MLList monad does not preserve tail-recursiveness

Scott Morrison (Nov 29 2023 at 06:12):

Where does this leave us?

  • MLList is just intrinsically limited (and we should put a big warning in the module doc)?
  • Do you think we can add an extra operation inside the implementation of MLList that "supports tail recursion"...? Not really sure what that would mean.

Mario Carneiro (Nov 29 2023 at 06:13):

I have a fix, at least it fixes the OP example

Mario Carneiro (Nov 29 2023 at 06:14):

I added a MLList.uncons? operation which returns none in the thunk and squash cases, and use it in cases and bind to avoid building up stacks of thunks if it's just bind (pure a) ...

Mario Carneiro (Nov 29 2023 at 06:17):

std#407

Jovan Gerbscheid (Nov 29 2023 at 14:17):

I played around a bit more and found that even on a normal list monad, this makes lean crash. I really don't understand why:

instance : Monad List where
  pure := List.pure
  bind := List.bind

def g (n : Nat) : List Nat := do
  for _ in [:n] do
    if true then
      continue
  return 4

#eval ((g 10000))

Jovan Gerbscheid (Nov 30 2023 at 01:29):

Here's a more minimal example:

def myBind : Option α  (α  Option β)  Option β := Option.bind

instance : Monad Option where
  pure := some
  bind := myBind

#eval Nat.foldM (fun _ _ => some ()) () 10000

The only difference between myBind and the default Option.bind is that I have removed the @[inline] tag. If the @[inline] is present, it runs fine, but if it's not, then Lean crashes.

My question is: is this stack overflow expected?

Mario Carneiro (Nov 30 2023 at 03:46):

yes, this is basically the same issue as for MLList

Mario Carneiro (Nov 30 2023 at 03:46):

The evaluation sequence I showed above also applies in the list case, it stacks up a bunch of .. |>.append [] |>.append [] operations and so the evaluation uses up all the stack

Jovan Gerbscheid (Dec 19 2023 at 10:59):

I found a way to solve this problem for List, by handling lists of length 1 separately:

@[inline]
def myListBind (a : List α) (b : α  List β) : List β :=
  match a with
  | [a] => b a
  | _ => List.bind a b

instance : Monad List where
  pure := List.pure
  bind := myListBind

#eval Nat.foldM (fun _ _ => List.pure ()) () 1000000

Is such a fix something you would want to have in Lean?

Mario Carneiro (Dec 19 2023 at 11:30):

well, core lean doesn't even have a Monad List instance, so there is nothing to fix


Last updated: Dec 20 2023 at 11:08 UTC