Zulip Chat Archive

Stream: lean4

Topic: Option do notation regression?


Zygimantas Straznickas (Mar 22 2021 at 23:40):

-- works
def test1 : IO Nat := do
  let z  IO.getNumHeartbeats
  return z + 1

-- no longer works
def test2 : Option Nat := do
  let x  (some 5)
  return x + 1 -- failed to synthesize instance Add (Option Nat)

Marc Huisinga (Mar 22 2021 at 23:44):

Option is no longer a monad since https://github.com/leanprover/lean4/commit/9a5f2395134a2668ae778f32262b3105c277483a

Marc Huisinga (Mar 22 2021 at 23:45):

Code that used Option as a monad before now seems to use OptionM.run: https://github.com/leanprover/lean4/commit/04e3f21783e343a58eb245bcb9488dab59cb01a4

Mario Carneiro (Mar 23 2021 at 00:19):

Whaat? Why isn't Option a monad? That's the classic example

Yakov Pechersky (Mar 23 2021 at 00:22):

Motivation: Option is data, OptionM is control.
https://github.com/leanprover/lean4/commit/04e3f21783e343a58eb245bcb9488dab59cb01a4

Yakov Pechersky (Mar 23 2021 at 00:24):

I guess similar to how Haskell prefers to supply a StateT or state over State?

Mario Carneiro (Mar 23 2021 at 00:25):

okay, but there are still uses of do notation for constructing Option data

Mario Carneiro (Mar 23 2021 at 00:26):

I mean, not all uses of do notation are for control

Yakov Pechersky (Mar 23 2021 at 00:26):

:shrug:

Mario Carneiro (Mar 23 2021 at 00:27):

mathlib has plenty of other monads like the giry monad or filters or pfun and they can only very vaguely be interpreted as control structures

Yakov Pechersky (Mar 23 2021 at 00:30):

Perhaps this will make things like with_bot or with_top cleaner?

Mario Carneiro (Mar 23 2021 at 00:33):

I'm not sure what you mean. Those can be monads too

Joe Hendrix (Mar 23 2021 at 00:39):

Coming from Haskell, I also think it's reasonable for Option to be a monad, and am not sure about the data Option vs control OptionM distinction. List is perfectly reasonable a s a monad in Haskell.

I'd understand though if there is a compilation or performance reason for distinguishing the two, but don't see that referenced in the commit.

Yakov Pechersky (Mar 23 2021 at 00:43):

I mean that it might make with_bot T = OptionM ??? T with something that isn't just Id for ??? ? Not sure.

Sebastian Ullrich (Mar 23 2021 at 07:58):

Joe Hendrix said:

List is perfectly reasonable a s a monad in Haskell.

This is exactly the problem: Option and List are reasonable monads in Haskell, but in Lean are performance traps because of strictness. People automatically think of monads as computation and don't think twice about chaining them with *> or <|>, which is why it's a bad idea to make a strict data type a monad.

Gabriel Ebner (Mar 23 2021 at 09:58):

It's a surprising design choice. All strict languages that I can think of and which can express a monad type have a monad instance for the list type: e.g. Lean 3, scala (cats), or purescript off the top of my head.

Gabriel Ebner (Mar 23 2021 at 10:10):

While I can somewhat follow the argumentation of removing Monad List (Lean does not implement stream fusion, so let's remove nice syntax instead..), I am completely baffled as to why Monad Option is supposed to be problematic. All operations on Option are constant-time, and you don't need stream fusion to simplify f <$> g <$> x to (f ∘ g) <$> x (I would expect both expressions to be optimized to the same result after inlining <$>).

Sebastian Ullrich (Mar 23 2021 at 10:33):

It's not about the overhead of the operations themselves, but about that of eagerly evaluating an argument that shouldn't have been. If you do bigSearch1 <|> bigSearch2 on Option and the optimizer decides it's over the inlining budget, you're screwed.

Sebastian Ullrich (Mar 23 2021 at 10:35):

A different, Lean 4-specific issue is

def f (xs : List Nat) : Option Nat := do
  for x in xs do
    if x > 10 then
      return some x
  return none
3:2:
type mismatch, 'for' has type
  Option Nat : Type
but is expected to have type
  Nat : Type
6:2:
application type mismatch
  pure none
argument
  none
has type
  Option ?m.551 : Type ?u.550
but is expected to have type
  Nat : Type

The user thought of Option as data, but do assumed it was computation

Gabriel Ebner (Mar 23 2021 at 10:38):

Sebastian Ullrich said:

It's not about the overhead of the operations themselves, but about that of eagerly evaluating an argument that shouldn't have been. If you do bigSearch1 <|> bigSearch2 on Option and the optimizer decides it's over the inlining budget, you're screwed.

This would be a reasonable argument, but the Alternative (Option α) instance is still there. And then there's also fast && slow, etc.

Gabriel Ebner (Mar 23 2021 at 10:39):

A different, Lean 4-specific issue is [...]

Oh, this is disappointing but also a good explanation. Thanks!

Sebastian Ullrich (Mar 23 2021 at 10:47):

Alternative Option is gone, but the OrElseinstance is still there. I'm not completely sure what @Leonardo de Moura's reasoning there was, but at least it makes it harder to unknowingly use the operator on Option in a combinator, since those usually use Alternative I'd guess. If we wanted to be strict (heh) about it, we could rename the function Option.or and add a separate Option.orElse using a thunk, both without special syntax.

Leonardo de Moura (Mar 23 2021 at 14:42):

Sebastian Ullrich said:

Alternative Option is gone, but the OrElseinstance is still there. I'm not completely sure what Leonardo de Moura's reasoning there was, but at least it makes it harder to unknowingly use the operator on Option in a combinator, since those usually use Alternative I'd guess. If we wanted to be strict (heh) about it, we could rename the function Option.or and add a separate Option.orElse using a thunk, both without special syntax.

There is the effort of fixing stdlib, and finding a nice alternative for the fix.

Leonardo de Moura (Mar 23 2021 at 14:53):

Regarding the change, the goal is to improve the user experience. See do example that Sebastian posted above.
We are exploring the design space and trying to figure out what works and what doesn't.
We didn't wake up one day and decide to remove the instance. @Sebastian Ullrich and I discussed the Monad Option issue a few times.
There are many other changes we want to make. For example, we want to address x *> let a := <expensive>; b
Finally, if mathtlib users think Monad Option and Monad List are useful, they can define them in the Mathlib "core" files.

Gabriel Ebner (Mar 23 2021 at 15:13):

Yes, the do example was convincing. Although it makes me slightly uncomfortable that adding a type class instance can break do-notation that used to work before. That is, if we add instance : Monad FreeGroup, it will break a function def foo (xs : List α) : FreeGroup α := do ....
There is often no way around Monad. For example if you want to use lemmas that require LawfulMonad, you need to provide a Monad instance.

Gabriel Ebner (Mar 23 2021 at 15:13):

Leonardo de Moura said:

Finally, if mathtlib users think Monad Option and Monad List are useful, they can define them in the Mathlib "core" files.

I don't think anybody will add these instances if they break do-notation.

Sebastian Ullrich (Mar 23 2021 at 15:20):

I'm sure there will be users that would want do to use Option as the monad, so it might not be broken in their opinion. But prefixing the block with OptionM.run seems like the best compromise to please both camps.

Gabriel Ebner (Mar 23 2021 at 15:23):

If you add the Monad Option instance, is there a way to get back and tell Lean that def f (xs : List Nat) : Option Nat := do should not use Option as the functor for the do-notation?

Sebastian Ullrich (Mar 23 2021 at 15:26):

Yes, you have to wrap the type with Id

Andrew Kent (Mar 23 2021 at 15:42):

Sebastian Ullrich said:

It's not about the overhead of the operations themselves, but about that of eagerly evaluating an argument that shouldn't have been. If you do bigSearch1 <|> bigSearch2 on Option and the optimizer decides it's over the inlining budget, you're screwed.

There may be some obvious reasons I haven't thought of yet why this suggestion is a silly one... but this has me curious if <|> wouldn't be better off as a macro that expands into control-flow primitives that only evaluate bigSearch2 if bigSearch1 "fails" (a la or being a macro and not a function/primitive in Scheme et al).

Sebastian Ullrich (Mar 23 2021 at 15:44):

<|> is from a typeclass, it cannot be a macro

Andrew Kent (Mar 23 2021 at 15:45):

Yes, I am aware it is currently a function from a typeclass, but that just reflects the current design, no? It's not a fundamental reason why <|> as infix syntax must be a function and not a macro.

Andrew Kent (Mar 23 2021 at 15:47):

I'm not necessarily arguing it "should change", but just thinking out loud a bit. If <|> has a strong association in enough peoples' minds with not always evaluating the right-hand side, maybe that's a reason to consider it a syntactic abstraction that does that instead of it being a function like it is in Haskell :shrug:

Sebastian Ullrich (Mar 23 2021 at 16:00):

<|> is the control flow primitive. If you want something else to happen for all monads/alternatives, you'll need to find a new primitive monadic combinator and abstract over it in a typeclass. We considered making the RHS of <|> a thunk, but that could be confusing in proofs/contexts where execution does not matter. If all but two interesting monads are already inherently lazy, the correct solution is to make those two lazy as well, not to change a fundamental combinator. Especially given that they are problematic in other ways as described above.

Andrew Kent (Mar 23 2021 at 16:04):

I don't necessarily disagree with you, although I think I was trying to talk more purely about the syntax <|> and nothing more. I.e., what should the syntax <|> mean when a user reaches for it and not arguing that the underlying tower of typeclasses should change.

Sebastian Ullrich (Mar 23 2021 at 16:06):

Then I do not understand. There is nothing the syntax can do here if the underlying combinator's signature doesn't change.

Andrew Kent (Mar 23 2021 at 16:20):

I think I'm imagining a more creative reinterpretation of the syntax <|> than perhaps I'm letting on then which is focused on the control-flow aspects....? (And perhaps for this reason it would be too different and not a good idea, as it would not work in all the places <|> is valid with its current meaning at present). In this alternative reality where <|> as syntax means something very different , you might imagine a class of values where you morally get the following elaborations:

for Option:

mA <|> mB
-- expands to
match mA with
| none => mB
| some a => pure a

for Except:

mA <|> mB
-- expands to
match mA with
| Except.error _ => mB
| Except.ok a => pure a

etc

And this interpretation of <|> may end up being very divorced from the current typeclasses it is associated with and work with only a subset or different set of values, etc, but you could still imagine there being such a syntactic abstraction that for some things it ends up computatinoally behaving more like <|> does in a call-by-need language (in at least delaying the RHS).

At any rate... I worry this is a bit too speculative and distracting for continued debate in this thread xD

Sebastian Ullrich (Mar 23 2021 at 16:24):

If the notation was not based on a typeclass, you would not be able to use it in programs or combinators generic over the monad except by making the whole function a macro, which is not tenable.

Andrew Kent (Mar 23 2021 at 16:27):

Yes, if you think of this like a more generic "or", you would need a typeclass to tell you what values were "falsy" to know when to return the thing. And then of course you wouldn't be able to use it in all the places <|> can be used now. It is quite a radical reimagining of the syntax =)

Joe Hendrix (Mar 23 2021 at 17:10):

@Sebastian Ullrich Thanks for explanation. In hindsight strictness seems like an obvious problem, and I think although bind would be ok, many of the the other operations like >> and <|> would not.

Joe Hendrix (Mar 23 2021 at 17:22):

@Andrew Kent @Sebastian Ullrich Another direction would be to make strategy annotations part of the type (ABI) of a function. Maude has had this for a while, but is a first order language. This would allow making operations like if then else and short-circuit operators definable in the language. For Lean, you could just mark certain arguments as eager (call-by-value) and others as lazy (call-by-need).

I'm guessing this would be a distraction from current goals, but I thought I'd mention it as the approach is fairly well known in term rewriting.

Mario Carneiro (Mar 23 2021 at 22:47):

Sebastian Ullrich said:

Yes, you have to wrap the type with Id

In your earlier example, with my lean-3-tinted eyes I don't see why the displayed behavior is regarded as unexpected. If you want to use return some x and return none then you have to use do in a monad m (Option Nat); if the target type is Option Nat then I expect that the monad is Option and the inner type is Nat. Indeed, if the target type is just A without an obvious monad application, I think it is acceptable to error out rather than guessing the Id monad. The concrete workaround for that example is to change the type to def f ... : id (Option Nat) := do ... or def f ... : Option Nat := show id (Option Nat) from do ....

Mario Carneiro (Mar 23 2021 at 22:54):

Regarding eager function applications like o1 <|> o2 or fast && slow, I think this has always been a problem and lean 3's solution of using @[inline] is not good because it is sensitive to inlining decisions. Ideally the execution order should not be changed by inlining; even though lean is a pure language this is still a performance foot-gun. I agree with Joe Hendrix that a more robust solution is to explicitly indicate laziness in the types; some of this can be accomplished by using thunk A but this is not the same type as A so it can cause problems for mathematics and the kernel (e.g. if the type of Bool.and was Bool -> (Unit -> Bool) -> Bool that would be rather annoying to work with), so I think this should be either an identity wrapper / annotation or an attribute.

Gabriel Ebner (Mar 24 2021 at 08:35):

I think it is acceptable to error out rather than guessing the Id monad.

So this is what's happening here? I honestly did not make that connection at all.

Sebastian Ullrich (Mar 24 2021 at 09:09):

Mario Carneiro said:

with my lean-3-tinted eyes

Using do blocks for pure code (without further annotations) has by now become second nature for us, it is super convenient. @Marc Huisinga told me he also ran into some Option confusion issues before the change, maybe he can mention some practical examples from someone not involved in designing the feature :) .

Zygimantas Straznickas (Mar 24 2021 at 16:21):

Just wanted to mention I’ve also run into issues that Sebastian described. As someone who frequently uses Option comprehension I dealt with them by sprinkling extra type annotations around in my for loops etc, but I understand that’s not good user experience for most users so the change seems like a net positive (as long as users can opt-in by providing their own Monad Option instance).

Mario Carneiro (Mar 24 2021 at 22:09):

What about a macro like pure_do or similar to make it explicit that this is a do block in the identity monad? I think this is useful as documentation as well, because the version you posted reads weirdly otherwise

Mario Carneiro (Mar 24 2021 at 22:11):

this default fallback behavior seems difficult to control; there should be a simpler way to do it without cumbersome annotations if it is as useful as you say

Marius Melzer (Dec 19 2021 at 12:46):

Sebastian Ullrich said:

A different, Lean 4-specific issue is

def f (xs : List Nat) : Option Nat := do
  for x in xs do
    if x > 10 then
      return some x
  return none
3:2:
type mismatch, 'for' has type
  Option Nat : Type
but is expected to have type
  Nat : Type
6:2:
application type mismatch
  pure none
argument
  none
has type
  Option ?m.551 : Type ?u.550
but is expected to have type
  Nat : Type

The user thought of Option as data, but do assumed it was computation

Might be a stupid question, but could somebody explain to me what the problem is here? And what's the difference between viewing a do block as data or as computation?

Horatiu Cheval (Dec 19 2021 at 12:57):

Have you seen the discussion here: https://leanprover.zulipchat.com/#narrow/stream/270676-lean4/topic/.E2.9C.94.20Has.20the.20do.20notation.20changed.20recently.3F ? I think it's related

Horatiu Cheval (Dec 19 2021 at 12:59):

What version are you on? On 2021-12-18 I get a nicer error message:

invalid 'do' notation, expected type is not a monad application
  Option Nat
You can use the `do` notation in pure code by writing `Id.run do` instead of `do`, where `Id` is the identity monad.

Marc Huisinga (Dec 19 2021 at 13:07):

If Option is a monad, then return is analogous to pure in the sense that it'll take a value of type Nat and embed it in Option Nat (i.e. return x is some x in this case, with it additionally affecting control flow in Lean 4). So in the code snippet above, if Option is a monad, it'll error because it expects Nat for the argument to return, not Option Nat. Instead, you'd have to write it like so:

def f (xs : List Nat) : Option Nat := OptionM.run do
  for x in xs do
    if x > 10 then
      return x
  none

This behaviour makes sense for most side effects. If you use IO for instance, you think of the type Nat in IO Nat as the real return value, not IO Nat. If you use return x in IO, you very likely want to pass an integer, not an IO.
But Option is somehow different. It's not always used for side effects, sometimes you just want to use it as a value. In this case, the default above can be unintuitive. So the choice made here was to have OptionM be the monad, Option be a value and OptionM.run to go from the former to the latter, so that you can still use the monadic version of Option in functions that return Option if you want to.

Edward Ayers (Mar 18 2022 at 20:47):

What's a terse way of doing x >>= f for x : Option α, f : α → Option β?

Edward Ayers (Mar 18 2022 at 20:52):

I guess Option.bind x f

Mario Carneiro (Mar 19 2022 at 00:29):

x.bind f


Last updated: Dec 20 2023 at 11:08 UTC