Zulip Chat Archive

Stream: lean4

Topic: Heterogenous operation and literals


Tomas Skrivan (May 06 2022 at 13:52):

I'm having hard time using heterogenous operations together with number literals. For example writing 2*x always gives me an error:

failed to synthesize instance
  HMul X X ?m.1180
 failed to synthesize instance
  OfNat X 2

i.e. Lean tries to use homogenous operation and cast 2 to X. What I have to do it to manually specify the type of two,(2:ℝ)*x.

Here is variant of this problem with addition:

structure Idx (n : USize) where
  val : USize
  property : (val < n)

instance : ToString (Idx n) := λ i => toString i.1
instance : OfNat (Idx n) i := i.toUSize%n, sorry
instance : HAdd (Idx n) USize (Idx n) := λ i j => ⟨(i.1 + j)%n, sorry⟩⟩

#eval (6 : Idx 10) + (5 : USize) -- this works
#eval (6 : Idx 10) + 5  -- this does not work

Arthur Paulino (May 06 2022 at 14:05):

I don't know how to tell Lean to automatically look for an instance of an heterogeneous operation when using +.
But this works:

structure Idx (n : USize) where
  val : USize
  property : (val < n)

instance : ToString (Idx n) := λ i => toString i.1
instance : OfNat (Idx n) i := i.toUSize%n, sorry
instance : HAdd (Idx n) USize (Idx n) := λ i j => ⟨(i.1 + j)%n, sorry⟩⟩

#eval (6 : Idx 10) + (5 : USize) -- this works
#eval (6 : Idx 10) + 5  -- this does not work

def mySum (n : Idx α) (s : USize) :=
  n + s

notation n " ⊹ " s => mySum n s

#eval (6 : Idx 10)  5 -- this works

Tomas Skrivan (May 06 2022 at 14:07):

Haha cute :laughter_tears: but no, I really want to use normal + sign.

Arthur Paulino (May 06 2022 at 14:12):

#xy: isn't it a bit weird though? I mean, using + to simbolize a sum of things with different types

Tomas Skrivan (May 06 2022 at 14:19):

Yeah in this instance you are probably right. But with scalar multiplication I really want to write 2*x for x : X where X is a vector space.

Arthur Paulino (May 06 2022 at 14:28):

I would take a look at how it's done in mathlib, but I'm skeptical that the common multiplication symbol * is used in these cases.

(not too related!) I remember Julian Berman posted a link to an amazing post where Andrej Bauer explained the layers of inferences needed for a computer to understand what we mean when we (informally) multiply a scalar number and a vector

@Julian Berman sorry to ping, but do you still have that link? I want to read it again Found it!

Tomas Skrivan (May 06 2022 at 15:14):

Mathlib4 does not have definition of modules/group_action yet. It has rings and there

variable (A : Type) [Ring A] (a : A)

#check (4 * a)    -- 4 has type A
#check ((4:Nat) * a)   -- output: ↑4 * a : A    i.e. (4:Nat) get cast to A and then multiply

Also mathlib4 does not seem to even define HAdd Nat A A.

Mathlib3 uses as it does not have heterogenous operations.

Kyle Miller (May 06 2022 at 15:24):

I've been using HMul for something that's essentially a vector space, and it seemed like there was no way around the fact that you have to specify the types of numeric literals when used as scalars.

Kyle Miller (May 06 2022 at 15:30):

Is there a special elaboration rule that when you have 2 * x it will try to use the type of x to synthesize an OfNat instance for the 2? I know that there's a default instance to get an HMul from a Mul, but that doesn't seem to be what's applying here.

@[defaultInstance]
instance [Mul α] : HMul α α α where
  hMul a b := Mul.mul a b

Tomas Skrivan (May 06 2022 at 15:31):

I was playing around with defaultInstance and was unable to get anything out of it.

Tomas Skrivan (May 06 2022 at 15:31):

I do not understand elaboration at all, so I can't answer that.

Tomas Skrivan (May 06 2022 at 15:34):

For example this hack works i.e. define a new type for each natural number, NatLit n, and notation # n creates a value of that type

inductive NatLit (n : Nat) | val

instance : HAdd (Idx n) (NatLit m) (Idx n) := λ i _ => ⟨(i.1 + m.toUSize)%n, sorry⟩⟩
macro "#" n:term : term => `(NatLit.val (n:= $n))

#check (6 : Idx 10) + #5

Kyle Miller (May 06 2022 at 15:40):

If you want that to feel less like a hack, you could think about NatLit as being a type of rank-0 arrays (using J terminology), and then you could require that broadcasting rules should only apply to actual arrays.

Kyle Miller (May 06 2022 at 15:41):

Oh, I didn't read what Idx actually was. Never mind. I somehow thought it was a length-n array of indices.

Tomas Skrivan (May 06 2022 at 15:45):

Yeah if you are working with arrays you can do something like that. Or when I want to multiply ℝ -> ℝ by 2 I can cast 2 to fun _ => (2:ℝ). But when working with a generic vector space X I can't meaningfully assign type X to 2.

Tomas Skrivan (May 06 2022 at 16:12):

Hijacking all numerical literals with macro like this does not work either:

macro(priority := high) n:Lean.Parser.numLit : term => `(NatLit.val (n:= $n))

Full code:

import Lean.Parser

structure Idx (n : USize) where
  val : USize
  property : (val < n)

instance : ToString (Idx n) := λ i => toString i.1
instance : HAdd (Idx n) USize (Idx n) := λ i j => ⟨(i.1 + j)%n, sorry⟩⟩ --

---

inductive NatLit (n : Nat) | val
macro(priority := high) n:Lean.Parser.numLit : term => `(NatLit.val (n:= $n))

instance : Coe (NatLit i) (Idx n) := λ _ => i.toUSize%n, sorry⟩⟩
instance : Coe (NatLit i) USize := λ _ => i.toUSize

---

instance : HAdd (Idx n) (NatLit m) (Idx n) := λ i _ => ⟨(i.1 + m.toUSize)%n, sorry⟩⟩

---

#check 5                -- of type NatLit 5
#check (6 : Idx 10)     -- still works
#check (6 : Idx 10) + 5 -- does not work

Arthur Paulino (May 06 2022 at 16:51):

structure Idx (n : USize) where
  val : USize
  property : (val < n)

instance : ToString (Idx n) := λ i => toString i.1
instance : OfNat (Idx n) i := i.toUSize%n, sorry
instance : HAdd (Idx n) USize (Idx n) := λ i j => ⟨(i.1 + j)%n, sorry⟩⟩

macro_rules
  | `(($n:num : Idx $s:num) + $s':num) => `(($n : Idx $s) + ($s' : USize))

#check 1 + 1
#eval (6 : Idx 10) + (5 : USize) -- this works
#eval (6 : Idx 10) + 5  -- works

That works

Tomas Skrivan (May 06 2022 at 16:53):

It is not general enough, it breaks on:

def a : Idx 10 := 6
#eval a + 5

Arthur Paulino (May 06 2022 at 16:56):

I was trying an elaboration like this:

def mySum (n : Idx α) (s : USize) :=
  n + s

open Lean Elab.Term Meta in
elab a:term " + " b:term : term => do
  let a  elabTerm a none
  let b  elabTerm b none
  let ta  inferType a
  let tb  inferType b
  -- check `ta` and `tb`. if they match the specific types then
  mkAppM ``mySum #[a, b]
  -- otherwise, do the regular sum
  mkAppM ``HAdd #[a, b]

But I couldn't make it work with ease. Someone else might be able to help you on that direction

Tomas Skrivan (May 06 2022 at 17:14):

Thanks Arthur! I managed to get it working

import Lean.Parser
import Lean.Elab

structure Idx (n : USize) where
  val : USize
  property : (val < n)

instance : ToString (Idx n) := λ i => toString i.1
instance : OfNat (Idx n) i := i.toUSize%n, sorry
instance : HAdd (Idx n) USize (Idx n) := λ i j => ⟨(i.1 + j)%n, sorry⟩⟩ --

---

inductive NatLit (n : Nat) | val
def natLit (n : Nat) : NatLit n := NatLit.val

---

instance hadd : HAdd (Idx n) (NatLit m) (Idx n) := λ i _ => ⟨(i.1 + m.toUSize)%n, sorry⟩⟩

open Lean Elab.Term Meta in
elab a:term " + " b:Lean.Parser.numLit : term => do
  let n := b.toNat
  let a  elabTerm a none
  let b  elabTerm b none
  let ta  inferType a
  let tb  inferType b
  mkAppM ``HAdd.hAdd #[a, ( mkAppM ``natLit #[mkNatLit n])] --(← mkAppM ``natLit #[b])]


#eval 10 + 12
#eval (6 : Idx 10) + (5 : USize)
#eval (6 : Idx 10) + 2

Arthur Paulino (May 06 2022 at 17:19):

That's even simpler. No need for inferType

Leonardo de Moura (May 06 2022 at 18:48):

This is an interesting thread and it exposes limitations in the current elaboration method we use for arithmetic expressions. It is great to see you find a workaround. Note that it is important to set the right precedence otherwise it will affect the standard + notation. The NatLit trick is not needed for this particular example, but I can see it adds extra flexibility.

import Lean.Parser
import Lean.Elab

structure Idx (n : USize) where
  val : USize
  property : (val < n)

instance : ToString (Idx n) := λ i => toString i.1
instance : OfNat (Idx n) i := i.toUSize%n, sorry
instance : HAdd (Idx n) USize (Idx n) := λ i j => ⟨(i.1 + j)%n, sorry⟩⟩ --

open Lean Elab.Term Meta in
elab:65 a:term " + " b:num : term => do
  let n := b.toNat
  let a  elabTerm a none
  mkAppM ``HAdd.hAdd #[a, ( mkAppM ``Nat.toUSize #[mkNatLit n])]

Leonardo de Moura (May 06 2022 at 19:09):

I want now to focus on the standard arithmetic elaboration function and gather ideas on how to improve it.
Note that we have

macro_rules | `($x + $y)   => `(binop% HAdd.hAdd $x $y)
macro_rules | `($x - $y)   => `(binop% HSub.hSub $x $y)
macro_rules | `($x * $y)   => `(binop% HMul.hMul $x $y)
macro_rules | `($x / $y)   => `(binop% HDiv.hDiv $x $y)

binop% is an auxiliary notation that comes with its own elaboration function which is implemented in the file src/Lean/Elab/Extra.lean. This elaboration function builds a tree, analysis it, and then injects coercions where needed. The analyze function tries to find a "maximal type" using an order A < B if there is a coercion from A to B. It also tracks whether there are uncomparable types. Moreover, terms that have unknown types (i.e., the type is a metavariable) are ignored when computing the maximal type. When applying the coercions, the unknown types will become the maximal type due to unification. For example, given (x : _) (y : Int) (z : Nat), the term z + y + 2 + x is elaborated as Inf.ofNat z + y + 2 + x, note that the 2 and x are elaborated as Int. The current approach passes a collection of tests mixing Nat, Int, Rat terms, and addresses counterintuitive behavior found in Lean 3. For example, Lean 3 fails to elaborate z + y + 2 + x.
That being said, a side-effect of this approach is that the numeric literals will always have the type of the maximal type which is bad for @Tomas Skrivan's example. A simple solution is to add a mechanism for marking types where numeric literals are skipped during the second step (coercion application). If we do that, then, we can mark the instance HAdd (Idx n) USize (Idx n) with priority higher than the instance HAdd α α α, and it should work.

Leonardo de Moura (May 06 2022 at 19:13):

The drawback of the simple solution above is the extra complexity and more confusion.

Leonardo de Moura (May 06 2022 at 19:21):

For @Tomas Skrivan's example, it is sufficient to check whether the maximal type implements the homogenous instance Add, but it will not be sufficient for the vector space example @Kyle Miller mentioned above.

Tomas Skrivan (May 07 2022 at 08:19):

This is a bigger can of worms than I thought.

Using the maximal type might cause an invisible performance issue. When you have an array x and write x + 1, you do not want to initialize a new array full of ones and add it to x. Even worse if you have component wise multiplication. Writing 2*x might silently create an array full of twos and do component wise multiplication.

The answer to this might be: do not to define the coercion from the array's value type to an array. However, if you want to use the fact that arrays of some comm ring form an algebra you probably get such coercion(not sure if mathlib does that). At least you get 0 and 1 defined. So what should x + 2*1 do?

Tomas Skrivan (May 07 2022 at 08:21):

Well it should do scalar multiplication and then broadcasted addition.

Tomas Skrivan (May 07 2022 at 08:24):

From this example I feel that the heuristic should be more like: "use the minimal type for which you can do the heterogenous operation".

Not sure if this would be good or bad for expressions mixing Nat, Int, Rat, Real ...

Kevin Buzzard (May 07 2022 at 08:36):

Just to comment that in mathlib3 we use different symbols for homogeneous multiplication * : A -> A -> A and heterogeneous multiplication • : A -> B -> B docs#has_scalar.smul , and it's the latter which we use for a field acting on a vector space

Tomas Skrivan (May 07 2022 at 11:53):

Right, I forgot that mathlib3 uses different notation for these and mathlib4 does not yet have group action defined.

If I understand it correctly, in Lean 3 it was not possible to use the same symbol. I'm curious what people think about using the same symbol for multiplication and scalar multiplication.

Eric Wieser (May 07 2022 at 11:54):

I think I came up with a bunch of cases where it would behave really badly with respect to instance diamond, but I dont remember them just yet

Mario Carneiro (May 07 2022 at 11:56):

My opinion is that HMul is more likely to cause headaches due to instance search compared to the cost of having to use different symbols here and getting good type inference

Eric Wieser (May 07 2022 at 12:02):

Here's the case I was thinking of (in lean3 I'm afraid):

class HMul (A B C : Type*) :=
(hmul : A  B  C)

export HMul (hmul)

variables {ι : Type*}

-- hmul transfers elementwise on the right argument
instance pi.right
  {A : Type*} {B C : ι  Type*} [Π i, HMul A (B i) (C i)] : HMul A (Π i, B i) (Π i, C i) :=
λ a b i, hmul a (b i)⟩

-- hmul transfers elementwise on the left argument
instance pi.left
  {B : Type*} {A C : ι  Type*} [Π i, HMul (A i) B (C i)] : HMul (Π i, A i) B (Π i, C i) :=
λ a b i, hmul (a i) b

lemma uh_oh {A B C : Type*} [HMul A B C] :
  (pi.left : HMul (ι  A) (ι  B) (ι  ι  C)) = pi.right  :=
rfl -- fails

lemma oh_no {A B C : Type*} [HMul A B C] :
  (pi.left : HMul (ι  A) (ι  B) (ι  ι  C)) = pi.right  :=
begin
  ext a b i j,
  dsimp [hmul],
  sorry -- `hmul (a i) (b j) = hmul (a j) (b i)`, which is not going to be true
end

Eric Wieser (May 07 2022 at 12:06):

Note that pi.right is basically docs#pi.has_scalar

Leonardo de Moura (May 07 2022 at 14:01):

@Tomas Skrivan

The answer to this might be: do not to define the coercion from the array's value type to an array.

Yes, if you have scalar multiplication for arrays, defining a coercion from values to arrays is going to create problems. For example, your 2*x example would be ambiguous. That being said, I am not concerned about this example because as far as I remember, I have never used a system where there is a coercion from values to arrays.
I am more interested in getting 2*x to work without adding a type ascription on the 2 when we have an instance for scalar multiplication.

However, if you want to use the fact that arrays of some comm ring form an algebra you probably get such coercion(not sure if mathlib does that). At least you get 0 and 1 defined.

I think this is a not super common scenario, and we can work around it by having a definition that "arrays of some comm ring form an algebra", but not mark it as an instance, or have it as scoped instance.

From this example I feel that the heuristic should be more like: "use the minimal type for which you can do the heterogenous operation".

I am happy you are trying to help, but we cannot afford to spend time trying suggestions that do not come with any evidence that they would work.

Don't take me wrong, we want to improve the current approach and make it better, and we will be grateful for any help we can get. It is very helpful to collect examples that we think should work but don't. Then, using the collected examples to test a new approach before suggesting them is even more useful.

Leonardo de Moura (May 07 2022 at 14:03):

If I understand it correctly, in Lean 3 it was not possible to use the same symbol. I'm curious what people think about using the same symbol for multiplication and scalar multiplication.

Independently of what Mathlib decides to do, we are committed to supporting that in Lean. This can be done in many modern programming languages.

Eric Wieser (May 07 2022 at 14:37):

Tomas Skrivan said:

However, if you want to use the fact that arrays of some comm ring form an algebra you probably get such coercion(not sure if mathlib does that). At least you get 0 and 1 defined. So what should x + 2*1 do?

In lean3 + mathlib, if x : fin n -> R, then x + 2*1 is defeq to (fun i, x i + 2 * 1):

import algebra.algebra.basic
variables {R : Type*} {n : } [semiring R] (x : fin n  R)
example : x + 2*1 = (λ i, x i + 2 * 1) := rfl

I don't know what the VM representation of that form of "array" is though.

François G. Dorais (May 07 2022 at 15:56):

I recently noticed this unexpected behavior with chains of default instances:

class HOp (α β γ) where hOp : α  β  γ

class LOp (α β) where lOp : α  β  β

class Op (α) where op : α  α  α

@[defaultInstance]
instance inst1 [LOp α β] : HOp α β β := LOp.lOp

instance inst2 [Op α] : LOp α α := Op.op

infix:75 " ⋆ " => HOp.hOp

section Test
variable (α) [LOp Nat α]
variable (x y z : α) (m n : Nat)

example : n  x = z := sorry -- TC works

example : 1  x = z := sorry -- TC works

attribute [defaultInstance] inst2

example : n  x = z := sorry -- TC works

example : 1  x = z := sorry -- TC fails

end Test

It seems odd that adding a default instance should cause a working TC resolution to fail. If that did not happen, it would help with the issue in this thread. I don't know if this is intended behavior.

Leonardo de Moura (May 07 2022 at 16:29):

@François G. Dorais Yes, this is a current limitation of the default instance mechanism. When Lean finds an applicable default instance, it does not backtrack if the generated subproblems cannot be solved. I am currently trying to fix this issue.

Tomas Skrivan (May 07 2022 at 16:46):

Eric Wieser said:

Here's the case I was thinking of (in lean3 I'm afraid):
...

This is exactly the edge case I ran into when playing with broadcasting. When you have an operation between square matrix and a vector, you do not know if you should treat it as a column or row vector.

Leonardo de Moura (May 07 2022 at 17:00):

@François G. Dorais Lean now backtracks while trying to apply default instances. Your example now works with the following commit
https://github.com/leanprover/lean4/commit/38baeaf373a19e57130dd7b003a2aaa67b527e6a

Kevin Buzzard (May 07 2022 at 17:05):

This is exactly the edge case I ran into when playing with broadcasting. When you have an operation between square matrix and a vector, you do not know if you should treat it as a column or row vector.

In maths you would be able to figure this out by looking at whether you're writing MvM*v (column) or vMv*M (row).

Leonardo de Moura (May 07 2022 at 17:37):

Improved the binop% elaboration function. The following examples should work now

@[defaultInstance]
instance [Mul α] : HMul α (Array α) (Array α) where
  hMul a as := as.map (a * ·)

instance [Mul α] : Mul (Array α) where
  mul as bs := (as.zip bs).map fun (a, b) => a * b

#eval 2 * #[3, 4, 5]

def f9 [Mul α] (a : α) (as bs : Array α) : Array α :=
  a * as * bs

def f10 (a : Int) (as bs : Array Int) : Array Int :=
  2 * a * as * bs

def f11 (a : Int) (as bs : Array Int) : Array Int :=
  3 * a * as * (2 * bs)

def f12 [Mul α] (as bs : Array α) : Array α :=
  as.foldl (init := bs) fun bs a => a * bs

The commit has comments explaining the changes to binop%
https://github.com/leanprover/lean4/commit/af5e13e5345359a93f24c3831feec97d4619dbf0

Eric Wieser (May 07 2022 at 20:08):

@Kevin Buzzard:

In maths you would be able to figure this out by looking at whether you're writing MvM*v (column) or vMv*M (row).

I think @Tomas Skrivan is talking about elementwise products (like * in numpy) not contraction (matrix) products like the mathematicians usually care about.

Eric Wieser (May 07 2022 at 20:11):

The problem doesn't appear with numpy "broadcasting" because 2D arrays are first-class objects rather than arrays of arrays

James Gallicchio (May 10 2022 at 08:47):

(I kind of hate numpy broadcasting for precisely that reason... this summer I plan to experiment with other APIs to find a more predictable syntax for a numpy-esque library, if anyone wants to share suggestions :joy:)

Eric Wieser (May 10 2022 at 08:55):

(as an active numpy maintainer in a past life, I'd be very interested in seeing what you come up with!)

Kyle Miller (May 10 2022 at 16:04):

Tomas Skrivan said:

This is exactly the edge case I ran into when playing with broadcasting. When you have an operation between square matrix and a vector, you do not know if you should treat it as a column or row vector.

The J-style rule for broadcasting is that if you (hadamard) multiply a matrix and a vector, you end up doing the hadamard product of the vector with each row of the matrix. That's because the matrix is rank-2 (say with dimensions [a,b]) and the vector is rank-1 (say with dimensions [b]), so the vector gets lifted to a rank-2 array with dimensions [a,b] -- new dimensions are always added to the left. If the matrix were square and you wanted the other behavior, you'd have to lift the vector yourself.

The "vector of" functor is a monad, and one way to interpret this is that broadcasting is that you repeatedly apply pure to one of the arguments to add dimensions to the left to make the arguments compatible, if possible. You can insert a dimension anywhere in an array using pure, fmap pure, fmap (fmap pure), and so on, if you want to get other behavior.

Eric Wieser (May 10 2022 at 16:11):

The J-style rule for broadcasting is that if you (hadamard) multiply a matrix and a vector, you end up doing the hadamard product of the vector with each row of the matrix.

Yes, this is the numpy rule too. The problem is that you end up in a mess when you're working with a matrix over some generic type A, and then someone sets A to the type of 1D arrays

Eric Wieser (May 10 2022 at 16:12):

If you create a specific "nd container" object (like docs#holor?) to represent your matrices / vectors etc then that problem goes away

Kyle Miller (May 10 2022 at 16:36):

Interestingly, while still a problem, it's less of a problem for Lean than Python: if you resolve all the broadcasting inside a function's definition, it won't suddenly change behavior just because you pass in an array -- typeclasses don't respect referential transparency. In Python (and in J), broadcasting is resolved at runtime.

As far as I know, there are two basic solutions to this. The first is what you mention, a special multidimensional array type that guarantees that the dimensions are completely legible. The second is what J does, which is having a way to "box" a value so it won't participate in broadcasting anymore. The multidimensional array type records the span of all the indices itself, where the box records the divider between spans of indices.

Tomas Skrivan (May 11 2022 at 04:29):

To answer whether to add dimensions to the left or right you decide between these two opposing requirements:

  1. If you have a scalar r and a function f both depending on a parameter i : R You want (r * f) i = r i * f i to hold even when f's type is R -> R -> S.
  2. If you have a function f depending on a parameter i : R and function g not depending on it. You want (f * g) i = f i * g

I think 1. is adding dimension to the right and 2. is adding dimension to the left. Maybe it is the other way around, I always get confused.

Also I ended up with the "boxing" approach. You can define type synonym and provide different TC instances to modify default behavior.

Tomas Skrivan (May 11 2022 at 04:45):

However, I had some type class issues so I abandoned my "clever" array indexing and thus didn't discover all the gotchas in my approach. I should revise it and start using it again and see if it holds up.


Last updated: Dec 20 2023 at 11:08 UTC