Zulip Chat Archive

Stream: lean4

Topic: why allow multiple instances per data type?


Michael Jam (May 10 2021 at 15:46):

What is the motivation for allowing multiple instances for a same type, in lean?
Are there some simple examples showing why its very beneficial in mathlib or elsewhere ?
Having some Haskell experience myself, I think Haskell people would yell at this design choice.
Haskell enforces at most 1 type class instance per type, I think one good reason is that it is safer in a programming context.
For example, if a remote programmer writes a different instance for a same type (with a higher priority) in a remote dependency, and the compiler allows this, some instances that your program used might get unintentionnaly swapped. It sounds bad to me because undesired swaps might result in undefined program behavior, and deadly crashes, which is unacceptable for a production environnement.
I feel that in the context of mathlib and theorem proving, it is not as big of a deal, because the worst that can happen, is that someone else breaks your proof, and you have to fix it, being more explicit about which instances you need. However if lean 4 has the ambition of also being a general purpose programming language, that type class design looks questionable to me.
Please correct me if any of my understanding is wrong, I am very new to lean.

Kevin Buzzard (May 10 2021 at 16:03):

Our library design is supposed to only allow one instance per class on a type, but how would you enforce that in general? For example in classical mathematics you absolutely need to sometimes allow having all Props decidable, so what about the ones which have a constructive instance?

Kevin Buzzard (May 10 2021 at 16:04):

And of course by "multiple instances" you need to say what you mean for two instances to be the same...

Nick Scheel (May 10 2021 at 16:09):

Because types are so much more sophisticated in Lean than Haskell (namely, they can contain what we think of as “values” in Haskell), it turns out that the question of when two types are equal is correspondingly more subtle. So while the “one instance per type” makes a lot of sense in the type language of Haskell, where types are very concretes/tractable, it’s much much harder to enforce in dependent type theories, to the point where the best option is really to drop that restriction. The good news is that dependent type theories also give you tools to alleviate the problems: types can now depend on instances, and you can make use of proofs that two instances are equal if it really comes down to that. But mostly it’s just up to good engineering/useful conventions to make sure that instances don’t conflict in the common scenarios.

Nick Scheel (May 10 2021 at 16:14):

To be more specific: the most commonly cited example of why canonicity matters in Haskell is with the use of Set a, a datatype which tabulates its elements in order and thus depends on having the same Ord a instance every time to ensure that insertions and other operations make sense. However, in dependent type theories, you can formulate Set in a way that the _type_ depends on the instance: that is, it has the shape Set : forall a, Ord a -> Type, and usages look like Set Nat OrderingOnNat (although the instance OrderingOnNat is usually implicit). So when the consistency of the instance matters, you can actually enforce that restriction at the type level.

Johan Commelin (May 10 2021 at 16:14):

Michael Jam said:

I feel that in the context of mathlib and theorem proving, it is not as big of a deal, because the worst that can happen, is that someone else breaks your proof, and you have to fix it, being more explicit about which instances you need.

It's a big deal for mathlib. We are constantly trying to make sure that there is at most one instance.
Because diamonds = headaches

Nick Scheel (May 10 2021 at 16:17):

(For the mathematicians: of course Set means finite set.)

Note that having different instances inferred should not compromise the strength of any proofs, or any safely-written code. The worst that happens is that the instances genuinely don’t line up and the compiler complains. But sometimes the instances should line up, but the compiler can’t see why! So it’s mostly an annoyance, not a security hole.

Michael Jam (May 10 2021 at 16:44):

OK thanks all for clarifying the situation

Michael Jam (May 10 2021 at 17:06):

So if I write this :

class C (t : Type) := (n : t)
instance : C Nat where n := 0
instance : C Nat where n := 1
def f [i : C Nat] := i.n
#eval f

f is able to fetch a Nat instance, so the type checker has to somehow know which instances satisfy the constraint...
So why cannot it figure out that there are two such instances for Nat and report it?

Kevin Buzzard (May 10 2021 at 17:13):

Because it stops when it finds the first one, which is an arbitrary one because you have written unidiomatic code

Kevin Buzzard (May 10 2021 at 17:14):

I really don't want my type checker saying "ok let's check that there are no more instances" not least because in the algebra heirarchy there might be literally hundreds of other ways of constructing the same instance

Kevin Buzzard (May 10 2021 at 17:17):

The classes we use in mathlib are classes where "clearly" there are, under normal circumstances, at most one instance. There are times when we do actually need more than one instance of a class and we do this using type aliases(eg dual X := X but with the opposite order), which is going to cause real problems for a system attempting to check that there's at most one instance per class up to definitional equality

Michael Jam (May 10 2021 at 17:20):

so if you use instance priorities, is it much slower because it will check everything?

Kevin Buzzard (May 10 2021 at 17:25):

Instance priorities are used in mathlib. I don't really understand the question. For example we sometimes switch on "mathematician mode" by adding an instance saying that every Prop is decidable, with low priority; that way if the system wants to check that equality on Nat is decidable, it will find the constructive instance first, but if it wants to check that equality on a random type X is decidable it will end up finding the classical one.

Michael Jam (May 10 2021 at 17:27):

well if I use priorities on my two instances, I mean the checker has to have knowledge of both to compare their priorities and use the one that's highest. So it has a way to understand there are two instances of that Nat type, if i'm not mistaken

Michael Jam (May 10 2021 at 17:30):

I thought you guys were kind of saying it's not feasible to enforce at most one instance per type. I was just trying to understand better if its for performance reasons or other reasons. But I guess now I need to get more experience with the language

Joe Hendrix (May 10 2021 at 17:31):

Are you wanting this feature? Multiple instances in Lean do not generate incorrect code the way they potentially can in Haskell.

Michael Jam (May 10 2021 at 17:39):

If the meaning of your program is 100% mathematically specified it should not be a problem, you're right.
I think for everyday programming, one can write underspecified, or to-be-specified-later code, and in that case it might be an issue

Kevin Buzzard (May 10 2021 at 18:03):

I like this! "Mathlib -- not your everyday programming". Yes, all I do is prove theorems in this software, I never run #eval or #reduce or anything like that

Mario Carneiro (May 10 2021 at 18:08):

Michael Jam said:

If the meaning of your program is 100% mathematically specified it should not be a problem, you're right.
I think for everyday programming, one can write underspecified, or to-be-specified-later code, and in that case it might be an issue

Worrying that there might be multiple nonequivalent implementations of a typeclass is somewhat similar to worrying that a class doesn't implement an interface "lawfully" in traditional programming. Sure, maybe the object implements hashCode but always returns 0, but that would be a really stupid thing to do, and generally people don't worry about it and use documentation to enforce behavioral constraints without assuming that all programmers are adversaries. In lean you have a bit more help from the compiler, meaning that if you have a mixup it's pretty likely to cause a compile error somewhere. It's not perfect, especially if you are using lean like as if it were java, but improved bug detection is already a pretty big value add.

Joe Hendrix (May 10 2021 at 19:10):

@Michael Jam For context, I think being able to have multiple instances of a typeclass is a feature of lean and not necessarily indicative of a bug. If one wants to ensure a typeclass is used consistently across a data structure just make the typeclass part of the datastructure type.
Obviously there could be bugs where a developer doesn't do that and should have, but I don't think it's the job of the language to add checks for things that might be a bug. Perhaps one could write a linter to check this sort of thing if it is really important in a specific use case.

Michael Jam (May 10 2021 at 19:25):

I had in mind a very hypothetical future where lean dominates the world like a widespread java, and many businesses having many interdependencies run on lean albeit with poorly specified code, because good programmers are hard to find and verifying software is costly.
Some malicious competitor could decide to publish an update to a library you depend on, whichs swaps one of your poorly specified instances with some other distinct behavior just to crash your business. But yeah, it's more of an issue about people producing low quality code, than about the language itself... Perhaps I worry too much...

Mario Carneiro (May 10 2021 at 19:55):

The obvious solution to that is to put invariants in your structures to make it impossible to smuggle relevant behavioral changes. That's where Lean wins compared to the other guys: that's just not possible to do (at compile time) unless you have a really strong type system, and if you are using lean you may as well take advantage of it.

Johan Commelin (May 10 2021 at 20:03):

Sure, but Michael was talking about underspecified or to-be-specified-later code...

Mario Carneiro (May 10 2021 at 20:03):

when you don't take precautions, sometimes you get burned

Mario Carneiro (May 10 2021 at 20:05):

The language can't make you do anything. At best it provides the means to do better, ideally such that it is easier to do right than do wrong (the "pit of success"). But you can be foolish in every programming language

Mac (May 10 2021 at 23:23):

Kevin Buzzard said:

Because it stops when it finds the first one, which is an arbitrary one because you have written unidiomatic code

This arbitrary selection of a satisfying instance is, in my opinion, a core feature of Lean's type classes. Type classes, in general, are suppose to enforce a abstract constraint on a type and a consumer of said type class is supposed to not care about which particular satisfying implementation of that constraint it is (when using a synthesized instance as opposed to an explicit one). A program which does care is breaking this contract and thus should be aware of the potential for undesired behavior. Obviously, this ideal is not always stuck to, but it does explain the logic of allowing multiple instances.

It is also were noting that type classes in Lean and Haskell are very different mechanically. Both their runtime representations and their instance selection procedures vary wildly and heavily affect the constraints placed on them.

Oliver Nash (May 11 2021 at 09:59):

I agree with the consensus here but I did worry about this a while back and came up with the example here: https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/multiple.20solutions.20for.20typeclass.20search/near/216002921

With apologies for reusing a Lean 3 example in this Lean 4 stream, you can do some mildly concerning things like this:

class foo := (x : )

def bar [foo] :  := foo.x

def baz (f : foo) := bar

instance foo_inst : foo := 37

lemma bar_val : bar = 37 := rfl

lemma baz_val (f : foo) : baz f = f.x := rfl

-- Just to hammer home the point: we can prove `baz` and `bar` agree:
lemma bar_eq_baz (f : foo) : baz f = bar := rfl

-- and yet:
#reduce bar -- 37
#reduce baz 73 -- 73

Michael Jam (May 11 2021 at 16:41):

Yes I think this conversation so far lacked some examples.
Here I will give a few examples that I randomly thought of :

Assume that you are a company A and you have a main program which does some complicated stuff, that requires many imports.
Assume one of these imports is from a malicious company B. Assume the main does a bunch of println.
For the sake of simplicity say your program looks like this:

import CompanyB.lib

def companyA_main : IO Unit := IO.println 5
#eval companyA_main

println implicitly depends on toString. The definition of ToString is this:

class ToString (α : Type u) where
  toString : α  String

The informal meaning of an expression like "toString x", as I understand it, is "a string that represents x of type α in a human readable way"
"human readable way" is not something that can be mathematically formalized. So the most information that can be specified in the type system about it is "String". Because of that I believe this function is underspecified and unsafe to use if we allow multiple instances for a data type.

A malicious company B could decide to add this in CompanyB.lib :

instance : ToString Nat where toString x := "0"

And then someday, you will update your dependencies because it's cool to stay up to date, and your program will behave badly. (in that case companyA_main will display "0" instead of "5")

Now I can even do bad things if your typeclass is completely formalized:
If I have this class:

class Double (α : Type u) [Add α] where
  double  (x : α) : {y : α // y = x + x}

Then I can write different instances like

instance : Double Nat where double x := x + x, rfl
instance : Double Nat where double x := x * 2, by rw [Nat.mul_succ, Nat.mul_one]⟩

Here you think that you are safe because double is specified to always return the double, so even if you write different implementations, they should do the same thing.
Well I argue that you are not completely safe because the type information does not give any information about the computation time.
Company B can write an instance that will compute the right result in a very very long time, for example by inserting a system sleep, or by adding a useless very long calculation on purpose.

Company A's build system will not report any problem, they will say it's all correct and verified, but actually when company launches their program, it will freeze for almost ever which could result in damage.

I think in Haskell this whole situation is not possible, because the compiler can detect if there are two conflicting instances in scope, so you never have to consider this risk. I don't really remember how other languages deal with that.

I think those kind of security risks might have to be considered if lean is ever willing to hit a bigger, more programming focused crowd.
I'm happy to hear if you have objections with my assumptions or reasoning.

Mac (May 11 2021 at 18:27):

Michael Jam said:

The definition of ToString is this:

class ToString (α : Type u) where
  toString : α  String

The informal meaning of an expression like "toString x", as I understand it, is "a string that represents x of type α in a human readable way"
"human readable way" is not something that can be mathematically formalized. So the most information that can be specified in the type system about it is "String". Because of that I believe this function is underspecified and unsafe to use if we allow multiple instances for a data type.

A malicious company B could decide to add this in CompanyB.lib :

instance : ToString Nat where toString x := "0"

And then someday, you will update your dependencies because it's cool to stay up to date, and your program will behave badly. (in that case companyA_main will display "0" instead of "5")

Haskell at least partially solves the "human-readable" ToString by having two classes Show := (show : A -> String) and Read := (read : String -> A) such that they are suppose to be inverses of each other (i.e. read (show a) = a : A and show (read a : A) = a. For many examples of ToString, such a specification works. However, there are some cases where producing the inverse read is not feasible/desirable, but then the idea of the class is very under-specified so it makes sense to me that relying on could produce nonsense results like "0" for some complex type. To quote Mario Carneiro:

Mario Carneiro said:

when you don't take precautions, sometimes you get burned

Though I imagine this is a matter of preference.

Mario Carneiro (May 11 2021 at 18:30):

Actually, I think where things go wrong is here:

Assume that you are a company A and you have a main program which does some complicated stuff, that requires many imports.
Assume one of these imports is from a malicious company B.

I'm sorry, but if you can't trust your dependencies you are already hosed. Lean code can do arbitrary bad things

Mario Carneiro (May 11 2021 at 18:32):

See the infamous leftpad debacle

Mac (May 11 2021 at 18:51):

Michael Jam said:

Well I argue that you are not completely safe because the type information does not give any information about the computation time.
Company B can write an instance that will compute the right result in a very very long time, for example by inserting a system sleep, or by adding a useless very long calculation on purpose.

This is actually kind of the point, but in the reverse. By allowing multiple instances, additional instances can be added (with higher priority) that may compute more efficiently. This does, though, in turn, mean it can be used for the reverse (forcing an more inefficient instance instead). You could further specific the instance by required a proof that the algorithm is more efficient than some baseline -- Lean could actually potentially do something like this -- but that seems like overkill.

It is also worth noting that the same is true for most underspecified programs in general. If I am using some library which has an under-specified normal def, it could maliciously change that def in future update to mess with my use case. For example

-- In Lib 1.0
def greet : IO () := println "Hello World"

-- My Library
def main : IO () := greet

-- In Lib 1.666
def greet : IO () := launchNukes

Now my programs launches the nukes! Oops! (In keeping with the comparisons to Haskell, launchNukes is a common demonstration of potential unsafety in Haskell.) Thus, I feel like the problem here is depending on malicious libraries not really Lean itself.

Mario Carneiro (May 11 2021 at 18:55):

You could further specific the instance by required a proof that the algorithm is more efficient than some baseline -- Lean could actually potentially do something like this -- but that seems like overkill.

As far as I know there is no way in lean to express the running time of an algorithm expressed directly as the inhabitant of a type. This is a fundamental and in some senses deliberate limitation of dependent type theory.

Chris B (May 11 2021 at 18:56):

Michael Jam said:

I think those kind of security risks might have to be considered if lean is ever willing to hit a bigger, more programming focused crowd.
I'm happy to hear if you have objections with my assumptions or reasoning.

I think both of the examples you gave also point to the fact that you can't rely on behavior that you haven't specified, let alone proved. Yes, if a dependency unexpectedly changes the ToString implementation for some type I might be screwed, but if I was relying on it working a certain way because I eyeballed it once a few versions back, it's hard to say that this is an issue with Lean.

Mac (May 11 2021 at 19:00):

Mario Carneiro said:

You could further specific the instance by required a proof that the algorithm is more efficient than some baseline -- Lean could actually potentially do something like this -- but that seems like overkill.

As far as I know there is no way in lean to express the running time of an algorithm expressed directly as the inhabitant of a type. This is a fundamental and in some senses deliberate limitation of dependent type theory.

As the way Lean currently stands, yes. But you could prove things about the Lean code generator which could then use to prove things about type representations and functions which would then allow you prove constraints on runtime. It would require an immense undertaking though.

Mac (May 11 2021 at 19:04):

Mario Carneiro said:

I'm sorry, but if you can't trust your dependencies you are already hosed. Lean code can do arbitrary bad things

This is actually a much greater concern in Lean than it is in compiled languages like Haskell and C++. Lean's compiler is also an interpreter. Thus, any code can produce arbitrary effects at compile time. For example:

-- In Lib 1.666
#eval launchNukes

Would be perfectly valid Lean that would launch the nukes at compile time for any application that imported Lib 1.666.

Mac (May 11 2021 at 19:08):

Oliver Nash said:

I agree with the consensus here but I did worry about this a while back and came up with the example here: https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/multiple.20solutions.20for.20typeclass.20search/near/216002921

Here is the example in Lean 4 (with addendum):

class foo := (x : Nat)

def bar [foo] : Nat := foo.x

def baz (f : foo) := bar

instance foo_inst : foo := 37

theorem bar_val : bar = 37 := rfl

theorem baz_val (f : foo) : baz f = f.x := rfl

-- Just to hammer home the point: we can prove `baz` and `bar` agree:
theorem bar_eq_baz (f : foo) : baz f = bar := rfl

-- and yet:
#reduce bar -- 37
#reduce baz 73 -- 73

-- but note:
#reduce bar_eq_baz 73 -- Eq.refl 73 -- i.e. 73 = 73

The reason bar_eq_baz is provable is because bar is guaranteed to use the foo f provided as an argument. Thus bar_eq_baz really is just f.x = f.x (which is trivially true).

Michael Jam (May 11 2021 at 20:24):

Right it's more of an assumption issue... in lean, if you don't trust your dependencies you're kind of dead anyway, because arbitrary code might run in your machine while type checking.
I agree that in general it's better to minimize dependencies as much as possible, or even better, to not have any.
In the real world, I feel like lots of programmers like to use way too many dependencies because they are brainwashed with "do not reinvent the wheel" or "there's this brand new plugin from cool guys at google".
So perhaps you are saying lean type class design could incentivize people to change their behavior and be extremely careful with dependencies... Interesting idea...

Kevin Buzzard (May 11 2021 at 21:15):

Seeing as the paranoid crowd are in, I'm assuming that Lean 4 can still clean your hard disk, making most of the suggestions above look rather run-of-the-mill in comparison.

Eric Wieser (May 11 2021 at 21:23):

My reading here of Oliver's example is that you can end up proving something for the wrong typeclass without realizing, and then when you actually use them with a different typeclass you don't have the guarantees you thought you'd proven

Mac (May 11 2021 at 21:26):

Kevin Buzzard said:

Seeing as the paranoid crowd are in, I'm assuming that Lean 4 can still clean your hard disk, making most of the suggestions above look rather run-of-the-mill in comparison.

For comedy's sake, I must emphatically state that I am worried that you deem launching nukes to be run-of-the-mill in comparison to cleaning your hard disk. XD

Mac (May 11 2021 at 21:27):

Eric Wieser said:

My reading here of Oliver's example is that you can end up proving something for the wrong typeclass without realizing, and then when you actually use them with a different typeclass you don't have the guarantees you thought you'd proven

True, though I think there are many aspects of learn that can confuse people initially. I would not say Lean has a particularly low learning curve.


Last updated: Dec 20 2023 at 11:08 UTC