Zulip Chat Archive

Stream: general

Topic: association lists, finmap


Sean Leather (Sep 20 2018 at 06:57):

As some people know, I've been working on association lists and finite maps for mathlib (https://github.com/spl/lean-finmap). Mario has decided he wants to get it into mathlib soon, so he has started rewriting it in his own fashion (https://github.com/leanprover-community/mathlib/tree/finmap). I wanted to open up the discussion to get feedback from anyone else who might be interested.

Sean Leather (Sep 20 2018 at 07:05):

One question is whether there is the need for a structure to wrap a list without duplicate keys. Mario thinks it might be useful:

structure alist (α : Type u) (β : α  Type v) : Type (max u v) :=
(val : list (sigma β))
(nd : val.knodup)

https://github.com/leanprover-community/mathlib/blob/e5ddd1d4dca984f6d7a77a87a1608b414296208f/data/list/alist.lean

I went down this path and found that, given that (a) there are so many useful definitions and theorems regarding lists and (b) the structure is such a thin wrapper, every definition or proof for the structure is simple, it created a large number of really simple statements. Considering that alist is really a list with a property and I believe it is useful to use other list definitions that don't involve this property, I think the alist structure either (a) makes it more difficult to work with the wrapped list or (b) creates the problem of scaling the number of statements about alist to match those already about list.

Sean Leather (Sep 20 2018 at 07:09):

In another way of putting it, you could look at an alist as something unique on its own or you could look at a list with a property that it has no duplicate keys. In the process of developing my library, I found the latter view much more useful.

Simon Hudon (Sep 20 2018 at 07:10):

Why is latter more useful?

Sean Leather (Sep 20 2018 at 07:12):

On a related note, that's also how I ended up with the finmap structure:

structure finmap (α : Type u) (β : α  Type v) : Type (max u v) :=
(val : multiset (sigma β))
(nodupkeys : val.nodupkeys)

https://github.com/spl/lean-finmap/blob/035f1faa218e47b9f411c4a243900955f4714a56/src/data/finmap.lean#L8-L10

You could say that this is a similar situation: a finmap is just a multiset with no duplicate keys. However, I found that it's more useful to look at it as a finmap because I didn't find a lot of the other multiset definitions to be very useful for a finmap.

Sean Leather (Sep 20 2018 at 07:13):

Why is latter more useful?

The most important reason for me is that I care about the structure of the internal list: nil, cons, append, etc.

Sean Leather (Sep 20 2018 at 07:13):

Also, regarding finmap, I don't care about the structure of the internal multiset.

Sean Leather (Sep 20 2018 at 07:15):

... which makes sense, since a multiset is a quotient.

Sean Leather (Sep 20 2018 at 07:17):

@Simon Hudon Thanks for that question, BTW. It gets to a key reason I don't think alist is useful, a reason that hadn't yet come to mind.

Sean Leather (Sep 20 2018 at 07:20):

That is, I found that it was useful to wrap a quotient like multiset in a structure but not that useful to wrap a list in a structure because the structure of the list is important when working with the structure. I ended up creating definitions to wrap nil, cons, append, etc., and all of that already existed for the list, so I came to ask why it was necessary. I couldn't come up with a good answer.

Simon Hudon (Sep 20 2018 at 07:25):

You could say that this is a similar situation: a finmap is just a multiset with no duplicate keys. However, I found that it's more useful to look at it as a finmap because I didn't find a lot of the other multiset definitions to be very useful for a finmap.

I don't get this nuance

Sean Leather (Sep 20 2018 at 07:30):

I don't get this nuance

I think it's similar to how you don't see a lot of the multiset definitions used for finset. There's a particular subset (mostly starting with nd) that are used in finset, but many of them are not.

Sean Leather (Sep 20 2018 at 07:31):

multiset seems to be useful as a substrate for finset and finmap, but it also has a larger API that is oriented towards counting duplicates.

Sean Leather (Sep 20 2018 at 07:32):

Also, the subset of the multiset API used for finset is not useful for finmap.

Mario Carneiro (Sep 20 2018 at 13:16):

I think the comparison is apt, and applicable to alist. A finset is just a multiset with a property, but the API is different (some basic operations are not applicable, like cons, and others are slightly weird and become more useful on the subtype, like ndinsert).

Mario Carneiro (Sep 20 2018 at 13:18):

In the case of alist, there are some operations that don't work anymore, like cons, and others are slightly weird and have been prefixed with k for lists. cons is just not a valid alist operation, insert is.

Sean Leather (Sep 20 2018 at 13:21):

I think the comparison is apt, and applicable to alist.

So does this mean you agree with me, since my comparison is apt, which is to say that alist is not a useful structure?

Sean Leather (Sep 20 2018 at 13:23):

In the case of alist, there are some operations that don't work anymore, like cons, and others are slightly weird and have been prefixed with k for lists. cons is just not a valid alist operation, insert is.

This hints of disagreement. My point is that lists are constructed with nil and cons, and it is useful to have that construction, and therefore more useful to have a bare list with no duplicate keys than to have a structure hide that construction.

Mario Carneiro (Sep 20 2018 at 13:25):

In short: I think finset is useful, and I think alist is useful

Mario Carneiro (Sep 20 2018 at 13:26):

I don't see why alist should provide a nil/cons construction. It's not natural. If you want to do such a construction then you should unpack the alist first

Sean Leather (Sep 20 2018 at 13:26):

That's kind of my point. :wink:

Mario Carneiro (Sep 20 2018 at 13:26):

Instead it has empty/insert

Mario Carneiro (Sep 20 2018 at 13:27):

My point is, why are you doing list things on alists?

Sean Leather (Sep 20 2018 at 13:28):

My point is that I want to construct lists with no duplicate keys as well as use any other existing definitions and theorems on lists.

Mario Carneiro (Sep 20 2018 at 13:30):

then do so, on list. You don't need alist for that

Sean Leather (Sep 20 2018 at 13:30):

Quite right, which is why I don't think alist is necessary.

Mario Carneiro (Sep 20 2018 at 13:31):

alist provides a compositional structure for safety-preserving functions on lists

Mario Carneiro (Sep 20 2018 at 13:32):

as well as use any other existing definitions and theorems on lists.

Most list operations don't preserve the invariant. What do you do with these?

Sean Leather (Sep 20 2018 at 13:33):

Well, if you want it, add it. I'm only trying to save you work.

Mario Carneiro (Sep 20 2018 at 13:33):

This is an honest question. What is missing from the API of alist that you would want, that requires resorting to list?

Mario Carneiro (Sep 20 2018 at 13:35):

Except for union, there's nothing I've missed that comes to mind

Sean Leather (Sep 20 2018 at 13:36):

I'm not sure about alist, but I defined the minimum necessary to use lists with no duplicate keys. For example, has_mem and append don't need to be redefined. You only need some extra properties to use append, for example.

Mario Carneiro (Sep 20 2018 at 13:36):

To answer your question from earlier, alist should also be available as a tool for programmers. "I need a map, I only have equality" -> use alist or finmap

Mario Carneiro (Sep 20 2018 at 13:37):

Note that the has_mem on alist is different from the one on lists

Mario Carneiro (Sep 20 2018 at 13:37):

and append doesn't apply

Sean Leather (Sep 20 2018 at 13:37):

Which I don't think it should be.

Sean Leather (Sep 20 2018 at 13:37):

As you'll see in my library.

Mario Carneiro (Sep 20 2018 at 13:39):

I think in common programming parlance, x \in m where m is a dictionary / map / associative array, means x is a key contained in the map m, not x is a key-value pair in the map

Mario Carneiro (Sep 20 2018 at 13:40):

The idiom for the other interpretation is <x, y> \in m.val, where val is the operation sometimes called entries in other libraries

Mario Carneiro (Sep 20 2018 at 13:40):

(indeed, maybe it should be renamed to that)

Mario Carneiro (Sep 20 2018 at 13:45):

Note also that if it meant membership in the underlying list, then it wouldn't be decidable unless you assume decidability of the values, which is not needed for anything else

Sean Leather (Sep 20 2018 at 13:46):

Fair enough. With a : alist, you can use a.val/a.entries.

Mario Carneiro (Sep 20 2018 at 13:47):

right, I think this helps to view them as different things (a finite map vs a list of pairs)

Mario Carneiro (Sep 20 2018 at 13:48):

You pointed out that it is possible to define operations like append if you assume side conditions (i.e. the key sets are disjoint). Does this come up often?

Sean Leather (Sep 20 2018 at 13:49):

I don't know about “often,” but it has come up.

Sean Leather (Sep 20 2018 at 13:51):

I'm now using finmap instead of association lists directly, and you have the same thing with union there.

Sean Leather (Sep 20 2018 at 13:51):

I'm assuming you'll use has_union for alist instead of has_append?

Mario Carneiro (Sep 20 2018 at 13:51):

yes

Mario Carneiro (Sep 20 2018 at 13:52):

since it's union, not append :P

Mario Carneiro (Sep 20 2018 at 13:52):

it's not even possible to give an append instance that uses list.append

Sean Leather (Sep 20 2018 at 13:52):

Well, it's debatable that it's not a form of append.

Mario Carneiro (Sep 20 2018 at 13:52):

I prefer set terminology when duplicates are being dropped

Sean Leather (Sep 20 2018 at 13:53):

It's still appending. :slight_smile:

Sean Leather (Sep 20 2018 at 13:53):

When it comes to the finmap, you can forget that there is any appending. But with alist, you can't.

Mario Carneiro (Sep 20 2018 at 13:54):

Not sure what you mean. The underlying operation is not an append

Mario Carneiro (Sep 20 2018 at 13:55):

It is true that union would be order-respecting on alist

Mario Carneiro (Sep 20 2018 at 13:55):

but then again, so is union on list

Sean Leather (Sep 20 2018 at 13:56):

def append : list α  list α  list α
| []       l := l
| (h :: s) t := h :: (append s t)

def kunion : list (sigma β)  list (sigma β)  list (sigma β)
| []         l := l
| (hd :: tl) l := hd :: kunion tl (kerase hd.1 l)

Sean Leather (Sep 20 2018 at 13:56):

Are you doing it differently?

Mario Carneiro (Sep 20 2018 at 13:57):

I think the erase makes that a rather different kind of operation. But also I think we need two functions here - that is unionl

Mario Carneiro (Sep 20 2018 at 13:58):

unionr would be

def kunionr : list (sigma β) → list (sigma β) → list (sigma β)
| []         l := l
| (hd :: tl) l := kinsert hd.1 hd.2 (kunionr tl l)

Mario Carneiro (Sep 20 2018 at 13:58):

I'm not even sure if it should get a has_union notation

Sean Leather (Sep 20 2018 at 13:59):

I'm not sure if you need it. I don't foresee people asking for it.

Mario Carneiro (Sep 20 2018 at 14:00):

You need left and right preference. For finmap you can ignore it since the order of parameters gives the preference, but on alist you have order and preference separately

Sean Leather (Sep 20 2018 at 14:02):

Why do you need it? Just because it's possible doesn't make it a need.

Mario Carneiro (Sep 20 2018 at 14:03):

I'm okay with just assuming right preference (which is what list.union does), but you want left preference?

Sean Leather (Sep 20 2018 at 14:04):

I stuck with the Haskell Data.Map approach because it seemed to the most intuitive to me.

Mario Carneiro (Sep 20 2018 at 14:05):

Keep in mind that haskell has laziness to contend with, making your definition more natural for them

Sean Leather (Sep 20 2018 at 14:05):

Yep.

Sean Leather (Sep 20 2018 at 14:06):

is infixl, so could that also indicate left-biased is preferred?

Mario Carneiro (Sep 20 2018 at 14:06):

It seems like the difference between foldl and foldr

Mario Carneiro (Sep 20 2018 at 14:06):

and we have both of those

Mario Carneiro (Sep 20 2018 at 14:07):

I'd rather not commit to one camp here

Mario Carneiro (Sep 20 2018 at 14:08):

particularly since there are also performance characteristics to worry about, and the lean 4 compiler may change this as well

Sean Leather (Sep 20 2018 at 14:08):

You have to commit to one for at the very least.

Sean Leather (Sep 20 2018 at 14:09):

I guess if you're going with the list.union precedent, you would go with the right-biased option.

Mario Carneiro (Sep 20 2018 at 14:10):

I think alist shouldn't have a notation, finmap can use left preference by default since the API properties take precedence there

Mario Carneiro (Sep 20 2018 at 14:11):

I should look at other languages like java or python and see what preference is preferred

Sean Leather (Sep 20 2018 at 14:12):

I would consider Haskell, ML, OCaml, Coq, Agda, Idris, etc. before looking at Java or Python.

Mario Carneiro (Sep 20 2018 at 14:12):

hm, python has a mutation operation which makes the precedence obvious

Mario Carneiro (Sep 20 2018 at 14:14):

Same with java. OCaml takes a merge function that resolves differences

Mario Carneiro (Sep 20 2018 at 14:18):

Haskell has both union (left bias) and unionWith that takes a merge function

Mario Carneiro (Sep 20 2018 at 14:22):

I couldn't find anything for Coq. Idris uses merge left bias and mergeWith

Mario Carneiro (Sep 20 2018 at 14:22):

ML is roll your own dict

Sean Leather (Sep 21 2018 at 09:35):

I'm coming to realize that what would be better for me than a finmap union that updates by insert or erase is something like an internal disjoint union. At least that's what I've gathered it would be called. [1, 2] For finmap, perhaps the simplest definition would be:

def djunion (f g : finmap α β) : finmap α β := if disjoint f.keys g.keys then f  g else 

But a more efficient version would probably use list.append instead of list.kunion as does.

I'm only dealing with unions of disjoint finmaps, and it would simplify some things if I didn't have to track all of the disjoint f.keys g.keys.

Sean Leather (Sep 21 2018 at 09:57):

Likewise, a disjoint insert would be good. Conceptually:

def djinsert (a : α) (b : β a) (f : finmap α β) : finmap α β := if a  f.keys then insert a b f else 

But cons would be better than kinsert.

Sean Leather (Oct 08 2018 at 14:16):

@Mario Carneiro What is your plan with https://github.com/leanprover-community/mathlib/tree/finmap ? I'd like to see something get into mathlib. However, given the somewhat large difference between your rewrite and my library, I'm a bit reluctant at this point to invest the time to port more definitions and theorems over to that branch without knowing what's going to happen to it. Do you plan to continue working on that branch?

Mario Carneiro (Oct 08 2018 at 14:25):

would you like me to merge it?

Sean Leather (Oct 08 2018 at 14:27):

would you like me to merge it?

“It” being the leanprover-community/mathlib branch? Sure, if you're happy with it in its current state. I would feel more comfortable extending it at that point.

Mario Carneiro (Oct 08 2018 at 14:28):

I noticed you renamed some things. Not a fan of knodup?

Sean Leather (Oct 08 2018 at 14:29):

I noticed you renamed some things. Not a fan of knodup?

:slight_smile: No. I think nodupkeys is better.

Mario Carneiro (Oct 08 2018 at 14:31):

when I look at it I think "nine letters is long for a name segment"

Mario Carneiro (Oct 08 2018 at 14:31):

You could have nodup_keys but then it confuses with nodup of keys

Mario Carneiro (Oct 08 2018 at 14:31):

what about nodup_fst?

Sean Leather (Oct 08 2018 at 14:31):

You could have nodup_keys but then it confuses with nodup of keys

Exactly.

Sean Leather (Oct 08 2018 at 14:32):

what about nodup_fst?

How is that better than nodupkeys since nodup and keys already exist?

Sean Leather (Oct 08 2018 at 14:32):

... exist in the association list/finmap API that is.

Mario Carneiro (Oct 08 2018 at 14:33):

I mean for the list version, before the "map" interpretation

Sean Leather (Oct 08 2018 at 14:37):

I'd prefer nodupkeys for both for consistency's sake. I think the name is clear in that it links nodup and keys and as short as it's needed to be.

Sean Leather (Oct 08 2018 at 14:38):

Anyway, I have to go now. If you want to discuss any other issues, I'll pick it up tomorrow.


Last updated: Dec 20 2023 at 11:08 UTC