Zulip Chat Archive

Stream: new members

Topic: Defining a list type with a no duplicates constraint


Sam (Aug 10 2021 at 16:28):

I'm a programmer without a strong mathematical background having fun playing around in Lean, but I'm a complete beginner at this. I've completed up to proposition world in the natural numbers game and read through most of the documentation but it's not all intuitive for me yet.

I'm interested in proving some simple theorems on a type I'll call unique_array. This is a fixed length ordered list in which duplicates are not allowed. I'm particularly interested in using lemmas surrounding rotation of elements, as well as swaps.

The list type exists and has rotation defined, but does not have the fixed length or uniqueness constraints. The array type has the fixed length constraint but not uniqueness, and does not have rotation defined. The set type has uniqueness but nothing else that I need.

Is there a way to create a unique_array type that combines the features of the above? For example, create an alias of the list type with the additional no duplicates and fixed length constraints? I'd like to make use of the many lemmas already defined for list. I should then be able to define additional lemmas like "for all unique_array a with length > 1, (a.rotate 1) != a" which relies on uniqueness.

I don't know how to go about defining an alias or derivation with additional constraints, nor how to go about constraining uniqueness. Perhaps there's something that I can do with structure inheritance, but I've not got my head around it well enough to work this out yet.

Some pointers would be welcome please! Thanks.

Mario Carneiro (Aug 10 2021 at 16:32):

You should look at the definition of src#finset, which does basically this, but with multiset as the base instead of list or array

Sam (Aug 10 2021 at 16:33):

That's interesting. I'll take a look, thanks!

Mario Carneiro (Aug 10 2021 at 16:34):

That is, I think you are describing a type like this:

structure unique_array (α : Type*) (n : ) :=
(val : list α)
(nodup : nodup val)
(len : val.length = n)

Mario Carneiro (Aug 10 2021 at 16:35):

Note that you will still have to reprove all the theorems, because they don't all carry over, although most of them are just redirecting to the corresponding lemma on the representing type. See the rest of that finset file for examples

Yakov Pechersky (Aug 10 2021 at 16:36):

Can one subtype on vector?

Mario Carneiro (Aug 10 2021 at 16:37):

that's also an option, although vector doesn't have nodup so you would have to write val.1.nodup for the other field

Yakov Pechersky (Aug 10 2021 at 16:37):

But if you're thinking about swaps and rotation of elements over a type with a fixed number of elements, just work on functions over fin n. That way you can use equiv.perm (fin n), function.injective, etc

Mario Carneiro (Aug 10 2021 at 16:38):

If the base type has size n, you can just use permutations instead of defining a new type

Sam (Aug 10 2021 at 16:40):

Ah. I had considered a composition approach like this but had dismissed it because it doesn't inherit all of the theorems already proved on list. There's a lot of stuff already defined on list that looks useful to me. Intuitively, a fixed length list with a constraint on the elements is a subtype of list and I hoped to exploit that without the busywork of re-implementing the theorems.

Mario Carneiro (Aug 10 2021 at 16:40):

it depends on what you want. perm and equiv also have a lot of theorems

Mario Carneiro (Aug 10 2021 at 16:41):

but indexing is a little annoying on lists

Sam (Aug 10 2021 at 16:44):

I'm looking at those now. I am actually dealing with permutations quite a bit so that sounds useful. I'd only come across list.perm and hadn't realised that perm is defined as its own type in the group theory namespace.

Sam (Aug 10 2021 at 16:45):

The catch is that when I said that I don't have a strong mathematical background I meant it. I'm not actually familiar with group theory yet, so I'll have to see if that's an obstacle or just an opportunity to learn.

Sam (Aug 10 2021 at 17:13):

Is there a way to accomplish something like this using subtype? The Inductive Types documentation mentions it briefly but I've not fully got my head around it yet. Does this represent the type of a list of naturals with no duplicates?

#check { x : list nat // list.nodup x}

Mario Carneiro (Aug 10 2021 at 17:16):

yes, that's right

Mario Carneiro (Aug 10 2021 at 17:17):

Another way to write the type from above is:

def unique_array (α : Type*) (n : ) := { val : list α // nodup val  val.length = n }

Sam (Aug 10 2021 at 17:30):

Ah, that looks very promising, thanks!

I'm a little surprised that that still seems to be creating a new type with a constrained list as a field. unique_array itself is not a list here, right? I'm a little confused by why this is called subtype, because I would think of this result as being a composed type rather than a subtype.

Eric Wieser (Aug 10 2021 at 17:49):

It's a subtype because it's the elements of list α which satisfy nodup val ∧ val.length = n - that makes it a "smaller" type with fewer elements (unless the condition is just true!)

Mario Carneiro (Aug 10 2021 at 17:51):

But from a syntactic point of view, it's true that an element of unique_array is not a list, it is a tuple of a list and two proofs. The proofs are erased, so in the VM it is represented as just a list

Sam (Aug 10 2021 at 17:51):

My confusion stems from the fact that it looks like a structure with a field called val of that subtype, rather than the subtype itself.

Mario Carneiro (Aug 10 2021 at 17:52):

what is "it"?

Mario Carneiro (Aug 10 2021 at 17:52):

the def or the structure example?

Sam (Aug 10 2021 at 17:53):

The unique_array type def. You're right, I shouldn't have called it a "structure" I think.

Mario Carneiro (Aug 10 2021 at 17:54):

That is a typedef for a certain instantiation of the subtype type, which is defined as a structure with two fields (the first of which is actually called val, unrelated to the fact that I also called it val in the binder syntax)

Mario Carneiro (Aug 10 2021 at 17:55):

so your "confusion" is actually the correct interpretation of the situation, this is a structure with two things

Mario Carneiro (Aug 10 2021 at 17:55):

the reason it is "smaller" is because the second thing is sometimes only a singleton and sometimes impossible

Sam (Aug 10 2021 at 17:55):

Oh! I thought that val was the field name you defined in the def.

Mario Carneiro (Aug 10 2021 at 17:56):

If I wrote

def unique_array (α : Type*) (n : ) := { a : list α // nodup a  a.length = n }

it would still define the same type and the field projection to get the list would still be called subtype.val

Eric Wieser (Aug 10 2021 at 17:57):

src#subtype might clear things up. There's nothing magic going on other than perhaps the notation

Sam (Aug 10 2021 at 17:57):

I've got the following very simple test that I'm playing with to understand this. I've just found that same thing about using a different name.

def unique_array (α : Type*) (n : ) := { bar : list α // list.nodup bar  bar.length = n }

lemma foo (a : unique_array nat 10) : a.val.length = 10 :=
begin
  cases a,
  tauto,
end

Eric Wieser (Aug 10 2021 at 17:57):

The shorter proof is a.prop.2 (docs#subtype.prop)

Mario Carneiro (Aug 10 2021 at 17:58):

or a.2.2 since you can refer to structure fields either by name or index

Sam (Aug 10 2021 at 17:58):

Oh very cool!

Mario Carneiro (Aug 10 2021 at 17:59):

with the structure def it would be either a.3 or a.len

Sam (Aug 10 2021 at 18:06):

To clarify, what subtype and the def actually create is a tuple, containing a value and a constraint upon that value?

What I intuitively expected was for the result of subtype to itself actually be a constrainedlist, rather than a grouping of a plain list and a constraint upon it. This is fine, it's just not what I expected from the name of the feature.

Mario Carneiro (Aug 10 2021 at 18:10):

In lean, a value can only have one type up to definitional equality. So if val : list A is a list with no duplicates, it still has type list A, it is not possible to prove that it has a different, more constrained type. This is important for type inference, among other things. If you want to show that it has a more constrained type, what you actually do is construct a new value that bundles val with its proof, and that's what subtype does. There are coercions to make this process fairly transparent in most cases

Sam (Aug 10 2021 at 18:11):

I see, thanks! I think I'm getting the hang of this now.

Sam (Aug 10 2021 at 18:17):

One thing that seems a little inconvenient about this is the lack of dot calls. Is there a way to make a.length etc automatically forward to a.val.length?

Yakov Pechersky (Aug 10 2021 at 18:28):

On the subtype? Not automatically. Dot notation l.length is just shorthand for list.length l. Since the subtype isn't of the type list but of subtype ..., there isn't a way to infer that a function subtype.length {x : list _ / ...} ...

Sam (Aug 10 2021 at 18:33):

I see. I obviously don't know what I'm talking about yet, but I wonder whether that would make sense to add in the future. It seems like it might be convenient if it could forward to val when unambiguous, but maybe there's a good reason for not doing that.

Eric Wieser (Aug 10 2021 at 18:57):

Note that you can still write list.length a and the coercion will kick in

Mario Carneiro (Aug 10 2021 at 19:11):

Also, this forwarding does happen for structure extends:

structure foo (A : Type*) := (val : A)

structure bar (A : Type*) extends foo A :=
(stuff : true)

variable (x : bar nat)
#check x.val -- x.to_foo.val

You can't make a structure extend a non-structure though, so it doesn't work for list

Sam (Aug 10 2021 at 19:20):

Ah yes I see. I think that difference between structure extends and subtype was part of where my original confusion came from.

Sam (Aug 10 2021 at 19:22):

Extended structures feel like my intuitive notion of subtyping more than subtype actually does, but that's only a matter of naming and probably stems from me thinking in more typical programming terms rather than type theory terms.

Sam (Aug 10 2021 at 19:22):

Extended structures feel like my intuitive notion of subtyping more than subtype actually does, but that's only a matter of naming and probably stems from me thinking in more typical programming terms rather than type theory terms.

Mario Carneiro (Aug 10 2021 at 19:31):

Keep in mind that extends is also just sugar for an extra field like this:

structure bar (A : Type*) :=
(to_foo : foo A)
(stuff : true)

Mario Carneiro (Aug 10 2021 at 19:32):

combined with a bit of added magic like the dot forwarding

Sam (Aug 11 2021 at 22:00):

Thanks for all of the help yesterday guys! I said in my post that

I should then be able to define additional lemmas like "for all unique_array a with length > 1, (a.rotate 1) != a"

Formally (and slightly more generally for all rotation distances) that's

theorem nodup_rotate (α : Type*) (l : list α) (hl₁ : l.length > 1) (hl₂ : l.nodup) (n : nat)
: ((n % l.length) = 0)  l = (l.rotate n)

I've now proved that theorem, which is my first Lean proof at a difficulty level higher than the Natural Numbers Game. It was quite a slog and it took longer than it should have with lots of things to learn along the way, but I got there. In the end I proved this on plain list not the unique_array type - I can use this proof to trivially prove it for unique_array later if I want that.

The proof ended up being quite long and complicated, and I expect that I've done lots of things very inefficiently. Any feedback on how I could improve it would be very welcome! I've tried to name things well but in some places readability is quite poor. I have a fear that somebody may come along and do it in one line...

One thing that I've found is that it's actually quite annoyingly slow to evaluate in VSCode - like 10 seconds. Are there some tricks to make compilation faster?

import data.list.basic
import data.list.rotate
import data.list.nodup
import tactic.linarith


theorem nodup_rotate (α : Type*) (l : list α) (hl₁ : l.length > 1) (hl₂ : l.nodup) (n : nat)
  : ((n % l.length) = 0)  l = (l.rotate n) :=
begin
  norm_cast at *,

  -- Break up the iff into parts
  fconstructor,

  -- Show that n % l.length = 0 → l = l.rotate n
  {
    intro h,
    have : (l.rotate n) = l.rotate (n % l.length) := (list.rotate_mod l n).symm,
    rw this,
    rw h,
    exact (list.rotate_zero l).symm,
  },

  -- Show that l = l.rotate n → n % l.length = 0
  -- Prove by contradiction, showing that n % l.length ≠ 0 → l ≠ l.rotate n
  {
    intro h₁,
    by_contradiction h₂,

    -- We have two different indices
    let k₁ := 0,
    let k₂ := (k₁ + n) % l.length,
    have h_k₁_ne_k₂ : k₁  k₂ := by finish,

    -- Utility hypothesese required for list indexing - indices must be in range
    have h_k₁_le_len : k₁ < l.length := pos_of_gt hl₁,
    have h_k₂_le_len : k₂ < l.length := (k₁ + n).mod_lt h_k₁_le_len,
    have h_k₁_le_rlen : k₁ < (l.rotate n).length := by finish,

    -- Show that the two different indices have the same value
    have h_vk₁_eq_vk₂: (l.nth_le k₁ h_k₁_le_len) = (l.nth_le k₂ h_k₂_le_len) :=
    begin
      have : (l.rotate n).nth_le k₁ h_k₁_le_rlen = l.nth_le k₂ h_k₂_le_len := list.nth_le_rotate l n k₁ h_k₁_le_rlen,
      finish,
    end,

    -- Different indices having the same value implies that the list is not nodup
    -- Our list is nodup by definition
    -- This is our contradiction
    have : ¬l.nodup := list.nth_le_eq_of_ne_imp_not_nodup l k₁ k₂ h_k₁_le_len h_k₂_le_len h_vk₁_eq_vk₂ h_k₁_ne_k₂,
    finish,
  }
end

Thanks again for all of the help!

Eric Wieser (Aug 11 2021 at 22:10):

Regarding the speed, finish is often a slow tactic

Yakov Pechersky (Aug 11 2021 at 22:11):

src#list.rotate_eq_iff

Yakov Pechersky (Aug 11 2021 at 22:11):

That's how that lemma is proven in more generality in mathlib (by yours truly)

Eric Wieser (Aug 11 2021 at 22:13):

That doesn't look like a comparable statement to me

Yakov Pechersky (Aug 11 2021 at 22:13):

Your norm_cast isn't doing much. Instead of fconstructor, which is a very low-level tactic, it is more customary to use split.

Yakov Pechersky (Aug 11 2021 at 22:14):

Ah I must have misread it. I see.

Eric Wieser (Aug 11 2021 at 22:15):

I don't think the l.length > 1 hypothesis is needed for it to be true?

Sam (Aug 11 2021 at 22:41):

Oh I think that you're right about l.length > 1 not being needed! That's a holdover from a previous version of a slightly different lemma. I do use it in the proof to verify that the indices are in range, but I suppose I should just special case length 0 and length 1.

Sam (Aug 11 2021 at 22:42):

finish being slow is interesting. Those mostly appeared out of me lazily doing hint so I'll try to replace those with something faster.

Sam (Aug 11 2021 at 22:45):

I'm quite surprised that nobody said that anything more drastic was wrong! Would this be considered a reasonably proof? If hypothetically this theorem were being put into mathlib, what would need to change to bring it up to standard?

Sam (Aug 11 2021 at 22:46):

I don't know much about the policy for what gets added to mathlib. I just did this for fun and learning, but is it something that it might be worth turning into a PR or would mathlib not want it?

Sam (Aug 11 2021 at 22:48):

I'll look at that fconstructor / split thing now! Thanks! I got fconstructor from hint, which didn't offer me split. Is there a reason for that?

Sam (Aug 11 2021 at 23:34):

I've been able to turn the final finish into a nice exact h₂ (false.rec (n % list.length l = 0) (this hl₂)), by first using solve_by_elim instead, then using show_term and inlining it.

The other three finishs are giving me a bit more trouble, and show_term on them gives me some 20 line monstrosity. I'm leaving them as finish for now, but if anyone has any tips on how to tidy them up into something that runs faster without being verbose or ugly I'd appreciate it!

Yakov Pechersky (Aug 12 2021 at 04:07):

Here's how I would phrase this lemma:

import data.list.rotate

namespace list

theorem nodup.rotate_eq_self_iff {α : Type*} {l : list α} (hl : l.nodup) (n : ) :
  l.rotate n = l  n % l.length = 0  l = [] :=
begin
  split,
  { intro h,
    cases l.length.zero_le.eq_or_lt with hl' hl',
    { simp [length_eq_zero, hl'] },
    left,
    rw nodup_iff_nth_le_inj at hl,
    refine hl _ _ (nat.mod_lt _ hl') hl' _,
    rw nth_le_rotate' _ n,
    simp_rw [h, nat.sub_add_cancel (nat.mod_lt _ hl').le, nat.mod_self] },
  { rintro (h|h),
    { rw [rotate_mod, h],
      exact (rotate_zero l) },
    { simp [h] } }
end

end list

Sam (Aug 12 2021 at 06:54):

Wow thank you @Yakov Pechersky! That is so much cleaner than mine! I won't be able to properly study what you've done until later, but this is an incredibly valuable learning resource.

Sam (Aug 12 2021 at 06:56):

In runs in a fraction of a second compared to 10 seconds for mine, too. I guess those tactics really were doing me in!


Last updated: Dec 20 2023 at 11:08 UTC