Zulip Chat Archive

Stream: Program verification

Topic: Lists Without Duplicates

Henning Dieterichs (Dec 30 2020 at 20:24):

I want to state that (f x) ++ (g x): List α has no duplicates and that ((f x) ++ (g x).to_finset = h x (maybe there is some notion of a disjoint_finset?).
Later, I would like to argue that if g x = y::ys, then y \not \in ys ++ f x.
What is the best way to formalize that? I thought about stating (f x ++ g x).length = (h x).card. Is there a better way?

Kevin Buzzard (Dec 30 2020 at 20:27):

This is a question about lists, right? Nothing to do with finsets. The way you prove the theorem will depend on how you are saying that L++M has no dupes.

Kevin Buzzard (Dec 30 2020 at 20:30):

Then you just need theorems like nodupes(L++M) iff nodupes(M++L) and nodupes(a::L) -> a not in L.

Henning Dieterichs (Dec 30 2020 at 20:30):

I would like to optimize the formalization for later arguments, not for provability of the theorem itself (I guess it will be easy to prove, but I have to use this argument many times). Well, it is about lists, but there is a connection to finsets. h x is a finset, but I need the order in f x.

Kevin Buzzard (Dec 30 2020 at 20:31):

Sure but h x never occurred in your question

Yakov Pechersky (Dec 30 2020 at 20:31):

Use can use nodup for the first constraint.

Yakov Pechersky (Dec 30 2020 at 20:32):

And there a lot of lemmas about how nodup interfaces with append, concatenate, and mem.

Kevin Buzzard (Dec 30 2020 at 20:32):

Great, if there's a docs#list.nodup that would be perfect

Henning Dieterichs (Dec 30 2020 at 20:33):

oh, nice thanks! Sorry that I missed that.

Kevin Buzzard (Dec 30 2020 at 20:35):

You could look for the lemmas I'm suggesting with library_search and then prove nodup (L++(m::M)) implies m isn't in M++L, although I guess I'm a bit surprised that you have swapped L and M.

Yakov Pechersky (Dec 30 2020 at 20:38):

I haven't had good luck finding the right mem lemmas via library_search, but have found them via suggest.

Henning Dieterichs (Dec 30 2020 at 20:46):

In my case, library_searched worked great!

Henning Dieterichs (Dec 30 2020 at 20:48):

I guess this is the difference between math and program verification:
image.png :D

Kevin Buzzard (Dec 30 2020 at 20:53):

We get really long goals like that, it's just that we don't notice they're long because everything is folded up into typeclasses.

Henning Dieterichs (Dec 30 2020 at 20:57):

How do you stay concentrated when looking at such goals? I tend to get lost in the sheer amount of identifiers. If I'm not very careful, I lose the idea of the proof and it gets very ugly.

Marc Huisinga (Dec 30 2020 at 21:33):

@Henning Dieterichs i ran into a similar problem during my bsc thesis (which was also cs-oriented), also struggling to keep up with the goals.
i found that two things help:

when defining some inductive/definition, start by proving "obvious" facts about the inductive/definition, regardless of whether you think you will need it down the line.
work through longer proofs with pen and paper and denote intuitive subgoals in your paper proof as lemmas, proving them separately. then, write out the big proof in dependence of all these sorry'd lemmas.

your pen and paper proofs should be precise so that you don't need to do so much work for the formalized proof.
in my experience, without such a pen and paper proof, you can get lost very easily, especially when you start applying theorems and simplification like a human pattern matcher without looking at the bigger picture.
meanwhile, the first bullet point means that you won't have to unfold as many layers, leading to more concise goals.

Marc Huisinga (Dec 30 2020 at 21:34):

in the end, i didn't follow those tips nearly well enough myself, which is part of the reason why i couldn't conclude the proof in its entirety.

Henning Dieterichs (Dec 30 2020 at 21:43):

thanks for your advice! Except for the pen&paper proof, I'm already trying to do that. I'm very optimistic to be able to close all my goals, but I'm sure many of my proofs are very bloated and look like brute force.

Yakov Pechersky (Dec 30 2020 at 21:43):

Also, let ... in ... don't simplify cleanly often, so I just avoid using them in definitions often

Yakov Pechersky (Dec 30 2020 at 21:45):

Here, you have a big goal because it's not simplifying the defeq structure record accession

Reid Barton (Dec 30 2020 at 23:00):

This screenshot looks like either not enough abstraction (e.g. some append operation on whatever the structures are) or too much unfolding rather than applying lemmas

Eric Wieser (Dec 31 2020 at 09:56):

You have a lot of subexpressions there of the form {field1 := f, ...}.field1, which simplify to f (perhaps via dsimp only)

Henning Dieterichs (Dec 31 2020 at 15:21):

I struggled to solve this goal with the definition of nodup:

disjoint: disjoint ant_tr1.leaves ant_tr2.leaves,
ant_ih_tr1 :
  ((R ant_tr1).acc ++ (R ant_tr1).inacc ++ (R ant_tr1).red).to_finset = ant_tr1.leaves ∧
    ((R ant_tr1).acc ++ (R ant_tr1).inacc ++ (R ant_tr1).red).nodup,
ant_ih_tr2 :
  ((R ant_tr2).acc ++ (R ant_tr2).inacc ++ (R ant_tr2).red).to_finset = ant_tr2.leaves ∧
    ((R ant_tr2).acc ++ (R ant_tr2).inacc ++ (R ant_tr2).red).nodup
⊢ ((R ant_tr1).acc ++ (R ant_tr2).acc ++ ((R ant_tr1).inacc ++ (R ant_tr2).inacc) ++
     ((R ant_tr1).red ++ (R ant_tr2).red)).nodup

There is list.nodup_append_comm, but the simplifier couldn't make use of it. Manually applying the lemmas to rearrange the terms was too hard for me.

This custom theorem helped a lot: theorem nodup_iff_card { α: Type } [decidable_eq α] (a: list α): a.nodup ↔ a.length = a.to_finset.card. Maybe it's a good candidate for mathlib? There is list.to_finset_card_of_nodup, but it is one direction only.
It was much easier to show (a ++ b ++ c ++ ...).length = (a ++ b ++ c ++ ...).to_finset.card, since nat.add and finset.union are commutative and the simplifier can solve goals where only assoc and comm rules are needed.

Henning Dieterichs (Dec 31 2020 at 15:21):

@Eric Wieser dsimp only helps! I just used unfold before.

Henning Dieterichs (Dec 31 2020 at 15:29):

Reid Barton said:

This screenshot looks like either not enough abstraction (e.g. some append operation on whatever the structures are) or too much unfolding rather than applying lemmas

I felt the same (not only here but for many of my proofs). However, I have no clue how to improve that. At that point, I didn't unfold much and I don't know what new lemmas could help me to get more insight.

This is an except of what I'm working on:

structure LeafPartition := mk :: (acc : list Leaf) (inacc : list Leaf) (red : list Leaf)

inductive Ant (α: Type)
| leaf (a: α) (leaf: Leaf): Ant
| branch (tr1: Ant) (tr2: Ant): Ant
| diverge (a: α) (tr: Ant): Ant

def R : Ant bool → LeafPartition
| (Ant.leaf can_prove_empty n) := if can_prove_empty then ⟨ [], [], [n] ⟩ else ⟨ [n], [], [] ⟩
| (Ant.diverge can_prove_empty tr) :=
    match R tr, can_prove_empty with
    | ⟨ [], [], m :: ms ⟩, ff := ⟨ [], [m], ms ⟩
    | r, _ := r
    end
| (Ant.branch tr1 tr2) :=
    let r1 := R tr1, r2 := R tr2 in
        ⟨ r1.acc ++ r2.acc, r1.inacc ++ r2.inacc, r1.red ++ r2.red ⟩

I want to prove that the result is indeed a partition of all leaves, if the leaves of every branch in ant are disjoint.
The goal above is for the branch case.

Yakov Pechersky (Dec 31 2020 at 15:30):

What does using finset give you? That two lists are the same if you remove duplicates and forget about order?

Yakov Pechersky (Dec 31 2020 at 15:38):

What's Leaf defined as?

Henning Dieterichs (Dec 31 2020 at 15:38):

Leaf is an arbitrary Type

Yakov Pechersky (Dec 31 2020 at 15:40):

The code above currently gives me

don't know how to synthesize placeholder
context:
⊢ Ant bool → Type

in the definition of R

Henning Dieterichs (Dec 31 2020 at 15:41):

Sorry, it is not a MWE... Creating a MWE would be hard. The code was just to give context.
Maybe adding [decidable_eq Leaf] to R fixes this, I have this globally.

Henning Dieterichs (Dec 31 2020 at 15:41):

What does using finset give you? That two lists are the same if you remove duplicates and forget about order?

I know that as ++ bs ++ cs ++ ds has no duplicates. I want to show that ds ++ bs ++ cs ++ as has no duplicates (it's just a permutation - that should be trivial to show).
However, this is difficult, as I cannot automatically rearrange the terms - after all, permutating the lists changes them. (This is a step of my proof in the branch case)

BUT, it is easy to prove with lean that (ds ++ bs ++ cs ++ as).length = (as ++ bs ++ cs ++ ds).length and that (ds ++ bs ++ cs ++ as).to_finset = (as ++ bs ++ cs ++ ds).to_finset! (Since length and to_finset is some kind of homomorphism into a commutative monoid)

Yakov Pechersky (Dec 31 2020 at 15:45):

docs#list.perm.nodup_iff

Henning Dieterichs (Dec 31 2020 at 15:47):

How difficult would it be to convince lean that as ++ bs ++ cs ++ ds is a permutation of ds ++ bs ++ cs ++ as?

example { α: Type } (as bs cs ds: list α): (as ++ bs ++ cs ++ ds) ~ (ds ++ bs ++ cs ++ as) :=
begin
    sorry,
end

Yakov Pechersky (Dec 31 2020 at 15:48):

One min, checking that

Henning Dieterichs (Dec 31 2020 at 15:50):

I think lean's automation really likes equalities. In my limited experience, definitions that are made out of equalities are much easier to reason about than definitions that involve recursively defined predicates

Eric Wieser (Dec 31 2020 at 16:06):

I assume you've already found docs#list.perm_append_comm

Henning Dieterichs (Dec 31 2020 at 16:07):

there is also comm for nodup directly, but is hard to apply manually

Henning Dieterichs (Dec 31 2020 at 16:12):

I think the key to this kind of problem is that

a~a' -> b ~ b' -> (a ++ b) ~ (a' ++ b')

(otherwise you could not really apply perm_append_comm recursively)
...which states that ++ is well-defined on the equivalence class of permutations. But this just yields union on multisets.
And if you remove duplicates, you get finsets, so maybe it might be easier to work with finsets directly?

Eric Wieser (Dec 31 2020 at 16:12):

Working with finsets directly is definitely the way to go here

Eric Wieser (Dec 31 2020 at 16:13):

That lemma is docs#list.perm.append

Yakov Pechersky (Dec 31 2020 at 16:15):

Yeah but that lemma won't work because of how the terms are associated.

Yakov Pechersky (Dec 31 2020 at 16:15):

and there's a missing @[trans] lemma for perms.

Kevin Buzzard (Dec 31 2020 at 16:18):

I would just use the comm and assoc lemmas and swap stuff by hand. You could even train the simplifier to do this

Eric Wieser (Dec 31 2020 at 16:19):

Here's a full proof using those:

example { α: Type } (as bs cs ds: list α) : (as ++ bs ++ cs ++ ds) ~ (ds ++ bs ++ cs ++ as) :=
calc (as ++ bs ++ cs ++ ds) ~ as ++ (bs ++ cs ++ ds) : by simp only [list.append_assoc]
                        ... ~ (bs ++ cs ++ ds) ++ as : list.perm_append_comm
                        ... ~ (bs ++ (cs ++ ds)) ++ as : by simp only [list.append_assoc]
                        ... ~ (bs ++ (ds ++ cs)) ++ as : list.perm.append_right _ $ list.perm.append_left _ list.perm_append_comm
                        ... ~ (bs ++ ds ++ cs) ++ as : by simp only [list.append_assoc]
                        ... ~ (ds ++ bs ++ cs ++ as) : list.perm.append_right _ $ list.perm.append_right _ list.perm_append_comm

Kevin Buzzard (Dec 31 2020 at 16:19):

Now get the simplifier to do it!

Eric Wieser (Dec 31 2020 at 16:20):

But the simplifier can't solve associativity / commutativity goals?

Henning Dieterichs (Dec 31 2020 at 16:20):

In my case though, I have 6 variables (⟨ r1.acc ++ r2.acc, r1.inacc ++ r2.inacc, r1.red ++ r2.red ⟩) :D So without automation it gets really hard

Eric Wieser (Dec 31 2020 at 16:20):

That's what tactic#ac_refl is for, right?

Eric Wieser (Dec 31 2020 at 16:20):

Except that only works for equality

Kevin Buzzard (Dec 31 2020 at 16:20):

I would definitely try the simplifier.

Yakov Pechersky (Dec 31 2020 at 16:20):

It's just key to prove

lemma perm_foldl_append_iff {b : list α} {l l' : list (list α)} :
  foldl append b l ~ foldl append b l' ↔ l ~ l' :=

Eric Wieser (Dec 31 2020 at 16:20):

You can take the quotients of lists under permutations to turn your ~ into an =

Eric Wieser (Dec 31 2020 at 16:20):

But then you have multisets

Kevin Buzzard (Dec 31 2020 at 16:21):

You tell it comm and assoc and left_comm and it will then just put things into its favourite order. I don't know if it'll work but it's worth a try.

Kevin Buzzard (Dec 31 2020 at 16:23):

Failing that, abel should be generalised to all operations, not just addition, and perhaps it should do this too

Marc Huisinga (Dec 31 2020 at 16:25):

not sure if relevant, but i've found docs#list.perm_iff_count very useful for working with perm

Eric Wieser (Dec 31 2020 at 16:27):

Here's how to use abel:

example { α: Type } (as bs cs ds: list α) : (as ++ bs ++ cs ++ ds) ~ (ds ++ bs ++ cs ++ as) :=
begin
  rw ←multiset.coe_eq_coe,
  simp only [←multiset.coe_add],
  abel,
end

Eric Wieser (Dec 31 2020 at 16:30):

I wonder if docs#multiset.coe_add simplifies in the wrong direction

Eric Wieser (Dec 31 2020 at 16:30):

It seems like it's much easier to work with multisets than permutations of lists

Mario Carneiro (Dec 31 2020 at 16:36):

example {α} (as bs cs ds : list α) : (as ++ bs ++ cs ++ ds).nodup ↔ (ds ++ bs ++ cs ++ as).nodup :=
begin
  apply list.perm.nodup_iff,
  rw ← multiset.coe_eq_coe,
  simp only [← multiset.coe_add, add_left_comm, add_comm],
end

Mario Carneiro (Dec 31 2020 at 16:36):

oops, too late

Mario Carneiro (Dec 31 2020 at 16:39):

I don't think coe_add is the wrong direction. Usually you would use the simplifier on a multiset goal after rcases to make it about lists and it will turn everything into lists

Mario Carneiro (Dec 31 2020 at 16:40):

this problem is a little unusual in that you want to add an additional wrapping layer rather than remove one

Henning Dieterichs (Dec 31 2020 at 16:41):

Mario Carneiro said:

example {α} (as bs cs ds : list α) : (as ++ bs ++ cs ++ ds).nodup ↔ (ds ++ bs ++ cs ++ as).nodup :=
begin
  apply list.perm.nodup_iff,
  rw ← multiset.coe_eq_coe,
  simp only [← multiset.coe_add, add_left_comm, add_comm],
end

Which is almost the same as this:

theorem nodup_iff_card { α: Type } [decidable_eq α] (a: list α): a.nodup ↔ a.length = a.to_finset.card :=
begin
    sorry,
end

example {α} [decidable_eq α] (as bs cs ds : list α) : (as ++ bs ++ cs ++ ds).nodup ↔ (ds ++ bs ++ cs ++ as).nodup :=
begin
  rw nodup_iff_card,
  rw nodup_iff_card,
  simp [finset.union_comm, finset.union_left_comm, nat.add_comm, nat.add_left_comm],
end

Mario Carneiro (Dec 31 2020 at 16:43):

except it has one fewer sorry :)

Henning Dieterichs (Dec 31 2020 at 16:45):

I'm calling it a day now. Thank you for all your help in 2020 and see you in 2021! I wish you a happy new year ;)

Kevin Buzzard (Dec 31 2020 at 16:50):

You won't be doing Lean to summon in the new year??

Henning Dieterichs (Dec 31 2020 at 16:50):

sorry,

Yakov Pechersky (Dec 31 2020 at 17:28):

@Henning Dieterichs
These are the sorts of definitional lemmas I would make:

import data.list.basic

variables {Leaf : Type} [decidable_eq Leaf]

structure LeafPartition (α : Type) :=
(acc : list α := [])
(inacc : list α := [])
(red : list α := [])

inductive Ant (α β : Type)
| leaf (a : α) (β : Leaf) : Ant
| branch (tr1 : Ant) (tr2 : Ant) : Ant
| diverge (a : α) (tr : Ant) : Ant

open Ant

def R : Ant bool Leaf → LeafPartition Leaf
| (leaf bb l) := ite bb {red := [l], ..} {acc := [l], ..}
| (diverge bb tr) := ite bb (R tr) (
    match R tr with
      | ⟨[], [], m :: ms⟩ := {inacc := [m], red := ms, ..}
      | _ := R tr
    end)
| (branch x y) := {
    acc := (R x).acc ++ (R y).acc,
    inacc := (R x).inacc ++ (R y).inacc,
    red := (R x).red ++ (R y).red
  }

lemma R_leaf_true (l : Leaf) : R (leaf tt l) = {red := [l], ..} := rfl
lemma R_leaf_false (l : Leaf) : R (leaf ff l) = {acc := [l], ..} := rfl
lemma R_branch (x y : Ant bool Leaf) : R (branch x y) =
  { acc := (R x).acc ++ (R y).acc, inacc := (R x).inacc ++ (R y).inacc,
    red := (R x).red ++ (R y).red } := rfl
lemma R_diverge_true (tr : Ant bool Leaf) : R (diverge tt tr) = R tr := rfl
lemma R_diverge_false_cons {tr : Ant bool Leaf} {m : Leaf} {ms : list Leaf}
  (h : R tr = {red := m :: ms, ..}) :
  R (diverge ff tr) = {inacc := [m], red := ms, ..} :=
by simp [R, h]
lemma R_diverge_false_not {tr : Ant bool Leaf} {m : Leaf} {ms : list Leaf}
  (h : (R tr).red = [] ∨ (R tr).acc ≠ [] ∨ (R tr).inacc ≠ []) :
  R (diverge ff  tr) = R tr :=
begin
  rcases hr : (R tr) with ⟨_ | ⟨hac, tlac⟩, _ | ⟨hin, tlin⟩, _ | ⟨hre, tlre⟩⟩;
  simpa [hr] using h <|> simp [R, hr],
end

Mario Carneiro (Dec 31 2020 at 17:49):

you should use tt and ff instead of true and false

Yakov Pechersky (Dec 31 2020 at 17:52):

Fixed. In general, there'd likely be some way to have it be over Prop and not bool, right?

Mario Carneiro (Dec 31 2020 at 17:53):

Well R itself is over bool so the lemmas should be too

Mario Carneiro (Dec 31 2020 at 17:53):

if it used Prop though your cases should use p -> ... and \neg p -> ... instead of true and false

Yakov Pechersky (Dec 31 2020 at 17:59):

Right, I meant that R itself might be over Prop. Since the original example had the bools named as can_prove_empty

Mario Carneiro (Dec 31 2020 at 18:00):

Ant has three type arguments btw, that seems odd

Mario Carneiro (Dec 31 2020 at 18:00):

the beta argument is unused

Yakov Pechersky (Dec 31 2020 at 18:11):

Ah, I meant

inductive Ant (α β : Type)
| leaf (a : α) (b : β) : Ant
| branch (tr1 : Ant) (tr2 : Ant) : Ant
| diverge (a : α) (tr : Ant) : Ant

Yakov Pechersky (Dec 31 2020 at 18:12):

Anyway, it was an attempt to make the example shared by the other poster above compile, just to show examples of the structure default values and the definitional lemmas.

Mario Carneiro (Dec 31 2020 at 18:12):

Here's a proof of something that I think roughly approximates the goal:

import data.multiset.basic

variables {Leaf : Type}

structure LeafPartition (α : Type) :=
(acc : list α := [])
(inacc : list α := [])
(red : list α := [])

inductive Ant (α β : Type)
| leaf (a : α) (b : β) : Ant
| branch (tr1 : Ant) (tr2 : Ant) : Ant
| diverge (a : α) (tr : Ant) : Ant

open Ant

def R : Ant bool Leaf → LeafPartition Leaf
| (leaf bb l) := ite bb {red := [l], ..} {acc := [l], ..}
| (diverge bb tr) := ite bb (R tr) (
    match R tr with
      | ⟨[], [], m :: ms⟩ := {inacc := [m], red := ms, ..}
      | _ := R tr
    end)
| (branch x y) := {
    acc := (R x).acc ++ (R y).acc,
    inacc := (R x).inacc ++ (R y).inacc,
    red := (R x).red ++ (R y).red
  }

def leaves : Ant bool Leaf → list Leaf
| (leaf bb l) := [l]
| (diverge bb tr) := leaves tr
| (branch x y) := leaves x ++ leaves y

section
local attribute [simp] leaves R add_left_comm add_comm
theorem R_perm (a : Ant bool Leaf) : leaves a ~ (R a).acc ++ (R a).inacc ++ (R a).red :=
begin
  rw ← multiset.coe_eq_coe, simp [← multiset.coe_add],
  induction a,
  { simp; split_ifs; simp },
  { simp [← multiset.coe_add, *] },
  { simp [-multiset.coe_eq_coe], split_ifs; simp [← multiset.coe_add, *],
    rcases hr : (R a_tr) with ⟨_ | ⟨hac, tlac⟩, _ | ⟨hin, tlin⟩, _ | ⟨hre, tlre⟩⟩;
    refl }
end
end

Mario Carneiro (Dec 31 2020 at 18:13):

the proof is a bit bombastic, it just cases on everything

Henning Dieterichs (Jan 01 2021 at 11:25):

Mario Carneiro said:

Here's a proof of something that I think roughly approximates the goal:

...
theorem R_perm (a : Ant bool Leaf) : leaves a ~ (R a).acc ++ (R a).inacc ++ (R a).red :=
begin
  rw ← multiset.coe_eq_coe, simp [← multiset.coe_add],
  induction a,
  { simp; split_ifs; simp },
  { simp [← multiset.coe_add, *] },
  { simp [-multiset.coe_eq_coe], split_ifs; simp [← multiset.coe_add, *],
    rcases hr : (R a_tr) with ⟨_ | ⟨hac, tlac⟩, _ | ⟨hin, tlin⟩, _ | ⟨hre, tlre⟩⟩;
    refl }
end
...

Nice :flushed: I didn't think such a short proof could do it.
I wonder though, I thought simp should, in general, be only used for closing goals?

Marc Huisinga (Jan 01 2021 at 14:04):

Henning Dieterichs said:

I wonder though, I thought simp should, in general, be only used for closing goals?

if you want to use simp in the middle of a proof (e.g. because the subgoal is extremely verbose), one way to make simp less brittle is to run squeeze_simp instead and then paste the output for a simp only call. this can also greatly boost performance (or at least it used to).

Henning Dieterichs (Jan 01 2021 at 18:33):

Thanks for pointing that out. I often wondered which lemmas simp used.

Bryan Gin-ge Chen (Jan 01 2021 at 18:50):

For more low-level output you can also turn on set_option trace.simplify true, see the start of the community page on the simplifier.

Last updated: May 02 2025 at 03:31 UTC