Stream: new members

Topic: Generalization of map_diff

Parker J. Rule (Apr 26 2022 at 00:50):

Hi all! I'm a Lean newbie primarily interested in formal verification of software. I'd like to show the following:

lemma list_map_diff_subset [decidable_eq α] [decidable_eq β] :
∀ (f : α → β) (l₁ : list α) (l₂ : list α),
((list.map f l₁).diff (list.map f l₂)) ⊆ (list.map f (l₁.diff l₂))


This statement is a generalization of map_diff in mathlib (which critically depends on map_foldl_erase) to the case where f is not necessarily injective. (Clearly, the two sides must only be equal when f is injective; consider the case of a constant f.) For my purposes, I'd also just be happy to show

lemma list_map_diff_length [decidable_eq α] [decidable_eq β] :
∀ (f : α → β) (l₁ : list α) (l₂ : list α),
((list.map f l₁).diff (list.map f l₂)).length ≤ (l₁.diff l₂).length


I'm not sure how to approach this—the definition of diff is slightly hairy compared to list concatenation, etc., which make it difficult to naively apply induction. Any tips on where to start? Are there any theorems in mathlibthat look roughly like this? Is it more natural to consider sublists instead of subsets? Thanks! :)

Patrick Johnson (Apr 26 2022 at 14:59):

That's an interesting lemma and, as far as I can tell, it's not in mathlib. I proved your first lemma, but the proof is very long. The proof can be found here.

Parker J. Rule (Apr 26 2022 at 19:29):

Wow, thank you so much for the quick response! I look forward to going through the proof in more detail later. Would it be alright with you if I used this (with attribution, of course) in an open-source project related to the formalization of differential privacy mechanisms? (It might also be worth getting this into mathlib...)

Kevin Buzzard (Apr 26 2022 at 19:46):

I looked at Patrick's proof and was surprised about how long it was, but when I tried this myself I realised I would probably end up writing something just as long. The only thing which occurred to me about a possible different approach is to change the question into a question about multisets. The question at stake doesn't care about the order of the lists so it's really a question about the associated multisets and I idly wondered whether they'd be easier to work with.

Patrick Johnson (Apr 26 2022 at 20:37):

Parker J. Rule said:

Would it be alright with you if I used this (with attribution, of course) in an open-source project

Feel free to use it, no need for attribution :)

Junyan Xu (Apr 26 2022 at 22:15):

multiset version is indeed easy:

import data.multiset.basic
open multiset
variables {α β : Type*} [decidable_eq α] [decidable_eq β] (f : α → β)

lemma multiset.map_diff_subset (s₁ : multiset α) (s₂ : multiset α) :
s₁.map f - s₂.map f ≤ (s₁ - s₂).map f :=
begin
rw [tsub_le_iff_right, le_iff_count],
exact card_le_of_le (filter_le_filter _ \$ tsub_le_iff_right.1 le_rfl),
end

lemma list.map_diff_subset (l₁ : list α) (l₂ : list α) :
(l₁.map f).diff (l₂.map f) ⊆ (l₁.diff l₂).map f :=
begin
simp only [← coe_subset, ← coe_sub, ← coe_map],
exact subset_of_le (multiset.map_diff_subset f _ _),
end


Kevin Buzzard (Apr 26 2022 at 22:19):

Oh nice! Ha ha i stopped when sub_le_iff_le_add didn't work, I should have thought to look in the multiset namespace!

Junyan Xu (Apr 26 2022 at 22:22):

updated with list version

Oh very nice

Eric Wieser (Apr 26 2022 at 22:24):

Is a version of this true with docs#list.sublist?

Eric Wieser (Apr 26 2022 at 22:25):

I would guess that version (assuming it's true) requires the much longer proof we started with, since it doesn't follow from the multiset version

Junyan Xu (Apr 26 2022 at 22:26):

yes, probably needs induction.

Eric Wieser (Apr 26 2022 at 22:26):

I'd kind of expect you can get there by just modifying the existing map_diff proof and all the lemmas it uses along the way

Eric Wieser (Apr 26 2022 at 22:27):

Which is the right approach anyway, because eventually someone will come along and ask for the weaker version of those lemmas too

Junyan Xu (Apr 26 2022 at 22:28):

Kevin Buzzard said:

Oh nice! Ha ha i stopped when sub_le_iff_le_add didn't work, I should have thought to look in the multiset namespace!

Yes, it's weird thatsub_le_iff_le_addcan't be directly used to rewrite; it should be enabled by the instance docs#multiset.has_ordered_sub. On the other hand multiset.sub_le_iff_le_add is protected and is not supposed to be used directly?

Junyan Xu (Apr 26 2022 at 22:33):

Hmm, tsub_le_iff_right is enabled by the instance, not sub_le_iff_le_add. Rewriting by the former works, so everything works as intended.

Junyan Xu (Apr 26 2022 at 23:18):

@Eric Wieser Here's a counterexample to the sublist version:

def l₁ := [0,1,2]
def l₂ := [2]
def f : ℕ → ℕ
| 1 := 1
| _ := 0

#eval (l₁.map f).diff (l₂.map f) /- [1, 0] -/
#eval (l₁.diff l₂).map f /- [0, 1] -/
#eval ((l₁.map f).diff (l₂.map f) <+ (l₁.diff l₂).map f : bool) /- ff -/


Yaël Dillies (Apr 27 2022 at 06:49):

Junyan Xu said:

Hmm, tsub_le_iff_right is enabled by the instance, not sub_le_iff_le_add. Rewriting by the former works, so everything works as intended.

Indeed, sub_le_iff_le_add is a lemma about add_group. tsub_le_iff_le_right is a lemma about has_ordered_sub, truncated subtraction, as the name indicates.

Yaël Dillies (Apr 27 2022 at 06:51):

Note that tsub_le_iff_right is strictly more general than sub_le_iff_le_add but we decided to duplicate the API nonetheless to have a uniform interface for tsub. That could change.

Junyan Xu (Apr 27 2022 at 14:29):

Yeah, I didn't know what protected does to a declaration, and only checked the docs last night to find out it means you have to prepend the namespace when calling that declaration even if you opened that namespace, so that's why I had to do multiset.sub_le_iff_le_add.

Parker J. Rule (Apr 27 2022 at 15:05):

Junyan Xu said:

multiset version is indeed easy:

Thanks so much, all! This makes a lot of sense. Indeed, my problem was originally in terms of multisets, but the induction tactic steered me toward formulating it in terms of lists. I was able to generalize  multiset.map_diff_subset to a notion of symmetric difference:

/- Junyan's snippet here... -/

@[simp] def sym_diff (x : multiset α) (y : multiset α) : multiset α
:= (x - y) + (y - x)
infix  Δ :51 := sym_diff

theorem multiset.map_sym_diff_subset (s₁ s₂ : multiset α) :
(s₁.map f) Δ (s₂.map f) ≤ (s₁ Δ s₂).map f :=
begin
simp only [sym_diff],
have h₁ : s₁.map f - s₂.map f ≤ (s₁ - s₂).map f
:= by { exact multiset.map_diff_subset _ _ _ },
have h₂ : s₂.map f - s₁.map f ≤ (s₂ - s₁).map f
:= by { exact multiset.map_diff_subset _ _ _ },
have h₃ : (s₁.map f - s₂.map f) + (s₂.map f - s₁.map f)
≤ (s₁ - s₂).map f + (s₂ - s₁).map f
have h₄ : (s₁ - s₂).map f + (s₂ - s₁).map f
≤ ((s₁ - s₂) + (s₂ - s₁)).map f
:= by { simp },
exact le_trans h₃ h₄
end


...which was my ultimate goal. Any tips on cleaning this proof up? :) It seems overly calculation-heavy...

Ruben Van de Velde (Apr 27 2022 at 15:13):

Somewhat less explicitly:

theorem multiset.map_sym_diff_subset (s₁ s₂ : multiset α) :
(s₁.map f) Δ (s₂.map f) ≤ (s₁ Δ s₂).map f :=
begin
simp only [sym_diff],
transitivity (s₁ - s₂).map f + (s₂ - s₁).map f,

You could use the calc blocks, if you've seen them
also := by { exact [xyz] } can just be replaced with := [xyz] (term mode)