Zulip Chat Archive

Stream: general

Topic: int-valued fincard

Kevin Buzzard (Mar 19 2021 at 15:43):

how about fincard is integer-valued and all infinite sets get sent to -1?

Sebastien Gouezel (Mar 19 2021 at 15:46):

If you want fincard (A x B) = fincard A * fincard B, you better send infinite sets to 0. There are often better choices for junk values than just random choices (and most of the time the good choice is 0, unless it is 37).

Kevin Buzzard (Mar 19 2021 at 15:50):

Aah yes in fact I already knew that :-/ Somewhere I even have a proof of fincard.prod, it's quite unpleasant, you need to do pretty much all nine cases empty, finite and non-empty, infinite. However maybe we don't want fincard.prod to apply to all sets? People will only use it when both sets are finite anyway.

Peter Nelson (Mar 19 2021 at 15:52):

I do like the -1 proposal. In my API, infinite sets go to zero, and this forces finiteness hypotheses in lots of places, such as the following:

lemma size_le_one_iff_mem_unique (hs : s.finite) :
  size s ≤ 1 ↔ ∀ e f ∈ s, e = f :=  sorry

I could live with products being a bit of a pain in exchange for size zero really meaning empty. I would also be very happy if it could be int-valued.

Peter Nelson (Mar 19 2021 at 15:54):

Just realized that infinite sets having size -1 also breaks the above lemma, as well as the lemma that size is nonnegative. I think I still prefer it, though.

Peter Nelson (Mar 19 2021 at 15:55):

Mainly int-valued is what I want

Peter Nelson (Mar 19 2021 at 15:56):

Maybe one could argue that int fits better than nat with the noncomputable setting where fincard, finsum etc live.

Kevin Buzzard (Mar 19 2021 at 15:57):

I am not really clear why we can't just have a nat version and then you coerce into int yourself. This conversation should be going on elsewhere by the way.

Bryan Gin-ge Chen (Mar 19 2021 at 16:03):

(I've moved the int-valued size / fincard messages to this new thread).

Kevin Buzzard (Mar 19 2021 at 16:19):

OK so I claim that fincard should be nat-valued, and that you should just be coercing it into int and using push_cast

Mario Carneiro (Mar 19 2021 at 16:22):

I can't tell how much of this is people joking, but fwiw: I'm strongly against using -1 for infinite sets or as a substitute for none. What is this, C?

Mario Carneiro (Mar 19 2021 at 16:24):

And size isn't negative, so using int is just silly. If you want int subtraction, use \u (fincard s)

Peter Nelson (Mar 19 2021 at 16:34):

What is the smoothest way to prove the second lemma here from the first if size is in nat?

lemma size_modular (s t : set α) (hs : s.finite) (ht : t.finite) :
  size (s ∪ t) + size (s ∩ t) = size s + size t :=
sorry

lemma size_union (s t : set α) (hs : s.finite) (ht : t.finite) :
  size (s ∪ t) = size s + size t - size (s ∩ t) :=
sorry

If it is in int, then linarith [size_modular s t hs ht] will work.

Jason KY. (Mar 19 2021 at 16:37):

Does zify do anything?

Peter Nelson (Mar 19 2021 at 16:38):

It changes the goal by coercing both sides to int.

Jason KY. (Mar 19 2021 at 16:39):

(deleted)

Peter Nelson (Mar 19 2021 at 16:40):

no - I don't think it can unfold the coercions.

Jason KY. (Mar 19 2021 at 16:40):

Sorry, I didnt see you had a subtraction in the second lemma

Alex J. Best (Mar 19 2021 at 16:40):

constants {a b c d : ℕ}
lemma size_modular :
  a + b = c + d :=
sorry

lemma size_union  :
  a = c + d - b :=
begin
simp [size_modular.symm],
end

Alex J. Best (Mar 19 2021 at 16:41):

I changed it to some random nats as I couldn't work out the right imports, but something like this should work

Kevin Buzzard (Mar 19 2021 at 16:43):

(deleted)

Peter Nelson (Mar 19 2021 at 16:45):

I am now trying to refactor my matroid project to use nat size, to see how bad things get. My current approach uses 'linarith and forget' in lots of places, and I want to see how that pans out with nat and coercions. In the above example, simp came to the rescue, but I'm worried it won't be so easy in larger cases.

Yakov Pechersky (Mar 19 2021 at 16:53):

(deleted)

Peter Nelson (Mar 19 2021 at 16:56):

Another example : I have the following goal.

⊢ size (s ∪ t) = size (s ∪ t) - size (s ∩ t) + size (s ∩ t).

This is true if size is nat-valued because of a fact about sets. This is true if size is int-valued because of a fact about int. In this (and many more complicated examples), I don't want to have to care about the fact about sets, or to turn the proof by ring into something more involved.

Kevin Buzzard (Mar 19 2021 at 16:58):

If you are using nats then you shouldn't be using subtraction at all. How did that subtraction end up in your goal?

Yakov Pechersky (Mar 19 2021 at 17:00):

Isn't it the case that the way size works is because of facts about sets?

Peter Nelson (Mar 19 2021 at 17:00):

Maybe I'm taking the size of the complement of something? Or the difference of the size of the set and one of its subsets? Here it's because I'm doing the latter in proving the identity size (s ∪ t) = size (s \ t) + size (t \ s) + size (s ∩ t).

Yakov Pechersky (Mar 19 2021 at 17:01):

size had better work in a reasonable way given the set-fact s ∩ t ≤ s ∪ t

Kevin Buzzard (Mar 19 2021 at 17:02):

this looks to me like it should follow from size (a union b) = size a + size b if disjoint

Yakov Pechersky (Mar 19 2021 at 17:02):

I think there are two ways of looking at the goal you had, it is either a statement about how size works, given some knowledge about sets, or how union and inter work, given some knowledge about size.

Kevin Buzzard (Mar 19 2021 at 17:02):

This stuff should be in the API for size and it's not unreasonable to ask the person writing the API to struggle a bit with nats, but this certainly looks doable without too much trouble.

Peter Nelson (Mar 19 2021 at 17:04):

Kevin Buzzard said:

this looks to me like it should follow from size (a union b) = size a + size b if disjoint

Absolutely! None of this is hard to prove - it's just that 'rearrange every calculation involving size so that all signs are +` is a constraint imposed upon me by the language that gets in the way of how I think about these things as a mathematician.

Eric Wieser (Mar 19 2021 at 17:06):

Kevin Buzzard said:

this looks to me like it should follow from size (a union b) = size a + size b if disjoint

Right, rewrite s ∪ t = (s ∩ t) ∪ (t \ s) ∪ (s \ t) somehow and apply a lemma about size of disjoint sets.

Kevin Buzzard (Mar 19 2021 at 17:06):

then just coerce to int. But when you're making the actual API it must be made for nat.

Peter Nelson (Mar 19 2021 at 17:06):

Is the answer to the question 'what is the size of the difference of a set and one of its subsets?' really not that it's a subtraction of sizes?

Kevin Buzzard (Mar 19 2021 at 17:07):

You can do what you like with size, I'm just saying that for the API one has to struggle to get it all into nat.

Yakov Pechersky (Mar 19 2021 at 17:07):

variables {m n : ℕ} (hn : m ≤ n)

example : n - m + m = n := nat.sub_add_cancel hn

Yakov Pechersky (Mar 19 2021 at 17:08):

And you know for sure that s ∩ t ≤ s ∪ t and size.mono

Peter Nelson (Mar 19 2021 at 17:09):

Yes, but ring is easier. Of course the nat API can handle all this stuff- I'm just saying it makes a lot of it much uglier.

Kevin Buzzard (Mar 19 2021 at 17:09):

Basically the API shouldn't have any subtraction in it (and probably no subtractions even in the proofs) and then when it comes to the case of subsets then the API gives you the <= and now you can choose whether to struggle more with nats or just coerce

Peter Nelson (Mar 19 2021 at 17:11):

When you say 'just coerce', do you mean 'write (fincard s : int) instead of fincard s everywhere? If so, you need an API for coercions of fincard or spend half your life translating back and forth.

Kevin Buzzard (Mar 19 2021 at 17:11):

The nat API is fine. I remember the days before ring and all you have to do is know (how to find) the gazillion lemmas you need. But the API writer is the one doing the struggling and that doesn't have to be you, you can just sorry stuff

Kevin Buzzard (Mar 19 2021 at 17:11):

The API is already there, it's just the coercion from nat to int.

Kevin Buzzard (Mar 19 2021 at 17:13):

Another thing is that ring works fine as long as you're not doing subtraction. The big mistake is to do a nat subtraction in the first place. If you want a-b and you know b<=a then maybe the variable should not have been a, maybe a-b should have been the other variable

Kevin Buzzard (Mar 19 2021 at 17:16):

I guess my opinion on this is like Mario's opinion on fincard. Mario knows full well that everything you want to do can be done with finset.sum but because you got frustrated making it work we're trying finsum. Now similarly I think everything should be doable with nat because I spent a lot of time wrestling with nat before ring, but it's causing you hassle so I say just coerce to int if that works for you. But when we're making the API we need to stick to nat and I am confident that there will be no problems

Kevin Buzzard (Mar 19 2021 at 17:17):

PS I was waiting for finsum to get merged before fincard

Peter Nelson (Mar 19 2021 at 17:19):

In many cases, I really have a set, I really have one of its subsets, and I really want to look at the cardinality of the difference, and maybe do complicated things with the term it gives me, while not wanting to be keeping track of what has what type. I'm not looking at the wrong variables. The implications of doing what you are suggesting for my project are a lot of mental gymnastics to solve problems that are distinctly unmathematical, and don't currently exist just using int.

Yakov Pechersky (Mar 19 2021 at 17:20):

For reference for the types of coercions Kevin mentioned:

variables (m n : ℕ)

-- one way of coercion, given an existing goal about nats
example (hn : m ≤ n) : n - m + m = n :=
begin
  rw ←int.coe_nat_inj',
  simp, -- still needs a m ≤ n to coerce ↑(n - m) to ↑n - ↑m
  simp [int.coe_nat_sub hn]
end

-- second way of coercion, make the goal about ints initially
example : (n : ℤ) - m + m = n :=
begin
  simp
end

Peter Nelson (Mar 19 2021 at 17:20):

And I figured that about fincard - looking forward to it!

Yakov Pechersky (Mar 19 2021 at 17:23):

Do you have a simp lemma saying

@[simp] lemma size_sub {s t : set α} (h : t ≤ s) : size s - size t = size (s \ t) := sorry

Yakov Pechersky (Mar 19 2021 at 17:23):

Or would you prefer it be reversed in your use case?

Kevin Buzzard (Mar 19 2021 at 17:24):

The cardinality of the difference is precisely the variable you are missing. You don't want to express this as a-b. You want this to be the variable.

Peter Nelson (Mar 19 2021 at 17:25):

But that is the size of the set of mammals that are not elephants. I would rather look at the elephants and look at the mammals.

Kevin Buzzard (Mar 19 2021 at 17:25):

The point is that the difference is a thing, s\t, and its cardinality exists, meaning that you never have to create it by doing the subtraction

Kevin Buzzard (Mar 19 2021 at 17:26):

Every time you do a nat subtraction you're thinking about it in the wrong way because you are implicitly making use of a hypothesis which says that something is <= something else.

Kevin Buzzard (Mar 19 2021 at 17:27):

If you want to throw away that usage then coerce to int and use push_cast, a tactic specifically designed to deal with the coercions

Yakov Pechersky (Mar 19 2021 at 17:28):

I think the difference here is that Kevin is saying you do all of the rearrangements (union, inter, diff, etc) on the sets first, then talk about size, and write lemmas that push operations on size into operations on sets, while Peter is saying you expand out all the set rearrangements into statements about size, and talk about numbers instead.

Kevin Buzzard (Mar 19 2021 at 17:28):

But you have to pay somehow if you want to use subtraction. Our opinions differ as to how you should be paying

Peter Nelson (Mar 19 2021 at 17:28):

That's why I don't want to do nat subtractions. I'm working in integer-land, with many quantities that are both cardinalities of sets and other things, and I want to treat cardinalities just like the other terms they're being combined with.

Peter Nelson (Mar 19 2021 at 17:28):

If we're paying, then your way is far more expensive for me

Kevin Buzzard (Mar 19 2021 at 17:29):

Then just coerce to integers, never use natural subtraction, and sorry all the card results you can't be bothered to prove and I'll deal with them in the fincard PR :-)

Peter Nelson (Mar 19 2021 at 17:29):

still harder than linarith

Peter Nelson (Mar 19 2021 at 17:30):

but I'll give it a try!

Kevin Buzzard (Mar 19 2021 at 17:30):

I just don't think we can justify an integer fincard

Peter Nelson (Mar 19 2021 at 17:31):

yeah, and I'm not really trying to say that one makes sense in mathlib

Kevin Buzzard (Mar 19 2021 at 17:31):

But I think that "push_cast at *; linarith" is not too much to ask

Peter Nelson (Mar 19 2021 at 17:31):

I really think it makes sense for me though - your suggestion of PRing is what put this tension in my head

Kevin Buzzard (Mar 19 2021 at 17:34):

What I am conflicted about here is that in some sense I know you don't need fincard and that judicious use of stuff like decidable equality will get you through. But I tired of this. And now the boot is on the other foot -- I'm claiming that all that is happening that either you don't know how to use nat.sub or you don't know how to use pushcast, however the same logic applies: there could be a string of people behind you who all want everything to be in int just because it's easier for people who didn't go through what I went through with arithmetic in 2018 when there was less automation

Peter Nelson (Mar 19 2021 at 17:38):

Yeah, it feels like something of a similar nature. Where my fight is coming from is the feeling that subtracting sizes is how I would teach certain things to someone, and being forced to never subtract is just a silly constraint imposed on me by the implementation of a proof assistant. Of course I absolutely understand that cardinality is nat-valued.

Peter Nelson (Mar 19 2021 at 17:40):

It is just very helpful to mentally embed nat in int when convenient. Unfortunately you have to pay for this in lean.

Kevin Buzzard (Mar 19 2021 at 17:44):

Whenever you use subtraction you are implicitly invoking the fact that it is "safe" because some set is a subset of another set. In general it's not true that size s/t is size s - size t, whether you're nat or int valued. You used that all elephants were mammals above. The question is where you want to store this information. If you use nat subtraction then basically you're saying that you have access to a proof that the thing you're subtracting is at most the thing you're subtracting it from. But if that proof is hidden away somewhere then you'll have to explicitly feed it to lemmas. I'm suggesting that in most cases your hidden proof is that you're doing a-b with b the size of a subset of the thing a is the size of, and here instead of doing the subtraction you can just use the card of the diff set

Sebastien Gouezel (Mar 19 2021 at 17:44):

A side -note. I had played a lot with distances in Isabelle, especially with Gromov-hyperbolic stuff where you keep on subtracting distances. For this, the fact that distances are taking values in reals, and not nonnegative reals, is extremely helpful, because otherwise you would need to add coercions to reals all the time. At one point, there was a suggestion that, in mathlib, maybe we could have nnreal-valued distances because, hey, distances are nonnegative. I argued strongly against this because of my previous experience, and then we kept the distances to be real-valued. And I never regretted it.

The situation is essentially the same here. I can't see what we gain with a nat-valued fincard compared to an int-valued fincard, but I can see what we lose.

Sebastien Gouezel (Mar 19 2021 at 17:46):

(The Gromov product in a metric space is $(x, y)_a = (d(x, a) + d(y, a) - d(x, y))/2$ . It looks like a crazy definition but in fact there is something deep behind it. And the subtraction always makes sense also in nnreal because d(x, y) \le d(x,a) + d(y, a) by the triangular inequality, but if one had to fight subtraction all the time playing with this would be a nightmare).

Kevin Buzzard (Mar 19 2021 at 17:48):

Well this is a very interesting counterpoint. @Mario Carneiro can you explain again why we're allowed to have metric spaces with a distance function taking values in the reals but we have to have fincard taking values in nat?

Peter Nelson (Mar 19 2021 at 17:50):

This is absolutely on the money!

A related notion in matroid theory is the dual rank function; given a primal rank function r : set T -> int (which is axiomatically required to be nonnegative), the dual rank function r' : set T -> int is defined by r' X = size X + r Xᶜ - r univ. This, as one can prove from the other axioms, is also always nonnegative. But it includes a subtraction. On the one hand, you could say that both functions, being nonnegative, should be nat-valued. On the other hand, I really wouldn't appreciate having to think about that proof of nonnegativity whenever I refer to the dual rank of a set, and constantly be invoking some API lemma to make sure its definition doesn't have any subtraction overflow. So I make the rank function int-valued, and everything is fine!

Sebastien Gouezel (Mar 19 2021 at 17:51):

In fact we also have the nnreal-valued distance, called nndist. But since it has essentially no advantage over the real-valued distance no-one uses it.

Sebastien Gouezel (Mar 19 2021 at 17:52):

For norms, we have norm and nnnorm, and people use one or the other depending on what is more convenient (with normbeing used maybe 80% of the time and nnnorm 20%)

Kevin Buzzard (Mar 19 2021 at 17:53):

This dual rank example is a great example.

Kevin Buzzard (Mar 19 2021 at 17:53):

And I think it's certainly reasonable to argue that card is some kind of norm so both versions should be available

Johan Commelin (Mar 19 2021 at 17:55):

In LTE we don't use nndist, but we use nnnorm a lot

Johan Commelin (Mar 19 2021 at 17:55):

It's really convenient that the type system keeps track of the nonnegativity

Kevin Buzzard (Mar 19 2021 at 17:55):

But all we are doing is chasing around factors about how norms grow under bounded linear maps there

Sebastien Gouezel (Mar 19 2021 at 17:56):

So fincard and nnfincard?

Kevin Buzzard (Mar 19 2021 at 17:56):

I'm coming to this conclusion

Peter Nelson (Mar 19 2021 at 17:56):

Boy, do I have an API ready for you!

Kevin Buzzard (Mar 19 2021 at 17:56):

Lol

Mario Carneiro (Mar 19 2021 at 18:16):

I agree that the parallel with nndist is worth looking at here. I think the difference is that dist (or norm) is actually a field of the structure, so for example with the Gromov product example that expression (which is obviously real and not obviously nonnegative) is going in as the definition of the distance function, so it is convenient for the distance function to be real-valued to receive such an expression. With card this is not the case - cardinality is an operation on sets/finsets and is not coming out of any structure, it's just an emergent property of sets. Here, it's useful to get as tight information as possible into the type system, hence it should be nat valued, because things like \u (card s) aren't operations that cancel, it's just a term in your formula that can be manipulated with the usual tactics like push_cast.

Mario Carneiro (Mar 19 2021 at 18:18):

Plus, nat is a much more versatile and heavily lemma'd type than nnreal

Mathieu Guay-Paquet (Mar 19 2021 at 18:27):

I want to re-emphasize Peter's example of the dual matroid: if you're just dealing with set operations and sizes, it's easy to imagine that every instance of subtraction comes from looking at a pair of nested sets, in which case you can look at the size of the set difference. In that context, nat-valued size makes sense, because being non-negative is really a syntactic property of the expressions you're looking at. But once you deal with matroids and duality, it is absolutely a theorem and not a syntactic fact that the expressions involved are non-negative.

Peter Nelson (Mar 19 2021 at 18:30):

Yeah - I think the case that matroid rank should be int-valued is convincing in a way that surpasses the case for int-valued size. But I am a real fan of these analogies with distance, and they do ring true in how I think about size.

What are the practical advantages of set size being nat-valued? I can see that it makes induction a bit easier. But is this such a compelling argument that an API for int-valued size is inappropriate? In my mind, the practical advantages of that are obvious.

Peter Nelson (Mar 19 2021 at 18:45):

Another example; an object appearing in graphs, matroids and more is a 'connectivity function', which is a certain type of nonnegative function k : set T -> int. There is quite a nice theory of these objects. One way to get one is to take the function defined by k X := M.r X + M.r' X - size X, where the r and r' are the primal and dual rank functions of a matroid M. Like in contexts before, this is nonnegative, but this is a theorem one needs to prove.

Of course one could define this as a coercion, but no mathematician would look at this formula and ask 'why are you subtracting size?'. And taking the approach that 'nonnegative things should be nat' would lead one to conclude that the LHS and three terms on the RHS all should be nat - so what is the subtraction doing?

It seems obvious to me that, in contexts like the above, everything is int.

Mario Carneiro (Mar 19 2021 at 19:02):

By the way, I didn't mention it explicitly but I think that the argument for having int valued matrix rank analogously to the distance function does work, because in that case the rank is a member of the structure so you are dealing with both sides of the equation. I don't see that for the cardinality function, as I said.

Peter Nelson (Mar 19 2021 at 19:04):

Rank doesn't have to be a member of the matroid structure - in most definitions, it isn't. It's something you derive.

Mario Carneiro (Mar 19 2021 at 19:05):

I think it could be nat too; I'm still trying to figure out what you find to be the downside of that though

Mario Carneiro (Mar 19 2021 at 19:05):

equations can be in int even if the definitions are in nat

Peter Nelson (Mar 19 2021 at 19:06):

See my example on the dual rank function. In that, we subtract a rank function to get it, but it always has a nonnegative value. It would be a nightmare (in the same way as the Gromov product above) to constantly have to remember the exact reason it's nonnegative, when you're proving a lemma involving dual rank whose statement doesn't care about the nonnegativity.

Mario Carneiro (Mar 19 2021 at 19:07):

I don't see why you need to do any of that

Peter Nelson (Mar 19 2021 at 19:07):

If you're in the weeds proving something, you add and subtract ranks all over the place

Mario Carneiro (Mar 19 2021 at 19:07):

Okay, so do so

Mario Carneiro (Mar 19 2021 at 19:07):

work over int

Mario Carneiro (Mar 19 2021 at 19:07):

that doesn't mean the definitions have to be in int too

Mario Carneiro (Mar 19 2021 at 19:08):

if you need to subtract, use int.sub

Mario Carneiro (Mar 19 2021 at 19:09):

It's possible to be quite fine grained about where and why to add nonnegativity assumptions and assertions using nat and int

Peter Nelson (Mar 19 2021 at 19:09):

But why should I be fine-grained when it is easier not to be?

Mario Carneiro (Mar 19 2021 at 19:10):

Well you said nonnegativity comes up so I guess you need it at some point

Peter Nelson (Mar 19 2021 at 19:10):

Coercions are one of the most confusing parts of the learning curve for people getting into this stuff. For most parts of the language they are necessary- in this situation they aren't.

Mario Carneiro (Mar 19 2021 at 19:12):

Okay? I'm trying to find the best way to express this, which might use features that newbies don't know. Maybe this will be a reason to learn the feature

Peter Nelson (Mar 19 2021 at 19:12):

yes, I use size_nonneg and rank_nonneg frequently - it is clear in calculations exactly where they're needed. Refactoring to make size (or rank!) nat-valued involves solving a tricky little coercion puzzle in a ton of places.

Mario Carneiro (Mar 19 2021 at 19:12):

I really don't see how it changes anything to replace all occurrences of size_int x with \u (size x)

Mario Carneiro (Mar 19 2021 at 19:14):

I don't agree with some of Kevin's comments earlier about removing all references to - by rearranging equations

Peter Nelson (Mar 19 2021 at 19:14):

it means using push_cast in a lot of places - I'd rather those are all pushed back into an API

Mario Carneiro (Mar 19 2021 at 19:14):

It shouldn't

Mario Carneiro (Mar 19 2021 at 19:15):

Since \u is always appearing on an atomic function, namely size x, there is nothing for push_cast to do

Peter Nelson (Mar 19 2021 at 19:16):

\u size (A U B) = \u (size A + size B) etc

Mario Carneiro (Mar 19 2021 at 19:16):

Then state and prove the theorem \u size (A U B) = \u (size A) + \u (size B)

Mario Carneiro (Mar 19 2021 at 19:16):

that's what you would have done with the int-based size API anyway

Peter Nelson (Mar 19 2021 at 19:16):

ok, and that's what I've done - a whole API worth of it!

Peter Nelson (Mar 19 2021 at 19:16):

so where do all those theorems go?

Peter Nelson (Mar 19 2021 at 19:17):

I've just done this with notation for \u size

Kevin Buzzard (Mar 19 2021 at 19:17):

right, that's the question. We were proposing mathlib, maybe Mario is suggesting that it's your problem?

Aaron Anderson (Mar 19 2021 at 19:18):

Kevin Buzzard said:

PS I was waiting for finsum to get merged before fincard

It's here already: docs#nat.card (although with a minimal API)

Mario Carneiro (Mar 19 2021 at 19:19):

A theorem like that can be stated for most nat functions though, it kind of reeks of duplication. Maybe it is good to have for certain functions where it comes up a lot, maybe fincard is such a function, but for mathlib I would try and see if using simp isn't good enough already (since simp will already normalize \u (size A + size B) to \u (size A) + \u (size B))

Peter Nelson (Mar 19 2021 at 19:22):

yeah - simp will do some stuff - I was looking into this when refactoring. It is really just seamless to have size be int, though, and the approaches being suggested do all at least involve very frequently manipulating coercions in a way that can be very distracting. It absolutely is duplication that I'm talking about.

Peter Nelson (Mar 19 2021 at 19:23):

Is simp, linarith really idiomatic?

Peter Nelson (Mar 19 2021 at 19:23):

because there'd be a lot of that (not a fan of the simp only [100+ characters] alternative either)

Kevin Buzzard (Mar 19 2021 at 19:25):

@Aaron Anderson does it make sense to have it in the nat namespace? If it's in the set namespace then you can write s.card. Or is set.card already taken by some crazy cardinal thing? :-/

Peter Nelson (Mar 19 2021 at 19:25):

I suppose push_cast at *, linarith as Kevin suggested earlier is cleaner.

Aaron Anderson (Mar 19 2021 at 19:26):

Kevin Buzzard said:

Aaron Anderson does it make sense to have it in the nat namespace? If it's in the set namespace then you can write s.card. Or is set.card already taken by some crazy cardinal thing? :-/

There's nat.card and enat.card

Kevin Buzzard (Mar 19 2021 at 19:26):

But I never want to write n.card, so I'm asking why we've put it in the nat namespace.

Kevin Buzzard (Mar 19 2021 at 19:27):

Yakov has turned me on to the power of dot notation.

Yakov Pechersky (Mar 19 2021 at 19:34):

Mario Carneiro said:

A theorem like that can be stated for most nat functions though, it kind of reeks of duplication. Maybe it is good to have for certain functions where it comes up a lot, maybe fincard is such a function, but for mathlib I would try and see if using simp isn't good enough already (since simp will already normalize \u (size A + size B) to \u (size A) + \u (size B))

But simp won't normalize \u (size A - size B), even if you give it the proper proof of the inequality:

import data.int.cast

variables {m n : ℕ}

example (hn : m ≤ n) : (((n - m) : ℕ) : ℤ) + m = n :=
begin
  success_if_fail { simp },
  success_if_fail { simp [hn] },
  success_if_fail { push_cast },
  push_cast [hn],
  simp
end

Julian Berman (Mar 19 2021 at 19:46):

If this isn't useful or true ignore me, but I think the whole reason size is coming up is because it's the (one and only) key dependency for https://github.com/e45lee/lean-matroids/blob/master/src/matroid/axioms.lean (i.e. for the core different definitions of matroids) right? So if anything perhaps an alternate way to have this conversation is for someone to show Peter a good/minimal set of changes that make that file easily addable?

Mario Carneiro (Mar 19 2021 at 19:53):

But simp won't normalize \u (size A - size B), even if you give it the proper proof of the inequality:

No, you should never be writing nat.sub here

Peter Nelson (Mar 19 2021 at 19:54):

Julian Berman said:

If this isn't useful or true ignore me, but I think the whole reason size is coming up is because it's the (one and only) key dependency for https://github.com/e45lee/lean-matroids/blob/master/src/matroid/axioms.lean (i.e. for the core different definitions of matroids) right? So if anything perhaps an alternate way to have this conversation is for someone to show Peter a good/minimal set of changes that make that file easily addable?

In a way that's the problem, but I can certainly figure out how to make that file addable myself - I'm working on a branch in which size is nat. The real problem is that every change I make in refactoring makes simple things more complicated, so I'm not convinced that nat size is actually a sound design decision. Some suggestions that have been made make the changes easier, but things are still shaking out to be more complex than they were.

Mario Carneiro (Mar 19 2021 at 19:54):

I guess the theorem about size of a difference would be something like B \sub A -> \u (size (A \ B)) = \u (size A) - \u (size B)

Peter Nelson (Mar 19 2021 at 19:55):

Yes - that's the exact kind of API lemma I have; I just want to go all-in and give \u size a name.

Mario Carneiro (Mar 19 2021 at 19:55):

most of the time you don't even need to acknowledge the presence of the \u

Mario Carneiro (Mar 19 2021 at 19:57):

If you want to name that combination for the specific use of matroids, that's fine, that's your business. I just don't think this is an argument for general fincard

Peter Nelson (Mar 19 2021 at 20:01):

What is the argument, in practical terms, that fincard should be nat?

Peter Nelson (Mar 19 2021 at 20:05):

I tend to think that generally for extremal combinatorics, it wants to be int.

Bryan Gin-ge Chen (Mar 19 2021 at 20:08):

Would your argument also say that for doing probability we'll want the cardinality to be in rat?

Peter Nelson (Mar 19 2021 at 20:09):

that's a good point

Peter Nelson (Mar 19 2021 at 20:13):

In practice what I want is ring/linarith, and to throw sizes of sets around without worrying about casts.

Bryan Gin-ge Chen (Mar 19 2021 at 20:14):

I'm not sure which way we should go, honestly. From your experience are there some lemmas which are much more painful with nat than with int or is it just the accumulation of boilerplate which is a drag?

Peter Nelson (Mar 19 2021 at 20:27):

It's hard to say definitively because I'm used to the way I've done things, but I just can't see how things are nicer with nat. Having ring and linarith make so many things easy that I want to do (and I think most researchers in combinatorics will want to do) when size is int. Using nat just requires more effort (albeit small) in a lot of places. This is nonzero effort being compared to zero effort.

And I am not sure of the advantages of nat size in practice. It is more theoretically precise in a sense that I can appreciate, and sometimes having nonnegativity for free might be useful, but what makes it so obviously the right way to do things? Induction on cardinality is a mess with fincard anyway because sets with fincard zero might be infinite, so the slight extra mess from int is no big deal.

As Mario says, cardinality is a function and not a field, but so is the function that squares an integer - should it map int to nat?

Sebastien Gouezel (Mar 19 2021 at 20:31):

I'd also be curious to hear the arguments in favor of a nat-valued fincard, other than "it's the right thing". We know that excluding division by zero would be "the right thing", but since it is less practical all theorem provers went the other direction -- it's just a question of being pragmatic. Here, I can see the advantages of an int-valued fincard, so I'd like to see also the advantages of a nat-valued fincard before weighing the ones again the others and making an informed choice.

Aaron Anderson (Mar 19 2021 at 21:46):

Kevin Buzzard said:

But I never want to write n.card, so I'm asking why we've put it in the nat namespace.

My thinking was that one could type open nat or open enat to unlock their preferred cardinality function.

Aaron Anderson (Mar 19 2021 at 21:48):

I didn't provide much API so I'm not very attached to these names. It still seemed better to me than fincard and efincard or something.

Mario Carneiro (Mar 19 2021 at 21:58):

@Sebastien Gouezel The obvious advantage is that you can use size in nat contexts without needing to_nat and a way to supply the proof of nonnegativity

Aaron Anderson (Mar 19 2021 at 22:01):

Peter Nelson said:

Julian Berman said:

If this isn't useful or true ignore me, but I think the whole reason size is coming up is because it's the (one and only) key dependency for https://github.com/e45lee/lean-matroids/blob/master/src/matroid/axioms.lean (i.e. for the core different definitions of matroids) right? So if anything perhaps an alternate way to have this conversation is for someone to show Peter a good/minimal set of changes that make that file easily addable?

In a way that's the problem, but I can certainly figure out how to make that file addable myself - I'm working on a branch in which size is nat. The real problem is that every change I make in refactoring makes simple things more complicated, so I'm not convinced that nat size is actually a sound design decision. Some suggestions that have been made make the changes easier, but things are still shaking out to be more complex than they were.

It looks to me like size is enforcing finiteness in this definition, by forcing the rank of any infinite set to be 0. If you want matroids to have to be finite, that's fine (although I'd eventually want to make an infinite version), but that axiom shouldn't be encoded in your size function

Mario Carneiro (Mar 19 2021 at 22:01):

for example, monoid powers take a nat argument, would you use to_nat in that position?

Mario Carneiro (Mar 19 2021 at 22:03):

@Aaron Anderson I think the question about what to do about infinite sets is a separate one; we have cardinal if you really want to handle the infinite case correctly but the nat case is common enough to warrant special support

Mario Carneiro (Mar 19 2021 at 22:03):

It is worth pointing out though that enat and int don't play well together

Peter Nelson (Mar 19 2021 at 22:05):

Mario Carneiro said:

for example, monoid powers take a nat argument, would you use to_nat in that position?

I've come across this already, and the answer is yes. It's rare that I want the size of the set up in an exponent.

Aaron Anderson (Mar 19 2021 at 22:06):

I totally agree that there should be a nat- or int-valued cardinality function, that comment is perhaps too specific to the matroid context.

Bryan Gin-ge Chen (Mar 19 2021 at 22:10):

Would it make sense to go all the way to rat then for use in probability?

Peter Nelson (Mar 19 2021 at 22:11):

Aaron Anderson said:

Peter Nelson said:

Julian Berman said:

If this isn't useful or true ignore me, but I think the whole reason size is coming up is because it's the (one and only) key dependency for https://github.com/e45lee/lean-matroids/blob/master/src/matroid/axioms.lean (i.e. for the core different definitions of matroids) right? So if anything perhaps an alternate way to have this conversation is for someone to show Peter a good/minimal set of changes that make that file easily addable?

In a way that's the problem, but I can certainly figure out how to make that file addable myself - I'm working on a branch in which size is nat. The real problem is that every change I make in refactoring makes simple things more complicated, so I'm not convinced that nat size is actually a sound design decision. Some suggestions that have been made make the changes easier, but things are still shaking out to be more complex than they were.

It looks to me like size is enforcing finiteness in this definition, by forcing the rank of any infinite set to be 0. If you want matroids to have to be finite, that's fine (although I'd eventually want to make an infinite version), but that axiom shouldn't be encoded in your size function

I've had infinite matroids in the back of my mind for a while - it wouldn't be a major refactor to have them. One could define matroids in terms of an axiom set other than the rank function, then derive the rank axioms for the finite case. The reason I haven't done this already is that you need an annoying extra axiom to define infinite matroids correctly, and I don't care about infinite matroids myself (finite matroids are much more well-studied, to the extend that 'matroid' means 'finite matroid', while infinite matroids get the non-default name 'B-matroid').

This isn't really related to my issue with size, though. Doing this refactor now would certainly allow me to get the axioms file into mathlib, but not any interesting lemmas. The file rankfun.lean, which is really the basic api for matroids, uses size extensively, so it would just be pushing the problems down the road.

Mario Carneiro (Mar 19 2021 at 22:14):

From what I'm hearing, I really think that this int-valued size thing is specific to the matroid context

Mario Carneiro (Mar 19 2021 at 22:15):

What do the matroid axioms look like in the infinite case?

Mario Carneiro (Mar 19 2021 at 22:15):

specifically, what does "size" mean?

Peter Nelson (Mar 19 2021 at 22:23):

It’s not just matroids - that’s just what I’m doing. All the problems I’m talking about will show up when doing extremal combinatorics more generally.

Infinite matroids don’t deal with sizes of sets. It’s quite a different world, and most of the theory matroid theorists are interested in just fails in the infinite setting. It’s a red herring here.

Mario Carneiro (Mar 19 2021 at 22:25):

but it seems like this int size thing might not even be the right abstraction then. As for extremal combinatorics, that still sounds domain specific to me; I don't think we have anything like that in mathlib

Peter Nelson (Mar 19 2021 at 22:27):

Would you like to?

Peter Nelson (Mar 19 2021 at 22:28):

Again, infinite matroids have almost no relationship to any notion of size. Finite matroids (matroids) are what care about sizes of sets.

Mario Carneiro (Mar 19 2021 at 22:28):

I don't see why not, but that doesn't mean that things that are good for the abstractions in extremal combinatorics are also good everywhere. In particular, if it's a choice between using size A and \u (size A) or to_nat (size A) and size A, I'll pick the former every time; \u has much nicer theorems about it than to_nat

Peter Nelson (Mar 19 2021 at 22:29):

Right - why can’t it be a choice others can make differently?

Mario Carneiro (Mar 19 2021 at 22:29):

And size A and int_size A would mean duplicating a lot of API theorems

Peter Nelson (Mar 19 2021 at 22:29):

Yup

Mario Carneiro (Mar 19 2021 at 22:30):

Is there a concrete problem that needs fixing here? We've been talking in abstracts for a while now

Peter Nelson (Mar 19 2021 at 22:32):

Yes - I’d like to PR matroids, and this requires either mathlib being ok with an int size api, or me to systematically refractor my code in a way that makes it more complicated.

Mario Carneiro (Mar 19 2021 at 22:34):

Maybe you should put matroids on a branch and we can look at variations in the review process

Mario Carneiro (Mar 19 2021 at 22:35):

We would be able to quantitatively answer this question by refactoring it to use nat and seeing how many lines are added/removed by the process

Kevin Buzzard (Mar 20 2021 at 00:50):

I don't know why the CS people are so anti code duplication. What's wrong with two definitions? We can watch and see which one is more useful.

Bhavik Mehta (Mar 20 2021 at 04:14):

Peter Nelson said:

It’s not just matroids - that’s just what I’m doing. All the problems I’m talking about will show up when doing extremal combinatorics more generally.

Do you have any reason for claiming this? I've done a good amount of extremal combinatorics in Lean and I haven't had any problems with the things you're talking about: specifically I've had no desire to work with int-valued fincard

Peter Nelson (Mar 20 2021 at 13:23):

Ah, I’m happy to hear that. I’m ok to be told I’m doing it wrong by someone that is working on the same type of math - I just get the impression that what I’m doing is smoother working over the integers and I’m not convinced that changing to nat will make the code better. (I can certainly change to nat - it’s never been a question of whether it’s possible).

Again, this is because when doing calculations with sizes of sets, subtraction is a normal part of the world, and the calculations that come up are easier to do if you know you’re in a ring.

I think coercions are a (small) mental barrier that doesn’t need to be intruding here at all - once you have an api for int size, it’s a tradeoff between zero effort and nonzero.

Mario Carneiro (Mar 20 2021 at 14:09):

Kevin Buzzard said:

I don't know why the CS people are so anti code duplication. What's wrong with two definitions? We can watch and see which one is more useful.

That's actually what I'm suggesting - try it both ways and see which is better. I think it's been pretty firmly established that this is a feasible task, but we're all just hypothesizing and we need data to make an objective statement

Bryan Gin-ge Chen (Mar 20 2021 at 14:12):

Peter's matroids repo currently shows how things look with an int-valued size. I think he said he was working on a branch swapping things over to nat, so when that gets pushed there should be plenty of examples to compare.

Filippo A. E. Nuccio (Apr 08 2021 at 17:01):

Kevin Buzzard said:

Yakov has turned me on to the power of dot notation.

I have seen that in some Discord session Yacov has provided a lesson on dot notation. Has this session been recorded, and if yes, do you have a link to it? Thanks!

Kevin Buzzard (Apr 08 2021 at 17:07):

No, he just carefully explained to me how to use it in the context of set.finite. For example

import data.set.finite

open set

example (X : Type) (s t : set X) (hs : finite s) (ht : finite t) : finite (s ∪ t) := hs.union ht

The idea is that set.union creates the union of the sets, and set.finite.union is the proof that the union of two finite sets is finite. Using dot notation makes it look very natural.

Eric Wieser (Apr 08 2021 at 17:08):

Some boring person made docs#has_union.union only work in Type u not Sort u, so we can't write things like hs ∪ ht

Eric Wieser (Apr 08 2021 at 17:09):

It's almost like they thought that would be a bad idea

Bryan Gin-ge Chen (Apr 08 2021 at 17:14):

Bryan Gin-ge Chen said:

Peter's matroids repo currently shows how things look with an int-valued size. I think he said he was working on a branch swapping things over to nat, so when that gets pushed there should be plenty of examples to compare.

Coming back to the original topic, it looks like the branch with a nat-valued size is now here.

Peter Nelson (Apr 08 2021 at 22:34):

Yes the branch is there, but currently broken.

Peter Nelson (Apr 08 2021 at 22:34):

The fact that finsum is now in mathlib will mean it needs a bit of a refactor

Last updated: May 02 2025 at 03:31 UTC