Zulip Chat Archive

Stream: general

Topic: finsupp refactor


Stuart Presnell (Jul 27 2022 at 06:07):

Just to see what's possible to do with it, I had a go at rearranging the content of data/finsupp/basic (#15699) to divide it up into parts that could potentially be split into separate files. To be clear, this was just an experiment and I don't intend to try making drastic changes without consultation!

On this first pass through the file I was able to arrange it into three major parts:

  • about 1300 lines of more basic material
  • about 400 lines on sum and prod
  • about 1300 lines of further lemmas

There's no grand architectural philosophy behind this particular arrangement. My main aim was just to group together the main lemmas about sums and products (since this seems like a natural candidate to become a separate file), and to move up any sections that can go before those lemmas. I expect there are finer and more useful ways to carve up this file into still smaller coherent pieces. But even if we did no more than to split the file into these three pieces, that might still be an improvement over the unwieldy 2900 line monster we currently have.

Stuart Presnell (Jul 27 2022 at 06:08):

So what do people think? I understand that @Yakov Pechersky and @Wrenna Robson were interested in refactoring this file — did you get anywhere with this?

Wrenna Robson (Jul 27 2022 at 06:09):

I didn't, no, but once I'm awake and up I'll have a look at this.

Wrenna Robson (Jul 27 2022 at 06:09):

Do the further lemmas have any kind of theme?

Stuart Presnell (Jul 27 2022 at 06:16):

Lots of them are sections about ways to make new finsupps from old ones: for example, filter p f or curry (f : (α × β) →₀ M) : α →₀ (β →₀ M). But this is also true of many of the lemmas in the first block; the main difference is whether the sections also contain lemmas involving sum and prod.

Stuart Presnell (Jul 27 2022 at 06:19):

So we could plausibly, for example, move most of the lemmas about filter into the first block and then have lemmas like docs#finsupp.prod_filter_index coming later.

Eric Wieser (Jul 27 2022 at 07:18):

I'd note that however we split finsupp, we should ideally have the dfinsupp file split along similar lines

Stuart Presnell (Jul 27 2022 at 07:36):

I agree, with the caveat that it may be easier to find the right way to split finsupp first, and then use that as a model to divide dfinsupp as a separate project — we shouldn't necessarily be constrained to keep finsupp and dfinsupp in sync every step of the way.

Yaël Dillies (Jul 27 2022 at 09:04):

Note that I started such a split over at #11109, but it ended up conflicting with other stuff I was doing.

Wrenna Robson (Jul 27 2022 at 09:13):

It's a shame that the computability question seems intractable.

Wrenna Robson (Jul 27 2022 at 09:14):

Your spacing looks reasonably sensible to me.

Scott Morrison (Jul 27 2022 at 09:38):

I nearly always approve of splitting files! Anything over 1000 lines is too big in my book.

Stuart Presnell (Jul 27 2022 at 09:50):

I'm aiming to keep this first move quite conservative, just to make some progress on making the files more manageable — there's lots of opportunity to get derailed by trying to work out the exact best way to arrange things, until nothing at all gets done.

On the other hand, I'm also very aware that we don't want to keep making further splits and rearrangements in lots of subsequent PRs, since this introduces a series of merge conflicts for any other projects that touch finsupp.

On the other other hand, if these subsequent tweaks are to the less used parts of the file then their potential for raising merge conflicts is correspondingly less.

Stuart Presnell (Jul 27 2022 at 09:56):

So we might start by splitting into data.finsupp.basic, data.finsupp.prod and data.finsupp.everything_else, where the stuff in the first two parts is the more commonly used material that we shouldn't have to revisit, and the stuff in the third part can be further subdivided later without causing too much pain.

Yaël Dillies (Jul 27 2022 at 09:57):

The material about finsupp.sum and finsupp.prod should go to algebra.big_operators.finsupp.

Yaël Dillies (Jul 27 2022 at 09:58):

The problem with this approach, as I experienced it, is that many finsupp constructions depend on finsupp.sum.

Stuart Presnell (Jul 27 2022 at 10:03):

That's one reason I started by internally rearranging data.finsupp.basic, to more easily see which parts don't depend on sum and prod.

Stuart Presnell (Jul 27 2022 at 14:32):

Ok, I've moved a couple of (seemingly less-used) sections out of block 1 down to block 3 to keep data.finsupp.basic down to about 1000 lines. Then I've split off block 3 into a new data.finsupp.misc (temporary name) and moved the prod and sum lemmas into algebra.big_operators.finsupp. Now let's see what breaks! :smile:

Stuart Presnell (Jul 27 2022 at 20:32):

This passes CI now. Only four files import data.finsupp.misc, so if we want to further refactor that into smaller parts it should be possible to do it in a further PR without too much disruption.

Stuart Presnell (Jul 27 2022 at 20:33):

Could someone suggest a better name for data.finsupp.misc?

Yaël Dillies (Jul 27 2022 at 23:24):

Usually we have .defs and then .basic, rather than .basic and .another_basic.

Stuart Presnell (Jul 28 2022 at 10:08):

I've just renamed the files and set the PR to awaiting-review. Once I'm happy that there's general consensus that this arrangement of material is right then I'll edit the docstrings of the three files and take out the temporary comments splitting up the sections.

Anne Baanen (Aug 16 2022 at 10:22):

I'm going to go ahead and say that the absence of any objection suggests that the split is good. So please go ahead and take it out of temporary state.

Stuart Presnell (Aug 17 2022 at 14:39):

Can someone help me figure out how to resolve the merge conflict on this (#15699)? I first made the edit to data/finsupp/basic, cutting out a load of material and moving it to two other files, about 3 weeks ago. Now there's a merge conflict on data/finsupp/basic amounting to about 700 lines. Some of this is perhaps due to changes that have been made to data/finsupp/basic in the past 3 weeks, such as #15443, so I want to be careful not to accidentally delete anything that was added in those PRs. Is there a straightforward way to make sure I do this right without choring through 700 lines?

Yaël Dillies (Aug 17 2022 at 14:42):

Look at the history of src#data.finsupp.basic

Anne Baanen (Aug 17 2022 at 14:42):

I don't know an easy way if you did more complicated things than just move a big chunk of those lines. So for example, if in the middle of those 700 lines you just cut 500 and pasted in another file, then you can just find the endpoints and try again.

Anne Baanen (Aug 17 2022 at 14:43):

But if a lot of adjacent definitions ended up in different places then you probably have to redo the splitting manually.

Yaël Dillies (Aug 17 2022 at 14:43):

From https://github.com/leanprover-community/mathlib/commits/master/src/data/finsupp/basic.lean, I'd say pay attention to #15443, #15642, #16017.

Stuart Presnell (Aug 17 2022 at 14:47):

Thanks

Junyan Xu (Aug 17 2022 at 14:51):

It's 700 lines but the "incoming change" is empty... I guess you can just accept the "current change".

Stuart Presnell (Aug 17 2022 at 15:19):

No, it's not so straightforward, because the >>>>>>> master side contains material that I've cut out and moved elsewhere, so I don't want to keep it here.

Eric Wieser (Aug 17 2022 at 15:20):

I would maybe recommend splitting one part out at a time rather than all three or so new files at once

Eric Wieser (Aug 17 2022 at 15:20):

That way the conflicts are smaller

Stuart Presnell (Aug 17 2022 at 15:22):

Oh, but I guess I could resolve the conflict by keeping everything, as @Junyan Xu suggests, and then I'll get compilation errors where definition/lemma names are repeated, which will tell me what I should delete from data/finsupp/basic.

Stuart Presnell (Aug 17 2022 at 15:24):

That guarantees I don't delete anything I shouldn't in the merge conflict, and then gets the compiler to do the work of figuring out what I should delete rather than manually checking it all lemma by lemma.

Stuart Presnell (Aug 18 2022 at 19:49):

Ok, the merge conflict is resolved and I've edited the docstrings. But I also moved one more lemma that I'd missed from finsupp/basic to algebra/big_operators/finsupp, so I'm waiting for CI to give it the all clear.

Stuart Presnell (Aug 19 2022 at 18:15):

Could I ask for a pause on changes to data/finsupp/basic until #15699 is merged? I just resolved another merge conflict that arose while I was waiting for the previously fixed version to compile.

Stuart Presnell (Aug 19 2022 at 18:18):

It's queued to build, delegated, and I've given the bors r+ instruction, so hopefully it should merge in the next few hours.

Damiano Testa (Aug 19 2022 at 19:09):

Stuart, it seems that my PR is building with yours! I have not touched finsupp.basic, but am importing it: let's see how it plays out!

Damiano Testa (Aug 19 2022 at 19:10):

(I had merged my PR with yours and everything worked earlier today.)

Stuart Presnell (Aug 20 2022 at 00:51):

... and it's done!

Stuart Presnell (Aug 20 2022 at 01:00):

The next project will be to split dfinsupp along similar lines, as discussed above.

Damiano Testa (Aug 20 2022 at 01:25):

I already opened #16165 to take advantage of the refactor!

Damiano Testa (Aug 20 2022 at 01:26):

Thanks for this: small files are much better!

Stuart Presnell (Aug 20 2022 at 20:19):

I've started a simple refactor of data/dfinsupp/basic by moving ~560 lines about dfinsupp.sum and dfinsupp.prod to a new file, algebra/big_operators/dfinsupp (#16174).

Stuart Presnell (Aug 21 2022 at 11:23):

This now passes CI


Last updated: Dec 20 2023 at 11:08 UTC