Zulip Chat Archive

Stream: general

Topic: unnecessary imports


Reid Barton (Apr 23 2020 at 15:23):

I don't see the point of #2496; what is supposed to be the gain?

Kevin Buzzard (Apr 23 2020 at 15:24):

Perhaps Reid is worried by the fact that there could be plenty of instances where we import X but it's an "over-import", where in fact we only need to import some Y<X; this might now be harder to spot.

Mario Carneiro (Apr 23 2020 at 15:25):

I thought there was a lint for that

Mario Carneiro (Apr 23 2020 at 15:25):

I recall @Simon Hudon working on that exact problem

Reid Barton (Apr 23 2020 at 15:26):

There are two kinds of redundant imports:

  • "syntactically unnecessary" imports: B imports A, and C imports both A and B. Then the import of A in C is syntactically unnecessary. These are the ones removed by #2496; they are easy to detect lexically. For much the same reason, removing them doesn't accomplish anything obvious, since for example changing A means rebuilding C anyways, because of the transitive import.
  • "semantically unnecessary" imports: B imports A, but B would work just as well if it didn't. These are harder to detect (especially given simp and other forms of introspection).

Reid Barton (Apr 23 2020 at 15:28):

Now imagine A defines a, B defines b and imports A even though it doesn't use it, and someone wanting to write a module C wants to use both a and b. The natural thing to do in general is to import both A and B (otherwise you might have to carefully track down transitive imports to see that actually one is redundant).

Reid Barton (Apr 23 2020 at 15:28):

This creates a syntactically unnecessary import of A in C.

Reid Barton (Apr 23 2020 at 15:29):

Now suppose someone later notices that B doesn't actually use A at all, so B may as well not import A.

Reid Barton (Apr 23 2020 at 15:29):

If you still have the import of A in C, then C will continue to work. If you removed it, however, because it was syntactically unnecessary, now you broke C by changing B, and you have to figure out what module C was supposed to be importing that provided a.

Reid Barton (Apr 23 2020 at 15:30):

Of course you could argue that simp means that changing any import anywhere can break anything anywhere else, but less breakage is still better than more breakage.

Reid Barton (Apr 23 2020 at 15:30):

Actually, Lean's design where importing B (which imports A) also gives you A is quite unusual among programming languages, I think.

Mario Carneiro (Apr 23 2020 at 15:31):

I agree it's not great, but for a theorem prover it's unclear what other options there are

Reid Barton (Apr 23 2020 at 15:32):

Anyways I don't want to advocate for any particular course of action, I just want to say that I think this removal of "unnecessary" imports is misguided: it doesn't achieve anything, and potentially costs.

Gabriel Ebner (Apr 23 2020 at 15:35):

I think we can let it slide if this is Scott's way of testing his import-modifying scripts. As long as the promised import-reducing PRs are coming soon, I'm happy with this first step.

Mario Carneiro (Apr 23 2020 at 15:37):

I can imagine a build tool that fixes the semantically unnecessary imports as follows: Suppose C imports (directly) files A and B, and assume that neither one imports the other. (If not, do the transitive reduction first.) For each file in the list, try replacing it with the list of all files that it imports; that is, supposing that A imports A1 and A2, try compiling C using A1 A2 B. If it succeeds, go back to the start, otherwise put A back and try replacing B with all of its imports. Eventually, you get to the stage where removing any single import from the list causes the file to break; C is now minimized. Continue (in forward dependency order) until all files are minimized.

Kevin Buzzard (Apr 23 2020 at 15:37):

will it change compile time?

Mario Carneiro (Apr 23 2020 at 15:38):

My guess is that this is about 10 times as expensive as a regular build

Mario Carneiro (Apr 23 2020 at 15:38):

so maybe once a week is tolerable

Kevin Buzzard (Apr 23 2020 at 15:39):

Maybe one advantage is that if someone is editing a file half way up the import tree, then recompiling all of mathlib afterwards might not hurt quite as much.

Mario Carneiro (Apr 23 2020 at 15:39):

You can short circuit the check if you notice that a proof in the current file uses a theorem declared in a direct import, since that means the file must have been used

Mario Carneiro (Apr 23 2020 at 15:40):

Lean automatically ignores transitive imports, so it doesn't make a difference to lean

Gabriel Ebner (Apr 23 2020 at 15:40):

I think this really should not be part of CI. This takes way too much time and should be executed manually or as a cron job. Note: if you remove an import in data.nat.basic you need to recompile all of mathlib to check whether that's allowed, no heuristic helps you here.

Mario Carneiro (Apr 23 2020 at 15:41):

no, you only need to recompile data.nat.basic itself

Simon Hudon (Apr 23 2020 at 15:41):

@Mario Carneiro: that sounds excessive. I think the best approach would be to add a flag to Lean

Mario Carneiro (Apr 23 2020 at 15:42):

I forgot to mention that when you remove an import, for example replacing A with A1 A2 in file C, you add A to every file that imports C to make sure that they don't break due to transitive imports being lost

Simon Hudon (Apr 23 2020 at 15:42):

and analyze the dependency between declaration (including what the proof scripts call)

Reid Barton (Apr 23 2020 at 15:42):

Kevin Buzzard said:

Maybe one advantage is that if someone is editing a file half way up the import tree, then recompiling all of mathlib afterwards might not hurt quite as much.

Removing semantically unnecessary imports can indeed accomplish this, as well as increasing build parallelism (but not the total amount of work).

Reid Barton (Apr 23 2020 at 15:42):

Although in practice, it probably wouldn't help much, I guess.

Mario Carneiro (Apr 23 2020 at 15:45):

The process I described has you build C without A, and without B, so two compiles of one file (and none of its dependencies). All of C's dependencies are already minimized, so they don't have to be touched. So I do think that this is only as expensive as mathlib compile * average number of imports in a file (weighted by file cost)

Reid Barton (Apr 23 2020 at 15:46):

Mario Carneiro said:

I forgot to mention that when you remove an import, for example replacing A with A1 A2 in file C, you add A to every file that imports C to make sure that they don't break due to transitive imports being lost

It would be an interesting experiment to see how this procedure works out in practice--I can imagine a change near the root of the dependency tree causing many imports to be added/removed near the leaves, but maybe it doesn't ever happen in reality.

Mario Carneiro (Apr 23 2020 at 15:47):

This is just the initial fixup pass, mind you - those files have not yet been processed, and when you get to D that imported C and now imports A and C, you might find that you can remove A again

Mario Carneiro (Apr 23 2020 at 15:52):

@Simon Hudon This is obviously a rather expensive analysis pass, but it's the only thing I think is likely to be 100% reliable (modulo quirks where import order matters). A lean flag or other mechanism for discovering that a file is needed is, like I said, a good way to speed up the process in the majority of cases, but lean files can set attributes, declare def_replacers, and all sorts of other funny things, and I don't think lean can keep track of it all reliably

Mario Carneiro (Apr 23 2020 at 15:53):

the best way to find out if an import is necessary is just to delete it and see if it breaks

Simon Hudon (Apr 23 2020 at 16:03):

That's fair. It might be accurate often enough that you'd want to run it first to select candidate redundant imports and try commenting those out only in the process of using your approach. An analysis of the declaration won't tell you that an import is needed when it's not so the analysis is complete but not sound (it will identify all redundant imports and then some)

Mario Carneiro (Apr 23 2020 at 16:07):

Assuming that this works from an already built mathlib, you can look at the olean files and immediately pick out all the directly used files based on theorems declared in file A and used in C

Mario Carneiro (Apr 23 2020 at 16:08):

and I think this covers 90% or more of imports (not counting those that will be removed)

Mario Carneiro (Apr 23 2020 at 16:08):

the hard case is import tactic, which I think is not uncommon

Mario Carneiro (Apr 23 2020 at 16:09):

although if you do some C++ magic to flag this in lean that will help

Mario Carneiro (Apr 23 2020 at 16:11):

even just the earlier idea of saving all the tactic expressions before by would solve this problem since if they get saved in oleans then you can link def to use

Simon Hudon (Apr 23 2020 at 16:17):

That's one idea. We could also just work on the C++ data structures so that we don't change the olean files that are produced

Scott Morrison (Apr 24 2020 at 02:25):

Sorry about this. I was experimenting last night with actually reducing the imports in a meaninful way. It's easy to build the graph whose vertices are files, and edges are "a declaration in X uses a declaration in Y". However there are many other reasons you might want to import a file. An obvious one is to use tactics or attributes, so I've added the simple mechanism of adding an edge to my graph for every "file X currently imports file Y, and file Y is under tactics/, or a short list of other tactics files".

Scott Morrison (Apr 24 2020 at 02:26):

After this, the error messages and failures are ... harder to diagnose.

Scott Morrison (Apr 24 2020 at 02:26):

I haven't had time yet to look at these failures in detail.

Scott Morrison (Apr 24 2020 at 02:27):

For now I'm just doing testing by replacing the entire import block with a new list (hence my desire for one import per line, to make it easy to decided what the import block even is), where that new list contains every import that appears to be needed.

Scott Morrison (Apr 24 2020 at 02:27):

If I can get that to compile, then I will run the same script that removed transitive imports to get back to a manageable list.

Scott Morrison (Apr 24 2020 at 02:28):

So yes, even though I didn't explain it well, the PR #2496 serves very little purpose except to verify that I know how to remove transitive imports, so I can use that step reliably now.

Scott Morrison (Apr 24 2020 at 02:28):

Sorry for the noise.

Reid Barton (May 17 2020 at 10:07):

Ahhhh, I'm trying to rearrange some imports between category theory and I think this is causing me headaches.

Reid Barton (May 17 2020 at 10:15):

All I wanted to do was write a lemma in one file and use it in another file...

Reid Barton (May 17 2020 at 10:19):

I think category theory might be due for a topology-style reorganization

Bhavik Mehta (May 17 2020 at 14:23):

Out of curiosity which bits are causing problems?

Reid Barton (May 17 2020 at 14:24):

See https://github.com/leanprover-community/mathlib/pull/2712/files#diff-f4371a7996ae088583250e08511b5045R149-R152 for the start of the story. I wanted to shuffle some imports around, but it broke several files that just imported category_theory.eq_to_hom because now they also needed to import the things it used to import.

Reid Barton (May 17 2020 at 14:25):

So, in the end, I gave up on the reorganization for now, which might be a good thing, I suppose.


Last updated: Dec 20 2023 at 11:08 UTC