Zulip Chat Archive

Stream: mathlib4

Topic: mk_all.sh


Mario Carneiro (May 28 2023 at 19:20):

While writing the mathport makefile modifications for archive/counterexamples, I noticed that mk_all.sh is currently broken when running on the archive, as it produces a file like this:

import .100-theorems-list.16_abel_ruffini
import .100-theorems-list.30_ballot_problem
import .100-theorems-list.37_solution_of_cubic
import .100-theorems-list.42_inverse_triangle_sum
import .100-theorems-list.45_partition
import .100-theorems-list.54_konigsberg
import .100-theorems-list.57_herons_formula
import .100-theorems-list.70_perfect_numbers
import .100-theorems-list.73_ascending_descending_sequences
import .100-theorems-list.81_sum_of_prime_reciprocals_diverges
import .100-theorems-list.82_cubing_a_cube
import .100-theorems-list.83_friendship_graphs
import .100-theorems-list.93_birthday_problem
import .100-theorems-list.9_area_of_a_circle
import .arithcc
import .examples.mersenne_primes

Eric Wieser (May 28 2023 at 19:20):

This is the mathlib3 mk-all right?

Mario Carneiro (May 28 2023 at 19:21):

yes

Eric Wieser (May 28 2023 at 19:21):

I made a PR to fix this long ago, but the response was "use leanproject mk-all instead"

Mario Carneiro (May 28 2023 at 19:21):

the shell script, not the leanproject command

Mario Carneiro (May 28 2023 at 19:21):

mathport doesn't depend on leanproject currently

Mario Carneiro (May 28 2023 at 19:21):

if you have a shell script handy that would be swell

Mario Carneiro (May 28 2023 at 19:22):

even just putting french quotes around everything would suffice

Mario Carneiro (May 28 2023 at 19:23):

but this is really not fun to read:

| sed 's,^\./,,;s,^\.\./[^/]*,,;s,/,.,g;s,\.lean$,,;s,^,import ,' \

Eric Wieser (May 28 2023 at 19:23):

#11001

Mario Carneiro (May 28 2023 at 19:23):

I'm sure I just have to put a french quote in there... somewhere...

Eric Wieser (May 28 2023 at 19:24):

Yeah, this is why the "use leanproject" argument is convincing

Mario Carneiro (May 28 2023 at 19:25):

now if only that PR can be revived and merged

Mario Carneiro (May 28 2023 at 19:25):

otherwise I'll be copying it into mathport

Mario Carneiro (May 28 2023 at 19:26):

I don't think there is any good reason to have a script in mathlib with a known bug

Eric Wieser (May 28 2023 at 19:26):

I've re-opened it, feel free to merge

Eric Wieser (May 28 2023 at 19:27):

Alternatively, if you want to drop the gross sed stuff I have some python functions here

Mario Carneiro (May 28 2023 at 19:28):

lean would of course also be an option

Mario Carneiro (May 28 2023 at 19:30):

ah, not quite...

import «».«100-theorems-list».«16_abel_ruffini»

Eric Wieser (May 28 2023 at 19:33):

Ah, I didn't try to fix relative imports

Eric Wieser (May 28 2023 at 19:34):

Or maybe the script is just garbage

Mario Carneiro (May 28 2023 at 19:34):

it's not a relative import, it seems to be a weird interaction with the funny directory traversal stuff

Eric Wieser (May 28 2023 at 19:34):

Should #11001 come back off the queue?

Mario Carneiro (May 28 2023 at 19:34):

it wants me to call ./scripts/mk_all.sh ../archive

Eric Wieser (May 28 2023 at 19:35):

I think you probably want (cd archive && ./scripts/mk_all.sh)

Mario Carneiro (May 28 2023 at 19:35):

but the find returns

../archive/imo/imo1962_q4.lean
../archive/imo/imo2011_q3.lean
../archive/imo/imo1981_q3.lean
../archive/imo/imo2013_q1.lean
../archive/imo/imo2013_q5.lean

and sed doesn't seem to be expecting it

Eric Wieser (May 28 2023 at 19:35):

Are you sure you don't just want to add leanproject as a dependency? For better or for worse, it's the monolith that contains random tools you need to write somewhat sane scripts to mangle lean

Mario Carneiro (May 28 2023 at 19:35):

I think so too, but the first part of the script is baking in some stuff which makes it absolute wrt src/

Damiano Testa (May 28 2023 at 19:35):

sed first replaces a single initial dot, then does not find two initial dots left...

Damiano Testa (May 28 2023 at 19:36):

try removing the first sed instruction.

Mario Carneiro (May 28 2023 at 19:36):

if you do cd archive && ../scripts/mk_all.sh you still get the contents of src

Mario Carneiro (May 28 2023 at 19:38):

I think I will just write the 3-line bash call at the end in the makefile

Damiano Testa (May 28 2023 at 19:40):

Is this what you want?

git ls-files | sed -n '/\.lean/ { s=\.lean$== ; s=/=.=g; s=^src\.==; s=^=import = ; p }'

Mario Carneiro (May 28 2023 at 19:41):

what i have so far is

find . -name \*.lean -not -name archive_all.lean \
  | sed 's,^\./,,;s,^\.\./[^/]*,,;s,/,».«,g;s,\.lean$,»,;s,^,import «,' \
  | sort

Mario Carneiro (May 28 2023 at 19:41):

your script looks to be the same except it is missing the french quote handling

Damiano Testa (May 28 2023 at 19:42):

oh, you need french quotes and src present?

Mario Carneiro (May 28 2023 at 19:42):

no src, I'm running it from archive/ and the result is

import «100-theorems-list».«16_abel_ruffini»
import «100-theorems-list».«30_ballot_problem»
import «100-theorems-list».«37_solution_of_cubic»
import «100-theorems-list».«42_inverse_triangle_sum»
import «100-theorems-list».«45_partition»
import «100-theorems-list».«54_konigsberg»
...
import «miu_language».«decision_suf»
import «oxford_invariants».«2021summer».«week3_p1»
import «sensitivity»

Damiano Testa (May 28 2023 at 19:44):

so, how is this from archive?

git ls-files | sed -n '/\.lean/ { s=\.lean$== ; s=^src/==; s=/=».«=g; s=^=import «= ; s=$=»= ; p }'

Mario Carneiro (May 28 2023 at 19:45):

seems to work

Damiano Testa (May 28 2023 at 19:46):

the src part is obviously not needed if you only run in in archive...

Mario Carneiro (May 28 2023 at 19:48):

I think you are overestimating how much I can parse of that line

Mario Carneiro (May 28 2023 at 19:49):

the spaces are appreciated though

Damiano Testa (May 28 2023 at 19:53):

Semicolons split instructions. The instructions in that case are simple substitutions.

Damiano Testa (May 28 2023 at 19:53):

Anyway, if that works, great! If it misses something, let me know!

Damiano Testa (May 28 2023 at 20:04):

It is quite surprising the contrast between the usefulness of sed and how much its syntax looks like white noise.

Mario Carneiro (May 28 2023 at 20:12):

Woohoo, it (almost) worked! It made it through the whole archive and got caught on Archive::all:

import Archive.Imo.Imo2005Q4 failed, environment already contains 'a' from Archive.Imo.Imo1998Q2

Mario Carneiro (May 28 2023 at 20:13):

I thought this was fixed?

Mario Carneiro (May 28 2023 at 20:16):

Oh I see, imo1998_q2 defines A and imo2005_q4 defines a

Mario Carneiro (May 28 2023 at 20:16):

we need a linter for one-letter global declarations

Kevin Buzzard (May 28 2023 at 20:17):

Yeah, I don't think anything was fixed: the mathlib3 docs are still generating spurious links to things like M and f (see e.g. here ("Let f : M → M'."). The suggestion was to put everything in archive into a namespace.

Mario Carneiro (May 28 2023 at 20:17):

I thought that was already done and dusted or else I wouldn't have written the script to import everything to one top level all.lean file

Scott Morrison (May 28 2023 at 20:36):

I made a PR that properly namespaced all the IMO files, but it was based on top of my "move everything to src" PR. Let me see if I can disentangle.

Mario Carneiro (May 28 2023 at 20:38):

while you are at it, please add *.tlean and *.ast.json to the mathlib .gitignore

Scott Morrison (May 28 2023 at 20:41):

#19122

Eric Wieser (May 28 2023 at 21:20):

The one downside to namespacing everything is all the docs pages about the imo problems become harder to read

Damiano Testa (May 28 2023 at 21:22):

Could the namespace be a non-printing character?

Mario Carneiro (May 28 2023 at 21:34):

Eric Wieser said:

The one downside to namespacing everything is all the docs pages about the imo problems become harder to read

That seems like a peculiarity of our docgen though. Most code documentation tools I can think of display functions in a module without the namespace

Eric Wieser (May 28 2023 at 21:35):

I somewhat agree; doc-gen certainly bears some of the blame here

Mario Carneiro (May 28 2023 at 21:35):

we could implement something like the vscode sticky namespace line

Eric Wieser (May 28 2023 at 21:36):

What code documentation tools are you thinking of?

Mario Carneiro (May 28 2023 at 21:36):

java, python, c / doxygen, rustdoc

Mario Carneiro (May 28 2023 at 21:36):

haskell

Eric Wieser (May 28 2023 at 21:37):

Python has a pretty different namespacing model, and c doen't have one at all!

Eric Wieser (May 28 2023 at 21:38):

Choosing a random doxygen example, https://eigen.tuxfamily.org/dox/group__SparseCore__Module.html seems to show the full Eigen namespace everywhere

Mario Carneiro (May 28 2023 at 21:38):

rustdoc is probably the closest, in that functions are often distributed into "impls" that are roughly the size of namespace scopes in a lean file

Eric Wieser (May 28 2023 at 21:39):

I think my argument here is that the best compromise might be to namespace separate archive files in imo_XXXX and imo_YYYY, but not make the statement even less readable with an archive on the front of everything

Mario Carneiro (May 28 2023 at 21:40):

not everywhere, e.g. here the headline is innerIndexPtr() without the namespace

Mario Carneiro (May 28 2023 at 21:41):

I don't think anyone was suggesting an archive.* on everything

Eric Wieser (May 28 2023 at 21:41):

That's what Scott's #19122 does, so at least one person was!

Mario Carneiro (May 28 2023 at 21:42):

oh, yeah I think imo1234_q17 is plenty specific

Damiano Testa (May 28 2023 at 21:42):

Why can't doc-gen have a first step that strips the namespace on the archive files, before actually producing the docs?

Mario Carneiro (May 28 2023 at 21:42):

because then all uses of a will get linked again

Mario Carneiro (May 28 2023 at 21:43):

but also, it's a misuse of the global namespace

Mario Carneiro (May 28 2023 at 21:43):

this is 100% a display issue

Damiano Testa (May 28 2023 at 21:44):

So the namespace could be <!--archive-->? :lol:

Eric Wieser (May 28 2023 at 21:45):

Mario Carneiro said:

this is 100% a display issue

The archive bit is, but the imo bit is a genuine hygiene problem that can cause archive entries to clash with one another

Eric Wieser (May 28 2023 at 21:45):

I already fixed the existing clashes in mathlib3, but I didn't give every problem its own namespace

Mario Carneiro (May 28 2023 at 21:46):

I mean that they should be namespaced, because that's the right way to build a collection of theorems about this stuff, and the display issue is not showing the namespace everywhere in the docs

Damiano Testa (May 28 2023 at 21:50):

Would it be hard to run a script on the docs with the namespace that removes the namespace for archive only in the visible html, but not on the actual URLs?

Mario Carneiro (May 28 2023 at 21:51):

or we could just fix docgen...?

Eric Wieser (May 28 2023 at 21:51):

It would be easy to add a special case in doc-gen to not display the "archive" prefix

Eric Wieser (May 28 2023 at 21:51):

But that's not a very good solution if other projects not called "archive" want this behavior

Henrik Böving (May 28 2023 at 21:51):

are you talking about the lean 4 doc-gen here? Can i help you?

Mario Carneiro (May 28 2023 at 21:51):

I guess it affects both docgens

Eric Wieser (May 28 2023 at 21:52):

Lean4 doc-gen doesn't yet have the offending lean files that cause the issue; but it will soon!

Damiano Testa (May 28 2023 at 21:54):

There could be a "blacklist" of namespaces, that do not show up on the webpages, but are present in the URLs, right? So, we can initially have archive as blacklisted. And we could add more, if needed.

Yaël Dillies (May 28 2023 at 21:55):

Eric Wieser said:

The one downside to namespacing everything is all the docs pages about the imo problems become harder to read

I see you haven't read much category_theory documentation :stuck_out_tongue_closed_eyes:

Mario Carneiro (May 28 2023 at 21:56):

I do think the situation is just a tiny bit worse when the namespace looks like a hash

Mario Carneiro (May 28 2023 at 22:25):

(In case it wasn't clear, I think that imo2010_q2 is a terrible name for a theorem. Unless you keep a database of IMO problems in your head and keep it up to date, it is way too easy to confuse it with imo2001_q2 even if you aren't dyslexic. And it is completely non-descriptive of what the theorem is actually about.)

Eric Wieser (May 28 2023 at 22:28):

I don't think it's terrible when the end goal is for the archive folder to be that database

Damiano Testa (May 28 2023 at 22:31):

Maybe I am misunderstanding: i find imo2010_q2 a not-so-bad for a namespace, though not for the theorem name.

Johan Commelin (May 29 2023 at 04:55):

Can we merge the PR, and finetune the namespacing solution later?

Yury G. Kudryashov (May 29 2023 at 07:20):

IMHO, facts from archive/imo should not be used outside of their files, thus imo2010_q2 is a good namespace/name for formalization of IMO 2010 Q2. In many cases, the only reason this problem is interesting to somebody is that it was at IMO.

Eric Wieser (May 29 2023 at 07:21):

@Johan Commelin, my impression is that the re-namespacing is not blocking mathport any more

Yury G. Kudryashov (May 29 2023 at 07:22):

If a problem has meaning outside of the IMO context, then we can have an alias with a better name.

Eric Wieser (May 29 2023 at 07:23):

So given we have a majority of people in favor of dropping archive from the namespace and there is no longer any urgency, I think it makes sense to do that before merging.

Patrick Stevens (May 29 2023 at 07:37):

(I think the most principled answer would be to give the theorems descriptive names if possible, and then have archive/imo.yaml which correlates the problems with their proofs, like we do for Freek's list - but also this is the very definition of bikeshedding, in a context where the wrong decision can easily be fixed later)

Johan Commelin (May 29 2023 at 07:41):

Eric Wieser said:

Johan Commelin, my impression is that the re-namespacing is not blocking mathport any more

It might still help with preventing :down:
Mario Carneiro said:

Woohoo, it (almost) worked! It made it through the whole archive and got caught on Archive::all:

import Archive.Imo.Imo2005Q4 failed, environment already contains 'a' from Archive.Imo.Imo1998Q2

Eric Wieser (May 29 2023 at 07:49):

Somehow I'd assumed that Mario had already fixed that and shipped it, but I now see the PR is still open after all

Mario Carneiro (May 29 2023 at 08:07):

I fixed it locally, but seeing as there was an active PR about it I didn't commit it to mathlib3

Scott Morrison (May 30 2023 at 01:18):

Sorry, I won't have a chance to look at the namespacing PR for 24 hours, if someone want to take it over please do.

Damiano Testa (May 30 2023 at 04:55):

I can try to renamespace the files in archive.

Damiano Testa (May 30 2023 at 04:55):

I am going to use the filename for the imo problems as the first layer of namespace, plus whatever they already had. Is everyone happy with this decision? This is very easy to change later on, anyway.

Damiano Testa (May 30 2023 at 04:59):

In fact, for the files contained inside a folder within archive, I would use the folder name. Thus use 100-theorems(-list), imo.

Damiano Testa (May 30 2023 at 05:00):

The files directly in the archive folder arithcc and sensitivity, already had their own namespace, so I am tempted to leave them as they were (namely, remove the archive namespace.

Johan Commelin (May 30 2023 at 06:59):

@Damiano Testa Just to make sure... you are aware of #19122?

Damiano Testa (May 30 2023 at 07:00):

Johan, yes, thanks! I am editing that pr, in fact.

Damiano Testa (May 30 2023 at 07:01):

That PR introduced the archive namespace for everything, while I think that the consensus is to have slightly more bespoke namespaces for the files in archive.

Damiano Testa (May 30 2023 at 07:12):

Namespacing everything causes issues with name resolution: open and namespace do not interact very well.

Damiano Testa (May 30 2023 at 07:14):

Ok, it seems that not that many files are affected: should I fix these issues by changing open ... and possibly adding some explicit namespaces/_root_. to declarations?

Damiano Testa (May 30 2023 at 13:47):

I opened #19129: it is a PR similar to Scott's, but so far only deals with archive. I add the two namespaces

  • imo and
  • theorems_100,

though less aggressively (i.e. not in all files contained in imo or 100-theorems-list).

The PR description gives some more context.

Let me know if this looks ok! On my computer,

lean --make archive

is successful.

Damiano Testa (May 30 2023 at 13:51):

If I did not make mistakes, all the added namespaces are in files where a namespace is wanted. There may be files in which we would like to add a namespace, but this PR has not added it (yet).

Damiano Testa (May 30 2023 at 13:51):

If this is in line with what everyone likes, then I can process similarly also counterexamples.

Mario Carneiro (May 30 2023 at 13:55):

Damiano Testa said:

Let me know if this looks ok! On my computer,

lean --make archive

is successful.

the real test is whether ./scripts/mk_all.sh && lean --make archive is successful

Mario Carneiro (May 30 2023 at 13:55):

i.e. can you import everything in the archive at once

Mario Carneiro (May 30 2023 at 13:56):

(substitute the scripts from up-thread for mk_all.sh)

Damiano Testa (May 30 2023 at 13:58):

Let me try!

Eric Wieser (May 30 2023 at 13:58):

CI already tests this

Eric Wieser (May 30 2023 at 13:58):

(and it already passes on master)

Eric Wieser (May 30 2023 at 13:59):

It just doesn't pass when you let mathport rename two different names to the same name

Damiano Testa (May 30 2023 at 14:00):

Eric, I made the mistake of merging master half-way, so I think that CI has to rebuild (almost) all master, before getting to the tests.

Damiano Testa (May 30 2023 at 14:00):

So, I am going to try locally, to see if there are further issues, before addressing counterexamples.

Eric Wieser (May 30 2023 at 14:01):

I don't think there's any need to do so immediately; I would guess the only clash mathport cares about is in the imo folder

Damiano Testa (May 30 2023 at 14:11):

Ok, I'll wait, since I am not sure that I am testing the right thing on my computer and the real test is whether it passes CI or not!

Eric Wieser (May 30 2023 at 14:15):

(the reason this is already in CI is because doc-gen would crash if we got it wrong, and no one looks at those CI logs)

Damiano Testa (May 30 2023 at 15:30):

Is it the "check archive and counterexample directories have unique identifiers "?

Damiano Testa (May 30 2023 at 15:41):

Ok, good, that check passed , then!

Damiano Testa (May 30 2023 at 16:56):

Looking at the files in counterexamples, I think that adding a blanket namespace is probably better.

Damiano Testa (May 30 2023 at 16:56):

I am going to add namespace counteraxample (singular) to all of them. Let me know if you prefer otherwise!

Damiano Testa (May 30 2023 at 18:12):

#19129 is ready, I hope!

I namespace 54 files, as opposed to the 69 that Scott's PR namespaced. There may still be some of the currently non-namespaced files that might benefit from a namespace, but I'll leave this be for the moment.

Let me know if you have any comments or namespace requests! :smile:

Damiano Testa (Jun 01 2023 at 20:31):

Given Eric's comment about 100-theorems being mostly unported, Scott merged the namespacing PR!

What else is needed to get the files in archive and counterexamples available for porting?

Eric Wieser (Jun 01 2023 at 22:03):

My understanding is the answer is:

  • Mario needs to revive their mathport PR to add the necessary CI scripts
  • I might need to tweak the script that @Johan Commelin runs in order to correctly consume data about these new files

Mario Carneiro (Jun 02 2023 at 04:00):

the mathport PR has the CI scripts already, that's what the PR was about

Damiano Testa (Jun 02 2023 at 05:56):

Ok, great!

Of course, this is not a priority. I also think that porting these files should be mostly fun and straightforward.

Yaël Dillies (Sep 10 2023 at 15:43):

Do we have a mathlib4 mk_all somewhere?

Damiano Testa (Sep 10 2023 at 15:48):

Isn't there already by default Mathlib.lean?

Damiano Testa (Sep 10 2023 at 15:49):

Or do you mean to lake build all?

Yaël Dillies (Sep 10 2023 at 15:54):

Nah, I want a script to generate MyProject.lean.

Damiano Testa (Sep 10 2023 at 15:56):

git ls-files 'Mathlib/*.lean' | sort | sed 's=^.=import &=; s=/=.=g ; s=\.lean$==' > MyPath.lean

This might work...

Damiano Testa (Sep 10 2023 at 16:26):

Running

git ls-files 'Mathlib/*.lean' | sed 's=^.=import &=; s=/=.=g ; s=\.lean$==' | diff - Mathlib.lean

produces no output. So, I think that

git ls-files '*.lean' | sed 's=^.=import &=; s=/=.=g ; s=\.lean$==' > my/path.lean

also has good chances of working!

Eric Wieser (Sep 11 2023 at 08:42):

The exact command to use can be found in the CI script


Last updated: Dec 20 2023 at 11:08 UTC