Zulip Chat Archive
Stream: mathlib4
Topic: mk_all.sh
Mario Carneiro (May 28 2023 at 19:20):
While writing the mathport makefile modifications for archive/counterexamples, I noticed that mk_all.sh
is currently broken when running on the archive, as it produces a file like this:
import .100-theorems-list.16_abel_ruffini
import .100-theorems-list.30_ballot_problem
import .100-theorems-list.37_solution_of_cubic
import .100-theorems-list.42_inverse_triangle_sum
import .100-theorems-list.45_partition
import .100-theorems-list.54_konigsberg
import .100-theorems-list.57_herons_formula
import .100-theorems-list.70_perfect_numbers
import .100-theorems-list.73_ascending_descending_sequences
import .100-theorems-list.81_sum_of_prime_reciprocals_diverges
import .100-theorems-list.82_cubing_a_cube
import .100-theorems-list.83_friendship_graphs
import .100-theorems-list.93_birthday_problem
import .100-theorems-list.9_area_of_a_circle
import .arithcc
import .examples.mersenne_primes
Eric Wieser (May 28 2023 at 19:20):
This is the mathlib3 mk-all right?
Mario Carneiro (May 28 2023 at 19:21):
yes
Eric Wieser (May 28 2023 at 19:21):
I made a PR to fix this long ago, but the response was "use leanproject mk-all
instead"
Mario Carneiro (May 28 2023 at 19:21):
the shell script, not the leanproject command
Mario Carneiro (May 28 2023 at 19:21):
mathport doesn't depend on leanproject currently
Mario Carneiro (May 28 2023 at 19:21):
if you have a shell script handy that would be swell
Mario Carneiro (May 28 2023 at 19:22):
even just putting french quotes around everything would suffice
Mario Carneiro (May 28 2023 at 19:23):
but this is really not fun to read:
| sed 's,^\./,,;s,^\.\./[^/]*,,;s,/,.,g;s,\.lean$,,;s,^,import ,' \
Eric Wieser (May 28 2023 at 19:23):
Mario Carneiro (May 28 2023 at 19:23):
I'm sure I just have to put a french quote in there... somewhere...
Eric Wieser (May 28 2023 at 19:24):
Yeah, this is why the "use leanproject
" argument is convincing
Mario Carneiro (May 28 2023 at 19:25):
now if only that PR can be revived and merged
Mario Carneiro (May 28 2023 at 19:25):
otherwise I'll be copying it into mathport
Mario Carneiro (May 28 2023 at 19:26):
I don't think there is any good reason to have a script in mathlib with a known bug
Eric Wieser (May 28 2023 at 19:26):
I've re-opened it, feel free to merge
Eric Wieser (May 28 2023 at 19:27):
Alternatively, if you want to drop the gross sed
stuff I have some python functions here
Mario Carneiro (May 28 2023 at 19:28):
lean would of course also be an option
Mario Carneiro (May 28 2023 at 19:30):
ah, not quite...
import «».«100-theorems-list».«16_abel_ruffini»
Eric Wieser (May 28 2023 at 19:33):
Ah, I didn't try to fix relative imports
Eric Wieser (May 28 2023 at 19:34):
Or maybe the script is just garbage
Mario Carneiro (May 28 2023 at 19:34):
it's not a relative import, it seems to be a weird interaction with the funny directory traversal stuff
Eric Wieser (May 28 2023 at 19:34):
Should #11001 come back off the queue?
Mario Carneiro (May 28 2023 at 19:34):
it wants me to call ./scripts/mk_all.sh ../archive
Eric Wieser (May 28 2023 at 19:35):
I think you probably want (cd archive && ./scripts/mk_all.sh)
Mario Carneiro (May 28 2023 at 19:35):
but the find
returns
../archive/imo/imo1962_q4.lean
../archive/imo/imo2011_q3.lean
../archive/imo/imo1981_q3.lean
../archive/imo/imo2013_q1.lean
../archive/imo/imo2013_q5.lean
and sed
doesn't seem to be expecting it
Eric Wieser (May 28 2023 at 19:35):
Are you sure you don't just want to add leanproject
as a dependency? For better or for worse, it's the monolith that contains random tools you need to write somewhat sane scripts to mangle lean
Mario Carneiro (May 28 2023 at 19:35):
I think so too, but the first part of the script is baking in some stuff which makes it absolute wrt src/
Damiano Testa (May 28 2023 at 19:35):
sed
first replaces a single initial dot, then does not find two initial dots left...
Damiano Testa (May 28 2023 at 19:36):
try removing the first sed
instruction.
Mario Carneiro (May 28 2023 at 19:36):
if you do cd archive && ../scripts/mk_all.sh
you still get the contents of src
Mario Carneiro (May 28 2023 at 19:38):
I think I will just write the 3-line bash call at the end in the makefile
Damiano Testa (May 28 2023 at 19:40):
Is this what you want?
git ls-files | sed -n '/\.lean/ { s=\.lean$== ; s=/=.=g; s=^src\.==; s=^=import = ; p }'
Mario Carneiro (May 28 2023 at 19:41):
what i have so far is
find . -name \*.lean -not -name archive_all.lean \
| sed 's,^\./,,;s,^\.\./[^/]*,,;s,/,».«,g;s,\.lean$,»,;s,^,import «,' \
| sort
Mario Carneiro (May 28 2023 at 19:41):
your script looks to be the same except it is missing the french quote handling
Damiano Testa (May 28 2023 at 19:42):
oh, you need french quotes and src present?
Mario Carneiro (May 28 2023 at 19:42):
no src, I'm running it from archive/
and the result is
import «100-theorems-list».«16_abel_ruffini»
import «100-theorems-list».«30_ballot_problem»
import «100-theorems-list».«37_solution_of_cubic»
import «100-theorems-list».«42_inverse_triangle_sum»
import «100-theorems-list».«45_partition»
import «100-theorems-list».«54_konigsberg»
...
import «miu_language».«decision_suf»
import «oxford_invariants».«2021summer».«week3_p1»
import «sensitivity»
Damiano Testa (May 28 2023 at 19:44):
so, how is this from archive?
git ls-files | sed -n '/\.lean/ { s=\.lean$== ; s=^src/==; s=/=».«=g; s=^=import «= ; s=$=»= ; p }'
Mario Carneiro (May 28 2023 at 19:45):
seems to work
Damiano Testa (May 28 2023 at 19:46):
the src
part is obviously not needed if you only run in in archive...
Mario Carneiro (May 28 2023 at 19:48):
I think you are overestimating how much I can parse of that line
Mario Carneiro (May 28 2023 at 19:49):
the spaces are appreciated though
Damiano Testa (May 28 2023 at 19:53):
Semicolons split instructions. The instructions in that case are simple substitutions.
Damiano Testa (May 28 2023 at 19:53):
Anyway, if that works, great! If it misses something, let me know!
Damiano Testa (May 28 2023 at 20:04):
It is quite surprising the contrast between the usefulness of sed and how much its syntax looks like white noise.
Mario Carneiro (May 28 2023 at 20:12):
Woohoo, it (almost) worked! It made it through the whole archive and got caught on Archive::all
:
import Archive.Imo.Imo2005Q4 failed, environment already contains 'a' from Archive.Imo.Imo1998Q2
Mario Carneiro (May 28 2023 at 20:13):
I thought this was fixed?
Mario Carneiro (May 28 2023 at 20:16):
Oh I see, imo1998_q2
defines A
and imo2005_q4
defines a
Mario Carneiro (May 28 2023 at 20:16):
we need a linter for one-letter global declarations
Kevin Buzzard (May 28 2023 at 20:17):
Yeah, I don't think anything was fixed: the mathlib3 docs are still generating spurious links to things like M
and f
(see e.g. here ("Let f : M → M'."). The suggestion was to put everything in archive into a namespace.
Mario Carneiro (May 28 2023 at 20:17):
I thought that was already done and dusted or else I wouldn't have written the script to import everything to one top level all.lean
file
Scott Morrison (May 28 2023 at 20:36):
I made a PR that properly namespaced all the IMO files, but it was based on top of my "move everything to src" PR. Let me see if I can disentangle.
Mario Carneiro (May 28 2023 at 20:38):
while you are at it, please add *.tlean
and *.ast.json
to the mathlib .gitignore
Scott Morrison (May 28 2023 at 20:41):
Eric Wieser (May 28 2023 at 21:20):
The one downside to namespacing everything is all the docs pages about the imo problems become harder to read
Damiano Testa (May 28 2023 at 21:22):
Could the namespace be a non-printing character?
Mario Carneiro (May 28 2023 at 21:34):
Eric Wieser said:
The one downside to namespacing everything is all the docs pages about the imo problems become harder to read
That seems like a peculiarity of our docgen though. Most code documentation tools I can think of display functions in a module without the namespace
Eric Wieser (May 28 2023 at 21:35):
I somewhat agree; doc-gen certainly bears some of the blame here
Mario Carneiro (May 28 2023 at 21:35):
we could implement something like the vscode sticky namespace line
Eric Wieser (May 28 2023 at 21:36):
What code documentation tools are you thinking of?
Mario Carneiro (May 28 2023 at 21:36):
java, python, c / doxygen, rustdoc
Mario Carneiro (May 28 2023 at 21:36):
haskell
Eric Wieser (May 28 2023 at 21:37):
Python has a pretty different namespacing model, and c doen't have one at all!
Eric Wieser (May 28 2023 at 21:38):
Choosing a random doxygen example, https://eigen.tuxfamily.org/dox/group__SparseCore__Module.html seems to show the full Eigen
namespace everywhere
Mario Carneiro (May 28 2023 at 21:38):
rustdoc is probably the closest, in that functions are often distributed into "impls" that are roughly the size of namespace scopes in a lean file
Eric Wieser (May 28 2023 at 21:39):
I think my argument here is that the best compromise might be to namespace separate archive files in imo_XXXX
and imo_YYYY
, but not make the statement even less readable with an archive
on the front of everything
Mario Carneiro (May 28 2023 at 21:40):
not everywhere, e.g. here the headline is innerIndexPtr()
without the namespace
Mario Carneiro (May 28 2023 at 21:41):
I don't think anyone was suggesting an archive.*
on everything
Eric Wieser (May 28 2023 at 21:41):
That's what Scott's #19122 does, so at least one person was!
Mario Carneiro (May 28 2023 at 21:42):
oh, yeah I think imo1234_q17
is plenty specific
Damiano Testa (May 28 2023 at 21:42):
Why can't doc-gen have a first step that strips the namespace on the archive files, before actually producing the docs?
Mario Carneiro (May 28 2023 at 21:42):
because then all uses of a
will get linked again
Mario Carneiro (May 28 2023 at 21:43):
but also, it's a misuse of the global namespace
Mario Carneiro (May 28 2023 at 21:43):
this is 100% a display issue
Damiano Testa (May 28 2023 at 21:44):
So the namespace could be <!--archive-->
? :lol:
Eric Wieser (May 28 2023 at 21:45):
Mario Carneiro said:
this is 100% a display issue
The archive
bit is, but the imo
bit is a genuine hygiene problem that can cause archive entries to clash with one another
Eric Wieser (May 28 2023 at 21:45):
I already fixed the existing clashes in mathlib3, but I didn't give every problem its own namespace
Mario Carneiro (May 28 2023 at 21:46):
I mean that they should be namespaced, because that's the right way to build a collection of theorems about this stuff, and the display issue is not showing the namespace everywhere in the docs
Damiano Testa (May 28 2023 at 21:50):
Would it be hard to run a script on the docs with the namespace that removes the namespace for archive only in the visible html, but not on the actual URLs?
Mario Carneiro (May 28 2023 at 21:51):
or we could just fix docgen...?
Eric Wieser (May 28 2023 at 21:51):
It would be easy to add a special case in doc-gen to not display the "archive" prefix
Eric Wieser (May 28 2023 at 21:51):
But that's not a very good solution if other projects not called "archive" want this behavior
Henrik Böving (May 28 2023 at 21:51):
are you talking about the lean 4 doc-gen here? Can i help you?
Mario Carneiro (May 28 2023 at 21:51):
I guess it affects both docgens
Eric Wieser (May 28 2023 at 21:52):
Lean4 doc-gen doesn't yet have the offending lean files that cause the issue; but it will soon!
Damiano Testa (May 28 2023 at 21:54):
There could be a "blacklist" of namespaces, that do not show up on the webpages, but are present in the URLs, right? So, we can initially have archive as blacklisted. And we could add more, if needed.
Yaël Dillies (May 28 2023 at 21:55):
Eric Wieser said:
The one downside to namespacing everything is all the docs pages about the imo problems become harder to read
I see you haven't read much category_theory
documentation :stuck_out_tongue_closed_eyes:
Mario Carneiro (May 28 2023 at 21:56):
I do think the situation is just a tiny bit worse when the namespace looks like a hash
Mario Carneiro (May 28 2023 at 22:25):
(In case it wasn't clear, I think that imo2010_q2
is a terrible name for a theorem. Unless you keep a database of IMO problems in your head and keep it up to date, it is way too easy to confuse it with imo2001_q2
even if you aren't dyslexic. And it is completely non-descriptive of what the theorem is actually about.)
Eric Wieser (May 28 2023 at 22:28):
I don't think it's terrible when the end goal is for the archive
folder to be that database
Damiano Testa (May 28 2023 at 22:31):
Maybe I am misunderstanding: i find imo2010_q2
a not-so-bad for a namespace, though not for the theorem name.
Johan Commelin (May 29 2023 at 04:55):
Can we merge the PR, and finetune the namespacing solution later?
Yury G. Kudryashov (May 29 2023 at 07:20):
IMHO, facts from archive/imo
should not be used outside of their files, thus imo2010_q2
is a good namespace/name for formalization of IMO 2010 Q2. In many cases, the only reason this problem is interesting to somebody is that it was at IMO.
Eric Wieser (May 29 2023 at 07:21):
@Johan Commelin, my impression is that the re-namespacing is not blocking mathport any more
Yury G. Kudryashov (May 29 2023 at 07:22):
If a problem has meaning outside of the IMO context, then we can have an alias with a better name.
Eric Wieser (May 29 2023 at 07:23):
So given we have a majority of people in favor of dropping archive
from the namespace and there is no longer any urgency, I think it makes sense to do that before merging.
Patrick Stevens (May 29 2023 at 07:37):
(I think the most principled answer would be to give the theorems descriptive names if possible, and then have archive/imo.yaml
which correlates the problems with their proofs, like we do for Freek's list - but also this is the very definition of bikeshedding, in a context where the wrong decision can easily be fixed later)
Johan Commelin (May 29 2023 at 07:41):
Eric Wieser said:
Johan Commelin, my impression is that the re-namespacing is not blocking mathport any more
It might still help with preventing :down:
Mario Carneiro said:
Woohoo, it (almost) worked! It made it through the whole archive and got caught on
Archive::all
:import Archive.Imo.Imo2005Q4 failed, environment already contains 'a' from Archive.Imo.Imo1998Q2
Eric Wieser (May 29 2023 at 07:49):
Somehow I'd assumed that Mario had already fixed that and shipped it, but I now see the PR is still open after all
Mario Carneiro (May 29 2023 at 08:07):
I fixed it locally, but seeing as there was an active PR about it I didn't commit it to mathlib3
Scott Morrison (May 30 2023 at 01:18):
Sorry, I won't have a chance to look at the namespacing PR for 24 hours, if someone want to take it over please do.
Damiano Testa (May 30 2023 at 04:55):
I can try to renamespace the files in archive.
Damiano Testa (May 30 2023 at 04:55):
I am going to use the filename for the imo
problems as the first layer of namespace, plus whatever they already had. Is everyone happy with this decision? This is very easy to change later on, anyway.
Damiano Testa (May 30 2023 at 04:59):
In fact, for the files contained inside a folder within archive
, I would use the folder name. Thus use 100-theorems(-list)
, imo
.
Damiano Testa (May 30 2023 at 05:00):
The files directly in the archive folder arithcc
and sensitivity
, already had their own namespace, so I am tempted to leave them as they were (namely, remove the archive
namespace.
Johan Commelin (May 30 2023 at 06:59):
@Damiano Testa Just to make sure... you are aware of #19122?
Damiano Testa (May 30 2023 at 07:00):
Johan, yes, thanks! I am editing that pr, in fact.
Damiano Testa (May 30 2023 at 07:01):
That PR introduced the archive
namespace for everything, while I think that the consensus is to have slightly more bespoke namespaces for the files in archive.
Damiano Testa (May 30 2023 at 07:12):
Namespacing everything causes issues with name resolution: open
and namespace
do not interact very well.
Damiano Testa (May 30 2023 at 07:14):
Ok, it seems that not that many files are affected: should I fix these issues by changing open ...
and possibly adding some explicit namespaces/_root_.
to declarations?
Damiano Testa (May 30 2023 at 13:47):
I opened #19129: it is a PR similar to Scott's, but so far only deals with archive
. I add the two namespaces
imo
andtheorems_100
,
though less aggressively (i.e. not in all files contained in imo
or 100-theorems-list
).
The PR description gives some more context.
Let me know if this looks ok! On my computer,
lean --make archive
is successful.
Damiano Testa (May 30 2023 at 13:51):
If I did not make mistakes, all the added namespaces are in files where a namespace is wanted. There may be files in which we would like to add a namespace, but this PR has not added it (yet).
Damiano Testa (May 30 2023 at 13:51):
If this is in line with what everyone likes, then I can process similarly also counterexamples
.
Mario Carneiro (May 30 2023 at 13:55):
Damiano Testa said:
Let me know if this looks ok! On my computer,
lean --make archive
is successful.
the real test is whether ./scripts/mk_all.sh && lean --make archive
is successful
Mario Carneiro (May 30 2023 at 13:55):
i.e. can you import everything in the archive at once
Mario Carneiro (May 30 2023 at 13:56):
(substitute the scripts from up-thread for mk_all.sh
)
Damiano Testa (May 30 2023 at 13:58):
Let me try!
Eric Wieser (May 30 2023 at 13:58):
CI already tests this
Eric Wieser (May 30 2023 at 13:58):
(and it already passes on master)
Eric Wieser (May 30 2023 at 13:59):
It just doesn't pass when you let mathport rename two different names to the same name
Damiano Testa (May 30 2023 at 14:00):
Eric, I made the mistake of merging master half-way, so I think that CI has to rebuild (almost) all master, before getting to the tests.
Damiano Testa (May 30 2023 at 14:00):
So, I am going to try locally, to see if there are further issues, before addressing counterexamples
.
Eric Wieser (May 30 2023 at 14:01):
I don't think there's any need to do so immediately; I would guess the only clash mathport cares about is in the imo
folder
Damiano Testa (May 30 2023 at 14:11):
Ok, I'll wait, since I am not sure that I am testing the right thing on my computer and the real test is whether it passes CI or not!
Eric Wieser (May 30 2023 at 14:15):
(the reason this is already in CI is because doc-gen would crash if we got it wrong, and no one looks at those CI logs)
Damiano Testa (May 30 2023 at 15:30):
Is it the "check archive and counterexample directories have unique identifiers "?
Damiano Testa (May 30 2023 at 15:41):
Ok, good, that check passed , then!
Damiano Testa (May 30 2023 at 16:56):
Looking at the files in counterexamples
, I think that adding a blanket namespace is probably better.
Damiano Testa (May 30 2023 at 16:56):
I am going to add namespace counteraxample
(singular) to all of them. Let me know if you prefer otherwise!
Damiano Testa (May 30 2023 at 18:12):
#19129 is ready, I hope!
I namespace 54 files, as opposed to the 69 that Scott's PR namespaced. There may still be some of the currently non-namespaced files that might benefit from a namespace, but I'll leave this be for the moment.
Let me know if you have any comments or namespace requests! :smile:
Damiano Testa (Jun 01 2023 at 20:31):
Given Eric's comment about 100-theorems being mostly unported, Scott merged the namespacing PR!
What else is needed to get the files in archive
and counterexamples
available for porting?
Eric Wieser (Jun 01 2023 at 22:03):
My understanding is the answer is:
- Mario needs to revive their mathport PR to add the necessary CI scripts
- I might need to tweak the script that @Johan Commelin runs in order to correctly consume data about these new files
Mario Carneiro (Jun 02 2023 at 04:00):
the mathport PR has the CI scripts already, that's what the PR was about
Damiano Testa (Jun 02 2023 at 05:56):
Ok, great!
Of course, this is not a priority. I also think that porting these files should be mostly fun and straightforward.
Yaël Dillies (Sep 10 2023 at 15:43):
Do we have a mathlib4 mk_all
somewhere?
Damiano Testa (Sep 10 2023 at 15:48):
Isn't there already by default Mathlib.lean
?
Damiano Testa (Sep 10 2023 at 15:49):
Or do you mean to lake build
all?
Yaël Dillies (Sep 10 2023 at 15:54):
Nah, I want a script to generate MyProject.lean
.
Damiano Testa (Sep 10 2023 at 15:56):
git ls-files 'Mathlib/*.lean' | sort | sed 's=^.=import &=; s=/=.=g ; s=\.lean$==' > MyPath.lean
This might work...
Damiano Testa (Sep 10 2023 at 16:26):
Running
git ls-files 'Mathlib/*.lean' | sed 's=^.=import &=; s=/=.=g ; s=\.lean$==' | diff - Mathlib.lean
produces no output. So, I think that
git ls-files '*.lean' | sed 's=^.=import &=; s=/=.=g ; s=\.lean$==' > my/path.lean
also has good chances of working!
Eric Wieser (Sep 11 2023 at 08:42):
The exact command to use can be found in the CI script
Last updated: Dec 20 2023 at 11:08 UTC