Zulip Chat Archive

Stream: mathlib4

Topic: Mathlib weekly change report

github mathlib4 bot (May 19 2024 at 14:46):

Weekly stats (2024-05-12...2024-05-19)

Type	Total	%
Theorems	168,073 (+772 -90)	0.41%
Data	60,131 (+320 -19)	0.50%
Predicates	4,150 (+31 -2)	0.70%
Types	3,325 (+8 -0)	0.24%

895 files changed (+42 -11 ~839), 7725 lines changed (+20466 -12741)

Reference commits: old 3f8be61e1c, new d874bdffd1.

Take also a look at the Mathlib stats page.

Damiano Testa (May 19 2024 at 14:47):

Automatic post, triggered by a label!

Damiano Testa (May 19 2024 at 14:48):

Ok, the next step is to check that I can get this to work on a schedule with a cron job.

Damiano Testa (May 19 2024 at 14:49):

(The bot even created the topic: it was not clear to me that this would happen!)

github mathlib4 bot (May 20 2024 at 10:27):

Weekly stats (2024-05-13...2024-05-20)

Type	Total	%
Theorems	160,386 (+765 -104)	0.41%
Definitions-Data	57,821 (+306 -31)	0.48%
Data	8,121 (+354 -29)	0.62%
Definitions-Props	7,760 (+43 -3)	0.52%
Types	1,264 (+1 -0)	0.00%
Definitions-Types	411 (+1 -0)	0.24%
Predicates	5 (+0 -0)	0.00%
Definitions-Predicates	3 (+0 -0)	0.00%

866 files changed (+43 -10 ~805), 7508 lines changed (+19972 -12464)

Reference commits: old cec3ec0be2, new b2214f7318.

Take also a look at the Mathlib stats page.

Damiano Testa (May 20 2024 at 10:29):

If, like me, you are wondering about those 5+3 predicates, they are

def pred:   FermatLastTheorem
def pred:   UnivLE
def pred:   RiemannHypothesis
other pred: True
other pred: PEmpty
other pred: False
other pred: FirstOrder.Language
other pred: PUnit

Damiano Testa (May 20 2024 at 10:32):

Comparing the two, it seems that most declarations that are in Prop are theorems and most declarations that are data are defs.

Kim Morrison (May 20 2024 at 10:37):

I'm sorry, I still don't understand the row labels at all. :-) Are the - minus signs? Indicating subcategories?

Damiano Testa (May 20 2024 at 10:41):

The - signs indicate numbers of declarations that were lost, and + are new declarations.

Damiano Testa (May 20 2024 at 10:41):

However, "lost" or "new" is just by looking at the Names, so a rename of a declaration would have +1 -1.

Kim Morrison (May 20 2024 at 10:42):

No, I mean in "Definitions-Data"

Damiano Testa (May 20 2024 at 10:42):

It is intended to be the analogue of Git's "insertions/deletions".

Damiano Testa (May 20 2024 at 10:43):

Ah, the split is as follows: for defs on the one hand and for everything else on the other hand, we compute 4 categories: Props, Types, predicates and data.

Kim Morrison (May 20 2024 at 10:44):

Oh, but in one place you write "Theorems" and in the other you write "Props"?

Damiano Testa (May 20 2024 at 10:44):

I.e. there is a function that classifies a declaration into the 4 kinds above and a further "pre"-filter on whether the declaration is a defnInfo or not.

Kim Morrison (May 20 2024 at 10:44):

That is confusing.

Utensil Song (May 20 2024 at 10:44):

I'm confused that Definitions-Predicates and Predicates both appear, can they be merged?

Kim Morrison (May 20 2024 at 10:44):

You should change the - to : in that case.

Damiano Testa (May 20 2024 at 10:44):

Ah, I see, that is left-over from the previous naming!

Damiano Testa (May 20 2024 at 10:45):

Ok, I am going to replace - by : first (and update the post above).

Damiano Testa (May 20 2024 at 10:46):

Oh, I cannot update the post above, since I have not posted it... :man_facepalming:

Damiano Testa (May 20 2024 at 10:46):

Anyway, I will do the replacement on the PR.

Damiano Testa (May 20 2024 at 10:47):

As for Definition-Predicates and Predicates, those are counting slightly different things (though I am tempted to just remove those from the list altogether, since everyone is going to wonder about what they are and the answer is somewhat boring).

Damiano Testa (May 20 2024 at 10:56):

Is anyone opposed to simply omitting Predicates, whether in definitions or elsewhere?

Eric Wieser (May 20 2024 at 11:01):

Damiano Testa said:

Comparing the two, it seems that most declarations that are in Prop are theorems and most declarations that are data are defs.

I don't think counting whether things are declared with def or theorem is interesting at all. What is interesting is whether the declations are "functions" or "proofs"

Eric Wieser (May 20 2024 at 11:04):

Which is to say; I think the one at the top of this thread is much better than the recent one, after replacing "data" with some better word like "definitions" or "operations" or "functions"

Damiano Testa (May 20 2024 at 11:05):

Ok, however I made the change thinking that this was in the direction of what Mario was suggesting in the other thread. I may very well have misunderstood what Mario meant, though.

Utensil Song (May 20 2024 at 12:01):

There could be just Theorems, Definitions, Types, Predicates .

Utensil Song (May 20 2024 at 12:03):

Unless more meaningful categories can be extracted as how people perceive them.

Damiano Testa (May 20 2024 at 12:09):

Ok, I reverted to 4 fields, and I have change Theorems to Propositions and Data to Definitions.

Damiano Testa (May 20 2024 at 12:09):

Weekly stats (2024-05-13...2024-05-20)

Type	Total	%
Propositions	168,073 (+697 -83)	0.37%
Definitions	60,130 (+306 -19)	0.48%
Predicates	4,150 (+27 -2)	0.60%
Types	3,325 (+8 -0)	0.24%

830 files changed (+42 -10 ~775), 7411 lines changed (+18922 -11511)

Reference commits: old 7131ad0791, new c1ec143cbf.

Take also a look at the Mathlib stats page.

Damiano Testa (May 20 2024 at 12:10):

I think that using both Theorems and Definitions may give the impression that those are literally counting the number of declarations introduced by theorem and by def.

Mario Carneiro (May 20 2024 at 19:58):

Damiano Testa said:

Ok, however I made the change thinking that this was in the direction of what Mario was suggesting in the other thread. I may very well have misunderstood what Mario meant, though.

My suggestion was to only split up "definitions" into "def (prop)", "def (type)", "def (other)". I suppose inductive can also be separated into (prop) and (type) classes. Everything else should go in an "other" category. More precisely:

theorems
defs (prop)
defs (type)
defs (other)
inductives (prop)
inductives (type)
structures (prop)
structures (type)
other

I'm not sure I fully understand how your categorization works, but I think it predicates and props are both binned as "prop" in my categorization, and on the other hand the "Data", "Types", "Predicates" categories for non-definitions (which are almost certainly structures) are categorized into inductive and structure categories in my categorization.

Mario Carneiro (May 20 2024 at 20:05):

The actual decision tree for the categories:

thmDecl
defnDecl
- forallTelescope -> is Sort? is Prop?
inductiveDecl
- isStructure
- forallTelescope -> is Sort? is Prop?
other

Damiano Testa (May 20 2024 at 20:35):

Mario, thanks for the detailed description: this is indeed quite different from what I had guessed! I'll try to implement it, though I'm not sure that I'll have time tomorrow.

Also, I will have to see how to get a Declaration out of a ConstantInfo: I am not sure that I ever worked with a Declaration...

Mario Carneiro (May 20 2024 at 20:44):

you don't need to get a declaration from a constantinfo

Mario Carneiro (May 20 2024 at 20:44):

and you can't anyway, Declaration is the input to the kernel, ConstantInfo is the output

Damiano Testa (May 20 2024 at 20:45):

Ok, so is it good enough if I replace Decl with Info in your decision tree?

Mario Carneiro (May 20 2024 at 20:48):

oh, yeah I just forgot the names of the constructors

Damiano Testa (May 20 2024 at 20:52):

I may end up producing it as an additional report, so that both get printed: the coarser one above and then the more refined information.

Damiano Testa (May 20 2024 at 20:52):

The bot could also "randomize" which report to post each week.

Damiano Testa (May 20 2024 at 20:53):

Anyway, the first step is to get the setup working. More statistics can come later.

Mario Carneiro (May 20 2024 at 20:54):

you could also hide the more refined information behind a spoiler tag. But I think it's okay even if this report is a full page, people can still mute the thread if they don't want to hear it

Mario Carneiro (May 20 2024 at 20:54):

if your concern is that the more aggregate statistics are missing you can throw those in too, making it a more hierarchical breakdown

Damiano Testa (May 20 2024 at 20:55):

(Here are the suggestions that came up in this post.)

Damiano Testa (May 20 2024 at 20:55):

Well, I think that which statistics are useful will probably change with time and it is very easy to change anyway, so I am not too concerned.

Mario Carneiro (May 20 2024 at 20:56):

I'd rather have as many stats up front as possible so that we get nice long time series information from the beginning of the report

Damiano Testa (May 20 2024 at 20:57):

I think that having a regular report is useful and interesting, so at the moment I would like it to happen more than what I actually pushing for anything in particular to be displayed at the beginning.

Damiano Testa (May 20 2024 at 20:57):

Mario Carneiro said:

I'd rather have as many stats up front as possible so that we get nice long time series information from the beginning of the report

Oh, I had not thought of the "historical value" of this.

Damiano Testa (May 20 2024 at 20:58):

I can add more statistics, I was simply worried that people would find a long report not too useful, because of the amount of info that it contained.

Damiano Testa (May 20 2024 at 20:59):

(I still think that the PR as is already has a lot of fluff related to setting up the cron job and a little of the infrastructure and it would probably not simplify review if I made it longer still.)

Mario Carneiro (May 20 2024 at 20:59):

I was wondering about commenting on the PR "why not write this in lean"

Damiano Testa (May 20 2024 at 21:00):

Write what in lean? The CI workflow?

Eric Wieser (May 20 2024 at 21:07):

Damiano Testa said:

Mario Carneiro said:

I'd rather have as many stats up front as possible so that we get nice long time series information from the beginning of the report

Oh, I had not thought of the "historical value" of this.

I think we should aim to have historical data elsewhere, like we do with mathlib stats; zulip isn't a great platform for structured time -series data.

Eric Wieser (May 20 2024 at 21:08):

On that note, it would be great to write this script in a way that uses as few Lean features as possible, so that we can run it on old commits without issue

Eric Wieser (May 20 2024 at 21:08):

(which is perhaps an argument for the script living outside of mathlib, so that it can easily be versioned separately)

Mario Carneiro (May 20 2024 at 21:10):

I don't think we need to worry about moving the script elsewhere for now

github mathlib4 bot (May 29 2024 at 23:26):

Weekly stats (2024-05-23...2024-05-29)

Type	Total	%
Theorems	169,367 (+989 -103)	0.52%
Data	60,653 (+269 -56)	0.35%
Predicates	4,182 (+18 -0)	0.43%
Types	3,363 (+12 -3)	0.27%

2197 files changed (+38 -2 ~2141), 7435 lines changed (+26022 -18587)

Reference commits: old 4072ed9348, new 36a9da67e9.

Take also a look at the Mathlib stats page.

Damiano Testa (May 29 2024 at 23:48):

Oh, has the PR been merged? Does the cron job run from a non-merged PR?

Johan Commelin (May 30 2024 at 04:03):

Could the bot tell us how many PRs were merged?

Damiano Testa (May 30 2024 at 04:14):

I can do that. I was also planning to get the statistics that Mario wanted. I was not expecting a post though: I suspect that this post has been set off by... @Michael Rothgang ? (Sorry for flagging you!)

Damiano Testa (May 30 2024 at 04:14):

(the cron job is set for Sunday, I believe, and I think that it should not run on PRs anyway)

Yaël Dillies (May 30 2024 at 05:39):

Oh I'm noticing that it doesn't show the total number of files. I think that's a cool statistic.

Michael Rothgang (May 30 2024 at 07:03):

Damiano Testa said:

I can do that. I was also planning to get the statistics that Mario wanted. I was not expecting a post though: I suspect that this post has been set off by... Michael Rothgang ? (Sorry for flagging you!)

Yes, that was me. Oh no, did I set off the bot? Sorry for that.

Michael Rothgang (May 30 2024 at 07:04):

(This happened in #13359, where I tried to use the "re-run a workflow using a label" trick. I copy-pasted the example you mentioned, but forgot to my local change of that workflow at first.)

Julian Berman (Jun 01 2024 at 12:03):

Maybe also adding some sort of EWMA would be interesting to see for those who only see these numbers occasionally and wonder whether a particular week is abnormal or not

Dean Young (Oct 20 2024 at 17:40):

@Kim Morrison @Damiano Testa @Mario Carneiro this was really cool. I haven't seen any reports since though. Is it going to run again?

Dean Young (Oct 20 2024 at 17:43):

#mathlib4 > Technical Debt Counters
So, did the Mathlib weekly change report evolve into Technical Debt Counters?

Has it become the info at this link?
https://leanprover-community.github.io/mathlib_stats.html

Damiano Testa (Oct 20 2024 at 18:54):

No, the PR with the weekly reports is stuck because I could not find a reasonable "characteristion" of what the statistics of the declarations is that was good for everyone. The PR is #13038.

Dean Young (Oct 21 2024 at 00:06):

That's really too bad, this is cool. I hope this works out eventually though.

Damiano Testa (Oct 21 2024 at 00:09):

I'll try to get back to this, since I also find it informative!

Last updated: May 02 2025 at 03:31 UTC