Zulip Chat Archive

Stream: Infinity-Cosmos

Topic: infinity-cosmos PR feed


Nick Ward (Jan 21 2025 at 17:11):

It would be neat to have somewhere to go to see all the open infinity-cosmos related PRs, as it's easy to miss one or lose track of them in long threads.

I'm imagining a zulip bot that posts every infinity-cosmos PR along with every mathlib PR that somehow mentions infinity-cosmos. A github query like the #queue could also work.

Nick Ward (Jan 21 2025 at 17:13):

I doubt we can justify adding a mathlib PR tag, so we probably would have to add a common string to mathlib PRs for the bot to catch it. Something like "Part of the Infinity Cosmos project", which I occasionally see FLT-related PRs do.

Emily Riehl (Jan 21 2025 at 17:23):

I like this idea because I've been missing a lot of PRs! Any thoughts on what might be feasible?

Damiano Testa (Jan 21 2025 at 17:31):

I personally think that adding a label to ear-mark PRs from the infinity-cosmo project is a good idea: I do not think that there are many costs associated with having a new label, and, even if that is the case, they can always be deleted.

Matthew Ballard (Jan 21 2025 at 17:32):

I added a label. There is already precedent for such labels

Damiano Testa (Jan 21 2025 at 17:32):

It would be trickier to integrate it in the autolabel mechanism, but manually labelling PRs, I would say "go for it!".

Nick Ward (Jan 21 2025 at 17:35):

Thanks @Matthew Ballard!

Nick Ward (Jan 21 2025 at 17:41):

@Emily Riehl this might be all we need. Just ask folks to tag their PRs with the infinity-cosmos label, then we can view them all with a github query like this. A zulip channel with a friendly bot would probably be the next level up if we think it's worth doing.

Kim Morrison (Jan 21 2025 at 22:06):

Yes, we should also work out how to have the auto-labeller bot help, and probably also have a bot that posts status changes on a thread here.

Damiano Testa (Jan 21 2025 at 22:19):

The auto-labeller bot assigns a label depending on the paths of the modified files.

Damiano Testa (Jan 21 2025 at 22:20):

I don't think that there is a clear path that identifies infinity-cosmos filesPRs, but maybe we can think of some heuristic that would be reasonable.

Damiano Testa (Jan 21 2025 at 22:21):

Maybe a combination of author and path?

Johan Commelin (Jan 22 2025 at 07:52):

What would the auto-labeler do now? My guess is that it would add the t-algebraic-topology label to almost all PRs in this project. Would it be good enough to just work with that?

Damiano Testa (Jan 22 2025 at 08:17):

The auto-labeler roughly looks at the root dirs after Mathlib of the modified files: if exactly one such root dir is affected, then it knows which label to add and adds it. If more root dirs are affected, then it knows the labels, but won't post.

In reality, it does something a little more sophisticated, but not much.

Damiano Testa (Jan 22 2025 at 08:18):

So, PRs changing only files in AlgebraicTopology should get a t-algebraic-topology label, the ones for CategoryTheory will get t-category-theory but PRs modifying both will get no auto-label.

Nick Ward (Jan 22 2025 at 18:17):

Is there an existing bot that posts PR status updates on zulip? I would imagine it would be relatively simple to repurpose.

Nick Ward (Jan 22 2025 at 18:22):

Damiano Testa said:

Maybe a combination of author and path?

If adding PR author to the mix is simple to do, that seems promising. Certain sub-dirs in AlgebraicTopology could also be good hints (e.g. Quasicategory or SimplicialSet).

Damiano Testa (Jan 22 2025 at 18:39):

There is no bot that posts PR status updates on Zulip, but maybe the #queueboard already has information that can be used: let's ping @Michael Rothgang (sorry for subscribing you to the channel along the way! :smile: ) who has been working quite a bit on the queueboard.

Damiano Testa (Jan 22 2025 at 18:41):

The bot that adds the labels to the PRs is mostly written in lean. Currently, it only inspects the paths of the changed files, but I am sure that adding support for passing an author and taking some decision based on that would be relatively straightforward.

Nick Ward (Jan 22 2025 at 18:47):

I also thought the more general solution you mentioned elsewhere of keying on a string in the PR description makes good sense.

Damiano Testa (Jan 22 2025 at 18:58):

The "string" solution also has the property that it would not be affected by the same person contributing to different projects in the same folders.

Nick Ward (Jan 22 2025 at 19:05):

Ahh, the dream of one day getting every auto-label on the same PR.

Michael Rothgang (Jan 22 2025 at 19:12):

The queueboard repository has virtually all metadata about github PRs you might want. So almost any query you'll have could be answered. The question is what your question is :-)

Nick Ward (Jan 22 2025 at 19:44):

I think my only queueboard question would be, where does the code live?

Nick Ward (Jan 22 2025 at 19:47):

Is hosting for new zulip bots an easy solve? e.g., does the #rss bot (or another similar bot) live somewhere that I can just contribute / fork it and have it up and running?

Bryan Gin-ge Chen (Jan 22 2025 at 19:50):

I think all of the Zulip bots that react to various events in mathlib4 repo live in the .github/workflows/ directory (with some calling Python or shell scripts in /scripts/).

Nick Ward (Jan 22 2025 at 19:51):

Oh perfect, that's exactly what I was looking for. Thanks @Bryan Gin-ge Chen.

Michael Rothgang (Jan 22 2025 at 22:33):

The queueboard data is in https://github.com/jcommelin/queueboard/tree/master/data; there are also some automatically generated statistics: https://github.com/jcommelin/queueboard/tree/master/processed_data

If you have a particular question, I'm happy to write code that extracts all matching PRs. Or to show you around the data format (that's not explicitly documented at the moment).

Nick Ward (Jan 22 2025 at 22:46):

Thanks @Michael Rothgang! I think our task is fairly simple relative to what you've put together. Others may have a more interesting spec in mind, but I think we just want to post to a zulip channel every time the status of a PR tagged infinity-cosmos changes.

Nick Ward (Jan 22 2025 at 22:52):

If any CI whizzes want to pick this up, please do! Otherwise, I will find some time soon to remember how to write python.

Nick Ward (Jan 22 2025 at 23:10):

(I hear they even have types now)

Edward van de Meent (Jan 23 2025 at 08:32):

(they're not mandatory, and not necessarily checked)

Michael Rothgang (Jan 23 2025 at 08:38):

Nick Ward said:

Thanks Michael Rothgang! I think our task is fairly simple relative to what you've put together. Others may have a more interesting spec in mind, but I think we just want to post to a zulip channel every time the status of a PR tagged infinity-cosmos changes.

That should be very doable; I may have some time on Friday.
Just to clarify: by "status change", do you mean a status change as for the queueboard (e.g., from "awaiting review" to "awaiting author"), or as in Github's sense (e.g., title renaming, reviewer request, just a label added count as changes)?

Michael Rothgang (Jan 23 2025 at 08:38):

(That is: the data extraction part should be easy. I haven't looked at zulip bot integration. Perhaps somebody else would like to do this part?)

Edward van de Meent (Jan 23 2025 at 08:46):

you can look at several pieces of CI for the zulip bot integration... Assuming you want actual messages on zulip, some examples can be found in technical_debt_metrics.yml, nightly_detect_failure.yml and discover-lean-pr-testing.yml.

Edward van de Meent (Jan 23 2025 at 08:48):

that is, assuming you're using a github action as your vehicle for posting a message

Nick Ward (Jan 23 2025 at 16:03):

Michael Rothgang said:

by "status change", do you mean a status change as for the queueboard (e.g., from "awaiting review" to "awaiting author"), or as in Github's sense (e.g., title renaming, reviewer request, just a label added count as changes)?

I actually hadn't thought that far ahead. One idea would be to ping the channel anytime an infinity-cosmos PR is tagged, closed, or moves from "awaiting author" to "awaiting review". Is it more challenging to get the queueboard-style status changes in a github action, though?

Another valid answer would be "whatever is easiest", because I think we really just want an easy way to be notified about relevant new PRs.

Michael Rothgang (Jan 23 2025 at 18:30):

Ask and ye shall receive: the queueboard repository now contains a file "infinity_cosmos_data.json", which contains the current state of every PR labelled "infinity-cosmos". It is updated about every 15 minutes.

Having a bot use this information (to, for example, output a message: "the status of PR #12345 has changed: it is now awaiting review.") shouldn't be difficult.

Michael Rothgang (Jan 23 2025 at 18:32):

About ease of implementation: "posting a message on zulip" means an action would delegate to a python script anyway, and from there, making that parse a json file... doesn't sound like much extra effort.

Bryan Gin-ge Chen (Jan 23 2025 at 19:09):

Going from current snapshot of all the PRs to pinging when the state of a PR changes might be a little tricky since the action will need to keep track of the last state as well, right? The current data file could easily be used in a daily report message bot though.

For PR status changes, I think the simplest thing to do might be to tack on some code to mathlib's emoji reaction bots (see the 3 files starting with zulip_emoji here) which send messages to certain channels when they happen to be triggered by PRs with the infinity-cosmos label. This could also be useful to other formalization projects downstream of mathlib (with other labels and channels, of course).

Nick Ward (Jan 23 2025 at 19:40):

I think @Bryan Gin-ge Chen makes a good point that the zulip bot might need to be more CI-local (assuming we want to host it as a github action). I also quite like the idea of a daily report, though -- along with a long list of other ideas for using @Michael Rothgang's new data feed.

Damiano Testa (Jan 23 2025 at 19:46):

It may be easy to create an action that is triggered by any label that is applied on PRs tagged with Infinity-Cosmos and that reports on Zulip the PR and the label. That would not require any "memory" or "state".

Damiano Testa (Jan 23 2025 at 19:48):

As before, I still think that having a more "string/regex-like" hook to pick these PRs with automation makes the whole process simpler (and less reliant on gh quota), but it can probably work with scanning labels just as well.

Nick Ward (Jan 23 2025 at 19:53):

@Damiano Testa does "gh quota" refer to some cap on CI compute or similar?

Damiano Testa (Jan 23 2025 at 19:54):

Yes, and it is something that I do not understand very well.

Damiano Testa (Jan 23 2025 at 19:54):

What I do know, is that github allows some number of actions to be run per unit of time (maybe 1000/hour or something like this).

Damiano Testa (Jan 23 2025 at 19:55):

The usage that Infinity-Cosmos would do is nowhere near this limit, but I worry that the quota that mathlib uses for its checks exhausts the available actions and the (very little) usage that Infinity-Cosmos would want to use never triggers.

Nick Ward (Jan 23 2025 at 19:55):

Does mathlib frequently run up against this limit?

Damiano Testa (Jan 23 2025 at 19:56):

Nick Ward said:

Does mathlib frequently run up against this limit?

About every half hour/45 mins.

Damiano Testa (Jan 23 2025 at 19:57):

This is mostly due to checking for merge-conflicts: the queue has 1.2k PRs that get checked periodically for conflicts with master...

Damiano Testa (Jan 23 2025 at 19:58):

Bryan understands this better than I do, and I think that simply using different bots for different actions, might mean that each bot gets a new quota.

Damiano Testa (Jan 23 2025 at 19:58):

If this is the case, then we are good: we devote a bot to merge-conflicts and nothing else on that bot.

Damiano Testa (Jan 23 2025 at 19:59):

But, I do not really know how this side of CI works, which is why I always look for alternatives that have not even the need to use the gh quota.

Nick Ward (Jan 23 2025 at 20:00):

Oh wow. I still assume that a github action will be reliable enough for our purposes now, but maybe something to think about if it would ultimately make sense to host zulip bots elsewhere (or if we don't want to take away resources from more important CI actions).

Damiano Testa (Jan 23 2025 at 20:01):

I don't think that your usage is large, by any means. I just worry that it might be obliterated by the usage that mathlib already performs.

Damiano Testa (Jan 23 2025 at 20:02):

So, I was trying to suggest actions that would not depend on the quota, not because you would use too much, but because you may not get any, if the mathlib actions exhaust it before you can use it!

Damiano Testa (Jan 23 2025 at 20:02):

Btw, a github action is not using gh quota, as far as I undestand.

Damiano Testa (Jan 23 2025 at 20:03):

What uses the gh quota is interacting with github via the gh api (and possibly something else).

Damiano Testa (Jan 23 2025 at 20:03):

GitHub actions are, as far as I am aware, not affected by running out of gh quota.

Damiano Testa (Jan 23 2025 at 20:04):

However, if you want to query whether a PR has a specific label, that would use the gh quota (or, at least, I do not know how to do it without using the gh interface).

Damiano Testa (Jan 23 2025 at 20:04):

Whereas, asking for what the title or body of the PR is goes via a different channel and does not need gh.

Nick Ward (Jan 23 2025 at 20:07):

Interesting. I'm afraid this is why all my repos are perpetually broken -- CI is hard work!

Damiano Testa (Jan 23 2025 at 20:08):

CI works because the computers take pity of our attempts.

Michael Rothgang (Jan 23 2025 at 20:09):

Bryan Gin-ge Chen said:

Going from current snapshot of all the PRs to pinging when the state of a PR changes might be a little tricky since the action will need to keep track of the last state as well, right? The current data file could easily be used in a daily report message bot though.

For PR status changes, I think the simplest thing to do might be to tack on some code to mathlib's emoji reaction bots (see the 3 files starting with zulip_emoji here) which send messages to certain channels when they happen to be triggered by PRs with the infinity-cosmos label. This could also be useful to other formalization projects downstream of mathlib (with other labels and channels, of course).

The .json file has a time of the last change (in UTC); you can compare that with the current time. That said: if all you want to do is ping this channel whenever an awaiting-author, blocked or merge-conflict label is changed, using a github workflow is definitely easier.

Michael Rothgang (Jan 23 2025 at 20:12):

If you want to post a daily/weekly report --- say, a summary of open infinity-cosmos PR, with e.g. their diff size, current state and when that state last changed, a .json file could be more useful. Computing "the last real change" requires tracking state, which the queueboard already does for you.

Damiano Testa (Jan 23 2025 at 20:14):

My experience suggests that

  • a daily/weekly post on Zulip works very well with an action;
  • a post/reaction on Zulip when a label is applied also works well with an action;
  • looking at the state probably works well by opening the #queueboard and watching it when you feel like it :smile:

Michael Rothgang (Jan 23 2025 at 20:15):

Point taken about the last one :-)

Damiano Testa (Jan 23 2025 at 20:17):

I mean to say: it contains a wealth of useful information that is difficult to summarize effectively and at any given moment you might prefer to apply one filter or another. This does not work well with automation.

Damiano Testa (Jan 23 2025 at 20:17):

The other points are "blind": a label is applied, I do something. There is introspection, it is just a reaction.

Michael Rothgang (Jan 23 2025 at 20:18):

(There is currently no table of all open PRs on the queueboard; the best approximation is this page. It would be very easy to add, though.)

Bryan Gin-ge Chen (Jan 23 2025 at 20:24):

Damiano Testa said:

Bryan understands this better than I do, and I think that simply using different bots for different actions, might mean that each bot gets a new quota.

The bots which tend to use a lot of github API calls (merge conflicts, dependent issues are the two that come to mind) indeed have associated github accounts with separate quotas. I wouldn't worry too much about the other bots (for now) since I haven't seen any evidence that we're close to the limit.

Damiano Testa (Jan 23 2025 at 20:28):

Bryan, thanks for the reassurance!

Damiano Testa (Jan 23 2025 at 20:31):

Anyway, I'm happy to try to find a good automation for projects downstream of mathlib that intend to upstream parts of their code and Infinity-Cosmos seems like a good starting point!

Michael Rothgang (Jan 23 2025 at 21:09):

Another thought: LeanCamCombi has an upstreaming dashboard: the project has a structured layout (material ready for mathlib goes in a ToMathlib folder, with folder structure mirroring mathlib exactly). This makes it very easy to keep track of which files can be upstreamed already.

The dashboard shows which of these files have open PRs against them. I haven't worked on that project, but imagine that to be potentially quite useful.

Nick Ward (Jan 23 2025 at 21:27):

Oh, that's very neat.

Nick Ward (Jan 23 2025 at 21:28):

Michael Rothgang said:

LeanCamCombi has an upstreaming dashboard

@Emily Riehl have you seen the LeanCamCombi dashboard mentioned above?

Nick Ward (Jan 23 2025 at 21:32):

I think something like that would be great to have when it's time to do maintenance. On the other hand, a zulip bot would be a great tool to passively keep track of what maintenance needs to be done.

Damiano Testa (Jan 23 2025 at 21:39):

I can help with the bot, but my Lean time for this week is essentially over. I could write something next week, though of course others may be able to help sooner!

Damiano Testa (Jan 23 2025 at 21:39):

There are already a couple of bots that post weekly reports to mathlib: modifying either one of them would be easy. The main question is what kind of information you would like to see.

Damiano Testa (Jan 23 2025 at 21:40):

Most things are doable, quite a few of them easily, others with a little more effort! :smile:

Nick Ward (Jan 23 2025 at 21:57):

I will check back in here if I have a chance to dust off the old python interpreter before then.

Bryan Gin-ge Chen (Jan 23 2025 at 22:22):

I might have some time over the weekend to put together something adding steps to our current bots to post Zulip messages on tagged PR status changes; glad to help with any questions if you want to give it a go before then though!

edit (2025/01/28): I've started on this, but I don't have anything worth sharing yet... maybe in a few more days.

Damiano Testa (Jan 26 2025 at 08:46):

Some PRs upstreamed from PFR use From PFR in their commit message. This one, for instance.

This seems like a good compromise: From <project-dependent-string> makes CI aware of the project.

Michael Rothgang (Feb 02 2025 at 15:30):

I'm cleaning up my zulip channels: it appears this discussion has converged on a solution and no help from me is needed.

Michael Rothgang (Feb 02 2025 at 15:32):

There is a dashboard of all PRs now (https://jcommelin.github.io/queueboard/triage.html#all).

Michael Rothgang (Feb 02 2025 at 15:33):

I will unsubscribe from this channel now: if you want to ask me again, please re-add me. Thanks!

Nick Ward (Feb 02 2025 at 16:22):

Thanks for the help!


Last updated: May 02 2025 at 03:31 UTC