Zulip Chat Archive

Stream: general

Topic: Queue histograms

Johan Commelin (Jul 19 2022 at 14:46):

Can we generate histograms about how long PRs are ope?. And/or how long PRs have to wait for a review after being labeled awaiting-review? Because those are metrics we care about. The length of #queue is just a proxy. Any GitHub API experts around who know how to do this?

Yakov Pechersky (Jul 19 2022 at 16:18):

Working on it

Yakov Pechersky (Jul 19 2022 at 16:30):

image.png

Yakov Pechersky (Jul 19 2022 at 16:30):

I don't know about how to track label-change events, so I don't know how to answer your second question

Yakov Pechersky (Jul 19 2022 at 16:31):

gh pr list --state all --limit 5000 --json number,createdAt,closedAt,labels > prs.json
python pres.py prs.json prs.csv

Yakov Pechersky (Jul 19 2022 at 16:31):

import sys
import csv
import json

entries = []
labels = set()
for entry in json.load(open(sys.argv[1])):
  entry["labels"] = [label["name"] for label in entry["labels"]]
  labels.update(entry["labels"])
  entries.append(entry)
for entry in entries:
  for label in labels:
    entry[label] = label in entry["labels"]
  del entry["labels"]

writer = csv.DictWriter(open(sys.argv[2], "w"), fieldnames=["number", "closedAt", "createdAt"] + sorted(labels))

writer.writeheader()
writer.writerows(entries)

Yakov Pechersky (Jul 19 2022 at 16:31):

Excuse the sloppy python code, just trying to crank something out fast

Yakov Pechersky (Jul 19 2022 at 16:32):

This plot indicates that our PR creation flux has been pretty constant over the past 5K PRs
image.png

Yakov Pechersky (Jul 19 2022 at 16:33):

prs.csv prs.json

Yakov Pechersky (Jul 19 2022 at 17:02):

Here's the code to get the label changes -- but I have now hit my GH API limit, and am barred from making more requests for idk how long:

import sys
import csv
import json
import requests
from requests.adapters import HTTPAdapter, Retry


events = []

writer = csv.DictWriter(open(sys.argv[1], "w"),
  fieldnames=["action", "created_at", "number", "label"])
writer.writeheader()

session = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[500, 403])
session.mount("https://", HTTPAdapter(max_retries=retries))

per_page = 100
pages = 100 // per_page
for page in range(pages):
  resp = session.get(
    "https://api.github.com/repos/leanprover-community/mathlib/issues/events",
    headers={"Accept": "application/vnd.github+json"},
    params={"per_page": per_page, "page": page + 1}  # github starts at 1
  )
  print(resp)
  for event in resp.json():
    if event["event"] not in ["unlabeled", "labeled"]:
      continue
    mini_event = {}
    mini_event["action"] = event["event"]
    mini_event["created_at"] = event["created_at"]
    mini_event["number"] = event["issue"]["number"]
    mini_event["label"] = event["label"]["name"]
    events.append(mini_event)
    writer.writerow(mini_event)

Yakov Pechersky (Jul 19 2022 at 17:02):

That's based on https://docs.github.com/en/rest/issues/events

Johan Commelin (Jul 19 2022 at 17:02):

@Yakov Pechersky Thanks! What is the y-axis in that diagram 2 posts up?

Yakov Pechersky (Jul 19 2022 at 17:03):

the tiny bar chart?

Yakov Pechersky (Jul 19 2022 at 17:03):

Just the count of PRs for each bin

Johan Commelin (Jul 19 2022 at 17:04):

Aha. So it doesn't yet tell us how long PRs are open on average.

Yakov Pechersky (Jul 19 2022 at 17:04):

That was the other bar chart

Yakov Pechersky (Jul 19 2022 at 17:04):

image.png

Johan Commelin (Jul 19 2022 at 17:05):

Ooh, I see. And that is "number of days" along the y-axis?

Yakov Pechersky (Jul 19 2022 at 17:06):

The y axis is binning -- there is a hypothesis that the closedAt - createdAt delta has changed over time

Yakov Pechersky (Jul 19 2022 at 17:06):

The y axis is, for each bin, the median of that delta over that bin

Yakov Pechersky (Jul 19 2022 at 17:07):

Separating out PRs labeled with easy (at time of close)

Yakov Pechersky (Jul 19 2022 at 17:11):

Here it is more clearly:
image.png

Yakov Pechersky (Jul 19 2022 at 17:11):

I've hidden the outliers on the top

Johan Commelin (Jul 19 2022 at 17:18):

So what does median mean exactly? Does it mean that in February the median was 81 days for a PR to be merged?

Yakov Pechersky (Jul 19 2022 at 17:21):

Sorry, the most recent plot was in hours

Yakov Pechersky (Jul 19 2022 at 17:22):

That in february, the median time for a PR to go from created to closed was 81 hrs if it is not labeled as easy, and 6 hrs if easy

Yakov Pechersky (Jul 19 2022 at 17:23):

I haven't differentiated "closed due to bors merge" and "closed due to user closing it without merge". My expectation is that the latter are way fewer than former. But that is why I am using median and not mean

Yakov Pechersky (Jul 19 2022 at 17:23):

Because I assume the latter are not distributed in the way the former are

Eric Wieser (Jul 19 2022 at 17:32):

I guess you didn't auth with github in your example? That probably leads to the rate limit being hit sooner

Yakov Pechersky (Jul 19 2022 at 17:33):

I was hitting 200 for a while, then 500 (set up retry), but then started to hit 403. I guess I am not passing the token that is in my shell, you're right

Johan Commelin (Jul 19 2022 at 17:37):

Aah, 81 hrs makes a lot more sense.

Johan Commelin (Jul 19 2022 at 17:39):

I think this is a very useful chart! Thanks for building it!

Yakov Pechersky (Jul 19 2022 at 18:27):

Thanks to Eric's comment, I got the labelling events out. An example of a viz:
image.png

Yakov Pechersky (Jul 19 2022 at 18:28):

labels.csv

Yakov Pechersky (Jul 19 2022 at 18:28):

from datetime import datetime
import sys
import csv
import json
import requests
from requests.adapters import HTTPAdapter, Retry

writer = csv.DictWriter(open(sys.argv[1], "w"),
  fieldnames=["action", "created_at", "number", "label"])
writer.writeheader()

session = requests.Session()
retries = Retry(total=5, backoff_factor=1, status_forcelist=[500])
session.mount("https://", HTTPAdapter(max_retries=retries))

per_page = 100
page = 0
in_range = True
while in_range:
  page += 1
  resp = session.get(
    "https://api.github.com/repos/leanprover-community/mathlib/issues/events",
    headers={"Accept": "application/vnd.github+json", "Authorization": f"token {token}"},
    params={"per_page": per_page, "page": page + 1}  # github starts at 1
  )
  for event in resp.json():
    print(resp)
    if event["event"] not in ["unlabeled", "labeled"]:
      continue
    mini_event = {}
    mini_event["action"] = event["event"]
    mini_event["created_at"] = event["created_at"]
    mini_event["number"] = event["issue"]["number"]
    mini_event["label"] = event["label"]["name"]
    writer.writerow(mini_event)
    if datetime.fromisoformat(mini_event["created_at"][:-1]) < datetime(2022, 1, 1):
      in_range = False

Yakov Pechersky (Jul 19 2022 at 18:44):

Johan, are you sure you want the "when's the earliest awaiting-review"? It's like minutes between PR creating to labeling as awaiting-review

Yakov Pechersky (Jul 19 2022 at 18:48):

As viz'd here, separate bins per month and whether (earliest-awaiting-review-label - PR-creation) < 5 image.png

Kyle Miller (Jul 19 2022 at 19:07):

If you haven't done it already, something that would be nice to see is time between awaiting-review and any change in status (label change or closure). Individual PRs might have multiple of these intervals.

Yakov Pechersky (Jul 19 2022 at 19:08):

Earliest awaiting-review or the latest?

Kyle Miller (Jul 19 2022 at 19:09):

for every awaiting-review, what's the time to the first change in status?

Yakov Pechersky (Jul 19 2022 at 19:18):

Unfortunately, that's all the time I had today for these analyses. Given the csvs I attached earlier, I hope that they're enough to answer

Johan Commelin (Jul 19 2022 at 19:36):

Thanks! That's already very useful!

Kyle Miller (Jul 19 2022 at 21:37):

Here's an attempt at visualizing lengths of review cycles, which is an interval of time when 'awaiting-review' is present, starting after the last 'awaiting-author', 'blocked-by-other-PR', 'awaiting-CI', 'CI', 'merge-conflict', 'help wanted', or 'WIP' was removed, ending when either 'awaiting-review' is removed or 'ready-to-merge' or 'ready-for-bors' are added.

To get a sense of distributions, for each month in Yakov's data (labels.csv) I have the review cycles in sorted order, and the y-axis is in days. (So the x-axis is just the index of a given review cycle in sorted order.) A review cycle is recorded in the month it ends.

image.png

Kyle Miller (Jul 19 2022 at 21:38):

code

import csv
from datetime import date, datetime, timedelta

prs = {}

with open("labels.csv") as fin:
    r = csv.reader(fin)
    header = next(r)
    for entry in r:
        data = dict(zip(header, entry))
        pr = prs.setdefault(data['number'], {'actions' : []})
        pr['actions'].insert(0, (data['action'], data['label'], datetime.fromisoformat(data['created_at'][:-1])))

def yield_reviews(pr):
    start = None
    for action, label, time in pr:
        if action == 'labeled' and label == 'awaiting-review':
            # begin review
            start = time
        elif action == 'unlabeled' and label in ('awaiting-author', 'blocked-by-other-PR', 'awaiting-CI', 'CI', 'merge-conflict', 'help wanted', 'WIP'):
            if start is not None:
                # in review, but this is actual start of review cycle
                start = time
        elif (action == 'unlabeled' and label in ('awaiting-review',)) \
             or (action == 'labeled' and label in ('ready-to-merge', 'ready-for-bors')):
            if start is not None:
                # in review, end of review cycle
                yield time, time - start
                start = None
        elif (action == 'unlabeled' and label in ('ready-to-merge', 'ready-for-bors')) \
             or (action == 'labeled' and label in ('awaiting-author', 'blocked-by-other-PR', 'awaiting-CI', 'CI', 'merge-conflict', 'help wanted', 'WIP')):
            # no change to state
            pass
        elif label in ('RFC','meta','modifies-tactic-syntax','delegated','easy','please-adopt','feature-request','docs','needs-documentation','good-first-project','review-in-progress','maybe-later','not-ready-to-merge','duplicate','lintfix','linter','bug','undergrad','medium','new-import','imo','enhancement','mathport'):
            # ignored labels
            # note: we ignore 'delegated' since it's usually still part of the review
            pass
        else:
            print(action, label)
            raise Exception
    if start is not None:
        # note that if we get here, then this is still an open PR
        pass

review_times = {}

for i, (n, pr) in enumerate(prs.items()):
    for start, length in yield_reviews(pr['actions']):
        if length > timedelta(seconds=30):
            review_times.setdefault((start.year, start.month), []).append(length/timedelta(days=1))

for times in review_times.values(): times.sort()

with open("review_times.csv", "w") as fout:
    w = csv.writer(fout)
    for date in sorted(review_times.keys()):
        w.writerow(("date",) + date)
        for i,t in enumerate(review_times[date]):
            w.writerow((i,t))
        w.writerow(())

The graphics were done with LibreOffice Calc

Eric Wieser (Jul 19 2022 at 21:43):

Can you replot with the same y axis on each graph, and maybe a log y scale?

Kyle Miller (Jul 19 2022 at 21:46):

I just used LibreOffice Calc -- I tried to figure out how to do any of that with it, but didn't find it with some clicking around.

Here's the CSV in a silly format: review_times.csv

Each month starts with "date, 2022, 2" (for example), then there are idx/days pairs, and then a blank line.

Kyle Miller (Jul 19 2022 at 21:50):

Here's a less silly format if you want to do some processing of your own. It has year/month/day of the end of a review cycle and its duration in days. review_times_full.csv

Johan Commelin (Jul 20 2022 at 05:34):

@Kyle Miller Thanks! That clearly shows that review times have been increasing!
I guess the y-axis is still in hours? And has some outliers chopped off?

Julian Berman (Jul 20 2022 at 05:36):

The ratio of PR submitters to number of distinct PR reviewers (and/or reviews per reviewer) is also possibly something interesting to look at over time. Lots of larger projects I think run into cases where the rate of growth of the former outpaces the latter either because submitting is more fun than reviewing or because it takes active encouragement to get people to think they're good enough to review stuff.

Johan Commelin (Jul 20 2022 at 05:37):

Nice idea. If we can autogenerate a bunch of such plots, I think that would be really useful.

Kyle Miller (Jul 20 2022 at 05:57):

@Johan Commelin The y-axes are in days, and no outliers have been removed. What I'm seeing is that reviewing times have gotten longer, but there has simultaneously been fewer review cycles overall. These are only a few months though so it's hard to say what's going on.

Johan Commelin (Jul 20 2022 at 06:04):

Kyle Miller said:

there has simultaneously been fewer review cycles overall

That's an interesting point! Could that be correlated with there being a lot more PRs that are 20 - 50 lines long? Or even < 20 lines, adding just 1 lemma?

Yaël Dillies (Jul 20 2022 at 09:43):

I suspected it was more of a cultural thing. Given that there are more PRs, you cannot spend as much time reviewing each.

Last updated: May 02 2025 at 03:31 UTC