Zulip Chat Archive

Stream: Program verification

Topic: Arena allocation for ITPs

Chris B (Aug 11 2020 at 19:05):

Hi,
I've been working on a rewrite of the Rust version of Lean's kernel for a while; it uses arenas to manage the type checker items (instead of reference counting). All of the ITPs I'm familiar with are either GC'd or are Lean, so I thought this would be an interesting data point for the CS people here who might find themselves writing software in the same vein. I rewrote it this way just to see whether some new insights about how the kernel is laid out could yield a performance benefit, and it ended up being ~3-4x faster than the other checkers while having a smaller memory footprint. The arena API is completely type safe and uses phantom lifetimes to achieve memory safety (arenas can't be dropped if there are outstanding pointers into their contents, so pointers can't deref to bad data).
There's some more information/stats in the readme. https://github.com/ammkrn/nanoda_lib

Simon Hudon (Aug 11 2020 at 19:11):

Nice! When you say "better than other checkers", does that include the C++ implementation?

Chris B (Aug 11 2020 at 19:17):

Hey I only said 'faster' lol. The one that was in the lean3 repo is the only C++ I know of, but yeah those numbers are the same (3-4x), maybe a little bit more depending on how big the export file is. leanchecker, trepplein, and the RC version of the Rust one were all really close in terms of speed. Leanchecker had some issues with memory consumption spikes IIRC. The last time I asked about it lean4 didn't have a free-standing checker and there isn't an export format.

Simon Hudon (Aug 11 2020 at 19:20):

I'm surprised that the speed up would be so consistent across the different checkers. I thought the Haskell checker was much much slower than the C++ one

Chris B (Aug 11 2020 at 19:22):

Oh yeah the Haskell one is way, way slower than the other ones. Do you know why that is?

Chris B (Aug 11 2020 at 19:23):

I assumed it was a Haskell thing about like unoptimized collections in the prelude or something.

Simon Hudon (Aug 11 2020 at 19:30):

I haven't looked closely at it, I can't say

Simon Hudon (Aug 11 2020 at 19:32):

Does the arena guarantee some sort uniqueness property for the term nodes it stores?

Chris B (Aug 11 2020 at 19:36):

Yeah, when you go to insert an item it checks whether it already exists in the collection. The underlying structure is from a rust library called indexmap that's like a hashmap with indices that preserve insertion order.

Chris B (Aug 11 2020 at 19:37):

The actual process is slightly more complicated since you have to check the persistent collection first then the temporary one.

Simon Hudon (Aug 11 2020 at 19:39):

Cool. i like the analysis you make about temporary garbage. I wonder how it will apply to Lean 4 since Lean 4 uses destructive updates whenever possible instead of allocating more memory

Mario Carneiro (Aug 11 2020 at 19:46):

Ah, this will be useful for isabelle proof generation. I wasn't sure how well this strategy would perform so it's good to hear the numbers bear it out

Chris B (Aug 11 2020 at 19:52):

Yeah neither was I lol. The one thing I'm worried about is someone managing to line up just the right tactic proof that produces like 100gb worth of garbage, but we'll see. The largest one a few months ago was 1.6 million nodes, now it's over 10 million.

Chris B (Aug 11 2020 at 19:56):

Simon Hudon said:

I wonder how it will apply to Lean 4 since Lean 4 uses destructive updates whenever possible instead of allocating more memory

Yeah I'm interested to see this as well. If you roll your own RC stuff like they did you can avoid a lot of the small to medium sized performance hang-ups I ran into in the last version, and I know they've done a lot of thinking about how to make the most of their system.

Mario Carneiro (Aug 11 2020 at 19:56):

It might be good to collect statistics on what the largest proofs are from this point of view, because they could just be bad proofs

Mario Carneiro (Aug 11 2020 at 19:57):

That growth in proof size sounds more like a regression than an improvement

Simon Hudon (Aug 11 2020 at 19:59):

Large proofs are clearly a problem but the other thing to consider is non-trivial kernel computations. The proof size could be moderate but you could be running ackerman's function for instance. That would produce a lot of trash that reference counting wouldn't keep around but arenas would

Chris B (Aug 11 2020 at 19:59):

I made another thread earlier today in general to try and get some insight into that; the biggest one by far is polynomial.monic.next_coeff_mul, and the runner up is finset.sum_range_sub_of_monotone

Mario Carneiro (Aug 11 2020 at 20:00):

one man's trash is another man's treasure :P

Chris B (Aug 11 2020 at 20:06):

Yeah, I would like to go back and address that more thoroughly in the future. I had an idea to do it where the arenas were telescoped out, so each arena had an Option<&mut Arena> to a parent whose lifetime was longer, so when you finish one computation the thing's scope ends and you insert the result into the parent, effectively dropping all of the garbage. The problem is that searching for duplicates now forces you to check as many arenas as you have alive instead of just the two.

Chris B (Aug 11 2020 at 20:08):

The RC solution is much more general, so props to Lean.

Kevin Buzzard (Aug 11 2020 at 21:00):

I'm pretty sure there would be people who would be interested in seeing a list of the 100 biggest proofs

Mario Carneiro (Aug 11 2020 at 21:01):

it's another way to play the -T50000 game

Kevin Buzzard (Aug 11 2020 at 21:12):

It's a better way perhaps, because when I've played that game some lemma times out and then lots of other things break and you basically have to deal with the first problem.

Chris B (Aug 11 2020 at 21:53):

Kevin Buzzard said:

I'm pretty sure there would be people who would be interested in seeing a list of the 100 biggest proofs

Mathlib high scores :
https://gist.github.com/ammkrn/dab311e0d2d27b3bcd15ec6c4a2b019d

Chris B (Aug 11 2020 at 21:55):

I'll add a command line flag for it or something.

Mario Carneiro (Aug 11 2020 at 21:55):

oh wow that escalated quickly

Mario Carneiro (Aug 11 2020 at 21:56):

where is src#polynomial.monic.next_coeff_mul

Mario Carneiro (Aug 11 2020 at 21:56):

hm, perhaps it is the omega calls at the end?

Chris B (Aug 11 2020 at 22:00):

omega does seem to be a common thread between no.1 and no.2

Mario Carneiro (Aug 11 2020 at 22:14):

How does the length of this version of the proof stack up:

lemma next_coeff_mul_omega_1 (dp : ℕ) (hp0 : ¬dp = 0) : dp ≠ dp - 1 := by omega

lemma next_coeff_mul_omega_2 (x : ℕ × ℕ) (dp dq : ℕ)
  (hp0 : ¬dp = 0) (hq0 : ¬dq = 0) (hx : x.fst + x.snd = dp + dq - 1)
  (h : x.fst ≤ dp) (hx1 : x.fst = dp - 1) : x.snd = dq := by omega

lemma next_coeff_mul_omega_3 (x : ℕ × ℕ) (dp dq : ℕ)
  (hp0 : ¬dp = 0) (hq0 : ¬dq = 0) (hx : x.fst + x.snd = dp + dq - 1)
  (h : x.fst ≤ dp) (this : x.fst ≠ dp - 1) (hx1 : x.fst = dp) : x.snd = dq - 1 := by omega

lemma next_coeff_mul_omega_4 (x : ℕ × ℕ) (dp dq : ℕ)
  (hp0 : ¬dp = 0) (hq0 : ¬dq = 0) (hx : x.fst + x.snd = dp + dq - 1)
  (hx1 : ¬(x = (dp, dq - 1) ∨ x = (dp - 1, dq)))
  (h : x.fst ≤ dp) (this : x.fst ≠ dp - 1) (this_1 : x.fst ≠ dp) : dq < x.snd := by omega

lemma next_coeff_mul {p q : polynomial R} (hp : monic p) (hq : monic q) :
next_coeff (p * q) = next_coeff p + next_coeff q :=
begin
  classical,
  by_cases h : nontrivial R, swap,
  { rw nontrivial_iff at h, push_neg at h, apply h, },
  haveI := h, clear h,
  have := monic.nat_degree_mul hp hq,
  dsimp only [next_coeff], rw this,
  simp only [hp, hq, degree_eq_zero_iff_eq_one, add_eq_zero_iff], clear this,
  split_ifs; try { tauto <|> simp [h_1, h_2] },
  rename h_1 hp0, rename h_2 hq0, clear h,
  rw ← degree_eq_zero_iff_eq_one at hp0 hq0, assumption',
  -- we've reduced to the case where the degrees dp and dq are nonzero
  set dp := p.nat_degree, set dq := q.nat_degree,
  rw coeff_mul,
  have : {(dp, dq - 1), (dp - 1, dq)} ⊆ nat.antidiagonal (dp + dq - 1),
  { rw insert_subset, split,
    work_on_goal 0 { rw [nat.mem_antidiagonal, nat.add_sub_assoc] },
    work_on_goal 1 { simp only [singleton_subset_iff, nat.mem_antidiagonal],
      apply nat.sub_add_eq_add_sub },
    all_goals { apply nat.succ_le_of_lt, apply nat.pos_of_ne_zero, assumption } },
  rw ← sum_subset this,
  { rw [sum_insert, sum_singleton], iterate 2 { rw coeff_nat_degree }, ring, assumption',
    suffices : dp ≠ dp - 1, { rw mem_singleton, simp [this] }, apply next_coeff_mul_omega_1; assumption }, clear this,
  intros x hx hx1,
  simp only [nat.mem_antidiagonal] at hx, simp only [mem_insert, mem_singleton] at hx1,
  suffices : p.coeff x.fst = 0 ∨ q.coeff x.snd = 0, cases this; simp [this],
  suffices : dp < x.fst ∨ dq < x.snd, cases this,
  { left,  apply coeff_eq_zero_of_nat_degree_lt, assumption },
  { right, apply coeff_eq_zero_of_nat_degree_lt, assumption },
  by_cases h : dp < x.fst, { left, exact h }, push_neg at h, right,
  have : x.fst ≠ dp - 1, { contrapose! hx1, right, ext, assumption, dsimp only, apply next_coeff_mul_omega_2; assumption },
  have : x.fst ≠ dp,     { contrapose! hx1, left,  ext, assumption, dsimp only, apply next_coeff_mul_omega_3; assumption },
  apply next_coeff_mul_omega_4; assumption
end

Mario Carneiro (Aug 11 2020 at 22:15):

(all I did was extract_goal on all of the omega calls)

Chris B (Aug 11 2020 at 22:32):

The new leaders are (1, 2, and 3) are the factored out omegas.

garbage terms
6335962 : polynomial.monic.next_coeff_mul_omega_3
2922613 : polynomial.monic.next_coeff_mul_omega_4
1885162 : polynomial.monic.next_coeff_mul_omega_2
1090313 : composition_as_set_equiv._proof_5


time (milliseconds)
3998 : polynomial.monic.next_coeff_mul_omega_3
1783 : polynomial.monic.next_coeff_mul_omega_4
1091 : polynomial.monic.next_coeff_mul_omega_2
680 : composition_as_set_equiv._proof_5

Kevin Buzzard (Aug 11 2020 at 22:59):

import tactic

lemma next_coeff_mul_omega_3 (x : ℕ × ℕ) (dp dq : ℕ)
  (hp0 : ¬dp = 0) (hq0 : ¬dq = 0) (hx : x.fst + x.snd = dp + dq - 1)
  (h : x.fst ≤ dp) (this : x.fst ≠ dp - 1) (hx1 : x.fst = dp) : x.snd = dq - 1 :=
begin
  cases dq, contradiction,
  subst hx1,
  rwa [nat.succ_eq_add_one, ←add_assoc, nat.add_sub_cancel, add_right_inj] at hx,
end

Mario Carneiro (Aug 11 2020 at 23:04):

oh, are we playing this game?

lemma next_coeff_mul_omega_3 (x : ℕ × ℕ) (dp dq : ℕ)
  (hp0 : ¬dp = 0) (hq0 : ¬dq = 0) (hx : x.fst + x.snd = dp + dq - 1)
  (h : x.fst ≤ dp) (this : x.fst ≠ dp - 1) (hx1 : x.fst = dp) : x.snd = dq - 1 :=
by rwa [hx1, nat.add_sub_assoc (nat.pos_of_ne_zero hq0), add_right_inj] at hx

Kevin Buzzard (Aug 11 2020 at 23:05):

I was just trying to make mathlib compile quicker!

Kevin Buzzard (Aug 11 2020 at 23:06):

How many times has CI compiled that four second proof?

Kevin Buzzard (Aug 11 2020 at 23:07):

More seriously, do we just have a rubbish omega, or is this the actual nature of omega -- I've heard that the algorithm is super-slow.

Mario Carneiro (Aug 11 2020 at 23:08):

probably a bit of both

Mario Carneiro (Aug 11 2020 at 23:08):

I looked at the proof terms and they are seriously bizarre

Mario Carneiro (Aug 11 2020 at 23:09):

                                           (nat.preform.le ((nat.preterm.var 1 6).add (nat.preterm.cst 1))
                                              (nat.preterm.var 1 2))))))))).and
                        ((nat.preform.eq (nat.preterm.var 1 1) ((nat.preterm.cst 1).add (nat.preterm.var 1 4))).or
                           ((nat.preform.le (nat.preterm.var 1 1) (nat.preterm.cst 1)).and
                              (nat.preform.eq (nat.preterm.var 1 4) (nat.preterm.cst 0))))).and
                       ((nat.preform.eq ((nat.preterm.var 1 1).add (nat.preterm.var 1 3))
                           ((nat.preterm.cst 1).add (nat.preterm.var 1 5))).or
                          ((nat.preform.le ((nat.preterm.var 1 1).add (nat.preterm.var 1 3)) (nat.preterm.cst 1)).and
                             (nat.preform.eq (nat.preterm.var 1 5) (nat.preterm.cst 0))))).and
                      ((nat.preform.eq (nat.preterm.var 1 3) ((nat.preterm.cst 1).add (nat.preterm.var 1 6))).or
                         ((nat.preform.le (nat.preterm.var 1 3) (nat.preterm.cst 1)).and
                            (nat.preform.eq (nat.preterm.var 1 6) (nat.preterm.cst 0)))))
                   ⟨⟨⟨⟨⟨trivial, trivial⟩,
                       ⟨trivial,
                        ⟨trivial,
                         ⟨trivial,
                          ⟨⟨trivial, trivial⟩,
                           ⟨⟨trivial, trivial⟩, ⟨trivial, trivial⟩⟩⟩⟩⟩⟩⟩,
                      ⟨trivial, ⟨trivial, trivial⟩⟩⟩,
                     ⟨trivial, ⟨trivial, trivial⟩⟩⟩,
                    ⟨trivial, ⟨trivial, trivial⟩⟩⟩
                   ⟨⟨⟨⟨⟨⟨⟨trivial, trivial⟩, trivial⟩, ⟨⟨trivial, trivial⟩, trivial⟩⟩,
                       ⟨⟨trivial, trivial⟩,
                        ⟨⟨⟨trivial, trivial⟩, ⟨trivial, trivial⟩⟩,
                         ⟨⟨⟨trivial, trivial⟩, trivial⟩,
                          ⟨⟨⟨⟨trivial, trivial⟩, trivial⟩, ⟨⟨trivial, trivial⟩, trivial⟩⟩,
                           ⟨⟨⟨⟨trivial, trivial⟩, trivial⟩, ⟨⟨trivial, trivial⟩, trivial⟩⟩,
                            ⟨⟨⟨trivial, trivial⟩, trivial⟩,
                             ⟨⟨trivial, trivial⟩, trivial⟩⟩⟩⟩⟩⟩⟩⟩,
                      ⟨⟨trivial, ⟨trivial, trivial⟩⟩,
                       ⟨⟨trivial, trivial⟩, ⟨trivial, trivial⟩⟩⟩⟩,
                     ⟨⟨⟨trivial, trivial⟩, ⟨trivial, trivial⟩⟩,
                      ⟨⟨⟨trivial, trivial⟩, trivial⟩, ⟨trivial, trivial⟩⟩⟩⟩,
                    ⟨⟨trivial, ⟨trivial, trivial⟩⟩, ⟨⟨trivial, trivial⟩, ⟨trivial, trivial⟩⟩⟩⟩
                   (clauses.unsat_cons
                      ([(int.of_nat 0, [-[1+ 0], int.of_nat 1]), (int.of_nat 0,
                          [-[1+ 0], int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 1,
                          [int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 1,
                          [int.of_nat 0, -[1+ 0], int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 1]), (int.of_nat 1,
                          [int.of_nat 0, int.of_nat 0, int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat
                              1])],
                       [(int.of_nat 0, [int.of_nat 1]), (int.of_nat 0, [int.of_nat 0, int.of_nat 1]), (int.of_nat 0,
                          [int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 0,
                          [int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 0,
                          [int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 0,
                          [int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat
                              1]), (int.of_nat 0,
                          [int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat
                              0, int.of_nat 1]), (-[1+ 0],
                          [-[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 0,
                          [-[1+ 0], int.of_nat 1]), (-[1+ 0],
                          [int.of_nat 0, int.of_nat 0, int.of_nat 0, -[1+ 0]]), (-[1+ 0],
                          [int.of_nat 0, -[1+ 0]]), (-[1+ 0],
                          [int.of_nat 0, int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat 0, int.of_nat
                              1])])
                      […, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, …, ([…, …, …, …, (int.of_nat
                              1,
                            […, …, …, …, …, int.of_nat 0, int.of_nat 1])],
                         …), ([(int.of_nat 0, [-[1+ 0], int.of_nat 1]), (int.of_nat 0,
                            [-[1+ 0], int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 1,
                            [int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 0, int.of_nat 1]), (int.of_nat 1,
                            [int.of_nat 0, -[1+ 0], int.of_nat 0, -[1+ 0], int.of_nat 0, int.of_nat 1]), (int.of_nat 0,

Mario Carneiro (Aug 11 2020 at 23:11):

I think we will have to do something about omega soon. It has no maintainer and the author doesn't seem interested in maintenance

Simon Hudon (Aug 11 2020 at 23:12):

I think Seul really wanted to have part of it running in the kernel

Mario Carneiro (Aug 11 2020 at 23:13):

I'm guessing that's what the array of trivialities are doing

Rob Lewis (Aug 11 2020 at 23:15):

Mario Carneiro said:

I think we will have to do something about omega soon. It has no maintainer and the author doesn't seem interested in maintenance

I'm certain we'll have to do something about omega before the Lean 4 transition, because I can't imagine the current one getting ported.

Kevin Buzzard (Aug 11 2020 at 23:19):

I'm assuming that 213ms : real.pi_lt_31416 is implied by 346ms : real.pi_lt_3141593?

Chris Wong (Aug 12 2020 at 09:55):

How much work would it take to clean up omega? That sounds like something I can help out with, but I don't want to lick the cookie if I can't eat it.

Rob Lewis (Aug 12 2020 at 10:00):

omega is 3k lines of pretty intricate code with limited documentation. It's a good proof of concept for reflexive tactics in Lean but I'm not sure it's the right approach overall. Fixing some of the open bugs would be nice, but for serious improvements it might be easier to start from scratch.

Rob Lewis (Aug 12 2020 at 10:02):

For comparison, cloc says linarith is 711 code lines and 752 comment lines, omega is 2342 code and 367 comment.

Rob Lewis (Aug 12 2020 at 10:04):

A rewrite should probably take some inspiration from the lra/lia interaction in Coq and share some code with linarith.

Last updated: May 02 2025 at 03:31 UTC