Stream: general

Topic: Lean Kernel Arena

Joachim Breitner (Jan 19 2026 at 18:34):

Discussion thread for #announce > Lean Kernel Arena

James E Hanson (Jan 19 2026 at 19:29):

Is it necessary to use nix develop to run the tests locally?

Joachim Breitner (Jan 19 2026 at 19:43):

No, you can install the python dependencies any other way.

Or, if your goal is to test your own checkers, you can download the tarball that's linked from the front page. It contains the built tests.

James E Hanson (Jan 19 2026 at 19:47):

I figured out what the issue was. I needed to set perf_event_paranoid lower.

James E Hanson (Jan 19 2026 at 19:50):

It looks like the one false positive being reported for Nanoda is that it accepts non-propositional theorem declarations. Is this really that important from a soundness perspective? My understanding is that at the level of type checking, the difference between a definition and a theorem is essentially cosmetic.

Joachim Breitner (Jan 19 2026 at 19:54):

It's borderline of course - you cannot prove false with it. On the other hand it is an invariant checked by the official kernel (and also lean4lean). So in the interest of compatibility, and with not much more to rely on for a specification than the official kernel, it'd say it's probably best if checkers agree on these cosmetic things.

If Chris Bailey has reasons to not implement this check I can also add a test label and opt his checker out, so that it doesn't look red. These things are still settling, hence the “under construction” label. The other red tests are also just temporary things.

Joachim Breitner (Jan 19 2026 at 19:56):

I expect we'll encounter more corner cases where it's unclear if some behavior of the official kernel is just an implementation choice, or part of the checker contract, and hopefully we'll learn something useful from them.

Chris Henson (Jan 19 2026 at 20:05):

I just want to say that I think this is a great effort! I've been interested in writing a checker for a while now and think this will be a helpful and motivating resource. I've also been working a bit on a somewhat "beginner friendly" blog post about the kernel, ZFC equiconsistency, etc. and this is great to point to as a unified location for these external checkers.

Chris Bailey (Jan 19 2026 at 20:19):

I have implemented the non-propositional theorem check. Some other odds and ends I have locally for the json move are an unsafe_permit_all_axioms config option which was discussed elsewhere as a "nice to have" for this kind of application, a more harsh restriction on duplicate names, which James Hanson brought up in another thread, and a config option pp_axioms that defaults to true, and pretty-prints the used axioms at the end of type checking.

Jason Rute (Jan 20 2026 at 20:05):

What is the input to these kernels? Lean code (pre-compiled)? oleans? Exported proofs? I ask, because we know some input forms are more susceptible to hacks.

Jason Rute (Jan 20 2026 at 20:12):

Also, as Comparator becomes more mature (and more needed as vibe coded proofs become common), it would be useful to make a version of this site which also focuses on the Comparator interface (with a challenge file). Then one could add SafeVerify to the list, Comparator with different external checkers (when that is available), and lean4checker (to highlight that there are a lot of things that lean4checker doesn't check like axioms, environment hacks, and changes to the code like removing the theorem you want to prove). Then you can also add other sorts of Lean hacks and bugs to the tests, such as all those used in the Comparator/safe verify tests, the ones @James E Hanson recently uncovered (which don't effect Comparator, but do effect SafeVerify), and maybe even attempted hacks which attempt to change files on the file system.

Joachim Breitner (Jan 20 2026 at 20:37):

Jason Rute schrieb:

What is the input to these kernels? Lean code (pre-compiled)? oleans? Exported proofs? I ask, because we know some input forms are more susceptible to hacks.

Exported proofs, in the new JSON based format that safely encodes strings.

Joachim Breitner (Jan 20 2026 at 20:40):

For now the arena will focus on checkers. A nice safe walled garden where all you have to worry about whether a certain text file with an exported is proof is good or bad. It's only one puzzle piece in the overall story, but I think it help to keep separate concerns separate.

Chris Henson (Jan 20 2026 at 21:14):

I think it is very reasonable to start with exported proofs and external checkers. It seems like it would be very hard to piece apart into a similar benchmark the trust apportioned to oleans or proof export that is assumed for all but the most adversarial of input.

James E Hanson (Jan 21 2026 at 02:21):

Incidentally, why is Nanoda almost twice as fast as the Lean kernel at checking Mathlib?

Chris Bailey (Jan 21 2026 at 05:41):

James E Hanson said:

Incidentally, why is Nanoda almost twice as fast as the Lean kernel at checking Mathlib?

It only has to do this one thing, so it's able to get away with more spartan data structures and memory management strategies. I think using deBruijn levels instead of unique identifiers for free variables also helps with caching in some spots but that's more speculative.

Joachim Breitner (Jan 21 2026 at 07:00):

Or maybe Chris is just a great programmer :-)

Joachim Breitner (Feb 25 2026 at 15:30):

Kernel Arena Update

Allow me to share some updates on the kernel arena at https://arena.lean-lang.org/

The Arena now sports 5 separate checkers:

The official kernel (in various versions)
@Mario Carneiro’s lean4lean
@Chris Bailey’s nanoda
A naive kernel implementation in Lean by me, the mini-kernel
A WIP kernel in Python by @Julian Berman called rpylean

The first three are serious, complete kernel implementations that correctly pass all the tests on the arena (not all at the same time, though, see below).

The kernel arena also comes with a the Kernel Tutorial Tests, a sequence of small test inputs (both good and bad) that exercise some aspects of a checker. The idea is that if you want to write a checker and don’t know where to start, just follow these tests in sequence and expand/fix your implementation to get the next one green. I have created a Verso-based rendering of these tests for easy inspection (@Julian Berman may currently be most interested in this).

Writing these checks has already helped to uncover and fix issues in the two external checkers.

The mini kernel does not support all of lean, but for the fragment it supports it should be sound. It probably isn’t, so if you enjoy breaking things, please do find the holes in this, as every hole here is a missing test case in the tutorial. If you find one, I’ll owe you a drink at the next lean event! (I know that I am not checking for duplicate declarations yet, as that is a bit tricky with the way the test files are generated. Besides that I made some implementation choices there that are “risky”, and just because I can’t exploit it yet doesn’t mean you cannot; see the README)

Currently the arena runs all checkers on all tests in one go to update the page. Takes 6h (on a rather weak virtual machine). I manually run that when something interesting has changed, not on every commit, though.

Two issues so far:

At some point I had everything on versions that lean4lean liked, but when I bumped some tests to get fixes in a proof that makes nanoda accept it, lean4lean started to stumble again (hence the “declined” tests), presumably until that Lean version gets released and lean4lean adjusted.

I expect things to start to settle, though. I hope that once we have a stable release that works with all checkers, we can update at a more sensible pace (e.g. to the next stable release as soon as all checkers had time to adjust, if needed).
The specification for external checkers is not clear. Is it simply the official kernel, including all possible incompletenesses or other strange corner case behavior? How should we treat tests where a different result is also reasonable, in some way? For example @Arthur Adjedj has proposed some clean-ups to the kernel, and it would be nice to test that on the arena. One option is to somehow add a label to the test that this is testing an “alternative world”, and tally/present such tests differently.

Julian Berman (Feb 25 2026 at 15:33):

Oh wow. That Verso page is great! I was indeed getting very tired of using my mental compiler to understand bad_defs already! Thank you for all your work Joachim it's been super helpful! (And to @Chris Bailey and everyone of course too)

Snir Broshi (Feb 25 2026 at 15:49):

Incredible! Question: IIUC the tests are JSONs exported by lean4export, which are the input to the checker. Is it possible to browse these JSONs in the Verso rendering, or do we call lean4export manually, or did I misunderstand something?

Patrick Massot (Feb 25 2026 at 16:44):

Joachim, you only need to use #35732 to get more candidates for your arena!

Chris Henson (Feb 25 2026 at 16:58):

I wanted to mention that is appreciated that you have recently added CSLib as one of the tests. Space was limited for a short paper format, but I made sure to briefly mention it in https://www.arxiv.org/abs/2602.15078 :)

Chris Henson (Feb 25 2026 at 16:58):

I have started working on my own checker using the new OxCaml compiler, though I think I'll be a bit slow in finding time to getting it very feature complete. I am using it as positive reinforcement for completing other work I've been procrastinating on :sweat_smile:

Joachim Breitner (Feb 25 2026 at 17:11):

Snir Broshi schrieb:

Is it possible to browse these JSONs in the Verso rendering,

That's what you are looking at, the rendering tools reads these exports.

Joachim Breitner (Feb 25 2026 at 17:13):

Patrick Massot schrieb:

Joachim, you only need to use #35732 to get more candidates for your arena!

Once thats merged and mathlib bumped on the arena, it'll be kernels checking kernels.

Snir Broshi (Feb 25 2026 at 20:12):

Joachim Breitner said:

Snir Broshi schrieb:

Is it possible to browse these JSONs in the Verso rendering,

That's what you are looking at, the rendering tools reads these exports.

But I want to see the JSONs on the website, it currently shows Lean code only

Joachim Breitner (Feb 25 2026 at 22:22):

Ah, you want raw JSON! Currently you can download https://arena.lean-lang.org/lean-arena-tests.tar.gz. I guess it wouldn’t be too hard to keep the json files in the web output and link them from the various places, not sure how much interest there is – once you have written your parser you likely don’t care much about the JSON any more.

James E Hanson (Feb 25 2026 at 23:41):

Snir Broshi said:

Incredible! Question: IIUC the tests are JSONs exported by lean4export, which are the input to the checker. Is it possible to browse these JSONs in the Verso rendering, or do we call lean4export manually, or did I misunderstand something?

The export JSON format is not very human-readable. Every syntactic node in every expression has its own entry and entries refer to each other by number rather than by name.

James E Hanson (Feb 25 2026 at 23:45):

For example, the exported version of Nat.add_succ is 572 lines long and I believe this essentially just contains the same information as the theorem declaration

@[defeq] theorem Nat.add_succ : ∀ (n m : ℕ), n + m.succ = (n + m).succ :=
fun n m => rfl

and the reduced proof term

fun n m => Eq.refl ((Nat.rec ⟨fun x => x, PUnit.unit⟩ (fun n n_ih => ⟨fun x => (n_ih.1 x).succ, n_ih⟩) m).1 n).succ

Henrik Böving (Feb 25 2026 at 23:45):

And all declarations that are used in the transitive closure of constants

James E Hanson (Feb 25 2026 at 23:47):

Right, but there aren't that many for Nat.add_succ, right?

James E Hanson (Feb 25 2026 at 23:50):

The only user-level things I'm seeing are constants related to Nat, Eq, HAdd, Add, PUnit, and PProd.

James E Hanson (Feb 25 2026 at 23:52):

Oh I guess there's also Nat's Add instance, instAddNat.

Snir Broshi (Feb 26 2026 at 00:46):

Joachim Breitner said:

not sure how much interest there is – once you have written your parser you likely don’t care much about the JSON any more.

Oh so kernels are meant to write parsers, that's what I misunderstood, thanks!
I thought the point was to write minimalistic checkers only, that take in an Expr as JSON and type-check it.
Requiring a parser is raising the bar quite high, no?
Are there tests for just the parser part?

(I agree that technically the parser could be considered as part of the trusted codebase, since it could change the meaning of theorems)

James E Hanson (Feb 26 2026 at 01:57):

Snir Broshi said:

Requiring a parser is raising the bar quite high, no?
Are there tests for just the parser part?

I think Joachim is just talking about parsing the JSON itself.

Snir Broshi (Feb 26 2026 at 02:37):

James E Hanson said:

Snir Broshi said:

Requiring a parser is raising the bar quite high, no?
Are there tests for just the parser part?

I think Joachim is just talking about parsing the JSON itself.

I'm confused, are you sure? They said "once you have written your parser you likely don’t care much about the JSON any more" which doesn't seem to fit, unless you consider the in-memory tree structure of JSON to not be JSON, and only consider the serialized form as JSON. But then the tests should still display the parts you do care about, which is the tree structure, and to display that you have to serialize it back to a JSON string, so we're back to caring about JSON. So you've confused me.

My point is- whatever you have to parse, whether it is Lean or JSON (or whatever you call a "parsed" JSON, even though I think that distinction is impossible), that should be on the website. After all, the point of the website is to display tests for the checker.
If the kernels only have to parse JSON and not Lean, since it isn't super readable then perhaps a side-by-side view of Lean & JSON is a good solution?

Joachim Breitner (Feb 26 2026 at 06:48):

I am assuming you take an off the shelf JSON parser which takes the raw bytes and returns a generic structured representation of JSON objects. Then you start from there and parse that into your checkers's representation of expressions and declarations. I wager that the raw textual JSON isn't that interesting once this translation is written. And if it is, just get the tarball and look in there

Snir Broshi (Feb 26 2026 at 12:41):

Joachim Breitner said:

I wager that the raw textual JSON isn't that interesting once this translation is written.

But that's the input to the checker! How do you expect someone to implement a checker if you don't show them the input? How would I even write the translation without seeing the input?
You created this beautiful website but it seems like one still has to download the tarball and browse it to implement a checker, ignoring the website.

Joachim Breitner (Feb 26 2026 at 13:04):

My assumption is that if someone actually writes a checker, they want to run it, so they need the JSON files locally anyways, so viewing them in the browser isn't that useful. Once the parser is done, however, and you start implementing features, you no longer care about the raw JSON but the expressions they encode, and it is very useful to see them in a human-readable format.

Also, someone who is not developing a checker but wants to understand what a test is about likely only cares about the declarations , and not the raw export encoding, so they also benefit from the nice view.

Chris Bailey (Feb 26 2026 at 13:15):

Snir Broshi said:

Joachim Breitner said:

I wager that the raw textual JSON isn't that interesting once this translation is written.

But that's the input to the checker! How do you expect someone to implement a checker if you don't show them the input? How would I even write the translation without seeing the input?

There's a spec: https://github.com/leanprover/lean4export/blob/master/format_ndjson.md

Snir Broshi (Feb 26 2026 at 13:16):

Joachim Breitner said:

My assumption is that if someone actually writes a checker, they want to run it, so they need the JSON files locally anyways, so viewing them in the browser isn't that useful.

I'm considering writing a checker (which is why I'm following this thread), and I find the website pretty but not helpful at the moment. I'm not sure if that contradicts your assumption (since I might realize later that you're right and the JSONs are not useful), but it's a datapoint.

Snir Broshi (Feb 26 2026 at 13:21):

The main advantage of the website vs local files is that it can place the JSON and Lean side-by-side.

Joachim Breitner (Feb 26 2026 at 13:35):

I’m sorry to hear that it is not useful to you.

Julian Berman (Feb 27 2026 at 23:16):

Snir I think when you start writing one the parser is all you see and it's hard to get off the ground. I will say my experience does match that once the parser is written (or in our case me and @Aaron Hill have written and rewritten it 4 times now) that then we stop looking at all at the JSON, it's too long and noisy for a human, and mostly look at our own domain objects (our re-representations of Lean Expr and things)

Last updated: Feb 28 2026 at 14:05 UTC