Zulip Chat Archive

Stream: general

Topic: Another Lean type checker


Chris B (Aug 21 2019 at 05:43):

I'd like to share something I've been working on for a while in an attempt to learn more about Lean's internals and how the gears of dependent types turn (I think I remember seeing a few Rust fans here too). https://github.com/ammkrn/nanoda.git
Going forward I'll be trying to add features aimed at exposing the internals in an informative way and expanding the documentation until (hopefully) the whole thing is annotated.

Mario Carneiro (Aug 21 2019 at 06:16):

awesome

Mario Carneiro (Aug 21 2019 at 06:18):

I wonder if we can hook this up with olean-rs to get typechecking straight from olean files

Mario Carneiro (Aug 21 2019 at 06:21):

Is there any chance of modifying it to be proof-producing? Specifically, I would like a record of how definitional equalities are proven so that it can be consumed by a much less intelligent verifier

Chris B (Aug 21 2019 at 06:28):

Sure, I can put in an option to output the intermediate states of two terms as they go through reduction/inference during the definitional equality checking if that's what you mean.

Mario Carneiro (Aug 21 2019 at 06:34):

Yes, that's the idea. I've been wanting to translate lean proofs into other languages for a while now, and the main blocker is that lean proofs leave a lot of work unfinished to be cleaned up by the typechecker.

Mario Carneiro (Aug 21 2019 at 06:38):

The other big thing I guess is inductive types, although I'm not sure how useful you can make the trace. Ideally I would want a record of why the typechecker thinks an inductive definition is permitted, and how it built the intro and recursion rules and the computation rule

Chris B (Aug 21 2019 at 06:48):

Construction of the intro/elim/reduction rules for inductive types is fairly procedural and they're built from components that exist separately earlier in the module so that shouldn't be hard. wrt to why the typechecker gives any particular definition the thumbs up you might have to narrow down what parts of the 'history' you need since some of the steps produce really large amounts of intermediate info.

Mario Carneiro (Aug 21 2019 at 07:44):

the short answer is "all of it"

Mario Carneiro (Aug 21 2019 at 07:45):

but it's possible I can reconstruct the rest from a reasonable amount of tracing data

Mario Carneiro (Aug 21 2019 at 07:46):

One way to describe the problem is that I want a trace that does not require knowledge of the rules of dependent type theory to verify

Mario Carneiro (Aug 21 2019 at 07:49):

really large amounts of intermediate info

Could you be more specific about this? If you are talking about repeated information that's not a problem, as it can be backreferenced, but if you are generating huge terms during reduction from small inputs that's actually what I care to capture. I assume that doesn't happen very often in lean proofs though

Chris B (Aug 21 2019 at 08:07):

I guess it depends on whether you plan on using an export file type deal or if you want to connect the tools directly. The terms aren't large in terms of memory consumption since the trees are mostly pointers, but the expressions themselves are frequently large in terms of the number of nodes/leaves and they go through a lot of steps in the checker, so if you try to put all that information somewhere persistent like text, then the history can get really big. When I was first trying to figure out why some things worked the way they do I was just doing naive logging and would get these runaway 300 MB log files.

Mario Carneiro (Aug 21 2019 at 08:30):

Right, the problem with pp.all style printing is it ignores subterm duplication, which can cause exponential blowup at the printing stage

Mario Carneiro (Aug 21 2019 at 08:32):

Any method that explicitly represents subterm duplication should be able to address this problem. The export file format uses numbered backreferences, or in human readable printing you can insert lets, or you can just do a memory dump (although this is tricky to set up reliably)

Mario Carneiro (Aug 21 2019 at 08:35):

I usually find the easiest thing to implement is numbered backreferences. You just keep a hashmap of exprs you've printed so far, and every time you see one you have printed before you print a reference to it rather than printing the whole term again

Mario Carneiro (Aug 21 2019 at 08:36):

That's basically what the export format printer does too

Mario Carneiro (Aug 21 2019 at 08:38):

I have a program I want to pipe this into but it would probably be easiest to communicate using some intermediate file format rather than via RPC (plus the intermediate file format has independent interest as a more-explicit lean export format)

Chris B (Aug 21 2019 at 08:43):

Some back of the envelope numbers, in checking mathlib, infer gets called 44.3 million times, def_eq 4.7 million.

Chris B (Aug 21 2019 at 08:44):

Are you targeting single definitions or like all of mathlib at once deal?

Chris B (Aug 21 2019 at 08:57):

I'll check later how much that goes down by if the caches for inference/definitional equality checks are reused globally.

Mario Carneiro (Aug 21 2019 at 10:38):

I don't think all of mathlib at once is out of the question; those numbers are big but not unreasonable

Mario Carneiro (Aug 21 2019 at 10:40):

I would anticipate that if I can get an importer into metamath working it will check in < 1 minute, possibly a lot less

Mario Carneiro (Aug 21 2019 at 10:40):

the biggest worry is if you hand me a 5 terabyte file

Chris B (Aug 21 2019 at 10:50):

Rad. In that case, let me know what I can do concretely to either get the ball rolling or how I can help you to do so.

Mario Carneiro (Aug 21 2019 at 11:48):

I guess if you can get tracing output from the program in any simple machine readable format (similar to the lean export format but feel free to pick your favorite concrete syntax) I can try to process the output and give more specific feedback on where I need more detailed info.

Mario Carneiro (Aug 21 2019 at 11:49):

I probably won't be able to work on this immediately though

Chris B (Aug 21 2019 at 16:31):

Word.


Last updated: Dec 20 2023 at 11:08 UTC