Zulip Chat Archive

Stream: general

Topic: Arbitrary Metaprograms


MohanadAhmed (Oct 05 2023 at 20:42):

In this thread https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/PSA.20about.20trusting.20Lean.20proofs Jannis Limperg writes a metaprogram that (as far I understand) can interfere with the environment and cause incorrect proofs to be accepted. On page 17 of Sebastian Ullrich's PhD thesis the last paragraph states:

The global context or environment holds the top-level declarations defined so far as well as further metadata. The environment is implemented in what is commonly called the LCF approach: it is an abstract data type to metaprogramming code that can only be extended with new declarations by an exported kernel function that rejects any input that is not well-typed in the environment before the extension, guaranteeing that the environment is consistent at all times even in the presence of arbitrary metaprograms.

Can someone help me understand: Where does the argument in the paragraph from the thesis breakdown?

Henrik Böving (Oct 05 2023 at 20:48):

In the idea that the environment can only me modified through the function which was not the case as this meta program was being written

Henrik Böving (Oct 05 2023 at 20:48):

that said I belive the PR that make the constructor of Environment private made this much harder to do in a way that is not noticed and in addition the lean4checker basically ensures this property by limiting itself to only using the kernel functions as it validates an Envrionment

MohanadAhmed (Oct 05 2023 at 21:00):

Thanks for the response Henrik.

So the particular statement:

it is an abstract data type to metaprogramming code that can only be extended with new declarations by an exported kernel function that rejects any input that is not well-typed in the environment before the extension,

If I understand you correctly the "bug" (if it is appropriate to call it that) is that the Environment data structure is not a faithful implementation of the LCF approach? In this case we had (before it was made private) the function Environment.mk that allowed addition of false theorems to the environment.

Henrik Böving (Oct 05 2023 at 21:03):

Indeed, although as I mentioned the lean4checker program nullifies this issue as it loads an Environment and shows all declarations to the kernel again so if there are such declarations around they will definitely be caught

MohanadAhmed (Oct 05 2023 at 21:20):

Is the feature you are talking about which shows all declarations to the kernel is the one here in lean4#2617?

Perhaps a rather silly question: the issue described in Jannis Limperg's thread was on the global context/Environment. The other contexts (local, metavariable context, tactic context) which are described in the next paragraphs of the thesis have no such issues because they are not observed by the kernel? Only the terms synthesized are sent to the kernel?

Henrik Böving (Oct 05 2023 at 21:28):

No it is this a separate program: https://github.com/leanprover/lean4checker/

Yes the Environment is the only thing that gets actively observed by the kernel, everything else is just means for meta programs to construct terms.

Mario Carneiro (Oct 05 2023 at 22:23):

MohanadAhmed said:

Can someone help me understand: Where does the argument in the paragraph from the thesis breakdown?

The environment is implemented in what is commonly called the LCF approach: it is an abstract data type to metaprogramming code that can only be extended with new declarations by an exported kernel function that rejects any input that is not well-typed in the environment before the extension, guaranteeing that the environment is consistent at all times even in the presence of arbitrary metaprograms.

This sentence is incorrect. The Environment is not an abstract data type, it is a regular data type which arbitrary metaprograms can modify arbitrarily. The PR to make the constructor private approximates an abstract data type more closely, but lean does not truly have abstract data types, and it is not possible to build LCF style provers in lean as a result - the type system is not airtight enough to hang system soundness on.

Wojciech Nawrocki (Oct 05 2023 at 22:32):

I would also argue that it is not an issue for Lean. Part of the reason why in LCF-style systems a 'truly abstract' (in this case meaning one whose values cannot be constructed by functions outside of the kernel module other than by going through kernel functions) datatype is needed is that proof terms are not recorded. In some systems, the proof datatype is implemented as roughly, unit. So you better make sure that this definition cannot be observed. In contrast, Lean records and stores full proof terms that can be verified by an independent proof checker. From that perspective, Lean is more of a proof construction environment for proofs that can also be checked separately than it is a proof checker. I seem to recall Mario phrasing a similar view on the other thread.

MohanadAhmed (Oct 05 2023 at 22:42):

Mario Carneiro said:

... but lean does not truly have abstract data types ....

Can you explain more on what this means?

MohanadAhmed (Oct 05 2023 at 22:53):

Wojciech Nawrocki said:

In contrast, Lean records and stores full proof terms that can be verified by an independent proof checker. From that perspective, Lean is more of a proof construction environment for proofs that can also be checked separately than it is a proof checker. I seem to recall Mario phrasing a similar view on the other thread.

Is Lean being a proof construction environment an intentional design choice?

My current understanding is that lean exports olean files. An independent proof checker would have to deal with these oleans. The format of oleans changes from time to time.oleans are as Mario describes them "an unstable binary format". Is it possible for the Oleans (or some other export format) to have a specification or are there design considerations that discourage this?

Wojciech Nawrocki (Oct 06 2023 at 01:07):

Is Lean being a proof construction environment an intentional design choice?

This I do not know; you would have to ask the language team :)

Is it possible for the Oleans (or some other export format) to have a specification or are there design considerations that discourage this?

I would say it's entirely possible, but fixing a specification now would make it harder to the developers to iterate on what's stored in oleans. Also if you made the specification reflect the current format, it would probably be a bit weird because the current approach more or less just dumps the in-memory representation (rather a compacted version of it) of an object graph into the olean file, for performance reasons.


Last updated: Dec 20 2023 at 11:08 UTC