Zulip Chat Archive

Stream: general

Topic: how do ITP devs test their systems for correctness


Huỳnh Trần Khanh (Jul 17 2021 at 13:12):

of course an untested program is likely to be incorrect... and handcrafted test cases don't cover all possible execution paths... how do ITP devs test their systems for correctness then, given that they don't have a reference implementation to test their systems against... what strategies do ITP devs employ to ensure that their systems are trustworthy?

Mario Carneiro (Jul 17 2021 at 13:14):

using the system works pretty well

Mario Carneiro (Jul 17 2021 at 13:14):

do proofs, etc

Mario Carneiro (Jul 17 2021 at 13:15):

"they don't have a reference implementation to test their systems against" that depends on the system in question. For metamath there are definitely lots of reference/alternate implementations to test against

Mario Carneiro (Jul 17 2021 at 13:16):

it's not a requirement that every ITP has to invent its own mathematical foundation

Mario Carneiro (Jul 17 2021 at 13:17):

and that's actually a little detrimental for the reasons you say

Mario Carneiro (Jul 17 2021 at 13:18):

Catching soundness bugs is hard; usually they are primarily prevented by lots of care in the relevant parts of the system

Mario Carneiro (Jul 17 2021 at 13:18):

but translation out of your system into some other system is very good at sniffing out soundness issues

Mario Carneiro (Jul 17 2021 at 13:24):

Of course, the gold standard is to have a proof of correctness of the ITP system. Unfortunately there aren't too many people working on making that a reality (cue shameless plug)

Sebastian Ullrich (Jul 17 2021 at 14:29):

I thought they did it by throwing money at the problem

Mario Carneiro (Jul 17 2021 at 14:36):

lol, they went out of business because folks found too many soundness holes in coq/agda

Eric Bond (Jul 17 2021 at 22:32):

Proof assistants based on type theory usually have something called a kernel. The kernel is basically a type checker that checks the correctness of your proof. The goal is to keep the kernel as small as possible which ideally makes it easier to verify. By having a small trusted kernel, developers of a theorem proving system do not have to worry about what tactics and higher level language constructs are doing to the proof object as long as it is checked by the kernel***(ideally).

Since the kernel is a type checker for a type theory, you can prove metatheorems about your type theory by hand or in another proof assistant. Some type theories have large implementations like Coq and its Calculus of Inductive Constructions which make verifying the code implementation harder. (Interestingly a large chunk of that is code for inductive types which are actually derivable within other type theories. See Calculus of Dependent Lambda Eliminations and Cedille)

As for verifying the code implementing your typechecker you could use existing research and development for verifying any software. Lean4 seems to be written in C++ so something like AutoCorres could be applied to kernel code.


Last updated: Dec 20 2023 at 11:08 UTC