Zulip Chat Archive

Stream: general

Topic: mm0

Johan Commelin (Oct 29 2019 at 09:20):

@Mario Carneiro Do you plan on having some Lean → MM0 compiler? Does such a thing even make sense to have?

Mario Carneiro (Oct 29 2019 at 09:21):

Yes it makes sense to have, and it would be valuable too as a way to prove consistency of lean (i.e. formally verifying my thesis)

Mario Carneiro (Oct 29 2019 at 09:22):

it's certainly easier to do than any other DTT based system from sheer familiarity and a precise specification, but DTT is still complicated

Johan Commelin (Oct 29 2019 at 09:22):

Aha, that's about what I expected

Johan Commelin (Oct 29 2019 at 09:23):

Also... is there some Makefile that I can use to compile stuff in your repo?

Johan Commelin (Oct 29 2019 at 09:23):

Which incantations do I need to get mm0-c and friends working?

Mario Carneiro (Oct 29 2019 at 09:23):

no makefile, but most of the things there are one line compile commands

Mario Carneiro (Oct 29 2019 at 09:23):

to compile mm0-c, do gcc main.c

Mario Carneiro (Oct 29 2019 at 09:24):

to compile mm0-hs, stack install mm0-hs from the mm0-hs directory

Johan Commelin (Oct 29 2019 at 09:24):

The last one needs a stable internet connection, I fear

Johan Commelin (Oct 29 2019 at 09:24):

I'm on mobile atm

Johan Commelin (Oct 29 2019 at 09:25):

Ooh, there is actually a make.sh in that directory (-;

Mario Carneiro (Oct 29 2019 at 09:26):

that adds a few more optimizations (profile guided optimization based on the result of verifying peano.mmb)

Johan Commelin (Oct 29 2019 at 09:27):

I see

Mario Carneiro (Oct 29 2019 at 09:28):

it's hard to write C programs without a complicated compiler invocation, but it's so much easier to use when you can

Johan Commelin (Oct 29 2019 at 09:28):

And if I run ./a.out ../example/peano.mmb and there is no output. This means all proofs are verfied?

Mario Carneiro (Oct 29 2019 at 09:28):

yes

Mario Carneiro (Oct 29 2019 at 09:29):

check the error code to be sure, but it should also give you an error dump if something goes wrong

Johan Commelin (Oct 29 2019 at 09:29):

This main.gcda file is containing optimizations for gcc?

Johan Commelin (Oct 29 2019 at 09:30):

Why does it get generated everytime?

Mario Carneiro (Oct 29 2019 at 09:30):

yes, you turn on some profile generation flag in gcc and it instruments the program to produce that file as it runs, then you feed it back into the compiler for a second build

Johan Commelin (Oct 29 2019 at 09:30):

But the second build should leave me with an a.out that doesn't generate those files, right?

Mario Carneiro (Oct 29 2019 at 09:30):

the second build doesn't generate the file

Mario Carneiro (Oct 29 2019 at 09:31):

but it doesn't clean it up either

Johan Commelin (Oct 29 2019 at 09:31):

Doesn't work for me.

Johan Commelin (Oct 29 2019 at 09:32):

I run ./make.sh and then ./a.out ../examples/peano.mmb. Result: command passes without errors, but I do get the gcda file

Johan Commelin (Oct 29 2019 at 09:32):

I should note that ./make.sh gave some errors from the gcc run

Johan Commelin (Oct 29 2019 at 09:32):

Errrr... warnigns

Johan Commelin (Oct 29 2019 at 09:32):

In file included from verifier_types.c:2,
                 from verifier.c:1,
                 from main.c:10:
index.c: In function ‘init_index’:
index.c:23:16: warning: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Waddress-of-packed-member]
   23 |     gi_sorts = ih->p_sorts;
      |                ^~
In file included from verifier_types.c:2,
                 from verifier.c:1,
                 from main.c:10:
index.c: In function ‘init_index’:
index.c:23:16: warning: taking address of packed member of ‘struct <anonymous>’ may result in an unaligned pointer value [-Waddress-of-packed-member]
   23 |     gi_sorts = ih->p_sorts;
      |

Johan Commelin (Oct 29 2019 at 09:33):

But that doesn't sound very serious

Mario Carneiro (Oct 29 2019 at 09:34):

what version?

Mario Carneiro (Oct 29 2019 at 09:34):

I have gcc 8.3.0

Johan Commelin (Oct 29 2019 at 09:36):

gcc --version
gcc (GCC) 9.2.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Mario Carneiro (Oct 29 2019 at 09:43):

I updated to gcc 9 and fixed the warnings you found, and simplified make.sh to remove the extra options

Johan Commelin (Oct 29 2019 at 09:47):

Nice. Seems to work now

Johan Commelin (Oct 29 2019 at 09:49):

@Mario Carneiro The file types.c... could that also have been called sorts.c?

Mario Carneiro (Oct 29 2019 at 09:49):

that's types in the C sense

Mario Carneiro (Oct 29 2019 at 09:50):

it's basically the definition of all the data structures in the file

Johan Commelin (Oct 29 2019 at 09:50):

Aha :thinking:

Johan Commelin (Oct 29 2019 at 09:52):

What's the purpose of index.c?

Johan Commelin (Oct 29 2019 at 09:53):

This is some sort of special made dictionary? Where you look up mm0 declarations?

Mario Carneiro (Oct 29 2019 at 09:54):

It's a nonessential part of the verifier that finds the names of things so that error reporting isn't completely unreadable

Mario Carneiro (Oct 29 2019 at 09:55):

When you compile with -D BARE it leaves that whole file out

Johan Commelin (Oct 29 2019 at 09:56):

Aha

Johan Commelin (Oct 29 2019 at 10:00):

Why is g_store called that? What does the g stand for?

Mario Carneiro (Oct 29 2019 at 10:00):

global variable

Johan Commelin (Oct 29 2019 at 10:00):

Next: what does HIGHWATER mean?

Johan Commelin (Oct 29 2019 at 10:01):

As a Dutchy it brings scary memories to mind... but that's not what your are meaning there :grinning:

Mario Carneiro (Oct 29 2019 at 10:01):

If the -D HIGHWATER compile option is enabled, it keeps statistics on how much of the various stacks are used, and reports the largest the stack ever got to

Mario Carneiro (Oct 29 2019 at 10:02):

This is because the verifier doesn't reallocate if it runs out of space, it just fails, so it's helpful to know whether the memory limits are reasonable

Mario Carneiro (Oct 29 2019 at 10:03):

I think the term comes from "high-water mark" used on ships

Johan Commelin (Oct 29 2019 at 10:05):

@Mario Carneiro So, roughly speaking, all the juicy bits are in verifier.c?

Mario Carneiro (Oct 29 2019 at 10:05):

yes

Mario Carneiro (Oct 29 2019 at 10:05):

there is a decent amount of comments and high level explanation in types.c though

Johan Commelin (Oct 29 2019 at 10:06):

<600 lines. Well done! I'll try to grasp it. But my train journey is almost over. So I'll have to postpone it for a while

Kevin Buzzard (Oct 29 2019 at 10:11):

Another mm0 question: where do the community envisage the Lean proof of the prime number theorem living? Will it need maintaining by a human? By a machine? Will it go in mathlib somewhere? People might need it.

Mario Carneiro (Oct 29 2019 at 10:19):

I think on another thread I suggested breaking that part of the mm0 repo into its own repo, following standard lean repo practices. The set.mm export has some manually written setup and postprocessing files, with a big autogenerated folder sitting between them; all of this can be stored like a regular lean project repo, and you can build more dependent files in that repo. Basically it would be devoted to set.mm and the lean facts you can derive from it

Kevin Buzzard (Oct 29 2019 at 10:28):

I see. So if I wanted to use PNT in some work I could just make this repo a dependency?

Mario Carneiro (Oct 29 2019 at 10:28):

yes

Mario Carneiro (Oct 29 2019 at 14:00):

@Johan Commelin I pushed a documentation upgrade to verifier.c in case you wanted to read it more carefully

Johan Commelin (Oct 29 2019 at 16:42):

@Mario Carneiro I don't see the push

Johan Commelin (Oct 29 2019 at 16:43):

https://github.com/digama0/mm0/blob/master/mm0-c/verifier.c#L87

#define pop_stack() ({ \
  ENSURE("unify stack underflow", g_stack_top > g_stack); \
  *(--g_stack_top); \
})

should this error message just say "stack underflow", instead of "unify stack"?

Johan Commelin (Oct 31 2019 at 05:16):

@Mario Carneiro Talking of error messages: some of them are capitalized, others not. Certainly a minor issue.

Mario Carneiro (Oct 31 2019 at 05:16):

heh

Mario Carneiro (Oct 31 2019 at 05:16):

where?

Johan Commelin (Oct 31 2019 at 05:16):

Also, thanks a lot for all the comments.

Johan Commelin (Oct 31 2019 at 05:16):

L234 and L235 of verifier.c

Johan Commelin (Oct 31 2019 at 05:17):

@Mario Carneiro grep "\"[A-Z]" verifier.c

Johan Commelin (Oct 31 2019 at 05:18):

You might want to git grep the entire project … — … if you care …

Mario Carneiro (Oct 31 2019 at 05:18):

fixed, I'll push later

Johan Commelin (Oct 31 2019 at 06:00):

@Mario Carneiro I'm confused. https://github.com/digama0/mm0/blob/master/mm0-hs/mm1.md is talking about mmu files, but your paper doesn't mention them, I think.

Johan Commelin (Oct 31 2019 at 06:01):

So the mm0-c folder is all about checking mmb files

Johan Commelin (Oct 31 2019 at 06:01):

And I think mm0-hs is about compiling certain files into mmb files

Johan Commelin (Oct 31 2019 at 06:02):

But I'm not sure if I completely grasp the flow diagram, and which programs are used in which parts

Mario Carneiro (Oct 31 2019 at 06:02):

The mmu format was indeed omitted from the paper. It is basically a text version of mmb available for debugging purposes

Mario Carneiro (Oct 31 2019 at 06:03):

In the flow diagram, the "editor" and "compiler" parts are mm0-hs and the "verifier" part is mm0-c

Johan Commelin (Oct 31 2019 at 06:04):

Right... but the "compiler" is also part of the "trusted core", right?

Mario Carneiro (Oct 31 2019 at 06:04):

Johan Commelin (Oct 31 2019 at 06:04):

Why not?

Mario Carneiro (Oct 31 2019 at 06:05):

it produces an mm0 file (or you provide an mm0 file to check against) and an mmb file, and they stand on their own

Johan Commelin (Oct 31 2019 at 06:05):

But verifier.c only works with mmb files, right?

Mario Carneiro (Oct 31 2019 at 06:05):

You are supposed to be able to read the mm0 file to know what is being proven, and the mmb file is verified to prove the theorems in the mm0 file by the verifier

Mario Carneiro (Oct 31 2019 at 06:06):

Actually mm0-c is incomplete; the mm0 checking part didn't make it in time for the paper

Johan Commelin (Oct 31 2019 at 06:06):

Aah, right

Johan Commelin (Oct 31 2019 at 06:06):

Because for all I know, you just compile Fermat into true, and the proof is trivial

Mario Carneiro (Oct 31 2019 at 06:06):

exactly

Johan Commelin (Oct 31 2019 at 06:07):

No wonder you can get <1s compile times :angry_devil:

Johan Commelin (Oct 31 2019 at 06:07):

Lol, I trust you

Johan Commelin (Oct 31 2019 at 06:07):

But I would like to see the mm0 checking part at some point :grinning:

Mario Carneiro (Oct 31 2019 at 06:07):

Note that the mm0 checking part is proportional to the size of the mm0 file. For a problem like Fermat this will be quite small

Johan Commelin (Oct 31 2019 at 06:08):

The c code will not be proportional to anything

Johan Commelin (Oct 31 2019 at 06:08):

But the running time might be. And I will believe that it is negligable

Mario Carneiro (Oct 31 2019 at 06:09):

mm0-hs has a verifier that checks the mm0 file, but it reads mmu files instead of mmb

Johan Commelin (Oct 31 2019 at 06:09):

Aha

Mario Carneiro (Oct 31 2019 at 06:10):

(that's mm0-hs verify, in the Verifier.hs file if you care to look)

Johan Commelin (Oct 31 2019 at 06:10):

/me takes a break to munch on some breakfast :bread: :glass_of_milk:

Johan Commelin (Oct 31 2019 at 07:21):

@Mario Carneiro Is there machinery to go back and forth between mmu and mmb?

Johan Commelin (Oct 31 2019 at 07:22):

How hard is it going to be to do the mm0 checking part in mm0-c? Sounds like you could do it in less than an afternoon...

Mario Carneiro (Oct 31 2019 at 07:22):

not directly right now; there should be a mm0-hs subcommand to do that

Mario Carneiro (Oct 31 2019 at 07:23):

I think it is not hard; for a while I was planning to add a way for mmb to help with distinctness checking (the map from names to indexes in the file should be injective), but I think that's probably overengineering and I'll just allocate a hashmap instead

Johan Commelin (Oct 31 2019 at 08:56):

Do/will/should mm1 files support unicode?

Mario Carneiro (Oct 31 2019 at 08:56):

Mario Carneiro (Oct 31 2019 at 08:57):

well, mm1 could, I guess, but mm0 doesn't

Johan Commelin (Oct 31 2019 at 08:57):

Right.

Johan Commelin (Oct 31 2019 at 08:57):

But mm1 is the format that you want mathematicians to read, right?

Mario Carneiro (Oct 31 2019 at 08:58):

mm1 is for the proof writer, mm0 is for the auditor

Johan Commelin (Oct 31 2019 at 08:59):

I fear that mm0 is impenetrable for many mathematicians

Mario Carneiro (Oct 31 2019 at 08:59):

My aim was "average for a theorem prover language"

Mario Carneiro (Oct 31 2019 at 09:00):

with as close to zero support infrastructure to allow that as possible

Mario Carneiro (Oct 31 2019 at 09:00):

unicode is above average

Mario Carneiro (Oct 31 2019 at 09:00):

and it does complicate the verifier

Mario Carneiro (Oct 31 2019 at 09:01):

but TBH there is a whole slew of landmines in the unicode spec, and lots of ways to make something misleading

Mario Carneiro (Oct 31 2019 at 09:01):

not good for an audit

Mario Carneiro (Oct 31 2019 at 09:02):

I'm pretty sure we have had a few threads here where we play with unicode and notations to apparently prove false

Johan Commelin (Oct 31 2019 at 09:06):

Yep, that's certainly right

Johan Commelin (Oct 31 2019 at 09:07):

I think I would like to see how 100 lines of group (or monoid) theory look like in mm0

Johan Commelin (Oct 31 2019 at 09:10):

I am currently trying to run stack install mm0-hs

Johan Commelin (Oct 31 2019 at 09:13):

@Mario Carneiro You do allow infix notation. What would be the right way to setup the group operation. Would you have a global g* infix that is the global group operation?

Johan Commelin (Oct 31 2019 at 09:13):

Because I don't think you can do operator overloading, right?

Mario Carneiro (Oct 31 2019 at 09:15):

Currently the notation command is limited to those with a unique constant at the start, but I think I can relax that to "infixy" notations that start with a variable but are followed by a unique constant. Then you could declare a notation x `+g[` G `]` y

Mario Carneiro (Oct 31 2019 at 09:16):

Alternatively, you could just have x +g y but it produces a "group expression" that is then evaluated in a particular group via x =[G] y

Johan Commelin (Oct 31 2019 at 09:17):

I wonder if it is worth it.

Mario Carneiro (Oct 31 2019 at 09:17):

Or you could just axiomatize groups and have x + y

Johan Commelin (Oct 31 2019 at 09:17):

I would rather just write add x (add y z), I guess

Mario Carneiro (Oct 31 2019 at 09:17):

You can have all that in mm1 / your favorite editor

Mario Carneiro (Oct 31 2019 at 09:18):

but mm0 is all about not hiding complicated stuff

Johan Commelin (Oct 31 2019 at 09:18):

Well, I'm interested in the first 100 lines of a reusable group theory library

Johan Commelin (Oct 31 2019 at 09:20):

And whether the group operation would be bundled or not, I don't know. But having it unbundled would mean that you can simply call the group op mul or add and write mul x (mul y z) which is just as readable as x `*g[` G `]` ( `y `*g[` G `]` z)

Mario Carneiro (Oct 31 2019 at 09:20):

Well, one thing that metamath had that mm0 doesn't is the ability to apply a binary operation using the notation (a op b)

Mario Carneiro (Oct 31 2019 at 09:20):

where op is a variable

Mario Carneiro (Oct 31 2019 at 09:21):

So you could just have a variable called + and it would all work nicely

Giovanni Mascellani (Oct 31 2019 at 09:22):

Well, you can have this if you want, can't you? You just have to use a character different than (, because that's already taken to override priority.

Mario Carneiro (Oct 31 2019 at 09:22):

But you could still define @(a op b) or something like that

Johan Commelin (Oct 31 2019 at 09:23):

Do you have namespacing?

Mario Carneiro (Oct 31 2019 at 09:23):

Mario Carneiro (Oct 31 2019 at 09:24):

again, that could be implemented at mm1 level

Mario Carneiro (Oct 31 2019 at 09:24):

I'm taking a lot of cues from C here

Johan Commelin (Oct 31 2019 at 09:24):

Right, but that means that if I claim def @, then all the others will have to deal with it, and can't reuse that "trick"

Johan Commelin (Oct 31 2019 at 09:25):

You can't have a temporary def with a short name, because it pollutes the global library

Mario Carneiro (Oct 31 2019 at 09:25):

An MM0 file is usually built as one "unit", so you have to sort out all the global constraints

Mario Carneiro (Oct 31 2019 at 09:26):

The hope is that this is fairly limited because you only care about a few theorems

Mario Carneiro (Oct 31 2019 at 09:27):

For actual proving work, you want MM1, which can add a bunch of namespacing, unicode, and whatever else, for your local theorems

Johan Commelin (Oct 31 2019 at 09:27):

So you are saying that I shouldn't be interested in group.mm0 but only in group.mm1 and possibly classification_fin_simp_grp.mm0.

Mario Carneiro (Oct 31 2019 at 09:27):

right

Juho Kupiainen (Nov 02 2019 at 19:19):

@Mario Carneiro Are there plans to have a metamath -> mm0 translator?

Bryan Gin-ge Chen (Nov 02 2019 at 19:20):

Isn't that what mm0-hs from-mm does?

Simon Cruanes (Nov 04 2019 at 04:49):

@Mario Carneiro do you have plans / vague ideas for the following?
- multiple file developments (one mm0 importing another mm0, hopefully not with a #include like mechanism, but rather some proper namespacing)
- multiple independent checkers for mm0/mmu/mmb (run automatically in CI, like in metamath, that's super neat)
- multiple interactive frontends (maybe specialized for different logics, like one for CIC, one for HOL, one for TLA… etc.), as it seems to me that interop between systems/teams should be on the mm0 level rather than one more complex, more specialized formats such as mm1

Mario Carneiro (Nov 04 2019 at 04:50):

- multiple file developments (one mm0 importing another mm0, hopefully not with a #include like mechanism, but rather some proper namespacing)

Explicitly no. This is implemented for MM1, but this is a huge complexity increase even there, and it also adds a lot of filesystem stuff to the trusted base

Mario Carneiro (Nov 04 2019 at 04:51):

- multiple independent checkers for mm0/mmu/mmb (run automatically in CI, like in metamath, that's super neat)

Yes. There are already multiple checkers, written as I need them, and the spec is intended to be implementation agnostic as with metamath. Besides checkers written by me, I believe Giovanni Mascellani started working on a python checker but I don't know what the progress is on that

Mario Carneiro (Nov 04 2019 at 04:53):

- multiple interactive frontends (maybe specialized for different logics, like one for CIC, one for HOL, one for TLA… etc.), as it seems to me that interop between systems/teams should be on the mm0 level rather than one more complex, more specialized formats such as mm1

That's right. I have vague plans about this, but this is really my dream for the future - I think this is the best way to get a really high quality front end on a metamath like backend

Mario Carneiro (Nov 04 2019 at 04:55):

I should qualify the MM0 importing thing by saying that I am actually segmenting the mm0 files on the repo using an include mechanism, but this is only recognized by the MM1 compiler and is intended as a pseudo feature so you can assemble MM0 files (by concatenation) for use by real MM0 verifiers

Simon Cruanes (Nov 04 2019 at 05:02):

Interesting. I was thinking of mm0/mmb files as a kind of compiled version with a stable ABI (like a .o file, in a way). Splitting large libraries into many such files (pairs of files, really) would then make incremental checking easy, and also to have different high-level tools tackle different parts of the library (say, a frontend with a omega-like tactic in a part, a frontend by another team with Gröbner basis working on another part). That would avoid relying too much on a single high level tool, imho.

You could still have something that takes all the mm0/mmu files and concatenates them, in topological order, into a single file, for checking/releasing (kind of like single-file libraries in C such as sqlite).

Mario Carneiro (Nov 04 2019 at 05:38):

They are a lot like .o files, but they are still self contained

Mario Carneiro (Nov 04 2019 at 05:39):

You can conceivably combine a number of .mmb files together, where one proves theorem X using axiom Y, and the other proves theorem Y from axiom Z

Mario Carneiro (Nov 04 2019 at 05:41):

@Simon Cruanes There isn't really much advantage to having the format itself note this "splittability" since it is basically a metatheoretic property of proofs that can be implemented by an untrusted transformer (i.e. a "linker" for proof files)

Mario Carneiro (Nov 04 2019 at 05:45):

If you think about how .o files work, they present "unresolved symbols" that are linked with other object files. It is unsafe to present these external symbols without declaring their types (and definitions, if examinable), but if you do that, it's not much different from having axiom and def declarations for these. OpenTheory also does something similar; there is no distinction made between an "axiom" as in something true according to the ambient foundation and an "axiom" meaning a theorem that has been proven elsewhere that will later be linked in

Mario Carneiro (Nov 04 2019 at 05:50):

also to have different high-level tools tackle different parts of the library (say, a frontend with a omega-like tactic in a part, a frontend by another team with Gröbner basis working on another part). That would avoid relying too much on a single high level tool, imho.

Indeed I've been thinking about this a bit. It would be nice to have a heterogeneous proof construction approach, where different front ends are used for building different parts of the library that are later linked together using a common proof substrate. For instance, I'm writing an x86 compiler which is MM0 proof producing, but the front end for that doesn't really have the same style as the MM1 format. It is a lot more automatic, and the user is mostly providing hints along the way (assertion-decorated code), by comparison to regular theorems where you want lots of unification and proof search tactics

Simon Cruanes (Nov 04 2019 at 15:27):

Indeed I've been thinking about this a bit. It would be nice to have a heterogeneous proof construction approach, where different front ends are used for building different parts of the library that are later linked together using a common proof substrate.

Other use cases for having heterogeneous proof construction that immediately come to me:
- custom decision procedures as separate tools (omega.exe? mathematica bridge? :grinning_face_with_smiling_eyes:)
- hammers (calling to Z3, CVC4, E and reconstruct a proof — no need to burden the main interactive prover with that)
- experimenting with alternative tactic languages (as is the case in Coq these days, just how many tactic languages do they have…)

Simon Cruanes (Nov 04 2019 at 15:30):

If you think about how .o files work, they present "unresolved symbols" that are linked with other object files.

Pushing the analogy a bit too far, .o files have big shortcomings too. They don't specify a dependency order, nor do they protect against duplicate symbols.

I think maybe just declaring a module foo; as the first statement of a .mm0 file, and its dependencies just after that (import bar;) without roping in the exact filesystem details on how to import bar, could be a lightweight solution?

Juho Kupiainen (Aug 08 2021 at 10:09):

@Mario Carneiro Do the mm0-c, mm0-hs and mm0-rs verify mmu and mmb proofs in polynomial time?

Johan Commelin (Aug 08 2021 at 10:10):

They verify proofs in constant time. Namely < 0.2 s. :stuck_out_tongue_wink:

Mario Carneiro (Aug 08 2021 at 16:16):

@Juho Kupiainen I work out the asymptotic complexity of an ideal mm0 verifier (and mm0-c is the one that implements all the tricks) in the MM0 paper. It is O(n^2) worst case, but that worst case requires that a single theorem statement take up half of the entire proof content. In the common case where theorem statements are bounded by k, it takes O(nk) time, so roughly linear time.

The other verifiers mm0-rs and mm0-hs do more things and aren't necessarily implemented optimally, in part because they also run tactics, do proof deduplication in order to create mmb files, and generally need a different data structure to do proof construction. Deduplication is the really important optimization, mm0-rs should still do it in most places, but maybe not all, and I know mm0-hs doesn't. That is, you can construct a term with a high duplication like ((x + x) + (x + x)) + ((x + x) + (x + x)) (imagine a complete binary tree like this of depth 60), and this is considered a "small" term because it only has 60 subterms, but processing this along the tree structure would take 2^60 steps.

Metamath does deduplication of proofs, but not of lemma statements: the way it is defined requires that verifiers actually construct the full list of symbols for each step. That means that it is subject to this worst case example, for which metamath takes exponential time. I believe there is a more complex example using another of metamath's features that takes double exponential time, but I forget the details. However, in case it wasn't obvious, this is an extremely contrived counterexample, and for the most part metamath is also linear time and these optimizations give mm0 only a small boost (from 0.9 s -> 0.2 s).

Do you know what the case is with the Metamath verifier? You mention in the mm0 github README that it's possible to prove anything in Metamath because the proof checker doesn't check everything properly. Did I understand that correctly? Is it possible to check for the things that should be checked? Is the problem in the language or in the verifier?

That's not quite what I mean (and I hope that's not what I said). Metamath's set.mm database is sound to the best of my knowledge, but because definitions are added as axioms, they have to adhere to some simple rules to ensure that they meet the usual standard for definition introduction in a logical system, which I coded as a "definition checker" that is currently run as part of CI. An example of a definition that fails the check is liar := ¬ liar. So the definition checker is an external application that is also critical for soundness, and it is this fact that motivated me to migrate it into the kernel in mm0.

Another aspect that is critical for the current proof of soundness but has to be checked externally is the unambiguity of the set.mm grammar. In order to minimize the distance between the textual representation of theorems and the proof checker's understanding of the statement, metamath theorems are represented as strings of tokens, which is an unusual choice for a theorem prover. This means that it is critical that the mapping from strings of tokens to trees has to be injective, which is an undecidable property of the grammar. In the case of set.mm, this property happens to be true, because there is a manual proof of it. That proof was later mechanized into a parser that constructs a parse table for the grammar, whose existence is a witness to the unambiguity of the grammar. This parse table construction is also run in CI. Again, MM0 wants to dodge this complication so it has a precedence parser built in to the verifier and its workings are part of the specification.

Last updated: May 02 2025 at 03:31 UTC