Zulip Chat Archive

Stream: general

Topic: Import of Metamath theorems into Lean


Hunter Monroe (Apr 20 2021 at 22:47):

I have manually imported 78 Metamath theorems regarding tautologies into Lean in the attached file, and would like to translate the remaining over 23K theorems automatically for addition to mathlib. The automated translation might rely on Mario's MM0 tools (see https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Prime.20number.20theorem.20in.20lean/near/168472336), modified to include documentation and otherwise make proofs intelligible. You will notice in the attached that the theorems name are cryptic, but the documentation is excellent. Some further work would be needed to show that Lean's and Metamath's natural numbers (for example) are isomorphic, so we can say that Lean has a proof of the Prime Number Theorem (see Mario's work on this https://github.com/digama0/mm0/blob/build/mm0-lean/mm0/set/post.lean). I would appreciate your feedback: (1) would you support inclusion of the attached file with 78 theorems and later the complete file with over 23K theorems; (2) how should I revise the file, for instance to avoid tactics, as a template for the automated translation; and (3) would this be a desirable first step towards importing theorems from other proof assistants such as Isabelle and Mizar (a notoriously difficult task)? set_mm.lean

Eric Wieser (Apr 20 2021 at 23:32):

Without wanting to comment on the idea as a whole as I'm not familiar with Metamath:

The theorem names below are identical to those in set.mm, except for theorem names beginning
with a number (for example 2a1i) or containing a period (for example, pm2.18d) or a theorem name
colliding with a Lean theorem

On the assumption you don't care about readable names, you can use «pm2.18d» and «2a1i» if you want a leading number in your lemma name «» is special syntax for unusual identifiers that has language support.

Mario Carneiro (Apr 20 2021 at 23:45):

Metamath has a very different naming convention than mathlib, so theorems intended to be directly used by mathlib users should be provided under a different name

Mario Carneiro (Apr 20 2021 at 23:46):

theorems ported directly from metamath should probably retain their original names inside a namespace, especially if the port is fully automatic

Mario Carneiro (Apr 20 2021 at 23:54):

(1) would you support inclusion of the attached file with 78 theorems and later the complete file with over 23K theorems;

Those are two very different questions. For the first part, I don't think it is worthwhile to simply copy these theorems and associated documentation without an intended application. For the second part I think it would be best suited as a separate lean project, which can be used as a dependency like mathlib.

(2) how should I revise the file, for instance to avoid tactics, as a template for the automated translation;

Much like the current mathport project, this involves tweaking the heuristics of the translator itself, while continuously using an up to date version of set.mm. If you manually port the file there is no way you will be able to keep up with upstream changes.

(3) would this be a desirable first step towards importing theorems from other proof assistants such as Isabelle and Mizar (a notoriously difficult task)?

Other proof languages have other problems. I have investigated isabelle to some extent but not much on mizar, and certainly nothing like a full port exists. To do this you have to understand the breadth of what the source system can express and how the target system (lean) can model it. Plus there are a bunch of technical issues to getting proofs out of those systems. Metamath is comparatively simple, which is part of why the translation you mention exists.

Johan Commelin (Apr 21 2021 at 05:48):

@Hunter Monroe if you make this port than (for example) you will be creating a new copy of the real numbers, right? And that one will not automatically play nice with the existing copy of the reals in mathlib. This seems like a huge issue to me. One that probably requires manual intervention for almost every definition. (You need to line up the mm copy of nat, int, rat, real, complex, group, ring, vector_space, topological_space, metric_space, continuous, etc... the list goes on and on.)
One of the massive strengths of mathlib is that it is a monolith, with very tight compatibility between almost all theories. Compare this to some of the other systems, that have multiple topology libraries, multiple algebra libraries, etc... and all those specialized libraries don't work together nicely all the time.

Mario Carneiro (Apr 21 2021 at 06:54):

That sounds a bit like "not invented here" mentality though. If our library is so good it should be possible to prove all those isomorphisms you mention

Mario Carneiro (Apr 21 2021 at 06:55):

There's a bunch of math that was written in systems other than lean too

Johan Commelin (Apr 21 2021 at 07:12):

Hmm, I don't think what I'm saying is NIH. Those isomorphisms can be proven. But my point is that adding a huge port to mathlib without proving all those isoms will lead to fragmentation. And I think fragmentation is very bad in the long run. We'll never be able to state the Hodge conjecture if our algebra, topology, analysis, and geometry libraries don't play very well together. Because you need a boatload of stuff from all those areas.

Scott Morrison (Apr 21 2021 at 07:15):

Yes. So presumably the right approach is that there is a big blob of automatically translated stuff, that never gets tweaked by hand, except insofar as modifying a list of alignments that have been made. There's also a bunch of hand-written alignment code. Presumably the part that can be committed to mathlib is the alignments + the fraction of the automatically translated stuff that has been aligned + the scripts that do the automatic translation on the source library, so that it's possible to bump as needed.

Scott Morrison (Apr 21 2021 at 07:16):

(Perhaps those scripts would read some configuration file that specified what is considered aligned by now. They then translate everything, and discard the non-aligned parts.)

Mario Carneiro (Apr 21 2021 at 08:44):

I don't know about discarding the non-aligned parts, since alignment only happens if people write the alignments, which is a lot easier if the unaligned theorem is already available without having to re-run the porting tool

Scott Morrison (Apr 21 2021 at 08:46):

Sure! Maybe it makes sense to try to split into two different directories or something that indicates the boundary between aligned and non-aligned.

Mario Carneiro (Apr 21 2021 at 08:49):

I don't think they come organized like that though. Suppose we have basic logical connectives aligned and nothing else; there are basic logic theorems littered throughout the metamath library (although there is a big concentration at the beginning)

Mario Carneiro (Apr 21 2021 at 08:50):

in particular because the second half of set.mm is a collection of "mathboxes" aka mini-libraries maintained by their authors

Eric Wieser (Apr 21 2021 at 08:50):

What's the motivation behind the single-file structure of set.mm?

Johan Commelin (Apr 21 2021 at 08:51):

That 2 seconds is so small that splitting the file isn't necessary (-;

Eric Wieser (Apr 21 2021 at 08:51):

Is it produced by a build tool that concatenated smaller pieces?

Mario Carneiro (Apr 21 2021 at 08:51):

Norm cites ease of various global file editing operations

Mario Carneiro (Apr 21 2021 at 08:51):

There are markings in the file that let you split and concatenate it if you want

Mario Carneiro (Apr 21 2021 at 08:52):

You need an editor with large file support but otherwise it has never been a big deal

Mario Carneiro (Apr 21 2021 at 08:52):

There is of course plenty of organizational structure in the file, see http://us.metamath.org/mpeuni/mmtheorems.html

Eric Wieser (Apr 21 2021 at 08:53):

Ah, I think I was expecting the level of structure mathlib has within files

Mario Carneiro (Apr 21 2021 at 08:54):

One thing that makes this distinctly different from lean's approach is that set.mm is linearly laid out, it's not a dag

Eric Wieser (Apr 21 2021 at 08:54):

That is, not much at all

Mario Carneiro (Apr 21 2021 at 08:54):

It makes it a lot easier to see what parts of the library are "early" and what order dependencies go

Eric Wieser (Apr 21 2021 at 08:54):

It still behave as a dag presumably?

Eric Wieser (Apr 21 2021 at 08:55):

At a lemma level

Mario Carneiro (Apr 21 2021 at 08:55):

The theorem references form a dag, of course, but they come with a predefined topological sort

Mario Carneiro (Apr 21 2021 at 08:56):

If metamath took a lot longer to compile this might be more annoying

Johan Commelin (Apr 21 2021 at 08:59):

I guess once set.mm starts taking > 10s to compile, everyone will start worrying, and thinking about porting to MM0 :rofl:


Last updated: Dec 20 2023 at 11:08 UTC