Zulip Chat Archive

Stream: general

Topic: Making Lean more accessible with Sophize

Abhishek Chugh (Sep 30 2020 at 16:21):

Hello! I would like to introduce the Lean community to a Mathematics Library that I have been working on for over a year.
Sophize is a non-profit online mathematics library and academic discussion platform. We are trying to create a deeply connected network of mathematical knowledge from a variety of sources - including formal systems such as lean.

As a part of this project, we are incorporating knowledge from formal systems and making them more accessible. The following short video gives an overview of our work with the Metamath language.
https://youtu.be/XNJ2w71Gi50

Abhishek Chugh (Sep 30 2020 at 16:27):

We would like to incorporate the wealth of knowledge you have created into our platform and extend our services to the Lean community. We can help make Lean more accessible to the broader mathematics community. Our broader goals for mathematics are outlined here:
https://www.youtube.com/watch?v=Wb1JbW9Otek&list=PLdfnRS1Iea0219QBYU_jF9D8F9VPUIq1z

I am looking for feedback from the community to help focus on what the community finds most valuable

Abhishek Chugh (Sep 30 2020 at 16:31):

Would you folks be generous to fill-in this short form so that we can better understand the community's views?
https://forms.gle/WSYc44wCadxRUyPe8

Yury G. Kudryashov (Sep 30 2020 at 16:54):

Some people (e.g., me) prefer reading text to watching videos.

Abhishek Chugh (Sep 30 2020 at 16:55):

Ah, I see. Thanks for letting me know.

Abhishek Chugh (Sep 30 2020 at 16:57):

So the basic idea - for Lean - would be to incorporate all the proofs and theorems that have been developed by the community. We provide very simple and intuutive to look up the meanings of every term/proposition in the proof.

Abhishek Chugh (Sep 30 2020 at 16:58):

Secondly, we organize the proofs in the form of a graph (built of individual arguments) where you can keep browsing the proof as much as you like (till you reach the axioms)

Yury G. Kudryashov (Sep 30 2020 at 16:59):

Have you already incorporated theorems from some other library of formal proofs? If yes, could you please post a link to your presentation of the library?

Yury G. Kudryashov (Sep 30 2020 at 17:00):

Do you want to deal with the proof tree only, or with the tactic mode source?

Abhishek Chugh (Sep 30 2020 at 17:00):

Thirdly, we allow you to run proof -generating code on our servers that whose output gets displayed on the fly on the website.

Yury G. Kudryashov (Sep 30 2020 at 17:01):

In either case, do you have some way of separating "important" dependencies from inevitable low-level ones? E.g., every rw or simp injects lots of boilerplate code in the proof tree.

Abhishek Chugh (Sep 30 2020 at 17:02):

Yury G. Kudryashov said:

Have you already incorporated theorems from some other library of formal proofs? If yes, could you please post a link to your presentation of the library?

Yes, we have done so for the Metamath language. The video (<4min) shows what we have done. The link is at https://sophize.org/s/metamath

Abhishek Chugh (Sep 30 2020 at 17:03):

Yury G. Kudryashov said:

In either case, do you have some way of separating "important" dependencies from inevitable low-level ones? E.g., every rw or simp injects lots of boilerplate code in the proof tree.

Since we have full control over the entire process, we can do such things easily. Each language needs special support and we will handle such things

Abhishek Chugh (Sep 30 2020 at 17:05):

Yury G. Kudryashov said:

Do you want to deal with the proof tree only, or with the tactic mode source?

I was thinking about the same and it doesn't look like things are very useful without the tactic mode support. So we are thinking of focusing more on that rather than the unwrapped tree.

Johan Commelin (Sep 30 2020 at 17:06):

But it will be a lot harder to get the data you want, in that case.

Johan Commelin (Sep 30 2020 at 17:07):

You can ask lean for the proof tree, but not for the tactic script that generated it. So you will need to parse the lean source code. That's a non-trivial thing to do.

Yury G. Kudryashov (Sep 30 2020 at 17:07):

Or you can ask the lean server info at each symbol.

Abhishek Chugh (Sep 30 2020 at 17:08):

I am assuming that there Lean already has a parser - in C/C++, am I right?

Abhishek Chugh (Sep 30 2020 at 17:08):

I went through the same process with metamath (although it was probably an order of magnitude simpler I suppose)

Yury G. Kudryashov (Sep 30 2020 at 17:08):

AFAIR, someone posted highlighted sources obtained by asking the lean server about each symbol.

Abhishek Chugh (Sep 30 2020 at 17:09):

That would be pretty great. Would simplify a lot of things

Abhishek Chugh (Sep 30 2020 at 17:15):

So my main focus for this thread is to understand these sorts of things - which are the biggest pain-points/opportunities in terms of

presentation of formal Lean proofs
sharing your proofs with descriptions - with the world to explain what you have done
making people understand how rigorous your work is
other opportunities to make formal mathematics popular/accessible

Abhishek Chugh (Sep 30 2020 at 17:21):

Yury G. Kudryashov said:

AFAIR, someone posted highlighted sources obtained by asking the lean server about each symbol.

Can you give me a hint about where I could find this?

Eric Wieser (Sep 30 2020 at 17:26):

You may be interested in https://www.imo.universite-paris-saclay.fr/~pmassot/lean/, which looks a bit like one of your "smart articles", and is generated from source code with comments

Edit: this is generated with https://leanprover-community.github.io/format_lean/, which is not actively maintained.

Abhishek Chugh (Sep 30 2020 at 17:27):

This is great. They must have gone through a very similar process to create this. Thanks Eric

Pedro Minicz (Sep 30 2020 at 17:29):

Cool. I've stumbled upon your channel yesterday while searching about Metamath.

Abhishek Chugh (Sep 30 2020 at 17:30):

Oh. Great! I guess you could be our first 'organic' viewer

Pedro Minicz (Sep 30 2020 at 17:32):

Abhishek Chugh said:

I went through the same process with metamath (although it was probably an order of magnitude simpler I suppose)

I believe @Mario Carneiro is going to be the best person to access how hard that is going to be. But from what I've seen it seems that the process will be more than an order of magnitude harder in Lean.

Mario Carneiro (Sep 30 2020 at 17:33):

Yep, it's basically never been done before

Mario Carneiro (Sep 30 2020 at 17:33):

The only practical way to get information about parsed lean source is to hack lean directly

Abhishek Chugh (Sep 30 2020 at 17:34):

Hey Mario, great to see you here.

Mario Carneiro (Sep 30 2020 at 17:34):

or find a way to get what you need through the lean API

Mario Carneiro (Sep 30 2020 at 17:34):

also hello again

Abhishek Chugh (Sep 30 2020 at 17:35):

I see. Can you give me a hint why the lean parser would not have the information I am looking for?

Abhishek Chugh (Sep 30 2020 at 17:37):

Is it different from the regular language parsers (some complex stages etc.) or is it that the parsed data is going to be very hard to make sense of?

Abhishek Chugh (Sep 30 2020 at 17:48):

So, what I am getting from here that displaying a tactic based proof - which hides (or reduces visibility) of some of the most repeated tactics would be the best way to present proofs.
And that hacking into the parser would be my best bet to get this information.

Abhishek Chugh (Sep 30 2020 at 17:51):

could I get some feedback on whether actually doing this would be helpful to the community. Or reading/sharing proofs is quite easy for you already?

Yury G. Kudryashov (Sep 30 2020 at 18:11):

Reading proofs is relatively easy for experienced users but it is not that easy for new users.

Pedro Minicz (Sep 30 2020 at 18:12):

I'd also argue that some (many?) proofs on mathlib were not made to be read.

Pedro Minicz (Sep 30 2020 at 18:12):

At least I find small term proofs harder to read than bigger (and slower to check) tactic mode proofs.

Yury G. Kudryashov (Sep 30 2020 at 18:15):

Usually a short term mode proof proves something trivial.

Abhishek Chugh (Sep 30 2020 at 18:15):

I see.

Yury G. Kudryashov (Sep 30 2020 at 18:16):

And you can reconstruct the proof in your mind from the list of theorems used in this line.

Yury G. Kudryashov (Sep 30 2020 at 18:16):

(says a person who often writes proofs of "trivial" results as term mode one-liners)

Bryan Gin-ge Chen (Sep 30 2020 at 18:17):

It really depends what you mean by "read". I don't think it's a good use of time for humans to try to type-check Lean proofs, but certainly it's not bad practice and not too difficult with experience to skim over a proof that Lean accepts and get a sense of whether it's an efficient / "good" proof.

Abhishek Chugh (Sep 30 2020 at 18:20):

Okay. So most of the time is spent creating proofs. Folks don't come back often to look important proofs to admire/learn something from them?

Pedro Minicz (Sep 30 2020 at 18:20):

Well, at least I do sometimes. Following a Lean proof is as convenient, if not more so something, as reading a demonstration from a textbook.

Yury G. Kudryashov (Sep 30 2020 at 18:22):

Sometimes we refactor old proofs, e.g., to get an intermediate lemma out of it, or to slightly generalize the statement, or to make the proof more readable.

Yury G. Kudryashov (Sep 30 2020 at 18:22):

Or if we want to change the definition.

Pedro Minicz (Sep 30 2020 at 18:22):

I agree that most term-mode one-liners are "trivial." But it would be nice to have mathlib's source itself be a tool for learning.

Pedro Minicz (Sep 30 2020 at 18:23):

I believe it already kind of is, it just clearly wasn't made of it.

Mario Carneiro (Sep 30 2020 at 18:37):

Abhishek Chugh said:

Is it different from the regular language parsers (some complex stages etc.) or is it that the parsed data is going to be very hard to make sense of?

The lean parser is very extensible (some might say too extensible), and lean 4 will be even more so. You can't parse lean code without also interpreting lean code, which requires that you also know the whole environment, have a VM implementation, and a typechecker - that is, you must be lean

Abhishek Chugh (Sep 30 2020 at 18:37):

Thanks for the help guys. I think I am getting a feel of things here. One last thing I would like to know:
Has anyone made automatic proof-generators for even simple input statement like "6*9+7=61" or "d (3 sin x)/dx = 3 cos x"?
Seems to me like having a formally verified proofs for things like this could be useful in critical applications (verification/security). And I suppose strongly verified easily accessible knowledge might be one of the goals of projects like Lean?

Mario Carneiro (Sep 30 2020 at 18:39):

For 6*9+7=61, yes, this is the norm_num tactic (which incidentally is almost the same tactic as the one I wrote for metamath)

Mario Carneiro (Sep 30 2020 at 18:39):

although for something that small you can probably just use rfl too

Sebastian Reichelt (Sep 30 2020 at 18:40):

This might be too much of a tangent, but one thing I would really like to see is a service that properly connects different libraries and also informal references. Like OEIS but with a much broader scope, or https://www.cs.ru.nl/~freek/100/ extended to all definitions and theorems. I know there are some projects with this goal, but they tend to get lost trying to solve interesting research problems around translating between provers or between formal and informal math. I'm imagining something much simpler.
(Like OEIS, in the long term it might actually work better in the reverse direction: There is a central service which essentially provides unique IDs, and others - in this case theorem prover libraries - reference these IDs.)

Abhishek Chugh (Sep 30 2020 at 18:40):

Mario Carneiro said:

Abhishek Chugh said:

Is it different from the regular language parsers (some complex stages etc.) or is it that the parsed data is going to be very hard to make sense of?

The lean parser is very extensible (some might say too extensible), and lean 4 will be even more so. You can't parse lean code without also interpreting lean code, which requires that you also know the whole environment, have a VM implementation, and a typechecker - that is, you must be lean

Thanks, Mario. I will start digging into the parser/API/LeanCrawler to understand this.

Mario Carneiro (Sep 30 2020 at 18:41):

import tactic.norm_num

example : 6*9+7=61 := by norm_num
example : 6*9+7=61 := rfl

Mario Carneiro (Sep 30 2020 at 18:41):

For taking derivatives, I think there is a tactic but I haven't been involved in that area. @Yury G. Kudryashov might know

Abhishek Chugh (Sep 30 2020 at 18:43):

That's pretty convenient!

Mario Carneiro (Sep 30 2020 at 18:44):

@Sebastian Reichelt This sounds a lot like what Michael Kohlhase's work on OMDoc and such. I think they want URIs for everything, dunno how not-vaporware it all is though

Bryan Gin-ge Chen (Sep 30 2020 at 18:45):

simp is able to do a bunch of derivatives:

import tactic
import analysis.special_functions.trigonometric
open real
#simp [exp_ne_zero] : (λ x, deriv (λ x, (sin x) / (exp x)) x)
-- λ (x : ℝ), (cos x * exp x - sin x * exp x) / exp x ^ 2

Abhishek Chugh (Sep 30 2020 at 18:49):

Sebastian Reichelt said:

This might be too much of a tangent, but one thing I would really like to see is a service that properly connects different libraries and also informal references. Like OEIS but with a much broader scope, or https://www.cs.ru.nl/~freek/100/ extended to all definitions and theorems. I know there are some projects with this goal, but they tend to get lost trying to solve interesting research problems around translating between provers or between formal and informal math. I'm imagining something much simpler.
(Like OEIS, in the long term it might actually work better in the reverse direction: There is a central service which essentially provides unique IDs, and others - in this case theorem prover libraries - reference these IDs.)

I have thought about this for some time and I came to very similar conclusions as yours. Mathematicians are quite interested in finding the underlying translation mechanisms where theories can be transformed (semi/fully) automatically. But they seem to forget that having simple relationships - even if not perfect - at first would be pretty useful. We have a system that can do exactly that - we have unique IDs for each term/propositions/proofs across various datasets/systems.

Abhishek Chugh (Sep 30 2020 at 18:52):

We have a lot of pretty scalable foundations setup - using technologies that power things like Instagram and stuff. But I am finding it pretty hard to convince people to realize that a well-engineered software platform for Mathematics is worth investing into :neutral:

Abhishek Chugh (Sep 30 2020 at 18:55):

Even though math researchers themselves have published reports saying that there is a need for it (https://arxiv.org/pdf/1404.1905.pdf), the incentive structures in universities make it pretty hard

Abhishek Chugh (Sep 30 2020 at 18:58):

Bryan Gin-ge Chen said:

simp is able to do a bunch of derivatives:

import tactic
import analysis.special_functions.trigonometric
open real
#simp [exp_ne_zero] : (λ x, deriv (λ x, (sin x) / (exp x)) x)
-- λ (x : ℝ), (cos x * exp x - sin x * exp x) / exp x ^ 2

Lean makes this so simple. Getting this in metamath needed so many lines of code.

Mario Carneiro (Sep 30 2020 at 19:06):

that's because the lean library includes tactics (proof producing programs) and annotations to help those programs work effectively, unlike metamath, which only collects a library of formal theorems and leaves automation to the user

Abhishek Chugh (Sep 30 2020 at 19:07):

MM1 is gonna solve that :)

Mario Carneiro (Sep 30 2020 at 19:07):

I have done some minor work to add annotations to set.mm to help mmj2 in some respects, but there is essentially no support for this sort of thing out of the box. MM1 is indeed trying to tackle this problem head on, with a lean inspired tactic framework on a metamath like foundation

Last updated: May 02 2025 at 03:31 UTC