Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Lean client for Python

Joe Palermo (S2'17) (Mar 29 2021 at 20:17):

@Jason Rute Thanks for your detailed commentary on my queries here: https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/Inferring.20Theorem.20from.20a.20Proof.20Term.3F

I'd love to try some simple experiments in generating theorems by leveraging the Lean Python client you referenced (https://github.com/leanprover-community/lean-client-python).

Can you please offer some advice on how to set it up? I would imagine one has to startup a Lean server (just run "lean --server"?), then hook the Lean client up to it somehow before one can start sending requests/responses back and forth.

Also I'm happy to help make improvements if I can. Cheers!

Jason Rute (Mar 30 2021 at 12:57):

@Joe Palermo (S2'17) The documentation and examples could surely be improved, but the main idea is this:

Install the trio Lean-Python server as specified using pip (if you like virtual environments, then by all mean use those).
Start with the trio_example.py script. It should run as long as you have a global Lean installation (check by running lean --version from wherever you plan to run the script). You don't need to start the Lean server ahead of time. The script does that for you.
If you are not familiar with async programming in Python, the basic rules are as follows: (1) Every function which is defined with an async needs to be called with an await in front. Every function which has an await in it has to be defined with an async. (This is a simplification, but probably enough to get you started.)
Once you get familiar with the example, then set up a Lean project directory. This way, you have full control of your Lean setup.
Make your own python scripts and play around.
Note that the .lean file you put into full_sync doesn't have to exist. You can add in an optional content keyword, and put in the contents of a nonexistent file. For example: await server.full_sync('dummy.lean', content="#eval 1+1\n#eval 2+2")
To learn more about the interface, look at the commands.py. Although, for most needs all you need is the info command.

Joe Palermo (S2'17) (Mar 30 2021 at 21:20):

Thanks, I'll play with it and see how it goes

Jason Rute (Mar 30 2021 at 23:25):

Don't hesitate to reach out if you have any questions. Also, if you are still having trouble, I could probably come up with a MWE for your use case after I ask you a few more questions.

Joe Palermo (S2'17) (Mar 31 2021 at 19:28):

Thanks Jason - I really appreciate it.

After setting up a conda env and installing with setup.py, I ran python trio_example.py from lean-client-python/examples. I get this exception:

```(lean-client-python) ~/workspace/math/lean/lean-client-python/examples(master):0 python trio_example.py
Traceback (most recent call last):
File "trio_example.py", line 28, in <module>
trio.run(main)
File "/Users/jpalermo/.conda/envs/lean-client-python/lib/python3.7/site-packages/trio/_core/_run.py", line 1932, in run
raise runner.main_task_outcome.error
File "trio_example.py", line 25, in main
nursery.cancel_scope.cancel()
File "/Users/jpalermo/.conda/envs/lean-client-python/lib/python3.7/site-packages/trio/_core/_run.py", line 815, in __aexit__
raise combined_error_from_nursery
File "/Users/jpalermo/workspace/math/lean/lean-client-python/src/lean_client/trio_server.py", line 73, in receiver
resp = parse_response(line.decode())
File "/Users/jpalermo/workspace/math/lean/lean-client-python/src/lean_client/commands.py", line 379, in parse_response
return InfoResponse.from_dict(dic)
File "/Users/jpalermo/workspace/math/lean/lean-client-python/src/lean_client/commands.py", line 199, in from_dict
dic['record'] = InfoRecord.from_dict(dic.pop('record'))
File "/Users/jpalermo/workspace/math/lean/lean-client-python/src/lean_client/commands.py", line 190, in from_dict
return cls(**dic)
TypeError: __init__() got an unexpected keyword argument 'widget'

Julian Berman (Mar 31 2021 at 19:36):

I don't think master works, last I recall. There are PRs I think fixing it, one from me and one more extensive one I think from Jason himself.

Joe Palermo (S2'17) (Mar 31 2021 at 19:37):

Oh I see. Do you know which branch I should try?

Julian Berman (Mar 31 2021 at 19:37):

One second, let me remind myself.

Julian Berman (Mar 31 2021 at 19:39):

so https://github.com/leanprover-community/lean-client-python/pull/18 I think works, which was mine. Maybe I should reopen that considering the more extensive refactor wasn't merged yet, but Jason I presume will chime in with what he plans.

Julian Berman (Mar 31 2021 at 19:40):

But that branch I believe will run, let me confirm...

Julian Berman (Mar 31 2021 at 19:42):

Yeah. That branch runs.

Joe Palermo (S2'17) (Mar 31 2021 at 19:43):

Great! Thanks

Joe Palermo (S2'17) (Mar 31 2021 at 20:49):

I've confirmed it runs.

@Jason Rute I'm curious to know which branch you'd suggest working from going forward

Joe Palermo (S2'17) (Mar 31 2021 at 21:11):

If execution terminates before the end of the file and the final "State After" comes back empty I'm guessing that means there was an error?

For example I tried running this through it:

import data.nat.factorial

def wrapper {α : Sort*} (a : α) : true := trivial

def ex : true :=
begin
  let x : _ := by {apply nat.factorial_pos, exact 7, },
  exact wrapper x,
end

#print ex

Joe Palermo (S2'17) (Mar 31 2021 at 21:14):

it returns

Line 6: begin
State before:
⊢ true

State after:
⊢ true

Line 7:   let x : _ := by {apply nat.factorial_pos, exact 7, },
State before:
⊢ true

State after:


Line 9: end
State before:
⊢ true

State after:

Joe Palermo (S2'17) (Mar 31 2021 at 21:24):

I wonder if perhaps the import is not working correctly

Joe Palermo (S2'17) (Mar 31 2021 at 21:33):

Hmm actually I tried something simpler without imports and also find a similar result

lemma l2 {P : Prop}:
(P ↔ true) → P :=
λ H, iff.dcases_on H (λ H₁ H₂, H₂ trivial)

yields the following logs:

Line 3: λ H, iff.dcases_on H (λ H₁ H₂, H₂ trivial)
State before:
P : Prop
⊢ (P ↔ true) → P

State after:

Bryan Gin-ge Chen (Mar 31 2021 at 21:39):

I haven't played with the Python server, but I think this is expected. If the proof is complete, Lean doesn't return any tactic state info.

Jason Rute (Mar 31 2021 at 23:08):

Julian Berman said:

but Jason I presume will chime in with what he plans.

Ok, I really need to look at where this project is and what it needs to be a working tool (with enough examples and documentation for someone to get started). I'll take a look tonight at where things are. In the meantime @Joe Palermo (S2'17) , if you found a branch that works for you, then use that for now. Also, @Julian Berman, I'm sorry that I scared you away from the project when you expressed interest a few months ago. I'd love to have you help out.

Julian Berman (Mar 31 2021 at 23:12):

@Jason Rute as someone who maintains projects with 20+ stale pull requests I let sit, you have 0 to apologize for, don't worry.

Julian Berman (Mar 31 2021 at 23:13):

I have some evil ideas that may make that Python client way easier to work with

Julian Berman (Mar 31 2021 at 23:13):

Which I can elaborate on at some point later

Julian Berman (Mar 31 2021 at 23:15):

I'd probably recommend starting by looking at the PR I sent that adds CI

Julian Berman (Mar 31 2021 at 23:15):

That at least will help us test if we make changes.

Julian Berman (Mar 31 2021 at 23:16):

(even though it will rightfully fail now on master)

Joe Palermo (S2'17) (Apr 01 2021 at 14:37):

Bryan Gin-ge Chen said:

I haven't played with the Python server, but I think this is expected. If the proof is complete, Lean doesn't return any tactic state info.

Thanks - that makes sense for my second example. But for my first example, it never runs line 8 (exact wrapper x). Perhaps that's not unexpected, but if so I'm not sure why.

Joe Palermo (S2'17) (Apr 01 2021 at 14:38):

@Jason Rute Seems that I don't get back anything if I run:

#check λ {P : Prop} (H : iff P true),
  @iff.dcases_on.{0} P true (λ (_x : iff P true), P) H (λ (H₁ : P → true) (H₂ : true → P), H₂ trivial)

This makes sense as I suppose the client is meant to serve as an interface to the Lean ITP, and there's not any tactic state to report here. What would it take to get this sort of info back (e.g. check, print, etc...)?

Bryan Gin-ge Chen (Apr 01 2021 at 14:43):

Joe Palermo (S2'17) said:

But for my first example, it never runs line 8 (exact wrapper x). Perhaps that's not unexpected, but if so I'm not sure why.

Lean doesn't return any output, but it does run line 8. Otherwise the proof would not be complete. (Maybe I'm misunderstanding you?)

Joe Palermo (S2'17) (Apr 01 2021 at 14:51):

Hmmm in the logs I see Line 7: let x : _ := by {apply nat.factorial_pos, exact 7, },. The part on line 8 (exact wrapper x) is not shown but perhaps it's running that part on line 8 without showing it?

Julian Berman (Apr 01 2021 at 14:56):

@Joe Palermo (S2'17) it may help to show a runnable full example perhaps if it's not too much trouble

Julian Berman (Apr 01 2021 at 14:56):

I'm sure from what you shared it's not hard to assemble one so ignore me if needed :) but yeah if you'd like to share some python code that inlines or runs that lean snippet I'm happy to try to run it as I suspect others may be

Joe Palermo (S2'17) (Apr 01 2021 at 15:02):

@Julian Berman Thanks! Sorry, I'm a little confused which thing you're referring to.

1) My questions about when Lean Python client returns empty tactic state?
2) My attempt to run the #check?

In both cases, I've been running it by pasting the code in a .lean file and then running trio_example.py such that the .lean file gets read in. In fact I didn't make any changes to trio_example.py, I just modified the file content it reads in.

Julian Berman (Apr 01 2021 at 15:05):

ah good! OK yeah that was all I needed (that you were just pasting that in a file and running trio_example.)

Julian Berman (Apr 01 2021 at 15:05):

Let me try.

Julian Berman (Apr 01 2021 at 15:14):

So just on the #check question, you can get the output by doing e.g. print([message.text for message in server.messages])

Julian Berman (Apr 01 2021 at 15:15):

if you want to know how I figured that out, all I did was some print debugging, I put print(data) as the first line in lean_client.commands.parse_result which shows me all the raw responses from the server -- there's probably a better way to do that (get debug info on all messages) but I was lazy

Julian Berman (Apr 01 2021 at 15:16):

let me know if that helps, hopefully it's what you were asking about but if you need any help modifying trio_example.py to do what you're specifically after obviously ask

Julian Berman (Apr 01 2021 at 15:16):

(On the first question almost certainly @Bryan Gin-ge Chen is more in the know)

Bryan Gin-ge Chen (Apr 01 2021 at 15:25):

I still haven't looked at Python server code, but I'm puzzled as to why Line 8 just doesn't show up at all whereas both Line 7 and Line 9 do. Is the script somehow grouping lines 7 and 8 together?

Julian Berman (Apr 01 2021 at 15:28):

So the script only emits output on changes to the state

Bryan Gin-ge Chen (Apr 01 2021 at 15:33):

I see, and I'm guessing it only considers the goal, not any of the context.

Julian Berman (Apr 01 2021 at 15:52):

@Joe Palermo (S2'17) maybe something like this helps you see some possible modifications you're after:

#!/usr/bin/env python
from pathlib import Path
from textwrap import indent
import sys

from lean_client.trio_server import TrioLeanServer
import trio

filename = sys.argv[1]
path = Path(filename)

async def main():
    async with trio.open_nursery() as nursery:
        server = TrioLeanServer(nursery, debug=False)
        await server.start()
        await server.full_sync(filename)

        with path.open() as file:
            for i, line in enumerate(file):
                line = line[:-1]

                before = await server.state(filename, i+1, 0)
                after = await server.state(filename, i+1, len(line))
                if before or after:
                    print(f'Line {i+1}: {line}')
                    print(f'State before:\n{before}\n')
                    print(f'State after:\n{after}\n')

        if server.messages:
            print(
                "Messages:\n\n" + indent(
                    "\n".join(message.text for message in server.messages),
                    prefix=" " * 4,
                ),
            ),

        server.kill()
        nursery.cancel_scope.cancel()

if __name__ == '__main__':
    trio.run(main)

Julian Berman (Apr 01 2021 at 15:52):

put that as a .py file and run python that.py somefile.lean

Julian Berman (Apr 01 2021 at 15:53):

I'm still on a call so hopefully I haven't done something truly dumb there, but when I run that on your example above, I get...

⊙  ../venv/bin/python trio_example.py ~/Development/mathlib/foo.lean                                                                                                              julian@Airm ●
Line 6: begin
State before:
⊢ true

State after:
⊢ true

Line 7:   let x : _ := by {apply nat.factorial_pos, exact 7, },
State before:
⊢ true

State after:


Line 9: end
State before:
⊢ true

State after:


Messages:

    file 'data/nat/factorial' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    invalid import: data.nat.factorial
    could not resolve import: data.nat.factorial
    unknown identifier 'nat.factorial_pos'
    state:
    ⊢ ?m_1
    state:
    2 goals
    ⊢ true

    ⊢ ?m_1
    def ex : true :=
    ⁇

Julian Berman (Apr 01 2021 at 15:53):

which you'll see is broken but that's for fixable reasons (that now I need to tell lean and/or my python file where to find mathlib)

Julian Berman (Apr 01 2021 at 15:56):

When I run it from a place that lean can find mathlib, now I get...

Line 6: begin
State before:
⊢ true

State after:
⊢ true

Line 7:   let x : _ := by {apply nat.factorial_pos, exact 7, },
State before:
⊢ true

State after:
x : 0 < 7.factorial := 7.factorial_pos
⊢ true

Line 8:   exact wrapper x,
State before:
x : 0 < 7.factorial := 7.factorial_pos
⊢ true

State after:
no goals

Line 9: end
State before:
⊢ true

State after:


Messages:

    def ex : true :=
    let x : 0 < 7.factorial := 7.factorial_pos in wrapper x

which hopefully is what you're expecting

Jason Rute (Apr 01 2021 at 16:01):

Some general notes about the Lean server and python client:

The lean server gives back the same information you find in vs code (and emacs), so if it is available there, it is available from the lean server and if it is not available there, it is probably not available.
The state method in that example, is just a light-weight example of what you can get from the Lean server. Actually, I think we should remove state and just encourage folks to use the actual API. More generally, much of what you need is from running an Info command as follows:

resp = await server.send(
    InfoRequest(file_name='test.lean', line=100, column=0)
)
print(resp)

Indeed if you look at what state does, it is just that, but it only returns one small portion (the tactic state) of the Info response

You should also look at the messages in server.messages after syncing or running an info request to see what is happening.
I would encourage you to try running an info request on every character in your lean file to see how the output changes. (Actually, I one did this for all of mathlib once. See here: https://github.com/jasonrute/annotated_lean )
As I think about it, you may want to try out the branch in this PR, since it has all the changes I intend to make to the interface. The current hold up is that I still need to address the qt server. But if you understand what is going on under the hood, it shouldn't matter too much which branch you are on.

Jason Rute (Apr 01 2021 at 16:08):

Julian Berman said:

if you want to know how I figured that out, all I did was some print debugging, I put print(data) as the first line in lean_client.commands.parse_result which shows me all the raw responses from the server -- there's probably a better way to do that (get debug info on all messages) but I was lazy

When you create the trio lean server there are optional flags, debug and debug_bytes, in the constructor that would print all the communication. This is useful for debugging.

Jason Rute (Apr 01 2021 at 16:16):

Also, if you use the online Lean web editor and click on the :question: icon, you can select "Log server messages to console". Then by using your web browser to inspect the console, you can see all the traffic. I've found this useful in the past to see what is going on when an editor communicates with the server. You may find it helpful.

Bryan Gin-ge Chen (Apr 01 2021 at 16:18):

We should make it easier to do this in the VS Code extension as well. It's currently possible by fooling around a bit with the debugger, but it's a bit of a pain.

Jason Rute (Apr 01 2021 at 16:44):

By the way, I just checked the traffic of the web editor to be sure, and the only place where the output of #check, #eval, etc. is shown is in the messages. You won't be able to get it from the info command for example.

Joe Palermo (S2'17) (Apr 01 2021 at 17:07):

Julian Berman said:

which you'll see is broken but that's for fixable reasons (that now I need to tell lean and/or my python file where to find mathlib)

Cool! How can you tell lean where to find mathlib?

Julian Berman (Apr 01 2021 at 17:10):

So there's a couple of ways to do that in Lean. The main way is to run it from a project, and one that has a leanpkg.path file, so I ran it from my mathlib checkout folder which has one (that says "mathlib's right here")

Julian Berman (Apr 01 2021 at 17:10):

so when I ultimately ran what I shared with you, I did ~/Development/lean-client-python/venv/bin/python ~/Development/lean-client-python/examples/trio_example.py ~/Development/mathlib/foo.lean from sitting in my mathlib working directory

Julian Berman (Apr 01 2021 at 17:11):

there are other ways you can use too if that's inconvenient somehow, and hopefully it's clear from what Jason is sharing that you'll likely want/need to do tweaking for these kinds of things, but do keep asking anywhere you run into trouble

Julian Berman (Apr 01 2021 at 17:16):

Does that help?

Joe Palermo (S2'17) (Apr 01 2021 at 17:18):

Great, thanks. Yeah I'll tweak it as I go.

One thing I think I'd like to do next is to iterate through mathlib and generate a fully elaborated proof term from every theorem/lemma by running #print. Any tips on how I might implement that?

I imagine that I could make a copy of mathlib and programmatically insert #print statements at the bottom of every file. Then I could modify the Lean Python client to iterate through mathlib (loading each file in turn) and extract only the output of the print statements. This seems very hacky though.

Julian Berman (Apr 01 2021 at 18:53):

I suspect for real insight now you need folks who've done similar things, whereas I'm just good enough to poke simply at tooling so far.

Julian Berman (Apr 01 2021 at 18:54):

What you said seems like it's a thing -- probably you don't even need to physically modify the files you could just walk the tree and have the python script be instrumenting those files as you go, but whether there's a more efficient way is yeah beyond what I know (I do know you can probably do lots of that extraction of declarations more efficiently in lean, but you may be like me and more comfortable outside lean than in it so far)

Joe Palermo (S2'17) (Apr 01 2021 at 19:47):

Yeah I figure there must be a better way to do it within lean, but likewise I'm not yet up to speed on programming in lean.

Jason Rute (Apr 01 2021 at 22:20):

Joe Palermo (S2'17) said:

One thing I think I'd like to do next is to iterate through mathlib and generate a fully elaborated proof term from every theorem/lemma by running #print. Any tips on how I might implement that?

I imagine that I could make a copy of mathlib and programmatically insert #print statements at the bottom of every file. Then I could modify the Lean Python client to iterate through mathlib (loading each file in turn) and extract only the output of the print statements. This seems very hacky though.

I agree this is hacky and probably not the best approach, but let me first tell you how to do it with the Lean server and then how to do it with Lean. With the Lean server, you don't have to modify the actual lean file, you just have to call sync with the file contents, so it is easy to run a modified lean file in the Lean server. And as long as your .olean files are up-to-date, then when you run a modified file, you are only checking that file and not recompiling any of its dependencies. This is very similar to what I did in this project, except I did that project before the Lean python server code and I didn't do truly async. In particular, I ran into trouble with incomplete responses from the server, that shouldn't be an issue if one uses a proper async framework like the current client.

Jason Rute (Apr 01 2021 at 22:20):

I did find something like a memory leak where if I keep syncing files it got slower and it was better to restart the server for each lean file. That might still be true, or maybe there is a way to unsync a file. (@Bryan Gin-ge Chen?)

Jason Rute (Apr 01 2021 at 22:21):

Also, to print the fully elaborated terms, you can put set_option pp.all true in the Lean code or just call the server with the command line flag -D pp.all=true. There is a way to specify this in the constructor for the Lean trio server.

Jason Rute (Apr 01 2021 at 22:22):

If you did this, my guess is that it would take 24 hours on a good machine to run.

Jason Rute (Apr 01 2021 at 22:23):

Actually, maybe that is too much, but it would be at least several hours.

Jason Rute (Apr 01 2021 at 22:24):

As for finding the theorem names to print, that would be a parsing challenge. It might not be too bad, but I could see you running into weird edge cases.

Jason Rute (Apr 01 2021 at 22:29):

Having said all this, I think it would be better to write a small Lean metaprogram and run that. It wouldn't be very long and many folks have done it already. See the examples I posted here in #general > ast and #general > Lean code into AST/XML/JSON for parsing/generation for ML? (Someone really needs to write a tutorial for this stuff, since it seems to be a common request.)

Jason Rute (Apr 01 2021 at 22:32):

The data you are looking for (term proofs) are stored very nicely in Lean's environment and there is a tool already for looping over it. The only thing you need to do is filter the information you want and print it.

Jason Rute (Apr 01 2021 at 22:43):

Actually, to make it easier to find, here are two snippets and a repo which do this sort of thing:

Get all theorem names (but easy to modify to get other stuff): https://leanprover.zulipchat.com/#narrow/stream/219941-Machine-Learning.20for.20Theorem.20Proving/topic/ML.20for.20Lean.3A.20How.20to.20do.20it.3F/near/186859952
Get name and type of all declarations (theorems, definitions, etc) in the current file (but can be relaxed to get anything in the environment). https://leanprover.zulipchat.com/#narrow/stream/113489-new-members/topic/printing.20lemma.20statements/near/213518206
A repo for getting all the term proofs in mathlib and processing them for the LeanStep dataset and GPT-f prover: https://github.com/jesse-michael-han/lean-step-public

Jason Rute (Apr 01 2021 at 22:47):

Also, if you want to put all of mathlib into the environment, there is a script in the mathlib directory which creates an all.lean file which can be imported as import all, but note that looping over all declarations in mathlib will probably take a long time (hours I think), so it is best to test with just base lean to start.

Bryan Gin-ge Chen (Apr 01 2021 at 23:00):

@Jason Rute I don't think it's possible to unload files from the server without restarting it, unfortunately.

Joe Palermo (S2'17) (Apr 02 2021 at 20:48):

@Jason Rute Thanks again for the detailed tips! I'll let you know how it goes.

Joe Palermo (S2'17) (Apr 05 2021 at 16:50):

@Jason Rute Are there different meanings to "fully elaborated proof term" within Lean?

I've been looking into the code for the PACT data pipeline (https://github.com/jesse-michael-han/lean-step-public) and ran it this morning. I was looking at the data extracted for the proof term elaboration task and I pasted some of the supposedly fully elaborated terms into the Lean Web Editor, but I found that they were actually missing type information. For example:

{"proof_term": "list.nil.bind f = list.nil", "elab_proof_term": "(@list.nil \u03b1).bind f = @list.nil \u03b2"}

https://leanprover.github.io/live/latest/#code=set_option%20pp.all%20true%0A%0A#check%20(@list.nil%20%CE%B1).bind%20f%20=%20@list.nil%20%CE%B2%0A%0A

I thought fully elaborated proof terms could always stand alone and type check without any other context, so I'm a bit confused what it means to say that the "elab_proof_term" above is fully elaborated.

Jesse Michael Han (Apr 05 2021 at 16:56):

"fully elaborated" is kind of a misnomer, it just means printed extremely verbosely

Joe Palermo (S2'17) (Apr 05 2021 at 17:34):

I see. So these fully elaborated terms (as in the example I shared) only type check in a specific context (i.e. tactic state)?

Mario Carneiro (Apr 05 2021 at 18:10):

Actually they don't necessarily even typecheck in the intended context, due to issues with pp.all printing

Mario Carneiro (Apr 05 2021 at 18:10):

From that example, it looks like dot-notation is being used, which shouldn't be the case with pp.all

Mario Carneiro (Apr 05 2021 at 18:11):

there are additional implicit args in list.bind that are being hidden by the use of dot notation

Jesse Michael Han (Apr 05 2021 at 18:33):

yeah as Mario observed, lean-step does not use pp.all, rather, it uses a slightly less verbose representation, comprising a certain set of pretty-printer options which have been optimized for round-tripping proof terms

Joe Palermo (S2'17) (Apr 05 2021 at 18:55):

Interesting. What does it mean to "round-trip" a proof term?

Joe Palermo (S2'17) (Apr 05 2021 at 18:58):

Also, if I set set_option pp.all true, then would any generated "elab_proof_term" type check without any context?

Jesse Michael Han (Apr 05 2021 at 19:18):

Joe Palermo (S2'17) said:

Interesting. What does it mean to "round-trip" a proof term?

means that Lean will accept it if you paste it as a proof; pp.all proof terms will often not round-trip

Jesse Michael Han (Apr 05 2021 at 19:19):

Joe Palermo (S2'17) said:

Also, if I set set_option pp.all true, then would any generated "elab_proof_term" type check without any context?

some of the top-level ones will, but strictly fewer than if you used the options i hardcoded

Joe Palermo (S2'17) (Apr 05 2021 at 20:32):

@Jason Rute Not sure what you meant by this: "you don't have to modify the actual lean file, you just have to call sync with the file contents, so it is easy to run a modified lean file in the Lean server. " Do you mean that I should be calling the 'sync' function in qt_server.py? Will this automatically print out all the proof terms constructed in the file?

Also. I was looking at some of the other code snippets you linked. I believe @Jesse Michael Han shared this one.

How can I run this? Do I need some imports?

meta def process_thm (d : declaration) : option declaration :=
let n := d.to_name in
  if ¬ d.is_trusted ∨ n.is_internal then none
  else match d with
       | declaration.defn _ _ _ _ _ _ := none
       | t@(declaration.thm n ns e te) := some t
       | declaration.cnst _ _ _ _ := none
       | declaration.ax _ _ _ := none
       end

meta def library_thms : tactic $ list declaration :=
  environment.decl_filter_map <$> get_env <*> return process_thm

meta def list_all_theorems : tactic unit :=
  do library_thms >>= λ x, tactic.trace (declaration.to_name <$> x)

run_cmd list_all_theorems

Thanks :)

Jesse Michael Han (Apr 05 2021 at 20:41):

environment.decl_filter_map is defined in meta.expr (you can see this if you grep your local copy of mathlib)

the fully qualified version of get_env is tactic.get_env, so you just need to prefix the above snippet with

import meta.expr
open tactic

Joe Palermo (S2'17) (Apr 05 2021 at 20:43):

Thanks!

Jason Rute (Apr 06 2021 at 04:33):

Joe Palermo (S2'17) said:

Jason Rute Not sure what you meant by this: "you don't have to modify the actual lean file, you just have to call sync with the file contents, so it is easy to run a modified lean file in the Lean server. " Do you mean that I should be calling the 'sync' function in qt_server.py? Will this automatically print out all the proof terms constructed in the file?

Imagine you want to run all the mathlib lean files, but adding to the end some custom eval statement. You could first run a script to loop over all files in _target/deps/mathlib/src directory (I'm assuming you know how that all works) modifying each file, recompile mathlib, and finally run the lean server as follows (forgive the metacode):

for file in mathlib_lean_files:
  start_lean_server
  sync(file)
  read_server_messages
  kill_lean_server

However, if you are not making changes to the file that effect downstream files, there is an easier way. Without modifying any files you can do the following:

for file in mathlib_lean_files:
  start_lean_server
  file_text = read(file)
  modified_file_text = ...  # add your modification here
  sync(file, contents=modified_file_text)
  read_server_messages
  kill_lean_server

Jason Rute (Apr 06 2021 at 05:22):

As for the broader question about what counts as a fully elaborated proof term, I think there are two considerations:

How exactly does one pretty print the term, but do so in a way that provides the best chance of it being parsable back into Lean? (I know pp.all has issues, which have been addressed in the upcoming Lean 4. I wasn't aware that Jesse was using special settings here, but it isn't surprising.)
What extra information to provide? If you think about your example, (@list.nil α).bind f = @list.nil β should never pass #check even if I added extra type information like (@list.nil (α : Type)).bind (f : α → β) = @list.nil (β : Type). The issue isn't missing types. The issue is that Lean doesn't know what \alpha, \beta, and f are and no amount of missing type info will fix that?

More on issue (2). First, it would be good to understand where this dataset is coming from. @Jesse Michael Han can correct me if I'm mistaken, but we take a full Lean proof term and then walk along the tree until we get to certain subtrees. Those subterms are what you are looking at. Now, the example you gave is a bit confusing since it isn't a proof. (In Lean, a proof has a type p which is a Prop. Here, this "proof" has type Prop so it isn't technically proof, but is another term which appeared in a proof. It doesn't make much since we are working in dependent type theory, but it is conceptually confusing to answer your question.) So let's consider the example from my talk:

theorem of_iff_true : ∀ {a : Prop}, (a ↔ true) → a :=
λ {a : Prop} (h : a ↔ true), iff.mp (iff.symm h) trivial

And here is a more elaborated version (using pp.all):

theorem of_iff_true : ∀ {a : Prop}, iff a true → a :=
λ {a : Prop} (h : iff a true), @iff.mp true a (@iff.symm a true h) trivial

If one checks the theorem statement in Lean it will parse, and if one checks the proof, it will also parse and correctly type check into the correct theorem. (However, as we talked about before, this need not always be the case.) However, in this dataset, we also have subterms which are not proofs of a self contained proposition, but of a proposition within a local context (i.e. a tactic state). The subterm (@iff.symm a true h) is a proof of the goal

a : Prop,
h : iff a true
⊢ iff true a

So this is a long winded way of saying, yes, these terms do require the local context to parse and type check. Also, just in case you are not aware of this, you can find this tactic state by using an _ as follows:

example : ∀ {a : Prop}, iff a true → a :=
λ {a : Prop} (h : iff a true), @iff.mp true a _ trivial

Joe Palermo (S2'17) (Apr 06 2021 at 15:14):

@Jason Rute @Julian Berman I'm having trouble getting it to do what I want with a single file. I figure I better debug it for one file before scaling to all of mathlib. Here's the script I'm running:

from pathlib import Path
from textwrap import indent
from lean_client.trio_server import TrioLeanServer
import trio

filepath = 'mathlib/src/algebra/algebra/basic.lean'
path = Path(filepath)

async def main():

    async with trio.open_nursery() as nursery:

        # start lean server
        server = TrioLeanServer(nursery, debug=False)
        await server.start()

        # read text file and modify it
        with open(filepath) as f:
            file_text = f.read()
        modified_file_text = f'set_option pp.all true\n {file_text}\n #print ring_hom.algebra_map_to_algebra'

        # sync
        await server.full_sync(filepath, content=modified_file_text)

        # read server messages
        if server.messages:
            print(
                "Messages:\n\n" + indent(
                    "\n".join(message.text for message in server.messages),
                    prefix=" " * 4,
                ),
            ),

        # kill lean server
        server.kill()
        nursery.cancel_scope.cancel()

if __name__ == '__main__':
    trio.run(main)

The output contains a lot of errors messsages:

file 'tactic/nth_rewrite' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    file 'data/matrix/basic' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    file 'data/equiv/ring_aut' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    file 'linear_algebra/tensor_product' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    file 'ring_theory/subring' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    file 'deprecated/subring' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    file 'algebra/opposites' not found in the search path
    Use 'lean --path' to see where lean is looking, or https://leanprover-community.github.io/file-not-found.html for more
    invalid import: tactic.nth_rewrite
    could not resolve import: tactic.nth_rewrite
    invalid import: data.matrix.basic
    could not resolve import: data.matrix.basic
    invalid import: data.equiv.ring_aut
    could not resolve import: data.equiv.ring_aut
    invalid import: linear_algebra.tensor_product
    could not resolve import: linear_algebra.tensor_product
    invalid import: ring_theory.subring
    could not resolve import: ring_theory.subring
    invalid import: deprecated.subring
    could not resolve import: deprecated.subring
    invalid import: algebra.opposites
    could not resolve import: algebra.opposites
    unknown attribute [nolint]
    unknown identifier 'comm_semiring'
    unknown identifier 'semiring'
    unknown identifier 'has_scalar'
    invalid 'structure', expression must be a 'parent' structure
    unknown identifier 'comm_semiring'
    unknown identifier 'semiring'
    unknown identifier 'algebra'
    ...

Clearly it's not finding the dependencies it needs. Am I missing a step?

Julian Berman (Apr 06 2021 at 15:47):

I think you're 1 directory too high

Julian Berman (Apr 06 2021 at 15:48):

Your path there is 'mathlib/src/algebra/algebra/basic.lean'

Julian Berman (Apr 06 2021 at 15:48):

so you're in some directory that contains the mathlib directory

Julian Berman (Apr 06 2021 at 15:48):

But you need to be alongside the mathlib leanpkg.path file for it to find all that stuff

Julian Berman (Apr 06 2021 at 15:53):

also my mental lean compiler is not good but I think you're going to need to put that set_option at the end (just before running #print I think should be fine?)

Julian Berman (Apr 06 2021 at 15:53):

otherwise I think Lean is going to complain when it sees an import that it's not at the top of a file

Joe Palermo (S2'17) (Apr 06 2021 at 16:17):

I don't see anything called leanpkg.path in mathlib. Are you referring to this? https://github.com/leanprover/lean/blob/master/leanpkg/leanpkg.path

Joe Palermo (S2'17) (Apr 06 2021 at 16:26):

I tried putting my script into the mathlib directory, and changing the filepath accordingly ("src/algebra/...") but I seem to get the same result.

Jason Rute (Apr 06 2021 at 16:27):

Are you familiar with setting up a lean project with leanpkg.toml and leanpkg.path files?

Jason Rute (Apr 06 2021 at 16:28):

If not, I’d look at the leanproject tool.

Joe Palermo (S2'17) (Apr 06 2021 at 16:29):

no, unfortunately not. Ok will do. Thanks!

Jason Rute (Apr 06 2021 at 16:32):

Or maybe you are running this from within a mathlib repo clone. That is fine too, but then I think you will have to be running your script from the right directory, which I don’t remember. I don’t work in mathlib directly much.

Jason Rute (Apr 06 2021 at 16:33):

You can test if Lean can find the file by just running lean path_to_lean_file. If that works, I think you can run your script from there.

Jason Rute (Apr 06 2021 at 16:35):

Also, after that test it adding one piece at a time, e.g. run the server in the unmodified file. Then add set option. Then you custom print.

Jason Rute (Apr 06 2021 at 16:39):

Try putting your script in src. That seems to be where it is looking, but I could be mistaken.

Jason Rute (Apr 06 2021 at 16:43):

(But when I do this stuff, I use leanproject new to make a custom project when adds mathlib files to a sub directory _target and adds that path to a file called leanpkg.path. If you run lean from inside that project it should find the right dependencies.)

Jason Rute (Apr 06 2021 at 16:44):

Also this has a huge advantage. The mathlib files are already compiled to .olean. I think if you just clone mathlib, you are in for hours of waiting for it to compile, and if you don’t compile ahead of time, running a single file will take forever since it has to run all dependencies.

Joe Palermo (S2'17) (Apr 06 2021 at 18:55):

Thanks for the recommendation. I used leanproject new and it just worked.

Joe Palermo (S2'17) (Apr 06 2021 at 19:15):

I'd appreciate some help interpreting the output of this print.

    @[_refl_lemma]
    theorem ring_hom.algebra_map_to_algebra : ∀ {R : Type u_1} {S : Type u_2} [_inst_1 : comm_semiring.{u_1} R] [_inst_2 : comm_semiring.{u_2} S]
    (i : @ring_hom.{u_1 u_2} R S (@comm_semiring.to_semiring.{u_1} R _inst_1) (@comm_semiring.to_semiring.{u_2} S _inst_2)),
      @eq.{(max (u_1+1) (u_2+1))}
        (@ring_hom.{u_1 u_2} R S (@comm_semiring.to_semiring.{u_1} R _inst_1) (@comm_semiring.to_semiring.{u_2} S _inst_2))
        (@algebra_map.{u_1 u_2} R S _inst_1 (@comm_semiring.to_semiring.{u_2} S _inst_2)
           (@ring_hom.to_algebra.{u_1 u_2} R S _inst_1 _inst_2 i))
        i :=
    λ {R : Type u_1} {S : Type u_2} [_inst_1 : comm_semiring.{u_1} R] [_inst_2 : comm_semiring.{u_2} S]
    (i : @ring_hom.{u_1 u_2} R S (@comm_semiring.to_semiring.{u_1} R _inst_1) (@comm_semiring.to_semiring.{u_2} S _inst_2)),
      @rfl.{(max (u_1+1) (u_2+1))}
        (@ring_hom.{u_1 u_2} R S (@comm_semiring.to_semiring.{u_1} R _inst_1) (@comm_semiring.to_semiring.{u_2} S _inst_2))
        (@algebra_map.{u_1 u_2} R S _inst_1 (@comm_semiring.to_semiring.{u_2} S _inst_2)
           (@ring_hom.to_algebra.{u_1 u_2} R S _inst_1 _inst_2 i))

Would be be fair to say that the template here is as follows?

theorem <theorem_name> : <theorem_statement, i.e. type> := <proof term>

Also every lemma/theorem I print will follow this pattern?

I noticed again that if I try to run #check on the proof term I extracted here, I get a bunch of errors complaining that the universes are unknown. See here for example. I suppose this could be a reflection of the issues with pp.all you mentioned @Jason Rute.

Joe Palermo (S2'17) (Apr 06 2021 at 19:20):

It's making me think that generating proof terms without any context is probably not the best way to go about randomly generating new theorems.

Jason Rute (Apr 06 2021 at 22:29):

Universes unknown errors aren't an issue with pp.all. Instead it is another example of the issue (2) I mentioned above. The universes u_1 and u_2 are not declared. If you put universes u_1 u_2 on the line before your #check it should work.

Jason Rute (Apr 06 2021 at 22:34):

(It is not terribly difficult to write a parser to extract the universes from the term. If you look for Sort ..., Type ... and .{...} that is where the universes show up. Then filter out {, }, (, ), max, imax, +, and digits (also checking parentheses after the Sort and Type to make sure you capture the whole universe term. Of course, this information is also available in the actual proof expression if you ever decide to go the route of meta programming. :smile: )

Jason Rute (Apr 06 2021 at 22:38):

And yes, I think

theorem <theorem_name> : <theorem_statement, i.e. type> := <proof term>

is the correct template for the output of #printing a theorem, but I might be forgetting something. (It is possible, but I don't know if any examples exist, that there is a let ... := ... inside the theorem statement, which would make parsing this harder.)

Joe Palermo (S2'17) (Apr 06 2021 at 22:57):

Hmmm yes, more and more I find myself thinking that meta programming is the way to go :smile:

Joe Palermo (S2'17) (Apr 08 2021 at 14:28):

@Jason Rute I'm curious what you mean by "this information is also available in the actual proof expression". Is there some way to extract the contents of a proof term which is more complete than what we obtain from #print?

Jason Rute (Apr 08 2021 at 16:28):

@_Joe Palermo (S2'17)|337523 said:

Jason Rute I'm curious what you mean by "this information is also available in the actual proof expression". Is there some way to extract the contents of a proof term which is more complete than what we obtain from #print?

Sure. Use metaprogramming to access the expression and print it however you like.

-- this tactic is just a placeholder, but you can implement any
-- custom expr to string formatter you like, e.g. JSON, an S-expr,
-- or any other format as desired.
meta def expr_to_custom_string (e : expr) : tactic string :=
do
  -- TODO: Implement as desired
  return "Not yet implemented"

-- this tactic looks up a declaration by its name and info stuff about it
meta def print_decl_info (nm : name) : tactic unit :=
do  -- enter tactic do block to (1) get access to the environemnt and (2) to trace output

  -- look up declaration by its name
  -- use <- notation since get_declaration has type `tactic expr`
  d <- tactic.get_decl nm,

  -- get type and value of the declaration
  let tp := d.type,
  let v := d.value,

  -- trace type
  tactic.trace "\nDeclaration Type (Pretty Printed)",
  tactic.trace tp, -- normal pp of the expression
  tactic.trace "\nDeclaration Type (Raw Format)",
  tactic.trace tp.to_raw_fmt,  -- a debugging format which shows the full internal expression term
  tactic.trace "\nDeclaration Type (My Own Format)",
  tp_custom_str <- expr_to_custom_string tp,
  tactic.trace tp_custom_str,  -- use a custom output format

  -- trace value
  tactic.trace "\nDeclaration Value (Pretty Printed)",
  tactic.trace v,
  tactic.trace "\nDeclaration Value (Raw Format)",
  tactic.trace v.to_raw_fmt,
  tactic.trace "\nDeclaration Value (My Own Format)",
  tp_custom_str <- expr_to_custom_string tp,
  tactic.trace tp_custom_str,  -- use a custom output format
  return ()

-- run_cmd runs a tactic (enter name argument of print_decl_info using backtick notation)
run_cmd print_decl_info `nat.add_comm

Jason Rute (Apr 08 2021 at 16:45):

You should probably take some time to familiarize yourself with the induction definition of expr in Lean and maybe also declaration. The code isn't terrible to follow. Also, if you are interested, there is a paper on this stuff: https://leanprover.github.io/papers/tactic.pdf

Jason Rute (Apr 08 2021 at 16:48):

Here is one example also of how to recurse over an expression. This prints an express as an S-expr (the let branch still needs to be filled in. Also it is a bit out of date, so it might require a bit of tweaking to compile.

def join (sep : string) : list string → string
| [x]     := x
| []      := ""
| (x::xs) := x ++ sep ++ join xs

def escapec : char → string
| '\"' := "\\\""
| '\t' := "\\t"
| '\n' := "\\n"
| '\\' := "\\\\"
| c    := string.singleton c

def escape (s : string) : string :=
s.fold "" (λ s c, s ++ escapec c)

def wrap_in_parens (ss : list string) : string := "( " ++ (join " " ss) ++ " )"

meta def form_of_string (s : string) : string := "\"" ++ (escape s) ++ "\""

meta def form_of_nat (n : nat) : string := to_string n

meta def form_of_unsigned (n : unsigned) : string := to_string n

meta def form_of_name : name → string
| name.anonymous         := wrap_in_parens ["name.anonymous"]
| (name.mk_string s nm)  := wrap_in_parens ["name.mk_string", form_of_string s, form_of_name nm]
| (name.mk_numeral i nm) := wrap_in_parens ["name.mk_numeral", form_of_unsigned i, form_of_name nm]

meta def form_of_lvl : level → string
| level.zero         := wrap_in_parens ["level.zero"]
| (level.succ l)     := wrap_in_parens ["level.succ", form_of_lvl l]
| (level.max l1 l2)  := wrap_in_parens ["level.max", form_of_lvl l1, form_of_lvl l2]
| (level.imax l1 l2) := wrap_in_parens ["level.imax", form_of_lvl l1, form_of_lvl l2]
| (level.param nm)   := wrap_in_parens ["level.param", form_of_name nm]
| (level.mvar nm)    := wrap_in_parens ["level.mvar", form_of_name nm]

meta def form_of_lvl_list : list level → string
| []       := wrap_in_parens ["list.nil"]
| (h :: t) := wrap_in_parens ["list.cons", form_of_lvl h, form_of_lvl_list t]

meta def form_of_binder_info : binder_info → string
| binder_info.default             := wrap_in_parens ["binder_info.default"]
| binder_info.implicit            := wrap_in_parens ["binder_info.implicit"]
| binder_info.strict_implicit     := wrap_in_parens ["binder_info.strict_implicit"]
| binder_info.inst_implicit       := wrap_in_parens ["binder_info.inst_implicit"]
| other                           := wrap_in_parens ["<other>"]

meta def form_of_expr : expr → string
| (expr.var i)                     := wrap_in_parens ["expr.var", (format.to_string (to_fmt i) options.mk)]
| (expr.sort lvl)                  := wrap_in_parens ["expr.sort", form_of_lvl lvl]
| (expr.const nm lvls)             := wrap_in_parens ["expr.const", form_of_name nm, form_of_lvl_list lvls]
| (expr.mvar nm nm' tp)            := wrap_in_parens ["expr.mvar", form_of_name nm, form_of_name nm',form_of_expr tp]
| (expr.local_const nm ppnm bi tp) := wrap_in_parens ["expr.local_const", form_of_name nm, form_of_name ppnm, form_of_binder_info bi, form_of_expr tp]
| (expr.app f e)                   := wrap_in_parens ["expr.app", form_of_expr f, form_of_expr e]
| (expr.lam nm bi tp bod)          := wrap_in_parens ["expr.lam", form_of_name nm, form_of_binder_info bi, form_of_expr tp, form_of_expr bod]
| (expr.pi nm bi tp bod)           := wrap_in_parens ["expr.pi", form_of_name nm, form_of_binder_info bi, form_of_expr tp, form_of_expr bod]
| (expr.elet nm tp val bod)        := wrap_in_parens ["<expr.elet>"]
| (expr.macro mdf mlst)            := wrap_in_parens ["<expr.macro>"]

Joe Palermo (S2'17) (Apr 08 2021 at 19:31):

Thanks for that script! It makes it clear that declarations can be deconstructed into a type (theorem) and a value (proof), and that these are just expressions (expr) that can be taken apart and converted to any desired format. So #print (which is just an invocation of pretty print, subject to configurations) and also .to_raw_fmt, are just different ways of converting underlying expressions into strings.

Thanks also for sharing the paper and suggesting that I study the definitions of expr and declaration. I'm sure they will be fodder for more questions in the coming days :P

I've revisited my previous example and I've noticed that the raw_fmt also lacks declaration of type universes.

run_cmd print_decl_info `ring_hom.algebra_map_to_algebra

(lam R implicit (sort u_1+1) (lam S implicit (sort u_2+1) (lam _inst_1 inst_implicit (app (const comm_semiring [u_1]) (var 1)) (lam _inst_2 inst_implicit (app (const comm_semiring [u_2]) (var 1)) (lam i default (app (app (app (app (const ring_hom [u_1,
           u_2]) (var 3)) (var 2)) (app (app (const comm_semiring.to_semiring [u_1]) (var 3)) (var 1))) (app (app (const comm_semiring.to_semiring [u_2]) (var 2)) (var 0))) (app (app (const rfl [max
           (u_1+1)
           (u_2+1)]) (app (app (app (app (const ring_hom [u_1,
             u_2]) (var 4)) (var 3)) (app (app (const comm_semiring.to_semiring [u_1]) (var 4)) (var 2))) (app (app (const comm_semiring.to_semiring [u_2]) (var 3)) (var 1)))) (app (app (app (app (app (const algebra_map [u_1,
             u_2]) (var 4)) (var 3)) (var 2)) (app (app (const comm_semiring.to_semiring [u_2]) (var 3)) (var 1))) (app (app (app (app (app (const ring_hom.to_algebra [u_1,
              u_2]) (var 4)) (var 3)) (var 2)) (var 1)) (var 0)))))))))

Pardon the naive question, perhaps this will be obvious when I understand expr more deeply and if so feel free to let me know, but is there a standard format one could apply to an expr that would yield all required declarations in order for the proof term to type check? I ask since you mentioned "this information is also available in the actual proof expression if you ever decide to go the route of meta programming."

Jason Rute (Apr 08 2021 at 22:18):

You can’t in an expression declare a universe (just like you can’t declare a constant). Think of universe variables as free variables, but unlike regular variables you can’t add a universal quantified to the front to make them bound. Instead, the best you can do is just add universes .... before your #check to declare the universe parameters.

What you can however get out of the expression is the names of the universes so you know what universe names to declare. That is what I was referring to. Does that make sense?

(Also it may help to read a bit about universes in Theorem Proving in Lean.)

Jason Rute (Apr 08 2021 at 22:26):

(Here is the section: https://leanprover.github.io/theorem_proving_in_lean/dependent_type_theory.html#types-as-objects)

Joe Palermo (S2'17) (Apr 08 2021 at 23:41):

Got it - you can only declare universe parameters (i.e. u_1, u_2, etc...). Universes are the families of types: Prop, Type 0, Type 1, etc...) and they are pre-defined, not declared by users.

I guess what I'm really wondering is how I can convert proof terms into theorems, i.e. reliably get the type of any given proof term. #check does this, but only if sufficient context is available. So I need to either:

1 - Choose a representation for proof terms which includes all relevant information for #check to work (I'm not sure if it's possible or desirable to do this inline within a proof term), or
2 - Write a procedure which can infer any missing information from a proof term and add any required declarations to the .lean file before invoking the #check command on the proof term

I'm aware now of several different representations for proof terms:

1 - What you get from #print,
2 - What you get from #print if pp.all is set to true
3 - What you get from .to_raw_fmt
4 - S-expr's as described above

#check seems to work best with with the 2nd, but still requires some information to be declared (for example the type universe parameters).

Do you think I'm on the right track here? Thanks!

Jason Rute (Apr 09 2021 at 01:46):

I think so. The universe thing is annoying, but it isn’t really a big deal.

Jason Rute (Apr 09 2021 at 01:48):

Also, if you are interested, I think I could write some lean code which would ack like #check but also fill in the universes for you.

Jason Rute (Apr 09 2021 at 12:58):

Ok, I tried this version of #check, and it was trickier than I thought.

Jason Rute (Apr 09 2021 at 13:08):

However, in your data gathering process, you can get a list of the universes by using tactic.trace v.collect_univ_params. When your model generates an expression, you can either (1) have your model explicitly list the universe params at the end, or (2) use a special token for universe parameters (like <univ> u), or (3) just write a few lines of code to find the universe parameters in the expression. (I sketched an algorithm above.) Then to check that your generated proofs type check, instead of

#check λ (α : Type u), (@rfl (Type u) α)

you would use

universes u
#check λ (α : Type u), (@rfl (Type u) α)

Joe Palermo (S2'17) (Apr 09 2021 at 14:30):

Excellent, v.collect_univ_params is handy.

Here's another class of error I'm running into. I got a pp.all proof term for nat.add_comm, and it type checks, but it type checks to an expression with ?? inside of it.

run_cmd print_decl_info `nat.add_comm

#check λ (n m : nat),
  @nat.brec_on.{0}
    (λ (m : nat),
       ∀ (n : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
    m
    (λ (m : nat)
     (_F :
       nat.below.{0}
         (λ (m : nat),
            ∀ (n : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
         m) (n : nat),
       (λ (n m : nat)
        (_F :
          nat.below.{0}
            (λ (m : nat),
               ∀ (n : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
            m),
          @nat.cases_on.{0}
            (λ (m : nat),
               nat.below.{0}
                 (λ (m : nat),
                    ∀ (n : nat),
                      @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
                 m →
               @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
            m
            (λ
             (_F :
               nat.below.{0}
                 (λ (m : nat),
                    ∀ (n : nat),
                      @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
                 nat.zero),
               id_rhs.{0}
                 (@eq.{1} nat (@has_add.add.{0} nat nat.has_add n (@has_zero.zero.{0} nat nat.has_zero))
                    (@has_add.add.{0} nat nat.has_add (@has_zero.zero.{0} nat nat.has_zero) n))
                 (@eq.symm.{1} nat (@has_add.add.{0} nat nat.has_add (@has_zero.zero.{0} nat nat.has_zero) n)
                    (@has_add.add.{0} nat nat.has_add n (@has_zero.zero.{0} nat nat.has_zero))
                    (nat.zero_add n)))
            (λ (m : nat)
             (_F :
               nat.below.{0}
                 (λ (m : nat),
                    ∀ (n : nat),
                      @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
                 (nat.succ m)),
               id_rhs.{0}
                 ((λ (_x : nat),
                     @eq.{1} nat
                       (@has_add.add.{0} nat nat.has_add n
                          (@has_add.add.{0} .nat .nat.has_add m (@has_one.one.{0} .nat .nat.has_one)))
                       _x)
                    (@has_add.add.{0} nat nat.has_add (nat.succ m) n))
                 ((λ
                   (this :
                     @eq.{1} nat (nat.succ (@has_add.add.{0} nat nat.has_add n m))
                       (nat.succ (@has_add.add.{0} nat nat.has_add m n))),
                     @eq.subst.{1} nat
                       (λ (_x : nat),
                          @eq.{1} nat
                            (@has_add.add.{0} nat nat.has_add n
                               (@has_add.add.{0} .nat .nat.has_add m (@has_one.one.{0} .nat .nat.has_one)))
                            _x)
                       (nat.succ (@has_add.add.{0} nat nat.has_add m n))
                       (@has_add.add.{0} nat nat.has_add (nat.succ m) n)
                       (@eq.symm.{1} nat (@has_add.add.{0} nat nat.has_add (nat.succ m) n)
                          (nat.succ (@has_add.add.{0} nat nat.has_add m n))
                          (nat.succ_add m n))
                       this)
                    (@congr_arg.{1 1} nat nat (@has_add.add.{0} nat nat.has_add n m)
                       (@has_add.add.{0} nat nat.has_add m n)
                       nat.succ
                       (@pprod.fst.{0 1}
                          ((λ (m : nat),
                              ∀ (n : nat),
                                @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m)
                                  (@has_add.add.{0} nat nat.has_add m n))
                             (nat.add m nat.zero))
                          (@nat.rec.{2} (λ (n : nat), Type) punit.{1}
                             (λ (n : nat) (ih : Type),
                                pprod.{1 1}
                                  (pprod.{0 1}
                                     ((λ (m : nat),
                                         ∀ (n : nat),
                                           @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m)
                                             (@has_add.add.{0} nat nat.has_add m n))
                                        n)
                                     ih)
                                  punit.{1})
                             (nat.add m nat.zero))
                          (@pprod.fst.{1 1}
                             (pprod.{0 1}
                                ((λ (m : nat),
                                    ∀ (n : nat),
                                      @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m)
                                        (@has_add.add.{0} nat nat.has_add m n))
                                   (nat.add m nat.zero))
                                (@nat.rec.{2} (λ (n : nat), Type) punit.{1}
                                   (λ (n : nat) (ih : Type),
                                      pprod.{1 1}
                                        (pprod.{0 1}
                                           ((λ (m : nat),
                                               ∀ (n : nat),
                                                 @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m)
                                                   (@has_add.add.{0} nat nat.has_add m n))
                                              n)
                                           ih)
                                        punit.{1})
                                   (nat.add m nat.zero)))
                             punit.{1}
                             _F)
                          n))))
            _F)
         n
         m
         _F)
    n

yields

λ (n m : nat),
  @nat.brec_on.{0}
    (λ (m : nat),
       ∀ (n : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
    m
    (λ (m : nat)
     (_F :
       nat.below.{0}
         (λ (m : nat),
            ∀ (n : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
         m) (n : nat),
       (λ (n m : nat)
        (_F :
          nat.below.{0}
            (λ (m : nat),
               ∀ (n : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n))
            m), ⁇)
         n
         m
         _F)
    n :
  ∀ (n m : nat), @eq.{1} nat (@has_add.add.{0} nat nat.has_add n m) (@has_add.add.{0} nat nat.has_add m n)

Jason Rute (Apr 10 2021 at 12:49):

You will notice that when you plug this into Lean there is a few .nats in the statement that you posted. Lean correctly points those out as errors. This is definitely a bug in the pretty printer. If you remove the leading . in all those cases, it will type check and remove the ??. I guess there are a few morals here:

pp.all=true is buggy
Even if the #check doesn't fail, you should look for errors in the code being checked. (Somehow #check knew even with the errors, that if this proof was valid code then it would type check to ∀ (n m : ℕ), n + m = m + n. A simple way to reproduce this sort of behavior is #check (foo : ∀ (n m : ℕ), n + m = m + n) where foo is an unknown identifier, but the type checker correctly infers the type: ?? : ∀ (n m : ℕ), n + m = m + n).)

Jason Rute (Apr 10 2021 at 13:14):

This example also reminds me of a few points:

One reason that fully elaborated lean proofs can get so long is the use of type classes. Notice every instance of 0 is expanded to (@has_zero.zero.{0} nat nat.has_zero) and every application of n + m is expanded to (@has_add.add.{0} nat nat.has_add n m).
Sometimes this (exponential?) exposition in term size is so bad that the pretty printer starts to replace terms with …, e.g. #check 10101010101010101010.
I also forgot that Lean has private and protected definitions and theorems. Like type classes, this makes Lean a top shelf programming language, but it also will create problems for you since if you try to check a proof which references a private theorem (which is not in scope) it will error.
I think the preceding examples are relatively rare and you could filter out pretty printed proofs which have errors in them.

Alex J. Best (Apr 10 2021 at 16:09):

Jason Rute said:

Sometimes this (exponential?) exposition in term size is so bad that the pretty printer starts to replace terms with …, e.g. #check 10101010101010101010.

there are a few options for the pretty printer to control some of these cutoffs iirc, not sure what they are or what the defaults are even, but might be worth looking into if this becomes a big issue for anyone?

Mario Carneiro (Apr 10 2021 at 18:59):

That's pp.max_steps

Joe Palermo (S2'17) (Apr 14 2021 at 19:05):

I've been getting my feet wet with metaprogramming and a few more questions came up.

1 - Is there some way within the metaprogramming framework to convert a proof term represented as an expr into a fully elaborated proof term? I'm thinking of something like #print (with pp.all true) that I can call within a tactic, and that would return an expr or a string.
2 - Is there an equivalent of #check that I could call on a proof term represented simply as a string? Alternatively I'm sure there must be some way to attempt to parse a string into an expr so I can just call tactic.infer_type, so I suppose the question I'm really asking is how to use Lean's builtin parsing utilities.

Thanks!

Mario Carneiro (Apr 14 2021 at 19:12):

A proof term represented as an expr should already be fully elaborated, except possibly for the inclusion of metavariables, which may not be resolved until the proof is complete

Mario Carneiro (Apr 14 2021 at 19:13):

To parse expressions you have to be in the parser monad, which is available from user commands and tactic argument parsers, but not from tactics

Mario Carneiro (Apr 14 2021 at 19:14):

from the parser monad you can call tactics but not vice versa

Jason Rute (Apr 14 2021 at 23:05):

@Joe Hendrix When you say "fully elaborated", as @Mario Carneiro pointed out, you probably don't mean what you think you mean. Are you talking about converting an expression to a string in pp.all = true form?

Jason Rute (Apr 14 2021 at 23:14):

To do that, there is a few ways. The simplest is to put set_option pp.all true and trace the expression with tactic.trace. Another approach to setting pp.all as true would be to explicitly toggle the option inside the tactic. Also, if you want to have more control over the way you print or output your expression, you can convert it into a string via something like

do
fmt <- tactic.pp e,
let s := (to_string fmt)

where e is your expr object.

Jason Rute (Apr 14 2021 at 23:15):

Mario Carneiro said:

from the parser monad you can call tactics but not vice versa

That is now longer true. We added a feature where you can create a dummy parser from inside a tactic (using the same environment as the tactic) just for cases like this. Let me look up how to do it.

Joe Palermo (S2'17) (Apr 14 2021 at 23:35):

Jason Rute said:

When you say "fully elaborated", as Mario Carneiro pointed out, you probably don't mean what you think you mean. Are you talking about converting an expression to a string in pp.all = true form?

Yes, that's right. Is there a name for the representation you get when you print an expr like this?

Joe Palermo (S2'17) (Apr 14 2021 at 23:37):

Jason Rute said:

To do that, there is a few ways. The simplest is to put set_option pp.all true and trace the expression with tactic.trace. Another approach to setting pp.all as true would be to explicitly toggle the option inside the tactic. Also, if you want to have more control over the way you print or output your expression, you can convert it into a string via something like
do
fmt <- tactic.pp e,
let s := (to_string fmt)
where e is your expr object.

Hmm I see. I was aware of tactic.trace from your previous examples, but this latter method is what I need because I need to process the string. Thanks!

Joe Palermo (S2'17) (Apr 14 2021 at 23:40):

Jason Rute said:

Mario Carneiro said:

from the parser monad you can call tactics but not vice versa

That is now longer true. We added a feature where you can create a dummy parser from inside a tactic (using the same environment as the tactic) just for cases like this. Let me look up how to do it.

I'm very curious to see this

Jason Rute (Apr 15 2021 at 13:16):

@Mario Carneiro @Joe Palermo (S2'17) See here:

meta def lean_code_to_expr : lean.parser expr :=
do
  t <- interactive.types.texpr,
  e <- tactic.to_expr t, -- this tactic is lifted to a parser
  return e

meta def parse_and_trace_expr (s : string) : tactic unit :=
do
  -- parse string (will fail if parsing error)
  e <- lean.parser.run_with_input lean_code_to_expr s,

  -- get type
  tp <- tactic.infer_type e,

  -- trace
  tactic.trace "Expression:",
  tactic.trace e,

  -- tactic type
  tactic.trace "Type:",
  tactic.trace tp,

  -- set pp.all to true for duration of tactic
  os <- tactic.get_options,
  let os := os.set_bool `pp.all tt,
  tactic.set_options os,

  -- trace with pp.all = true
  tactic.trace "Fully Elaborated Expression:",
  tactic.trace e,

  -- trace type with pp.all = true
  tactic.trace "Fully Elaborated Type:",
  tactic.trace tp,

  return ()

run_cmd parse_and_trace_expr "1+1"
run_cmd parse_and_trace_expr "@rfl nat 1"
run_cmd parse_and_trace_expr "asdfasdf sdfasdf"  -- fails
  ```

Jason Rute (Apr 15 2021 at 13:17):

lean.parser.run_with_input lets you run a parser on a string inside a tactic (using the environment from the tactic state). Similarly, lean.parser.runis the same, but you don't provide it a string (so I think you are parsing an empty string).

Joe Palermo (S2'17) (Apr 15 2021 at 16:39):

(deleted)

Joe Palermo (S2'17) (Apr 15 2021 at 17:29):

Is there a way to declare universes before parsing a string? For example:

meta def parse_pp_string (s : string) : tactic expr :=
  lean.parser.run_with_input lean_code_to_expr s

meta def main : tactic unit := do
  tactic.trace (parse_pp_string "λ {α : Sort u} {a : α} {p : α → Sort v} {b : α} (h₁ : @heq.{u} α a α b), @eq.rec_on.{v u} α a (λ {a : α}, p a) b (@eq_of_heq.{u} α a b h₁)")

This fails only because universe u is not declared.

An alternative would be to simply write the text of the string into a file as:

universes u
#check "<string to parse and typecheck>"

Then run the script and extract the output that way, but this seems hacky.

Jason Rute (Apr 15 2021 at 21:45):

@Joe Palermo (S2'17) I should make clear that using lean.parser.run_with_input is probably only good to use if your plan to get the string s from some io operation from inside Lean. If you are using the Lean server, then you can write a user command (which is in the parser monad) instead of a tactic and enter your expression string (without quotes) as an argument to the command.

Jason Rute (Apr 15 2021 at 21:49):

As for the universe stuff, I think it might be possible, but I might suggest another approach. If you know the universe variables in your string, consider replacing them all with u_1, u_2, u_3, ... and then have universes u_1 u_2 u_3... near the top of your code.

Jason Rute (Apr 15 2021 at 22:05):

As for the user command stuff, here is that same code above as a user command:

@[user_command] meta def parse_and_trace_expr2
  (_ : interactive.parse (lean.parser.tk "parse_and_trace_expr2"))
: lean.parser unit :=
do
  t <- interactive.types.texpr,

  -- remaining code are all tactics which
  -- get lifted to parsers

  e <- tactic.to_expr t,

  -- get type
  tp <- tactic.infer_type e,

  -- trace
  tactic.trace "Expression:",
  tactic.trace e,

  -- tactic type
  tactic.trace "Type:",
  tactic.trace tp,

  -- set pp.all to true
  os <- tactic.get_options,
  let os := os.set_bool `pp.all tt,
  tactic.set_options os,

  -- trace
  tactic.trace "Fully Elaborated Expression:",
  tactic.trace e,

  -- tactic type
  tactic.trace "Fully Elaborated Type:",
  tactic.trace tp,

  return ()
. -- in cases like this a period helps lean seperate commands

parse_and_trace_expr2 1+1
parse_and_trace_expr2 @rfl nat 1
parse_and_trace_expr2 asdfasdf sdfasdf  -- errors

Joe Palermo (S2'17) (Apr 16 2021 at 17:25):

Jason Rute said:

Joe Palermo (S2'17) I should make clear that using lean.parser.run_with_input is probably only good to use if your plan to get the string s from some io operation from inside Lean.

Why is this the case? Just convenience of implementation?

Jason Rute said:

As for the universe stuff, I think it might be possible, but I might suggest another approach. If you know the universe variables in your string, consider replacing them all with u_1, u_2, u_3, ... and then have universes u_1 u_2 u_3... near the top of your code.

In practice, the way I'm currently doing things Lean doesn't use the declared universes when parsing. For example, the following yields an errors message: unknown universe 'u', so I think I'm misunderstanding what you mean.

import meta.expr
import data.list
import tactic
import system.io
open io
open list

universes u

-- omitting most of my code --
-- ...
---------------------------------------

meta def lean_code_to_expr : lean.parser expr := do
  t <- interactive.types.texpr,
  e <- tactic.to_expr t, -- this tactic is lifted to a parser
  return e

meta def parse_pp_string (s : string) : tactic expr :=
  lean.parser.run_with_input lean_code_to_expr s

meta def main : tactic unit := do
  tactic.trace (parse_pp_string "λ {α : Sort u} {a : α} {p : α → Sort v} {b : α} (h₁ : @heq.{u} α a α b), @eq.rec_on.{v u} α a (λ {a : α}, p a) b (@eq_of_heq.{u} α a b h₁)")

Also are there any guides on how to interact with the Lean server directly? Perhaps I'll take a closer look at the Lean TPE and Lean Python client implementations. Tbh I'm not sure how one would issue a user_command to a running Lean server.

Thank you for your help with my incessant stream of questions! My progress would be a lot slower if I had to do this on my own by trial and error, so I really, really appreciate it.

Last updated: May 02 2025 at 03:31 UTC