Zulip Chat Archive

Stream: general

Topic: lean-client-python


view this post on Zulip Patrick Massot (Apr 21 2020 at 20:35):

There have been several people interested in talking to Lean from python. I needed it for format_lean, @Daniel Donnelly needed it for his diagram chasing project, @Jason Rute wants it for AI, and @Frédéric Le Roux wants it for teaching. So I decided to give it a slightly more serious shot. The resulting embryo is at https://github.com/PatrickMassot/lean-client-python. The core part comes straight from the TypeScript client specification of requests and responses. But this is only the trivial part translating back and forth between JSON and python objects. The tricky part is to handle asynchronous communication with Lean. Here the TypeScript client seems to be able to assume a somewhat uniform solution based on promises. I experimented with two opposite setups: Qt's signal/slot call-back hell and trio's async/await hell.

There are demos of both approaches in the repository. You can see the Qt thing in animated gif action in the README. The trio demo does something similar to what format_lean needs, except that it's robust enough to avoid launching the Lean server with -j0 flag (which asks Lean to use only one thread, hence drastically reducing asynchronicity).

I'll probably port format_lean to use this at some point (but there is no hurry for me). I'll push this only if there is interest from the community. I hope this is already a good starting point for interested people (and it's what I promised Frédéric).

view this post on Zulip Kevin Buzzard (Apr 21 2020 at 20:48):

Are there any good factoring algorithms in python?

view this post on Zulip Patrick Massot (Apr 21 2020 at 21:10):

Factoring what?

view this post on Zulip Kevin Buzzard (Apr 21 2020 at 21:11):

positive integers.

view this post on Zulip Kevin Buzzard (Apr 21 2020 at 21:11):

Get Lean to factor positive integers by asking Python what the answer is and then multiplying them together to check

view this post on Zulip Kevin Buzzard (Apr 21 2020 at 21:12):

I get the impression that Rob let us do this in mathematica once, but then mathematica broke his interface :-(

view this post on Zulip Patrick Massot (Apr 21 2020 at 21:12):

This is a job for Rob's bridge.

view this post on Zulip Jason Rute (Apr 21 2020 at 22:29):

@Patrick Massot I'll definitely have to look into this more. Thanks for creating this! As you mentioned I've been experimenting with similar things. Here is a Python notebook which uses a simple interface to talk to Lean via Python. See the LeanServer class. This project, which scrapes all the Lean files for data from the Lean server, uses a slightly modified version of the same LeanServer interface. It would be great to have prebuilt, documented, and tested Python-Lean-Server interface, so I don't have to roll my own. Let me mention some observations from my work in case they help. (This is before I've looked too closely at your project)

view this post on Zulip Jason Rute (Apr 21 2020 at 22:30):

As for performance (with the Lean server in my Python code):

  • My simple Lean-Python interface starts up a Lean server, lean --server, and sends it requests. I block until I get a response for that same request and I save all the other responses in a queue which I process later. If I'm still waiting on information to come from the server, I will just send a dummy request.
  • I was hoping for faster speeds for the "sync" requests, but there seems to be a 0.1 second delay (probably a sleep statement somewhere). That put a bit of a damper on "back-and-forth" applications like sending a tactic to Lean, waiting on the response, and then sending a new tactic to Lean based on that. Each loop takes 0.1 seconds or more even for trivial syncs.
  • I never got parallelism working correctly. If I wanted to sync two (or more) files at the same time, I had wait twice as long (about 0.2 seconds). Also, if I started syncing a lot of separate filenames, then Lean's memory would blow up. I either had to start the Lean server for each request (which has a fixed overhead) or I think I could "zero-out" a file by syncing it with an empty string.

view this post on Zulip Jason Rute (Apr 21 2020 at 22:32):

As for accuracy and robustness (with the Lean server in my Python code):

  • The information coming out of the server is not always accurate/robust. It isn't ever "wrong", but it can be missing information. It is difficult to be sure that "sync" hasn't finished processing (or worse is showing information from a previous "sync" request for the same filename). As for "info" requests, they usually return the right information, but sometimes for large complicated files, they come back empty when they shouldn't or worse they are missing some but not all of the fields. Also, sometimes there are multiple levels of information when tactic blocks are stacked and it isn't consistent on which level it shows. I can give some examples if it helps.
  • The output of the Lean server wasn't always as helpful as I'd like. The pretty-printed state (even if I used lean --server -D pp.all=true) would often not be valid Lean code. I provide a detailed list of problems in the "Current Issues" section of the notebook.

view this post on Zulip Jason Rute (Apr 21 2020 at 22:32):

Also, I should mention that this isn't the only way to for Lean and Python to communicate. Lean has an io monad which lets it communicate with external processes on the command line, even Python scripts, and to read/write from stdin/stdout. I believe, this io monad could even be used (in an unsafe manner) inside a tactic so you could theoretically have a tactic which calls a Python Script. I'm planning on experimenting with this approach as well.

view this post on Zulip Jason Rute (Apr 21 2020 at 22:44):

@Kevin Buzzard I think in the case of factoring, you could have a Python (or whatever) script which generates the factoring witnesses (I don't know what witnesses show that a number is prime) and then writes a Lean proof as a text string which can be pasted into Lean. If I understands Patrick's lean-client, the one additionally thing it would give you is a way to automatically check that your proof is valid in Lean, but you could do that automatically by just writing it to a file and telling Lean to check that file. (Although, I guess having an open server connection would be faster than starting up Lean from scratch.)

view this post on Zulip Jason Rute (Apr 22 2020 at 02:59):

@Patrick Massot two questions:

In your format_lean project, have you encountered times when the response from the lean server was missing data causing incorrect behavior? For example, it doesn't return a state even though it should. If so, how do you correct for it?

In format_lean, where does the async help:

  1. Does it speed up the lean server execution time to be able to do tasks in parallel? (As I said above, that didn't seem to be the case for me. If i sent two sync commands and waited for them both to return messages. It would take the same time as if I sent one, waited, and then sent the other. But I may have only been using one thread and that was the problem.)
  2. Does not blocking save significant wall clock time? I imagine that most of the processing time of format_lean would be in getting data from Lean via the server, so that most of the time would be waiting for Lean, even with async, but maybe I am mistaken.
  3. Is it just that it makes interacting with the server more natural? The Lean server has responses directly tied to a request ("ok" responses) and other responses which will appear at unpredictable times (e.g. "all_messages" responses). As I said above, when I called a request, I would wait for the "ok" response associated with that request and then also record all the other responses into a queue. If I wanted, say an "all_message" response not directly tied to a request, then I would repeatedly send dummy requests (e.g. "info" requests) waiting for the ok response each time until I got the "all_messages" response. This is hacky, and possibly the largest advantage of an async framework is to avoid hacks like this.
  4. Does it prevent Lean from returning missing data. I don't understand how servers like this work, but I noticed that I would get more complete information if I waited 0.1 seconds than say read from stdout immediately. Somehow the lean server new I was trying to read from it. Maybe by using async, I avoid prematurely asking Lean for data where it sends me incomplete data.

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:20):

As I wrote, I haven't ported format_lean to use my brand new lean-client-python. I can still answer a couple of questions. I've played only with sync and info requests, using only the tactic state information in info requests. Info requests do trigger empty responses when Lean is not ready because it's still compiling the file. Those empty responses are CommandResponse containing only the InfoRequest's seq_num and "ok". In particular I couldn't detect a difference between this situation and situations where Lean simply has nothing to declare, for instance because you asked for information about an empty line outside a tactic proof. If you really want information about a file position, you need to wait until Lean is ready. There is no need to send dummy requests for this, you only need to listen carefully and take action when the time is ripe.

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:26):

Also note that SyncRequest trigger a response long before Lean is done. This response only acknowledge that sync has started.

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:26):

You can skim through commands.py, but the two relevant files for you are trio_server.py and trio_example.py. Those are much easier to read than the Qt files, because they are based on async/await. Forget about the async def vs def, it only says a function will be able to use await. The important part is await. There is no call-back here. Each function executes instruction in the order they are written. But whenever an expression is prefixed with await, python will wait for the answer to become available. While waiting other tasks are allowed to run. There is no multi-thread here, and context switching doesn't happen randomly, only while meeting await or in between iterations of a async for loop.

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:29):

The next piece of technology is trio.Event. You can create an event, and then await event.wait() waits for the event to trigger. When it does trigger, you don't fire a call-back, but simply move to the next line.

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:31):

You can already understand the logic in those lines which is completely linear. We always wait until information becomes available. During this waiting, we continue to listen to Lean output, without sending dummy requests.

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:37):

The listening loop is at here. It should be very easy to read once you know that self.is_fully_ready is an Event that you trigger using self.is_fully_ready.set(). After doing that each time a function which is waiting for this signal gets a chance of advancing (because some other task is waiting) it will run past its await self.is_fully_ready.wait(). Similarly every request creates an event in the self.request_events dictionary, and this event fires when the listening loop gets the answer (which is also stored in another dictionary indexed by seq_num).

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:38):

I think this is all you need to understand those two files. But don't hesitate to ask question and PR documentation or other improvements (if there are contribution then this repo will probably quickly move to the lean community organization where things will be easier).

view this post on Zulip Patrick Massot (Apr 22 2020 at 10:47):

And of course I'd be very happy to read @Gabriel Ebner and @Bryan Gin-ge Chen comments about all this (they understand the TypeScript client).

view this post on Zulip Gabriel Ebner (Apr 22 2020 at 12:02):

I think Patrick explained this well. One more gotcha you should be aware of is that you cannot execute two info commands in parallel: the second one will cancel the first.

view this post on Zulip Jason Rute (Apr 22 2020 at 12:18):

@Patrick Massot As for your trio_example.py code, when I run it it returns nothing. After turning on debugging, I get this:

Sending SyncRequest(file_name='test.lean', content=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CurrentTasksResponse(is_running=False, tasks=[], cur_task=None)
Received CommandResponse(seq_num=1)
Sending InfoRequest(file_name='test.lean', line=1, column=0)
Received CommandResponse(seq_num=2)
Sending InfoRequest(file_name='test.lean', line=1, column=39)
Received CommandResponse(seq_num=3)
Sending InfoRequest(file_name='test.lean', line=2, column=0)
Received CommandResponse(seq_num=4)
Sending InfoRequest(file_name='test.lean', line=2, column=5)
Received CommandResponse(seq_num=5)
Sending InfoRequest(file_name='test.lean', line=3, column=0)
Received CommandResponse(seq_num=6)
Sending InfoRequest(file_name='test.lean', line=3, column=21)
Received CommandResponse(seq_num=7)
Sending InfoRequest(file_name='test.lean', line=4, column=0)
Received CommandResponse(seq_num=8)
Sending InfoRequest(file_name='test.lean', line=4, column=22)
Received CommandResponse(seq_num=9)
Sending InfoRequest(file_name='test.lean', line=5, column=0)
Received CommandResponse(seq_num=10)
Sending InfoRequest(file_name='test.lean', line=5, column=3)
Received CommandResponse(seq_num=11)
Sending InfoRequest(file_name='test.lean', line=6, column=0)
Received CommandResponse(seq_num=12)
Sending InfoRequest(file_name='test.lean', line=6, column=0)
Received CommandResponse(seq_num=13)
Sending InfoRequest(file_name='test.lean', line=7, column=0)
Received CommandResponse(seq_num=14)
Sending InfoRequest(file_name='test.lean', line=7, column=36)
Received CommandResponse(seq_num=15)
Sending InfoRequest(file_name='test.lean', line=8, column=0)
Received CommandResponse(seq_num=16)
Sending InfoRequest(file_name='test.lean', line=8, column=18)
Received CommandResponse(seq_num=17)
Sending InfoRequest(file_name='test.lean', line=9, column=0)
Received CommandResponse(seq_num=18)
Sending InfoRequest(file_name='test.lean', line=9, column=0)
Received CommandResponse(seq_num=19)

view this post on Zulip Jason Rute (Apr 22 2020 at 12:19):

If I sleep for 1 second after the sync, then it seems to work, although I still want to test it on a larger file.

view this post on Zulip Jason Rute (Apr 22 2020 at 12:20):

Also does

before = await server.state('test.lean', i+1, 0)
after = await server.state('test.lean', i+1, len(line))

run the risk of calling the info command in parallel (as @Gabriel Ebner mentioned) thereby cancelling out the "before" info request if it takes too long?

view this post on Zulip Jason Rute (Apr 22 2020 at 13:01):

I tried it with ring_theory/polynomial.lean in mathlib, which is a file which has given me a lot of trouble with stuff like this. With a 1 second delay it returns nothing. With a 10 second delay it misses most of the beginning of the file (the first 250 lines or so) and some lines later on as well which have a state. With a 60 second delay, it starts to go through all the lines correctly, but then stopped processing at line 150 or so and crashed at the command nursery.cancel_scope.cancel() with error:

Traceback (most recent call last):
  File "trio_example.py", line 30, in <module>
    trio.run(main, path)
  File "/Users/jasonrute/Development/lean/test_massot_lean_client/venv/lib/python3.8/site-packages/trio/_core/_run.py", line 1804, in run
    raise runner.main_task_outcome.error
  File "trio_example.py", line 26, in main
    nursery.cancel_scope.cancel()
  File "/Users/jasonrute/Development/lean/test_massot_lean_client/venv/lib/python3.8/site-packages/trio/_core/_run.py", line 730, in __aexit__
    raise combined_error_from_nursery
  File "/Users/jasonrute/Development/lean/test_massot_lean_client/venv/lib/python3.8/site-packages/lean_client/trio_server.py", line 64, in receiver
    resp = parse_response(line)
  File "/Users/jasonrute/Development/lean/test_massot_lean_client/venv/lib/python3.8/site-packages/lean_client/commands.py", line 379, in parse_response
    return InfoResponse.from_dict(dic)
  File "/Users/jasonrute/Development/lean/test_massot_lean_client/venv/lib/python3.8/site-packages/lean_client/commands.py", line 199, in from_dict
    dic['record'] = InfoRecord.from_dict(dic.pop('record'))
  File "/Users/jasonrute/Development/lean/test_massot_lean_client/venv/lib/python3.8/site-packages/lean_client/commands.py", line 189, in from_dict
    dic['source'] = InfoSource(**dic.pop('source'))
TypeError: __init__() missing 1 required positional argument: 'file'

view this post on Zulip Jason Rute (Apr 22 2020 at 13:05):

I'd love to have a way for the lean server to tell me when it is ready so I don't have to deal with arbitrarily long sleep statements and such. I've tried a number of tricks including:

  • putting an eval statement at the end of the file to check when it has gotten that far
  • waiting for the server to say that it is done processing
  • probably others
    However, none of them seem perfect, especially with a file like ring_theory/polynomial.lean.

view this post on Zulip Gabriel Ebner (Apr 22 2020 at 13:08):

Have you tried server.full_sync instead of sleeping? The logic Patrick implemented seems correct to me.

view this post on Zulip Jason Rute (Apr 22 2020 at 13:10):

Here is the exact code I was running: https://gist.github.com/jasonrute/cfa1500e49f2c34724024056b53d8cba It is only a slightly modified version of Patrick's code: https://github.com/PatrickMassot/lean-client-python/blob/master/examples/trio_example.py

view this post on Zulip Jason Rute (Apr 22 2020 at 13:11):

It uses server.full_sync but maybe not correctly.

view this post on Zulip Jason Rute (Apr 22 2020 at 14:28):

If I remove the sleep line, no states are found.

view this post on Zulip Patrick Massot (Apr 22 2020 at 21:25):

I'm sorry I had no time at all this afternoon and this evening. I just tried and I can reproduce the problem with polynomial.lean. We haven't understood all the mysteries of the Lean server protocol yet. I'll investigate.

view this post on Zulip Patrick Massot (Apr 22 2020 at 21:38):

Hopefully fixed in https://github.com/PatrickMassot/lean-client-python/commit/746eb30b98f340296a723da115bd1e5f49d421fd (see commit message for the explanation).

view this post on Zulip Jason Rute (Apr 23 2020 at 03:00):

@Patrick Massot I spent some time today learning about async and await in Python, so I'm much more familiar with how your code works. (For example, my concern about before and after calling the "info" command in parallel is clearly wrong. The after value won't start be computed until before is done being computed.) (For another example, I also found that same error in your code--but you fixed it before I had a chance to tell you.) I tried your new code and it works really well. It seems to be much more robust than my code which is great! I haven't found any cases so far where the server doesn't return results when it should, but I'd like to test it out more. Thanks for making this!

view this post on Zulip Patrick Massot (Apr 23 2020 at 09:09):

Great! This async thing clearly requires some adaptation. I didn't know any of this one week ago, and I spent quite a few hours reading documentation and blog posts about all this. But our problem is really typical of what this async/await stuff is meant to solve. All the examples are always talking about talking to a server over internet, but conceptually there are no difference (although the practical detail differences took me a while to figure out).

view this post on Zulip Jason Rute (Apr 23 2020 at 22:11):

@Patrick Massot I found another error (more subtle this time). Should I make a PR (and if so how?), or should I just tell you the fix? It is short.

view this post on Zulip Florian Dupeyron (Apr 23 2020 at 22:27):

Hi everyone,
@Patrick Massot why using a third-party instead of just using the asyncio library ? I think it should be possible to have a lighter code using the loop.subprocess_shell function for this little application, and implementing a asyncio.SubprocessProtocol to handle incoming data. I am also learning the async paradigm so I maybe missed some point. I will give it a try on my own and maybe send a PR if I get something interesting ?

view this post on Zulip Jason Rute (Apr 24 2020 at 02:59):

Here is a tricky case, try running the trio example on category/core.lean. On line 24, one gets a back an info response with no line or position. (It is some weird effect of the obviously tactic.) Anyway, it causes the current code to crash.

view this post on Zulip Patrick Massot (Apr 24 2020 at 12:48):

@Jason Rute of course you can open PRs!

view this post on Zulip Patrick Massot (Apr 24 2020 at 12:49):

Even more: as I wrote in the very first message of this thread, I'm totally open to moving this repository into the lean-community organization and give write access to other people.

view this post on Zulip Patrick Massot (Apr 24 2020 at 12:50):

I'm currently swamped in PRs here where I'm the only active maintainer, but if you open a small PR in lean-client-python and ping me here then I should be able to handle it.

view this post on Zulip Patrick Massot (Apr 24 2020 at 18:02):

Florian Dupeyron said:

Hi everyone,
@Patrick Massot why using a third-party instead of just using the asyncio library ? I think it should be possible to have a lighter code using the loop.subprocess_shell function for this little application, and implementing a asyncio.SubprocessProtocol to handle incoming data. I am also learning the async paradigm so I maybe missed some point. I will give it a try on my own and maybe send a PR if I get something interesting ?

The honest answer is: as I wrote, I didn't know anything about async one week ago. The documentation of Trio is the first thing I found that was understandable. That being said, it also seems that trio is the standard asyncio library done right. The issue is that asyncio was introduced in a hurry, and now the python core developers need to maintain some backward compatibility with earlier versions which prevent them from really fixing it. Now of course I'm open to PRs. There is nothing relying on lean-client-python yet, so you won't break anything.

view this post on Zulip Patrick Massot (Apr 24 2020 at 18:04):

This is all very experimental. I think there is enough interest to move this into the leanprover-community GitHub organization. So I'll do that soon, unless someone complains.

view this post on Zulip Patrick Massot (Apr 25 2020 at 11:32):

I transferred the project and issued an invitation to Jason with full admin rights. It doesn't mean we shouldn't continue discussing design decisions, but I don't want people to think this is my thing (hopefully I got double negations right). I'm very happy I initiated it, but I don't have any pressing need for it, and I want it to be a community project.

view this post on Zulip Florian Dupeyron (Apr 25 2020 at 12:37):

Patrick Massot said:

Florian Dupeyron said:

Hi everyone,
@Patrick Massot why using a third-party instead of just using the asyncio library ? I think it should be possible to have a lighter code using the loop.subprocess_shell function for this little application, and implementing a asyncio.SubprocessProtocol to handle incoming data. I am also learning the async paradigm so I maybe missed some point. I will give it a try on my own and maybe send a PR if I get something interesting ?

The honest answer is: as I wrote, I didn't know anything about async one week ago. The documentation of Trio is the first thing I found that was understandable. That being said, it also seems that trio is the standard asyncio library done right. The issue is that asyncio was introduced in a hurry, and now the python core developers need to maintain some backward compatibility with earlier versions which prevent them from really fixing it. Now of course I'm open to PRs. There is nothing relying on lean-client-python yet, so you won't break anything.

Hi,
Thank you for your answer. I will have a deeper look at trio when I will have understood the basics of python coroutines. If it can be useful, my proposition originated after looking at the source code from pyserial-asyncio.

Have a nice day ! :D


Last updated: May 17 2021 at 21:12 UTC