Zulip Chat Archive

Stream: general

Topic: lean4_jupyter: A Lean 4 Jupyter kernel via repl


Utensil Song (May 22 2024 at 02:53):

lean4_jupyter is a Lean 4 Jupyter kernel via repl. It just released v0.0.1, an initial release that covers all basic features of repl.

What's already working

:fire: See it in action: Tutorial notebook.

The kernel can:

  • execute Lean 4 commands (including definitions, theorems, etc.)
  • execute Lean 4 tatics with magic like %proof immediately after a sorryed theorem
  • backtrack to earlier environment or proof states with magic like %env 1 or %prove 3
  • support magics like %cd or %load (loading a file)
  • support for importing modules from projects and their dependencies, e.g. Mathlib ( demo ).

Output:

  • In jupyter notebook and alike: echos the input annotated in alectryon style, at the corresponding line (not columns yet), with messages, proof states
  • In jupyter console and alike: echos the input annotated in codespan style, at the corresponding line:column, with messages, proof states
  • Raw repl input/output in JSON format can be inspected by click-to-expand in the WebUI.

For what's next, check out TODOs in README.

Utensil Song (May 22 2024 at 02:55):

Rationale

I've always wanted to do literate programming with Lean 4 in Jupyter, but Lean LSP and Infoview in VS Code has provided an immersive experience with immediate feedback, so I could never imagine a better way to interact with Lean 4, until interacting with repl makes me believe that limitless backtrack is another feature that best accompanies the reproducible interactivity of alectryon style annotations.

The idea came to me in an afternoon, and I thought it's technically trivial to implement overnight thanks to repl. It took me a bit longer to work out the logistics of UX and polish the code, but it's fun to see the potential.

Mr Proof (Jun 10 2024 at 23:47):

Utensil Song said:

lean4_jupyter is a Lean 4 Jupyter kernel via repl. It just released v0.0.1, an initial release that covers all basic features of repl.

What's already working

:fire: See it in action: Tutorial notebook.

The kernel can:

  • execute Lean 4 commands (including definitions, theorems, etc.)
  • execute Lean 4 tatics with magic like %proof immediately after a sorryed theorem
  • backtrack to earlier environment or proof states with magic like %env 1 or %prove 3
  • support magics like %cd or %load (loading a file)
  • support for importing modules from projects and their dependencies, e.g. Mathlib ( demo ).

Output:

  • In jupyter notebook and alike: echos the input annotated in alectryon style, at the corresponding line (not columns yet), with messages, proof states
  • In jupyter console and alike: echos the input annotated in codespan style, at the corresponding line:column, with messages, proof states
  • Raw repl input/output in JSON format can be inspected by click-to-expand in the WebUI.

For what's next, check out TODOs in README.

Hi great work. I wonder if you can help me. I'm trying to call repl from c#.

I can make a process calling "cmd" with arguments "lake exe repl".

Then I can send data to the stdin such as {"cmd":"#eval 2+3"}\n

Unfortunately it is only sending back the replies once the repl has exited out either by closing the shell window or forcing an error such as sending \n\n. I am not sure if there is a way to force it to send a reply back to the stdout as soon as I've sent one command. It seems to be holding on to the replies until the end.

Are you able to shed any light on this or point to the place in your code where you send the commands and get the replies?

I am using this version of repl: https://github.com/leanprover-community/repl

Kim Morrison (Jun 10 2024 at 23:58):

Sounds like buffering problems in the way you are calling it. It certainly responds separately to each command.

Utensil Song (Jun 11 2024 at 00:20):

8B47A1CE-3A43-49F8-BC4B-F47643FA2ED1.jpg

https://github.com/leanprover-community/repl/pull/5#discussion_r1595010942

Utensil Song (Jun 11 2024 at 00:26):

@Mr Proof To interact with repl, I was under the working assumption that it's a \r\n\r\n ending protocol, otherwise I can't make it respond reliably. This is a recurring theme that you always need a way to know when to stop receiving and start processing. I hope this would be explicitly stablized as the protocol, because the PRed Python implementation relies on the ending env which is not guaranteed to be there, and Eric's proposal of using one line ending as the request/response ending will also be fragile.

Mr Proof (Jun 11 2024 at 00:35):

Tried and still didn't work :(
I'm doing:

       string text="{\"cmd\":\"def x:Nat:=3\"}\r\n\r\n";
        writer.Write(text);
        writer.Flush();

So I must be doing something wrong. Only way so far I can get it to return a reply is to force an error for example doing three \r\n in a row or if I close the cmd window. Then it returns the error plus all the other replies. Oh well. Back to the drawing board.

Utensil Song (Jun 11 2024 at 01:06):

1CD45548-F78F-43AF-BC92-AD527F278AC6.jpg

310C37B1-81FB-40C3-94CB-414D2BCDFF45.jpg

These simple code work in Python. I can't tell the difference between it and your code. I suspect that it's not that sending is not working, it's that receiving is not working, i.e. repl has responded but you didn't finish your receiving.

Mr Proof (Jun 11 2024 at 02:02):

Can you point me at the file which sends the replies? Maybe it's just missing a "Flush" command.
It could be my code but I'd just like to check.

Utensil Song (Jun 11 2024 at 03:05):

https://github.com/utensil/lean4_jupyter/blob/main/lean4_jupyter/repl.py

Mr Proof (Jun 11 2024 at 16:56):

UPDATE FIXED!!!! :grinning_face_with_smiling_eyes: (with a little help from ChatGPT)
I manage to fix it! For anyone else having this problem here is the solution:

In the Main.lean file in the REPL folder add:

def printFlush [ToString α] (s : α) : IO Unit := do
  let out  IO.getStdout
  out.putStr (toString s)
  out.flush -- Flush the output

Then you can change the line which prints a newline after the output as:

  printFlush "" -- easier to parse the output if there are blank lines

Now it will flush the output appropriately. Now I just got to work out how to send multiline commands.

Next Tasks
The next thing on my list which I'm not sure how to do is to find a way to list all the available tactics. Ideally I would like to filter them by the ones that fit the current goals. Guess that's something I'll have to write in Lean

Kim Morrison (Jun 12 2024 at 01:44):

@Mr Proof, if you make a PR I'll merge it.

Malvin Gattinger (Jul 13 2024 at 09:48):

It seems repl replies immediately without buffering if its output is a terminal:

$ ~/gits/lean4-repl/.lake/build/bin/repl
{ "cmd" : "import Mathlib.Logic.Lemmas" } # entered by me

{"env": 0}  # appears immediately

{ "cmd" : "import Mathlib.Logic.Lemmas" } # entered by me

{"env": 1}
^C

But when the output is redirected to a file, there is buffering:

$ echo > output.txt
$ tail -f output.txt & (~/gits/lean4-repl/.lake/build/bin/repl > output.txt)
[1] 12818
{ "cmd" : "import Mathlib.Logic.Lemmas" } # entered by me

{ "cmd" : "import Mathlib.Logic.Lemmas" } # entered by me

{ "cmd" : "import Mathlib.Logic.Lemmas" } # entered by me

# I pressed Ctrl+D here, then the remaining lines appear.
{"env": 0}

{"env": 1}

{"env": 2}

Now I'll check if the fix from @Mr Proof helps :fingers_crossed:

Malvin Gattinger (Jul 13 2024 at 09:51):

Indeed it does :tada:

$ tail -f output.txt  & (~/gits/lean4-repl/.lake/build/bin/repl > output.txt)
[1] 13452
{ "cmd" : "import Mathlib.Logic.Lemmas" }

{"env": 0}
{ "cmd" : "import Mathlib.Logic.Lemmas" }

{"env": 1}
{ "cmd" : "import Mathlib.Logic.Lemmas" }

{"env": 2}
^C
$ fg
tail -f output.txt
^C

Malvin Gattinger (Jul 13 2024 at 10:02):

Here is a PR with the code from @Mr Proof above. I am still wondering how to write an automated test for this - the current tests only are for total input/output pairs, right? So they cannot check whether some output appears already before certain other input is given :thinking:

Malvin Gattinger (Jul 13 2024 at 13:37):

With that fix on top of 4.9.0 I was able to build this monster of bash, latex and lean: https://github.com/m4lvin/RepLeanTeX :smirk_cat:

Kim Morrison (Jul 16 2024 at 15:44):

Malvin Gattinger said:

Here is a PR with the code from Mr Proof above. I am still wondering how to write an automated test for this - the current tests only are for total input/output pairs, right? So they cannot check whether some output appears already before certain other input is given :thinking:

The PR is failing the test suite. Can you take a look, @Malvin Gattinger?

Malvin Gattinger (Jul 21 2024 at 20:58):

The problem was printFlush "" vs printFlush "\n" :man_facepalming: and it's fixed now. Now more tests pass when I run them locally but the test case called pickle_open fails with failed to open file 'test/e.olean' and indeed I do not have any such file. Does that test need some special preparation? See CI run waiting here

Malvin Gattinger (Jul 26 2024 at 09:05):

@Kim Morrison can you let the CI run for the PR to repl?

Malvin Gattinger (Jul 26 2024 at 09:06):

Related (but probably better another topic?), I have just found out that locally on my syste the ./test.sh fails because the order in whichfor infile in $IN_DIR/*.in; do goes through the test files is locale dependent, but the pickle_open_2 test only works when pickle_open has been before it to create the e.olean and similarly for the pickle_open_scoped_2 test and the H20231215_2.in Mathlib test.

Here is another PR to fix the order in which tests are run.

Kim Morrison (Jul 26 2024 at 13:19):

I've just started CI for both.

Kim Morrison (Jul 26 2024 at 13:22):

and :merge:'d both!

Utensil Song (Nov 19 2024 at 09:55):

One long missing feature of lean4_jupyter is syntax highlighting for code cells, now it's implemented. It looks like this:

image.png

I have also polished the installation process (for installing from git main branch), which should be streamlined by the one-liner installation (also used in CI), with some minor side effects (see "note:"). If anyone is interested to try it out and encounters issues, please report here or file an issue.

Utensil Song (Nov 19 2024 at 09:56):

Side note: It wasn't trivial for me as it involves creating a frontend JupyterLab extension to add a Lean 4 mode to CodeMirror, and initial survey showed there are quite some breaking changes from JupyterLab and CodeMirror since similar extensions. Thanks to @Eric Wieser 's previous work on Pygments lexer for Lean 4, aider, and help from JupyterLab forum, I managed to translate the Pygments lexer into a CodeMirror Lean 4 mode, only spending ~$5 on claude-3-5-sonnet-20241022(48 out of 61 commits in the PR are by aider).

Eric Wieser (Nov 19 2024 at 10:02):

If you have a CodeMirror lexer, I think there is a way to contribute it to github so that the "edit" button on github uses it

Utensil Song (Nov 19 2024 at 10:02):

For now the installation etc. are still on Lean 4.11, I haven't bumped it to 4.13 since repl doesn't maintain tags for stable Lean releases, and repl has recently bumped for v4.14.0-rc1 with quite some fixes, so later I'll bump to stable 4.14. Support matrix will maintain information about Lean and Python versions etc., I'm trying to roll both Lean and Python support to a relative recent version from time to time, as both are evolving rapidly.

Utensil Song (Nov 19 2024 at 10:22):

Eric Wieser said:

If you have a CodeMirror lexer, I think there is a way to contribute it to github so that the "edit" button on github uses it

That's an interesting thought. I haven't even tried to edit Lean 4 code using the edit button, TIL it's not syntax highlighted :joy:

It's not a fun to find out every piece of the ecosystem that we are used to are using drastically different mechanisms to highlight. Along the way I've seen you contributed to Pygments for minted in LaTeX, Linguist for Github to recognize it, and there are the TextMate-like one for the VSCode extension, Tree-sitter based one for Neovim, highlightjs-based one for e.g. Alectryon.

The CodeMirror mode code can be separated from the code for the JupyterLab extension, so it's feasible with no duplication. Since the current translation is written in the legacy mode, it could be contributed to https://github.com/codemirror/legacy-modes . Or if I could figure out how to further port it to CodeMirror 6, then it would be @codemirror/lang-lean.

Eric Wieser (Nov 19 2024 at 10:46):

These lines suggest that github is using codemirror 5, but you might want to do more searching to confirm

Utensil Song (Nov 19 2024 at 11:21):

It seems that linguist has no issues/PRs discussing adopting CodeMirror 6, so it will still require the mode exists at https://github.com/codemirror/codemirror5/tree/master/mode , but CodeMirror 5 no longer accepts new language PRs. Recent PRs to linguist use ace_mode: text like you did. It seems that we might need to wait for the maintenance cycle to shift after all.

Patrick Massot (Nov 19 2024 at 12:01):

Utensil Song said:

Tree-sitter based one for Neovim

I don’t think this is accurate.

Utensil Song (Nov 19 2024 at 12:03):

Patrick Massot said:

Utensil Song said:

Tree-sitter based one for Neovim

I don’t think this is accurate.

It seems to use tree-sitter as first pass, then use information from LSP to be semantic.

Patrick Massot (Nov 19 2024 at 12:09):

Very interesting, I had no idea! The tree-sitter approach looks so primitive compared to what Lean allows, but I guess it makes sense as a first pass.

Utensil Song (Nov 19 2024 at 12:14):

Yes, for the same reason vscode-lean maintains a Textmate-like lexer, but it's missing some new features because the user usually sees the LSP highlight very soon.

Also, the link above is for tree-sitter queries, and the full parser is at https://github.com/Julian/tree-sitter-lean which is a bit more sophisticated.


Last updated: May 02 2025 at 03:31 UTC