Zulip Chat Archive

Stream: general

Topic: Ideas for two-way communication between processes


Daniel Fabian (Jul 27 2020 at 00:47):

I'm playing around with z3. Trying to see how far I can push it. Initially, I am just interested in using lean to drive z3 (similar to, say an F# script), later I'll look into doing more fancy stuff.

Because lean doesn't have an FFI and also because it makes is very easy replace z3 with [smtlib-compatible smt solver], I'm not using the z3 api, but rather am communicating with it through pipes in a child process.

Lean seems to be perfectly capable dealing with parsing, AST, etc. but I'm really struggling with IO for all the wrong reasons.

The SMTLib protocol is a two-way communication protocol, not something you send a script to and get a response. You send a few commands, at some point, you may send (check-sat), then z3 is meant to send you sat, unsat or unknown. Depending on what you got, you then have other commands you can execute and deal with. And you can issue some other commands to get back to a different state where you can use other commands again, etc. If used from a REPL, It's really fairly intuitive.

Unfortunately, the interaction really doesn't look at all like any of the IO samples I've seen like reading a file, because it's really interactive. The most natural way for me to represent such a protocol would be a state machine where transitions happen when a message is sent or received.

So far, I haven't found a good way how to implement this. I can't read after each message sent, because I don't know how much data to read and the read is blocking. In a usual language, I'd either do event-based async io, or have a thread used for writing and one for reading. Also the protocol is not line-based, so I can't use that trick either.

Any ideas how to go about it, or have any of you done something similar?

Simon Hudon (Jul 27 2020 at 01:16):

You might need to adapt the I/O API

Daniel Fabian (Jul 27 2020 at 01:18):

as in add API features for async IO?

Simon Hudon (Jul 27 2020 at 01:27):

Either that or read one line of output and nothing more

Jason Rute (Jul 27 2020 at 01:27):

I'm not sure I'm the best person to put an answer here. I don't know anything about z3 or [smtlib-compatible smt solver]. (You might also want to look at what @Gabriel Ebner did with hooking up an SMT solver to Lean.)

However, I can say the following. I've been playing around with using Lean to communicate (in both directions) with python via IO for machine learning.. While your needs are probably different from mine, if you can get a two-way communication going with a Python script, then you can have the script handle the communication with Z3. I think this is similar to what @Rob Lewis and @Minchao Wu did with communicating between Lean and Python (https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Calling.20external.20process.20in.20a.20tactic/near/198857265).

As for communicating between Lean and Python, I've found two ways:

  1. Call Lean as a process from within a python script (and have Lean communicate with stdin, stdout)
  2. Have Lean call the Python process as a child processes. (this is probably better for your case).

Also, in my method, you need a communication protocol. Lean doesn't have JSON built in, but I implemented it and implemented some serializers for some of my classes. (I also made a hacky version of JSON, since Lean's string parsing is too slow.

Bryan Gin-ge Chen (Jul 27 2020 at 01:30):

#1083 (a long-abandoned PR that aimed to set up a connection bewteen Lean and Vampire) might be interesting too.

Daniel Fabian (Jul 27 2020 at 01:32):

I'll look at the samples, thanks. As for z3 specifically, I used the old smt2-interface as well as @Simon Hudon 's z3 repo. And they both don't really have a solution just hard-code the interaction. Which, whilst valid, is not very useful for my use-case.

Daniel Fabian (Jul 27 2020 at 01:35):

And as for the idea of using a separate wrapper for the IO, I thought about that too, personally, I wouldn't use python, but F# out of mere preference. But I get the point and it may well be a valid option.

Jason Rute (Jul 27 2020 at 01:35):

Also, as for the interactivity, the application I'm building is quite interactive. Lean keeps tract of the search tree (where every node is a tactic state, the tree allows backtracking), and python (possibly using say TensorFlow) guides Lean's exploration of the search tree. I however don't need async communication. Each call in either direction blocks. It's not really ready for showing yet, but if this interests you and you want a sneak peak, I'd be happy to share my private repo.

Daniel Fabian (Jul 27 2020 at 01:36):

To make it nice, I was thinking of sticking to the protocol, at least more or less, so I don't invent a new protocol.

Daniel Fabian (Jul 27 2020 at 01:37):

well, there's a fairly trivial F# program I could write... One that always responds. That way, every call could be made blocking, because we'd know that there will be a result, eventually.

Daniel Fabian (Jul 27 2020 at 01:38):

plus in such a wrapper, I can do something like base64 encode the payload, so it's always one line.

Mario Carneiro (Jul 27 2020 at 06:38):

Why don't we just add to system.io some kind of io_stream type with operations like open, read, write?

Mario Carneiro (Jul 27 2020 at 06:39):

it should be pretty easy to support that on the C++ side

Mario Carneiro (Jul 27 2020 at 06:43):

Actually, never mind this already seems to exist

Mario Carneiro (Jul 27 2020 at 07:04):

import system.io

#eval do
  child  io.proc.spawn {
    cmd := "tee",
    args := ["/dev/null"],
    stdin := io.process.stdio.piped,
    stdout := io.process.stdio.piped,
    stderr := io.process.stdio.piped},
  io.fs.put_str child.stdin "foo\n",
  io.fs.flush child.stdin,
  s  io.fs.read child.stdout 4,
  io.run_tactic (tactic.trace s.to_string), -- foo
  io.fs.put_str child.stdin "bar\n",
  io.fs.flush child.stdin,
  s  io.fs.read child.stdout 4,
  io.run_tactic (tactic.trace s.to_string), -- bar
  io.fs.close child.stdin,
  exitv  io.proc.wait child,
  guard (exitv = 0)

Mario Carneiro (Jul 27 2020 at 07:05):

It shouldn't be hard to put an SMTLIB wrapper around this interaction

Daniel Fabian (Jul 27 2020 at 08:58):

right, so this is what I have right now. The problem with your line s <- io.fs.read child.stdout 4, is that I don't have 4 and the call is blocking. I can't do a read to end as it's blocking, I can't read a buffer of, say 1024, because it blocks until it either gets that many or reaches EOF, and I can't read-line because some responses are multi-line and I'd have to parse the (partial) response to guess that there should be more data to come.

Apparently how the F* repo is (or at least was, maybe they changed that in the meantime) doing this is by weaving in an (echo "Done!")after every submit they do. Then they know, that the last thing they need to read is a Done! message.

Of all the solutions I've seen so far, this feels like the simplest and moderately clean. Because I can then use just the get_line function over and over, until I read my marker line, knowing that everything I read before would have been the response to the user's request.

And it's safe in the sense, that even in case there is a clash with the string (I'd use a much more cryptic marker to reduce the chances of a conflict), get_line will still not be blocked, but at worst will not return the whole output. So at least the process won't just hang.

Mario Carneiro (Jul 27 2020 at 09:02):

You can tell that SMTLIB output is done with one character lookahead

Mario Carneiro (Jul 27 2020 at 09:02):

so get_char should do the trick

Patrick Massot (Jul 27 2020 at 09:03):

I'm very skeptical that you'll be able to get a robust solutions by fiddling around. Async programming is a topic of its own, and clearly very difficult to get right.

Daniel Fabian (Jul 27 2020 at 09:03):

not really, after all writing to a pipe is not atomic.

Mario Carneiro (Jul 27 2020 at 09:03):

I don't follow

Mario Carneiro (Jul 27 2020 at 09:04):

I assume the SMT prover flushes after each response

Daniel Fabian (Jul 27 2020 at 09:04):

the marker is a pretty clear delimiter. But just a look-ahead can totally read half a message.

Daniel Fabian (Jul 27 2020 at 09:04):

I'd be really, really surprised... a model can be huge.

Daniel Fabian (Jul 27 2020 at 09:04):

so there's a good chance they will write multiple times before the message is over.

Mario Carneiro (Jul 27 2020 at 09:05):

If the response is a keyword like sat\n then the whitespace indicates the end, if it's a composite sexpr like (echo bla bla) then the end paren does the job

Daniel Fabian (Jul 27 2020 at 09:05):

also if the look-ahead is blocking, that's enough to hang the process and if it's non-blocking, you may read nothing because z3 is still computing.

Mario Carneiro (Jul 27 2020 at 09:05):

I looked at some SMTLIB parsers and they just use getchar

Daniel Fabian (Jul 27 2020 at 09:07):

as in call getchar for every single char?

Mario Carneiro (Jul 27 2020 at 09:07):

yeah

Mario Carneiro (Jul 27 2020 at 09:07):

ultimately, that's what the parser does

Daniel Fabian (Jul 27 2020 at 09:07):

that's an awful lot of calls, what's wrong with the marker?

Daniel Fabian (Jul 27 2020 at 09:08):

I was thinking making a function like this: list command -> response

Daniel Fabian (Jul 27 2020 at 09:08):

and after each batch, I'll just weave in the marker.

Mario Carneiro (Jul 27 2020 at 09:08):

Sure, you can do that regardless of how parsing works

Daniel Fabian (Jul 27 2020 at 09:08):

that way we still get good behaviour w.r.t. large batches, because only one message per batch.

Daniel Fabian (Jul 27 2020 at 09:08):

that's not excessive costs.

Daniel Fabian (Jul 27 2020 at 09:09):

and it makes for a very clear delimiter, imo.

Mario Carneiro (Jul 27 2020 at 09:10):

Sure, if the format lets you do that, then that will allow you to use a bigger buffer

Daniel Fabian (Jul 27 2020 at 09:11):

It does and honestly I was fairly re-assured that a big project was using that approach, it suggests that it holds up in practice. Even if feels a bit hacky, it lets us pretend we are doing RPC.

Daniel Fabian (Jul 27 2020 at 09:11):

which is nice from an API point of view.

Mario Carneiro (Jul 27 2020 at 09:12):

At least JSON-RPC uses a Content-Length: n at the start of the response so you don't have to play that game

Daniel Fabian (Jul 27 2020 at 09:13):

right, that's quite nice.

Mario Carneiro (Jul 27 2020 at 09:13):

But this seems like an optimization, I wouldn't bother until the rest of the arch is working first

Mario Carneiro (Jul 27 2020 at 09:14):

Something tells me that repeated getchar calls will be the least of your performance worries

Daniel Fabian (Jul 27 2020 at 09:14):

I can't really change how z3 returns its responses. And in the first phase, I want to just do a z3 binding.

Daniel Fabian (Jul 27 2020 at 09:14):

and ideally not make very strong assumptions on the order of commands, etc.

Mario Carneiro (Jul 27 2020 at 09:14):

and it's pretty easy to code a parser using lots of getchar

Daniel Fabian (Jul 27 2020 at 09:15):

but what do you do if there is nothing to read at all?

Mario Carneiro (Jul 27 2020 at 09:15):

block

Daniel Fabian (Jul 27 2020 at 09:15):

i.e. there just is no response

Daniel Fabian (Jul 27 2020 at 09:15):

well, SMTlib doesn't produce a response for every operation.

Mario Carneiro (Jul 27 2020 at 09:15):

you should know if you expect a response, and if z3 hangs then so should you

Mario Carneiro (Jul 27 2020 at 09:16):

the smtlib protocol determines what you are looking for

Daniel Fabian (Jul 27 2020 at 09:16):

if I do that, then the IO code and command code are very tightly coupled. Because the decision to read or not read depends on the concrete command.

Daniel Fabian (Jul 27 2020 at 09:17):

So I can't abstract that away in a natural way.

Mario Carneiro (Jul 27 2020 at 09:17):

You can have a read_command : io response

Daniel Fabian (Jul 27 2020 at 09:17):

Whereas adding the marker makes every reponse always end in a Done!.

Daniel Fabian (Jul 27 2020 at 09:18):

I was thinking about that one, too, actually

Mario Carneiro (Jul 27 2020 at 09:18):

and then run_smt_cmd : request -> io response will do return response.none on some requests

Mario Carneiro (Jul 27 2020 at 09:18):

and read_command on others

Daniel Fabian (Jul 27 2020 at 09:19):

So you'd code up the whole spec?

Mario Carneiro (Jul 27 2020 at 09:19):

the parts that matter to you, at least

Daniel Fabian (Jul 27 2020 at 09:21):

there's also a config option, though. This lets the user change the behaviour of z3 as to it would write success after every command.

Daniel Fabian (Jul 27 2020 at 09:21):

and that's totally a command the user can send.

Daniel Fabian (Jul 27 2020 at 09:21):

so now the reading of responses even depends on the exact order of commands.

Daniel Fabian (Jul 27 2020 at 09:22):

all of that is possible, but feels like a huge amount of complexity just to avoid a single marker from the sender side.

Daniel Fabian (Jul 27 2020 at 09:22):

do you have any concrete issues with the approach or are you merely offering other ways to achieve the same outcome?

Mario Carneiro (Jul 27 2020 at 09:23):

If you don't want that option, then ignore it

Mario Carneiro (Jul 27 2020 at 09:23):

if you have a grammar of commands to send, you can just skip that one

Daniel Fabian (Jul 27 2020 at 09:24):

I see, so this may become less of an issue once I have more code, then.

Mario Carneiro (Jul 27 2020 at 09:24):

If you actually want to talk in smtlib then one would expect a fully detailed (if incomplete) grammar at some point. Trading sexprs only gets you so far

Daniel Fabian (Jul 27 2020 at 09:24):

At this early stage I'm submitting s-expressions directly

Daniel Fabian (Jul 27 2020 at 09:25):

yes, I definitely agree with that.

Mario Carneiro (Jul 27 2020 at 09:25):

and if you are at the trading sexprs stage then you can just pretend that inconvenient options don't exist and trust the user not to send them

Daniel Fabian (Jul 27 2020 at 09:26):

In the end, though, I think I really prefer the option of adding the marker. Then I can completely hide that inside of the io loop that the user doesn't see and they see much the same thing as they would in a repl.

Daniel Fabian (Jul 27 2020 at 09:27):

and it's simple which is something I like for maintenance.

Daniel Fabian (Jul 27 2020 at 09:27):

and the user can't break things by changing options.

Mario Carneiro (Jul 27 2020 at 09:31):

You should have the parser hooked up to the IO loop anyway (even if the actual mechanism is hidden in some monad transformers), because otherwise you have to buffer more than necessary

Daniel Fabian (Jul 27 2020 at 09:33):

my current thinking was to use get_line repeatedly until the marker and use the parser on the output, possibly per line possibly at the end. And the output of the monad would be a response, not a string.

Daniel Fabian (Jul 27 2020 at 09:35):

it's mostly about avoiding to block accidently without restricting the user's API artificially. So I'm weighing options and trying to compare advantages and issues with any.

Mario Carneiro (Jul 27 2020 at 09:36):

I think that any parse approach can be encompassed in request -> io response, or \Pi r : request, smt_monad (response r) if you want to get fancy

Daniel Fabian (Jul 27 2020 at 09:36):

So far my main concern about the read one char at a time are possibly a huge number of api calls (somewhat minor, I guess) and the complexity to make sure we don't accidently read when there's nothing to read, i.e. the chances of a block.

My main criticism of the "send a marker" approach is that it feels a bit hacky

Mario Carneiro (Jul 27 2020 at 09:37):

I suspect that there are so many buffers between lean and the actual IO that many getchar calls won't actually be a problem

Daniel Fabian (Jul 27 2020 at 09:37):

good to know!

Daniel Fabian (Jul 27 2020 at 09:38):

do you have some further criticism for the marker idea that I need to consider, btw?

Jason Rute (Jul 27 2020 at 11:28):

Mario Carneiro said:

Something tells me that repeated getchar calls will be the least of your performance worries

In my experience the parsing is the slowest part if you do it in pure Lean. I used Lean's parsing library (not to be confused with Lean's parser which is not available inside tactics), and I've found it really slow. I think the slowdown is character-by-character evaluation.

Mario Carneiro (Jul 27 2020 at 11:32):

Even with batched OS calls, you will usually have a character parser to parse that text, and that will be slow. But I don't really see an alternative unless you somehow do the work outside lean

Daniel Fabian (Jul 27 2020 at 11:36):

I don't think it's that bad, actually. Because generally the input is much, much bigger.

Daniel Fabian (Jul 27 2020 at 11:36):

and that's not parsed, but just concatenated.

Daniel Fabian (Jul 27 2020 at 11:37):

Plus, I'm hoping lean4 would make it better in future.

Daniel Fabian (Jul 27 2020 at 11:38):

if parsing becomes the bottle neck, I can always do the parsing in F# and spit out some trivially-parsable thing for lean.

Jason Rute (Jul 27 2020 at 11:39):

@Mario Carneiro That is sort of my point. Parsing is the bottle neck (on the Lean side. Of course, if Lean blocks, Z3 is probably the bottleneck overall). Although, as I think about it, my parsing woes where related to parsing quoted strings. When I changed my protocol so that I didn't have quoted strings (using a bastardized version of JSON), then it was much faster.)

Daniel Fabian (Jul 27 2020 at 11:44):

I mean... technically my parser will deal with quoted strings.

Daniel Fabian (Jul 27 2020 at 11:44):

But i can always put that rule last

Daniel Fabian (Jul 27 2020 at 11:44):

and since in practice, we probably get numbers it may be ok.

Mario Carneiro (Jul 27 2020 at 11:46):

You could structure the parser as a finite (or infinite) state machine

Daniel Fabian (Jul 27 2020 at 11:47):

I read on the repo of SMTlib in F#, that the guy ran 40k test scripts for his parser from the smtlib homepage. Maybe I can do that, too.

Mario Carneiro (Jul 27 2020 at 11:47):

It might be okay to use parser combinators if you liberally sprinkle @[inline] but lean doesn't do nearly as much optimization as haskell to make functional programming style efficient

Daniel Fabian (Jul 27 2020 at 11:48):

that way, it would both be known to work and reasonably efficient.

Daniel Fabian (Jul 27 2020 at 11:50):

anyway, I don't think that it's wasted effort, because either way, we probably want the inductives for commands available in lean. And I can always use an existing F# lib for parsing.

Daniel Fabian (Jul 27 2020 at 11:50):

and then do the trivially-parsable format for lean.

Daniel Fabian (Jul 27 2020 at 11:50):

It would only mean replacing the parser whilst leaving everything else in place.

Mario Carneiro (Jul 27 2020 at 11:51):

Do you have a smtlib sexpr parser?

Daniel Fabian (Jul 27 2020 at 11:52):

I am working on one for lean and there's one for F#.

Daniel Fabian (Jul 27 2020 at 11:53):

the f# one is just an existing lib.

Daniel Fabian (Jul 27 2020 at 11:53):

so at worst, the lean sexpr parser is wasted.

Mario Carneiro (Jul 27 2020 at 11:54):

a sexpr parser is obviously independently useful

Mario Carneiro (Jul 27 2020 at 11:55):

smtlib sexprs are slightly more complicated because they have things like |long atoms| and "weird ""string"" quoting"

Daniel Fabian (Jul 27 2020 at 11:57):

oh I see, then we could make the sexpr parser separately packaged and the SMTlib protocol on top of that.

Daniel Fabian (Jul 27 2020 at 11:57):

as two different things.

Daniel Fabian (Jul 27 2020 at 12:00):

hmm, can you even make an sexpr parser that smtlib uses, then? Don't you have to tokenize differently like the long atoms?

Daniel Fabian (Jul 27 2020 at 12:01):

i mean as n sexpr, (|what does| "this ""tokenize"" as?")

Mario Carneiro (Jul 27 2020 at 12:01):

You probably need a dedicated smtlib style sexpr parser, which can be separated from the smtlib grammar parsing

Daniel Fabian (Jul 27 2020 at 12:01):

right, that makes sense.

Daniel Fabian (Jul 27 2020 at 12:01):

my idea was to make the smtlib sexpr parser first.

Daniel Fabian (Jul 27 2020 at 12:02):

and then just pattern match various s-exprs to build commands.

Mario Carneiro (Jul 27 2020 at 12:02):

you could try to generalize the pure sexpr part, but I think it would make the parser a lot slower

Daniel Fabian (Jul 27 2020 at 12:07):

Ok, so the agreed plan of action for now is:

  • make an smtlib-style sexpr parser
  • run a large test suite
  • if it's good enough, continue, else use some other language of the parsing, simplifying Lean's job
  • after that pattern-match sexprs to build commands

Jason Rute (Jul 27 2020 at 12:20):

If it helps, in that example I sent you, I did something similar, except with JSON instead of sexprs. I had a general purpose JSON parser which read in and parsed JSON to an inductive type json. (Besides the parser, I also built a serializer for the json type.) Then for my API, I made custom structures and inductive types. All serialization and deserialization of the API objects went through the json type. (I even wrote a python script which took a lean structure or inductive type and automatically wrote all the boilerplate to convert to/from json. (I've been spoiled by Google's Protobuf format which automatically turns a protobuf API into whatever language you need, so I wanted to reproduce a little of that magic.)

Daniel Fabian (Jul 27 2020 at 12:32):

right, so I have a "serializer" for the s-exprs that I support.

Daniel Fabian (Jul 27 2020 at 12:32):

but no parser yet.

Daniel Fabian (Jul 27 2020 at 12:32):

It's not really fully-fletched yet, but it's enough to solve sudoku in z3 :P

Daniel Fabian (Jul 27 2020 at 12:33):

in particular, I'm not escaping strings yet, etc.


Last updated: Dec 20 2023 at 11:08 UTC