Zulip Chat Archive

Stream: lean4

Topic: Feasability of creating a compiler in Lean


Jesse Slater (Feb 20 2023 at 20:36):

I am working on a university research project, and we are considering using Lean. Our goal is to create a domain specific programming language which compiles to a program in an unusual model of computation. The specific details are not that important, but we want to be able to prove that the output of our compiler correctly computes the program.

We like the idea of using Lean because we would be able to write the compiler and do the proof in the same language. We could also write the language in lean macros, so that it can be evaluated as a lean program as well as being compiled and run under our computation model. Then our goal would be to prove that running it as a lean program has the same effect as running the compiled code in our computation model.

Our past work in the area has been done with Isabelle, and none of us have worked much with Lean before. The language and compiler we are writing will be very simple. Does Lean seem like a reasonable choice for this project? How feasible would it be to write a compiler and prove that it is correct in Lean? Are there libraries that could help with that task? Are there any important things we should take into consideration in making this decision?

I know this is a very abstract question, but I figured that just asking it directly would be worth a shot. I have been learning Lean on my own for the past couple months, and so far, I really like the idea, but I thought that maybe one of you experts would be able to give some helpful advice.
Thanks!

Sebastian Ullrich (Feb 20 2023 at 21:45):

You might be interested in our "do unchained" paper and especially its supplement for what could reasonably be called a "simple verified compiler for a simple (embedded) language". The setup for a stand-alone compiler for a stand-alone language might look a little different (I'm not aware of such a project in Lean 4 yet, otherwise I would have pointed to it instead), but feasibility in Lean should be comparable. In particular, Aesop should be similarly helpful for automation and provide a reasonable replacement for Isabelle's auto.

Jesse Slater (Feb 20 2023 at 23:05):

Thanks, this is great!

James Gallicchio (Feb 21 2023 at 04:41):

FWIW, I'm currently working with a friend on a provably correct compiler for a C-like language, and it's been mostly smooth so far. Programming in Lean feels very much like programming in ML, which is a joy.

One sticking point has been proving termination of some of the frontend passes, because termination arguments are kinda complicated when compiling an imperative language. But there are workarounds, and there's a good chance it won't be an issue in your case :)

James Gallicchio (Feb 21 2023 at 04:43):

(Also, I think it would be very appreciated to have more people trying to do software verification in Lean, since it becomes easier to invest in building the kind of automation available in other tools)

Andrés Goens (Feb 23 2023 at 07:10):

I can second what @James Gallicchio is saying. We're also using Lean for similar academic projects. On that note, the reactor lean implementation from @Marcus Rossel sounds very similar (compiling an unusual model of computation) and he ended up bypassing a lot of the java-based compiler of the main project and using lean syntax extension's instead. We have a specification of the model too, but they're not yet connected (i.e. the compiler/runtime is not verified) but @Marcus Rossel did prove some properties of the runtime and provided mechanisms to prove properties about individual programs. It sounded like Lean was mostly straightforward about it but I didn't do it so I might be painting the wrong picture. Marcus probably has something more insightful to say about his experience.

Alex Keizer (Feb 23 2023 at 13:06):

James Gallicchio said:

(Also, I think it would be very appreciated to have more people trying to do software verification in Lean, since it becomes easier to invest in building the kind of automation available in other tools)

Could you expand a bit on what kind of automation other tools have that we might want for this kind of software verification?

I'm trying to settle on a topic for a PhD, my supervisor is mostly interested in metaprogramming and developping automation for Lean, I am very interested in software verification, so building that sort of automation seems like it would be a very good topic for me to pursue!

Jannis Limperg (Feb 23 2023 at 13:46):

Look at the big Coq projects: Iris, RustBelt, CompCert, DeepSpec, etc. I'm not super familiar with that area, but my impression is that they all have some tooling component. Just be aware that doing the same things but in Lean (and maybe slightly better) is not a very attractive research proposal.

In general, Coq is leagues ahead in low-level program verification. They already have a separation logic for C-style languages; they already have a verified optimising C compiler; they already have formalised memory models and instruction set semantics and so on. Each of these projects represents a major time investment. I'm not sure there's a strong 'business case' for reproducing all this effort in Lean.

Marcus Rossel (Feb 23 2023 at 15:13):

@Jesse Slater Andrés Goens said:

It sounded like Lean was mostly straightforward about it but I didn't do it so I might be painting the wrong picture. Marcus probably has something more insightful to say about his experience.

Lean was indeed an awesome language for this project. Especially the parsing and code generation was so easy using Lean's syntax and macro infrastructure. And I think it should also be comparatively easy to handle errors / produce good error messages.

Andrés Goens (Feb 23 2023 at 15:55):

Jannis Limperg said:

Look at the big Coq projects: Iris, RustBelt, CompCert, DeepSpec, etc. I'm not super familiar with that area, but my impression is that they all have some tooling component. Just be aware that doing the same things but in Lean (and maybe slightly better) is not a very attractive research proposal.

In general, Coq is leagues ahead in low-level program verification. They already have a separation logic for C-style languages; they already have a verified optimising C compiler; they already have formalised memory models and instruction set semantics and so on. Each of these projects represents a major time investment. I'm not sure there's a strong 'business case' for reproducing all this effort in Lean.

I agree that Coq has a lot more infrastructure, because it's been around for much longer and people have used it for a lot of these kinds of projects. I'd argue, however, that building infrastructure in Lean has two attractive properties that speak for it:

  • Metaprogramming: Ltac is pretty difficult to use and not as expressive. Having to write any reasonable tactics in Ocaml, on the other hand, makes a lot of cognitive overhead of switching and is a bit more awkward. And then you might want to execute your program, you'll need to do extraction. All that is just easier in Lean. And that's not including the syntax extensions, which are more powerful and IMO a bit more understandable (but that might be biased because I've worked more with lean's).
  • Mathlib: having a "library" of formalized mathematics allows you to reason about sophisticated mathematical structure that might be present in your problem. Maybe you'd want to reason about elliptic curve cryptography or quotients modulo some group action, you could use mathlib for those.

Andrés Goens (Feb 23 2023 at 15:58):

But also, isn't your aesop tactic based on Isabelle's heuristics? And there was enough interesting new things to innovate there, right? I'd conjecture that translating ideas from say Coq or Isabelle into Lean would have enough interesting differences to innovate, not just because of the language but because of how ideas have developed in the meantime

Sebastian Ullrich (Feb 23 2023 at 16:00):

Jannis Limperg said:

Just be aware that doing the same things but in Lean (and maybe slightly better) is not a very attractive research proposal.

Obligatory plug: it's quite nice for student research projects though https://pp.ipd.kit.edu/uploads/publikationen/koenig22masterarbeit.pdf

James Gallicchio (Feb 23 2023 at 17:01):

Ditto to what everyone else said. As for interesting research, I'm currently playing around with Dijkstra monads and using them to generate verification conditions for monadic code in Lean -- getting that working would I think be very interesting research

James Gallicchio (Feb 23 2023 at 17:04):

It also seems like there is a good amt of energy to put towards integrating Lean with proof-producing SMT stuff, which will be a great bit of automation

Jannis Limperg (Feb 23 2023 at 18:28):

Andrés Goens said:

But also, isn't your aesop tactic based on Isabelle's heuristics? And there was enough interesting new things to innovate there, right? I'd conjecture that translating ideas from say Coq or Isabelle into Lean would have enough interesting differences to innovate, not just because of the language but because of how ideas have developed in the meantime

Yes, and I had to constantly convince myself (and ultimately the reviewers, though this was not a problem here) that there was enough novelty in there to call it research. I find this unsatisfying and would prefer to work on stuff that is less derivative. Of course, incremental progress is still valuable, but then I think it's a good idea to consider possible improvements from the beginning. For example, what would be the specific advantage of a Lean Iris implementation over the Coq Iris implementation? (I also completely agree that student projects are a great way to get nontrivial but non-novel things done, if the students are very good.)

Wrt the advantages of Lean over Coq, I would agree that the metaprogramming is much nicer, though Coq's Mtac2 is conceptually similar. (How polished it is I don't know.) Mathlib can, of course, be relevant, but this depends a lot on the specific math needed for each project. (For elliptic curves, Coq's library is pushing 10 years.) Do these advantages outweigh Coq's 40-year headstart in many subfields of program verification? I highly doubt it. Of course this doesn't mean that we shouldn't do program verification in Lean, but it does mean, imho, that it would be prudent to focus on niches where Coq is not so far ahead.

Alex Keizer (Feb 24 2023 at 10:38):

And yet, having aesop seems very beneficial to the Lean community, so even if your work is maybe slightly derivative, it (indirectly) contributes the more novel work others might do in Lean using aesop. Now, whether you are satisfied working on valuable, but not necessarily novel, infrastructure and tooling, instead of more traditionally novel research, that's up to you. But, it feels like a shame that the research community as a whole doesn't seem to appreciate the former quite as much (at least, as far as I can tell, feel free to tell me if I'm wrong).

Alex Keizer (Feb 24 2023 at 10:45):

It makes sense that taking some work done in Coq and doing it in Lean is not very exciting, but if we keep sticking with Coq because it has a 40 year headstart and don't develop the infrastructure in other tools, then this headstart will only keep increasing.

This is definitely just me talking about things I don't know a lot about, but I'm reminded of the current situation with Rust increasingly being used to replace old C or C++ code. A big reason this is feasible, is because of Rust's FFI, allowing new Rust code to interface with existing C/C++ code. I wonder if we could make some tool that allows us to have a similar kind of interface between Coq and Lean, but on a high enough level to also support proofs, kind of like a Foreign Proof Interface? This is obviously much, much harder than normal FFIs, even before we consider how that would even work with tactics and other metaprogramming, but still.

Arthur Paulino (Feb 24 2023 at 11:17):

It depends on your goal, really. If the goal is to do research, then indeed replicating knowledge from one tool to another is not interesting to the academy. At least not for masters or PhD level research. But if your goal is to expand and improve the tools available, detached from the academy, then it makes perfect sense.

Andrés Goens (Feb 27 2023 at 12:12):

IMHO novelty is overrated in academia. Sure it's great to have novel developments, but it's not as important as sometimes people make it out to be. Tools (like software) are valuable contributions that sometimes don't have as much novelty, as we've discussed here. But also papers that reinforce an idea with perhaps new data, or give a clearer exposition of a well-known topic. But, of course, a lot of academics don't see it that way.


Last updated: Dec 20 2023 at 11:08 UTC