Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: ChatGPT can help translate between Coq and Lean


Daniel Windham (Feb 13 2024 at 18:08):

I ran some simple experiments that may be of interest to folks here: https://docs.google.com/document/d/1Dr2FFZMqpPWuhCBRi0nSg_Ji6US_Xm_aGJvQx_5Z2Cc

Cross-posted to Coq discourse: https://coq.discourse.group/t/chatgpt-can-help-translate-between-coq-and-lean/2203

Here's a summary:

Making high-quality proof libraries takes a ton of work. It would be a shame to duplicate all that work for each ITP language, or to get locked out of using a specific ITP because it doesn't have the right libraries. Cheap library reuse across languages would reduce the burden of language silos.

For instance, software verification in Lean could benefit from Coq's advanced libraries for software, while Coq could benefit from Lean's math libraries. Having equivalent proofs implemented in different ITPs also provides clear benchmarks for comparing research results across ITPs.

I ran some simple experiments on using ChatGPT-4 as an assistant to iteratively write and translate between Coq and Lean 4 definitions (not proofs).

These experiments found that ChatGPT significantly reduced the work to (a) write a toy ISA model and example assembly programs in Coq (~400 SLOC), (b) translate these to Lean, (c) compare the Coq vs. Lean programs for semantic differences, and (d) port a Coq bugfix to Lean as an incremental patch. The most surprising result was that ChatGPT accurately detected minor semantic differences between two implementations, one in Coq and the other in Lean, and could patch the Lean code to match the Coq code.

Is this direction interesting to folks here?

Jason Rute (Feb 13 2024 at 19:58):

As a practical tool, yes it is interesting. I’d be interested in if others feel the same way, and if it is as easy to go one direction or the other. (I think ChatGPT often gets Lean 3 and 4 mixed up so I bet it is better having Lean as a source language instead of a target language.)

Jason Rute (Feb 13 2024 at 19:58):

As a research direction, I’ve often wondered about translating between ITPs and their libraries. We can do it the logical way through shallow embeddings, and the probabilistic way though machine learning, but can we combine the two approaches? If one could show a method which accurately and idiomatically translates large chunks from one ITP language to another including the proofs, I think that would be really exciting, especially since proofs can be automatically checked and we can have some guarantees the translation is correct.

Jason Rute (Feb 13 2024 at 19:59):

(But I’m not sure how interesting of a paper it would be to say that ChatGPT is a good assistant for translating the non-proof code. I don’t find this surprising anymore. Useful, yes: surprising, not so much.)

Jason Rute (Feb 13 2024 at 20:06):

Although I guess translating say all of Isabelle’s AFP or Coq’s MathComp to Lean automatically (even without the proofs) would provide a lot of interesting training and testing data for automated theorem proving in the same ways that natural-to-formal auto-formalization papers like ProofNet did.

Kevin Buzzard (Feb 13 2024 at 20:10):

Isn't the issue here that the idiomatic way to do things in two different languages might be very different? So you translate from one prover to another and you may get a bunch of unidiomatic code which is hard to use.

Jason Rute (Feb 13 2024 at 20:10):

I was also about to say I know someone who is really interested in this direction of translating between Lean and Coq. His name is (looks up name in recent messages) @Daniel Windham :laughter_tears:.

Jason Rute (Feb 13 2024 at 20:11):

@Kevin Buzzard I think @Daniel Windham is saying that the ChatGPT translation is very idiomatic.

Jason Rute (Feb 13 2024 at 20:12):

…and correct also, right?

Jason Rute (Feb 13 2024 at 20:13):

I think it would be good to see some side-by-side examples.

Daniel Windham (Feb 13 2024 at 20:20):

I think ChatGPT often gets Lean 3 and 4 mixed up so I bet it is better having Lean as a source language instead of a target language.

In my experience GPT-4 has been reasonably good at Lean 4 now that the training cutoff date is April 2023. There were a couple syntax mistakes it kept making but they were things I could fix with find/replace, and they weren't Lean 3 vs 4 issues.
.

I was also about to say I know someone who is really interested in this direction of translating between Lean and Coq.

Ha! Yes... this was that project.
.

Isn't the issue here that the idiomatic way to do things in two different languages might be very different?

I think it would be good to see some side-by-side examples.

I'm pretty new to ITPs so I'm not the right person to assess what's idiomatic to Coq or Lean. The experiment I ran was equivalent to "Say we want a software engineer to start developing verified software. Can tools like ChatGPT get them to deliver value with just weeks of training?" So it's not a good experiment for studying "idiomatic".

Here are the final programs in Coq and in Lean.

Daniel Windham (Feb 13 2024 at 20:28):

But yes - my takeaway was that the translation was correct, including in iterative workflows where it's important to translate incremental changes that align to existing definitions.

Bas Spitters (Jul 24 2024 at 12:05):

Could these approach help to make the code in PudnamBench more idiomatic?

Daniel Windham (Aug 07 2024 at 13:27):

@Bas Spitters it depends on the gaps, but it would be fast for someone familiar with the PutnamBench problems to try. I'm happy to pair with them if that's of interest. If someone spends more than half an hour on trying this out, they should spend the first half-hour customizing the chatbot to query PutnamBench's existing resources.

I'd expect this approach to be helpful for:

  • Code style refactors that involve common or easy-to-describe syntactic patterns
  • Improving names and naming consistency
  • Translating problem formalizations from one language to another when the user can point out theorems in the target language that match the source language's dependencies

I'd expect it to not help when:

  • Improvements wouldn't be valuable enough to justify manual review the results
  • The logical structure of a formalization needs to be rethought based on the available lemmas or language features

(Sorry for the very delayed reply.)

Bas Spitters (Aug 14 2024 at 14:37):

@George Tsoukalas is this relevant for you?

George Tsoukalas (Aug 14 2024 at 14:54):

Bas Spitters said:

George Tsoukalas is this relevant for you?

I agree with Daniel about when I'd expect it to help. I'd also be curious to know if ChatGPT or any of the big LLMs could reliably catch subtle mistakes in the formalizations - we did not run any experiment like this. I'd be supportive of any efforts to look into this further. However, for the Coq formalizations we are currently revising, we are still planning on doing them manually because it helps build experience in our group, even if it takes longer. Though if there were modifications needed based on simple syntactic changes, I could see this kind of automated fixing to be very useful, especially from a maintenance perspective.

RexWang (Sep 15 2024 at 02:21):

Making high-quality proof libraries takes a ton of work. It would be a shame to duplicate all that work for each ITP language, or to get locked out of using a specific ITP because it doesn't have the right libraries. Cheap library reuse across languages would reduce the burden of language silos.

An idea from CAS: wrapper for ITPs? Since the translation works could become overwhelming as libraries grow.

There are many CAS languages, such as Mathematica, Maple, SymPy, GAP, and more. SageMath builds on top of them and access their combined power through a common interface. One can perform computations on a group in GAP and then transfer the result to Maple for further processing, as long as we know we're dealing with the same abstract objects.

However, the concept of "the same objects" carries more weight in ITPs. Could there be a standard or a wrapper that allows the conversion of proof objects (definitions and propositions) between different ITPs? If we know the objects are equivalent, we could transfer them to another ITP and use its lemmas seamlessly.

It would also be fascinating to explore interactions between ITPs and CAS. CAS excels in computation, while ITPs specialize in proving; the two could complement each other effectively. Currently, LLMs can interface with both, but perhaps this is not the most natural solution. Could there be a more organic integration?

Related papers just found: A Bi-Directional Extensible Interface Between Lean and Mathematica.

Bas Spitters (Sep 15 2024 at 13:14):

One possibility is importing Lean into Coq: https://coq.discourse.group/t/alpha-announcement-coq-is-a-lean-typechecker/581/5

Jonathan Julian Huerta y Munive (Sep 16 2024 at 10:29):

The people working on Dedukti have worked on proof exchange between provers for some time now. If I remember correctly, they are also working on interfacing with Lean. They also know that a big obstacle is maintainability of the approach after it has been completed, and have plans to tackle it.

Mario Carneiro (Sep 16 2024 at 13:08):

For LLM purposes, I think low-level translations between proof systems are useless, and high-level (source-to-source) translations a la mathport qualify as "synthetic data". They are no good for learning how to write idiomatic code


Last updated: May 02 2025 at 03:31 UTC