Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: DSL for theorem proving?


(deleted) (Aug 05 2025 at 07:17):

https://gist.github.com/huynhtrankhanh/b9b5f0bca44f3d1aa6bd7148ece7f710

I'm trying to make a DSL for theorem proving in a limited domain. However, DeepSeek R1, Gemini 2.5 Pro and Claude Opus are all unable to generate a correct proof. What should I do here? Can I tweak the prompt?

(deleted) (Aug 05 2025 at 07:18):

I would like to apologize in advance for the broken Chinese.

(deleted) (Aug 05 2025 at 08:09):

Do I need Gemini Deep Think? I feel I might need it. It is my only hope. My language is so clearly specified.

(deleted) (Aug 05 2025 at 09:00):

UPDATE: the answer is indeed just use a better reasoning model, and let it think for around 11min.

(deleted) (Aug 05 2025 at 09:01):

The community could steal a page from this. If you have a massive number of theorems to prove, it's possible to make a DSL and make a strong reasoning model prove theorems with the DSL.

GasStationManager (Aug 05 2025 at 15:41):

Looks interesting, thanks for sharing!

I'm trying to understand what is the advantage of this DSL, over doing the proof in Lean. Is it more powerful tactics? Even if that's the case, could have defined those tactics in Lean. Perhaps it is because this DSL has a very limited set of choices (compared to Lean), so if the model tries to do a (more exhaustive) search for proof there is better chance of succeeding?

(deleted) (Aug 05 2025 at 15:42):

Yes, the limited set of choices is the very point. So the model doesn't have to waste time doing RAG and risk degrading performance.

GasStationManager (Aug 05 2025 at 22:01):

Huỳnh Trần Khanh said:

UPDATE: the answer is indeed just use a better reasoning model, and let it think for around 11min.

Does the model run the code during that 11 minute think, or is it purely chain-of-thought?

(deleted) (Aug 06 2025 at 00:53):

It is purely chain of thought. I'm working on an MCP server to validate the code so that less expensive models can be used.

(deleted) (Aug 10 2025 at 15:55):

https://github.com/huynhtrankhanh/ComplexNumberProofAssistant/

(deleted) (Aug 10 2025 at 15:55):

Enjoy the MCP server!

(deleted) (Aug 10 2025 at 15:55):

Moving forward this represents a promising model for automated theorem proving.

(deleted) (Aug 10 2025 at 15:58):

We should make more MCP servers for various domains of mathematics to harness the reasoning power of LLMs.

(deleted) (Aug 10 2025 at 16:27):

Oh sorry... I still have to clean up the LLM generated mess. The fight is not over, and AI hasn't replaced programmers yet.

GasStationManager (Aug 10 2025 at 21:13):

Cool! From what I understand, this produces a proof in the DSL; would it be possible to automatically translate it into a Lean proof script?

(deleted) (Aug 11 2025 at 00:55):

Very simple, I just need to modify the pretty printing function... But I have to get the server to work first. It still doesn't run properly.


Last updated: Dec 20 2025 at 21:32 UTC