Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: hallucination

Abhishek Anand (Dec 11 2023 at 14:29):

Question for those who have used LLM/based provers in their actual work: how bad has the problem of hallucination been? When I tried to use chatgpt for Coq proofs, I think the most common reason it failed to do the proof was that it hallucinated lemmas from the library. Although the problem of hallucination is hard to solve in general, partly because it is hard to obtain really unbiased authoritative sources about arbitrary questions, in the domain of math, such problems may be easier to solve: existence of lemmas of a given name is easy to check. If the hallucination problem is bad even for non-chatgpt LLMs for theorem proving, has someone tried hallucination-reducing approaches like this one? https://arxiv.org/abs/2311.08401

Siddhartha Gadgil (Dec 12 2023 at 01:40):

At least for autoformalization of statements, a little prompt engineering can get rid of a lot of the hallucination. Specifically, if GPT-4 sees examples with the correct names it largely uses the correct names.

Also post-processing can fix the most common error: using Lean 3 names.

Last updated: May 02 2025 at 03:31 UTC