Zulip Chat Archive

Stream: lean4

Topic: Duper question: Saturation

Vlad Tsyrklevich (Oct 30 2024 at 17:06):

cc @Josh Clune

I'm using Duper and have a basic question: I was copying and pasting a call to duper with a specific portfolioInstance set and changing the underlying problem it's solving. Duper then returned a message saying that it was saturated. Changing the call to duper? or getting rid of that portfolioInstance lead to a timeout instead.

I only have a faint understanding of the mechanism by which ATPs work, but I had the impression that e.g. using Vampire that hitting saturation means that the underlying goal has been refuted because it's decision procedure is complete. Is this the case for duper as well? I am asking because I believe the underlying goal should actually be provable, and Vampire confirms that thinking, and I am wondering if my translation between TPTP and Lean code is correct. (I would test the TPTP directly with duper, but the resulting TPTP times out so it's inconclusive. Also I have to remove universal quantifiers and I'm not 100% sure if it's actually the same instance in the end.)

Also, more of a comment than a question, but it would be cool if duper? returned a more complete picture of how to reach the final goal. This would be great for optimizing large amounts of calls that may otherwise be slow even with the specific portfolioInstance, but I also note it because I see that calling duper? with some unnecessary hypotheses does not seem to weed them out from the final suggested result for me, though maybe that is because they end up somehow needlessly contributing to the final resolution?

Josh Clune (Oct 30 2024 at 17:39):

It's true that when a complete saturation prover achieves saturation, that indicates the underlying goal has been refuted. I wouldn't claim that Duper is complete (there are a lot of subtle things that can go wrong in practice), though it is based on a calculus that has completeness properties for FOL and HOL.

One thing to note about Duper is that calling duper or duper? without setting a portfolioInstance will cause Duper to attempt the problem with multiple option configurations (setting the portfolioInstance fixes the option configurations). Some of these option configurations are intentionally incomplete because they exclude rules that are necessary for completeness but are expensive in practice.

Without knowing more specifics about the portfolioInstance you're using or the problem you're trying to solve, I suspect the most likely thing that's going on is the portfolioInstance you're using is one of the ones that tends to be faster on most problems because it excludes some rules that are necessary to solve a minority of problems (if you're using portfolioInstance 1 or 2, this is almost certainly what is happening). This would explain why one instance would saturate while others time out.

Another possibility is that you ran into a bug that causes Duper to miss a proof that it theoretically ought to find. This is also totally plausible.

When calling duper or duper? without setting the portfolio instance, Duper can throw a different error Duper failed to solve the goal and determined that it will be unable to do so with the current configuration of options and selection of premises. If this happens, then Duper saturated or failed on all of the option configurations in its portfolio mode. If this happens on a problem that Vampire can prove, then I would expect you probably found a bug.

Josh Clune (Oct 30 2024 at 17:43):

As for the suggestion regarding modifying duper?, I agree that would be cool and I might add an option to make duper? behave more like that in the future. The reason duper? doesn't currently weed out unnecessary hypotheses is that during proof search, it is possible for Duper to use hypothesis A to determine it doesn't need to look at hypotheses B and C, resulting in the presence of hypothesis A improving the proof search even if A doesn't appear in the final proof. It's hard to distinguish this situation from one in which A, B, and C are all totally irrelevant.

It would be possible for duper? to suggest only using premises that appear in the final proof that it finds, but this could result in behavior where duper? solves a goal but the tactic it suggests doesn't. That would be bad and confusing, so although the current behavior doesn't filter irrelevant premises, it is more reliably capable of reproducing whatever proof duper? found.

Vlad Tsyrklevich (Oct 31 2024 at 09:55):

Josh Clune said:

if you're using portfolioInstance 1 or 2, this is almost certainly what is happening

Thank you for the thorough clarification. It was indeed option 2, and I'll be more cautious about not copying and pasting duper calls with portfolio instances attached. Also thanks for noting the error message to look out for saturation on non-portfolioInstance calls, that is useful.

Vlad Tsyrklevich (Oct 31 2024 at 09:58):

Josh Clune said:

it is possible for Duper to use hypothesis A to determine it doesn't need to look at hypotheses B and C, resulting in the presence of hypothesis A improving the proof search even if A doesn't appear in the final proof

Ah, yes. That makes sense why it would require additional work to determine if a premise was actually used in a given path.

Vlad Tsyrklevich (Oct 31 2024 at 10:19):

Having given a little more thought to my original idea, premise minimization is really not the desired end-goal. I think what I would like is to be able to extract proofs out of duper entirely so that subsequent calls are fast (I'm assuming that generally the execution time of the extracted proof is minimal compared to the search.) . This is fundamentally what I was working on that inspired me to look at duper: taking completed Vampire proofs and turning them into Lean code. In the Vampire case, it's search is actually very fast, though when turning off certain features to make the proof conversion easier, even it starts to take hundreds of ms for some problems.

Vlad Tsyrklevich (Oct 31 2024 at 10:21):

I have no idea how feasible that is, nor if others would want the same feature. In my case where I'm generating lots of proofs that aren't meant to be read by humans, the readability of the output is not the highest priority. That may be different in something like mathlib, though even there an editor integration to collapse regions marked as auto-generated could be desirable for the speed-up.

Josh Clune (Oct 31 2024 at 19:11):

I see. I think the usual recommendation for things like this would be to use show_term [tactic you want to generate the term of], but I don't know how much work it would be to get show_term to successfully extract the all proof terms Duper can produce. Out of the box, show_term runs into an immediate problem reprinting the internal names Duper uses for its clauses, and though this could be fixed relatively easily, show_term can run into more fundamental problems on some of the more complicated proofs Duper can produce, the specifics of which I'm not too well-versed in.

Last updated: May 02 2025 at 03:31 UTC