Zulip Chat Archive

Stream: general

Topic: Reap tactic related

Yutong Wang (Aug 27 2025 at 06:31):

Discussion thread for #announce > Reap!: Yet another tactic for theorem-proving and #announce > `reap`: General neural tactic now public available

Jared green (Aug 27 2025 at 07:18):

@Terence Tao this might be something to make a video about

Niels Voss (Aug 27 2025 at 07:21):

This looks really cool! I notice that in the video you provide, you have a reap tactic by itself that closes the proof without offering a replacement. Is this tactic deterministic and stable, and does it rely on an LLM call or internet access? I'm just wondering if when you close a proof with reap how likely it is to fail at some point in the future. (Of course, these concerns don't apply to the reap? tactic)

Yutong Wang (Sep 24 2025 at 04:59):

Niels Voss said:

This looks really cool! I notice that in the video you provide, you have a reap tactic by itself that closes the proof without offering a replacement. Is this tactic deterministic and stable, and does it rely on an LLM call or internet access? I'm just wondering if when you close a proof with reap how likely it is to fail at some point in the future. (Of course, these concerns don't apply to the reap? tactic)

You are right. Currently, reap does suffers from deterministic issue of llm generation. We are working to develop a caching system to make it at least more stable. For now, reap? is more transparent and is of course the better one to try

Malvin Gattinger (Sep 30 2025 at 13:25):

I think the readme and install instructions should clarify what the tactic actually does, and answer questions similar to what Niels asked above:

what LLM is used, who runs it an where? (I guess it is not local.)
does it work when I have no internet connection? (I guess no.)
what API calls does it make?
what data from my project/machine gets sent to where?

etc.

In general I would expect/like some clarification that these neural tactics are fundamentally different from those like aesop or grind.

Yutong Wang (Sep 30 2025 at 14:01):

Hi Malvin, thanks a lot for raising these questions — they’re very helpful and we’ll make sure the README/installation instructions clarify them. Let me try to summarize here:

Which LLM is used / who runs it?
The backend uses our own trained model called REAL-Prover. The 7B version is open-sourced on HuggingFace (link), while the hosted version we provide runs on our research cluster.
Does it require internet access?
By default, yes — the default endpoint is our hosted cluster. However, you can override this by setting tacticgenerator.llm_endpoint to a locally served instance of REAL-Prover. In that case, no internet connection is needed.
What API calls are made?
When you call reap to generate the next tactic, it sends the current proof state to the model endpoint you’ve configured (by default, our hosted cluster).
What data is sent?
Only the proof state at the point of the call. No file paths, user identifiers, or other project metadata are transmitted. On our side, this data is processed statelessly by the model. In other words, it remains anonymous to both the model and us.

Yutong Wang (Sep 30 2025 at 14:17):

Basically, reap tactic is composed of neural-based premise selection and tactic generation:

Neural-based premise selection are specially trained, so that it knows well about the semantic similarities(which defs/lemmas are more helpful to solve current state). And can get more accurate/comprehensive premises than classical one.
Neural-based tactic generator learns to solve states by training on large corpus of Lean code and can generate more versatile tactics, including those termwise-like step.

Intuitively, it may produce more human-like tactics and can push forward some harder state that classical rule-based tactics cannot.

Floris van Doorn (Sep 30 2025 at 14:24):

I noticed that Reap is not yet in the Lean reservoir.
Could you add the Apache 2.0 license to it (and all it's dependencies)?

Yutong Wang (Sep 30 2025 at 14:33):

Floris van Doorn said:

I noticed that Reap is not yet in the Lean reservoir.
Could you add the Apache 2.0 license to it (and all it's dependencies)?

Thanks very much. I've added Apache 2.0 license to Reap and its dependencies. Would there be anything else I need to do to add it to reservoir?

Floris van Doorn (Sep 30 2025 at 14:39):

I think not, you should satisfy all other criteria: https://reservoir.lean-lang.org/inclusion-criteria

Floris van Doorn (Sep 30 2025 at 15:00):

I tried to play a little with reap, some observations:

Enabling reap is super easy and convenient: just add the dependency, import Reap and it works. This is really nice!
reap is nondeterministic, so it's useless to keep it in your main proof script. So I think reap should just do what reap? currently does: give suggestions for replacements
It seems that reap doesn't do any caching (if I rerun the tactic I get different results). It might be nice to reduce the number of calls on your side by doing some kind of caching (I think polyrith did do some caching, so maybe you can reuse their code?)
It seems that reap often doesn't use the "most useful" suggestion. When I write

example {α} (s t : Set α) (h : s ⊆ t) (h2 : t ⊆ s) : s = t := by
  reap?

this will usually give 2 suggestions, with the first of them solving the goal. But if I write reap, it will almost always just apply ext (the second suggestion).

Yutong Wang (Sep 30 2025 at 15:22):

reap will just choose the first one that compiles (with no re-ranking, I think this should be refined), by design reap!will try to choose one that close the goal within a single step.

Cache support is important and we will also work on this. More than reduce number of api calls, it should make some request lightening fast btw.

Deterministic problem is an issue. I think reap? and reap + proof search may be the answer to this issue?

Yutong Wang (Oct 12 2025 at 16:23):

Hi guys, here are some updates about reap:

Proof search

Proof search now available: now can use reap!! to initiate a proof search. This is done by taking advantages of aesop.

import Mathlib.Algebra.Order.Ring.Star
import Mathlib.Analysis.Normed.Ring.Lemmas
import Mathlib.Data.Int.Star

import Reap

theorem aux {m n : ℕ} (h₀ : m ∣ n) (h₁ : 2 ≤ m) (h₂ : m < n) : n / m ∣ n ∧ n / m < n := by
  -- This can be solved by `reap!!`
  reap!!

A scoring mechanism based on log probability is implemented for better best-first search. And openai_client is therefore updated to support this feature (to capture logprob struct from any openai-compatible server). I believe this may help someone who is seeking this feature.

Yutong Wang (Oct 12 2025 at 16:27):

Premise selector

reap comes with a powerful premise selection engine, named LeanSearch-PS, which is trained on Mathlib and augmented corpus. Now you can access this through native Lean.PremiseSelection Interface:

import Mathlib
import Reap
-- Set `reap` as the default premise selector
set_premise_selector reapSelector

example (φ : G →* H) (S T : Subgroup G) (hST : S ≤ T) : map φ S ≤ map φ T := by
  suggest_premises

I believe one can work on this for other purpose, feeding these to grind or other local tactics.

(deleted) (Oct 12 2025 at 16:28):

logprob isn't extinct? I'm pretty sure OpenAI doesn't return logprob anymore. Do other "OpenAI-compatible" servers still do it?

(deleted) (Oct 12 2025 at 16:28):

In scare quotes, because there has never been a clear OpenAI API standard. And other servers almost always diverge from OpenAI in some way

(deleted) (Oct 12 2025 at 16:32):

I'm asking this question in part because I'd like to know when this feature can be useful. You can direct me to LLM APIs that return logprob

Yutong Wang (Oct 12 2025 at 16:33):

Huỳnh Trần Khanh said:

logprob isn't extinct? I'm pretty sure OpenAI doesn't return logprob anymore. Do other "OpenAI-compatible" servers still do it?

I'm not sure. But It seems that official OpenAI API still support this OpenAI Platform. For my routine use case, vllm basically provides exactly same response format as official API.

Yutong Wang (Oct 12 2025 at 16:39):

logprobs is partially helpful in proof search, since you can estimate how confident tactic generator is towards some generated tactic, then use it as a guidance through tree search. (Of course more ideal way is to have a so-called reward model to evaluate each resulted proof state :slight_smile: )

Yutong Wang (Oct 12 2025 at 16:48):

btw installation changed a little bit, one should always place dependency of reap before that of mathlib in lakefile.toml (see Readme for lakefile.lean) to avoid cache system collapse.

[[require]]
name = "reap"
git = "https://github.com/frenzymath/reap.git"
rev = "main"

[[require]]
name = "mathlib"
scope = "leanprover-community"

I am still not sure why the reverse way won't work, should be related to newly added aesop dependency.

Jannis Limperg (Oct 12 2025 at 16:53):

Your package likely depends on a different Aesop commit (recorded in lake-manifest.json) than Mathlib. Afaik there is currently no good way to handle this situation other than manually synchronising your deps with Mathlib.

Yutong Wang (Oct 12 2025 at 16:56):

Jannis Limperg said:

Your package likely depends on a different Aesop commit (recorded in lake-manifest.json) than Mathlib. Afaik there is currently no good way to handle this situation other than manually synchronising your deps with Mathlib.

I try to keep exact

[[require]]
name = "aesop"
scope = "leanprover-community"
rev = "master"

with mathlib, but it won't work. Should I directly modify lake-manifest.json to do the manual sync?

Jannis Limperg (Oct 12 2025 at 16:59):

Yeah. Your lakefile fixes only a particular branch, but not a particular commit, whereas lake-manifest freezes both. So if you and Mathlib call lake update aesop at different times, you may get different commits in lake-manifest.

Yutong Wang (Oct 12 2025 at 17:45):

I think this solution is kinda weird, if I need to support different mathlib release tag master, v4.23.0, v4.24.0-rc1, etc. Would I need to manually match lake-manifest.json for each release tag? This seems to make lake package management more confusing.

Jannis Limperg said:

Yeah. Your lakefile fixes only a particular branch, but not a particular commit, whereas lake-manifest freezes both. So if you and Mathlib call lake update aesop at different times, you may get different commits in lake-manifest.

Jannis Limperg (Oct 12 2025 at 18:13):

Yup. It's not a great UX, but it is what is right now.

Yutong Wang (Oct 12 2025 at 19:01):

Here is a demo of proof search speed on some random exercises. Sometimes much faster than I expected.
demo

(deleted) (Oct 13 2025 at 03:11):

Man it looks so fast. What model are you using

(deleted) (Oct 13 2025 at 03:13):

It's so extreme.

(deleted) (Oct 13 2025 at 03:14):

Ah the stock Real-Prover?

(deleted) (Oct 13 2025 at 03:15):

You're giving me a good reason to download it. Man... I'm missing out.

(deleted) (Oct 13 2025 at 03:20):

The fastest in the market, and it's freely available too. Can't be grateful enough.

Yutong Wang (Oct 13 2025 at 04:04):

Yeah, it is basically Real-Prover, but with a little bit data hot fix and format optimization in comparison with the vanilla one.

Kim Morrison (Oct 15 2025 at 01:31):

Are any lake exe cache experts or enthusiasts available to help out here?

@Yutong Wang has prepared #29953, which ideally should be preparing the oleans required for the reap branch mentioned above, but is failing in CI.

I think the problem is visible at https://github.com/leanprover-community/mathlib4/actions/runs/18494458606/job/52695367601?pr=29953#step:24:7022, where you can see that lake exe cache put is packaging up all the oleans for Mathlib and its upstream dependencies, but is not doing anything with Reap, and I can't work out why.

Kim Morrison (Oct 23 2025 at 22:55):

I hope that #30818 has now fixed this problem, and we should get oleans for a reap-enabled Mathlib from #29953 shortly.

Asei Inoue (Oct 24 2025 at 01:13):

Mathlib will have reap soon?

Kim Morrison (Oct 24 2025 at 02:34):

Mathlib will have a reap branch soon, with lake exe cache get, in which you can use reap. Whether it ever lands in master is further off.

Kim Morrison (Oct 25 2025 at 05:43):

This is now working. Please use the following steps to get Mathlib with reap installed:

git remote add frenzymath git@github.com:frenzymath/mathlib4.git
git fetch frenzymath
git checkout reap
lake exe cache get

and then try using the reap tactic anywhere in Mathlib.

It would be really helpful to have some "real world" usage reports --- is reap useful as is? I think if we can get a few people to give it a 30 minute try on a current project inside Mathlib, and report back what they found, we may be able to talk about turning it on for everyone.

Snir Broshi (Oct 25 2025 at 16:20):

Kim Morrison said:

I think if we can get a few people to give it a 30 minute try on a current project inside Mathlib, and report back what they found, we may be able to talk about turning it on for everyone.

Tried it now for an hour on some trivial graph theory lemmas, and I'm pretty disappointed.
It keeps suggesting various simp only [...] calls which all contain lemmas already tagged with simp.
Sometimes it closes the goal with an exact, but in those cases exact? suggests the same thing.
So it seems pretty similar to doing simp_all; exact?.
The only "smart" thing I saw it do so far was suggest some cases or induction, which are helpful but obvious suggestions.
image.png
So I think adding cases or induction on inductive structures (such as graph walks in this case) to hammers will get better results than reap.

Specific problems:

Having both reap and reap? means that whenever I type reap? it performs 2 calls to the model instead of one because reap is a prefix of reap?.
When I want to retry reap? because it says "no suggestions" my instinct is to delete the last character and type it again, but this again creates more uneccessary API calls.
The suggestions only show the first goal a suggestion creates, which is misleading. It looks like this suggestion solved almost all of the cases, but actually it creates 3 goals and is useless.
image.png

Patrick Massot (Oct 26 2025 at 11:42):

Could we have the same kind of effort to have a mathlib with Canonical and LeanHammer instead?

Niels Voss (Oct 26 2025 at 16:10):

In the long term, would it be possible to allow users to install their own tactics like canonical or reap without requiring a new branch of Mathlib to be made and maintained for everyone?

Kim Morrison (Oct 27 2025 at 00:16):

Niels Voss said:

In the long term, would it be possible to allow users to install their own tactics like canonical or reap without requiring a new branch of Mathlib to be made and maintained for everyone?

Yes, it's long been on the agenda, but no near term plans, unfortunately.

Kim Morrison (Oct 27 2025 at 00:22):

Patrick Massot said:

Could we have the same kind of effort to have a mathlib with Canonical and LeanHammer instead?

Sure, happy to have a go. I hadn't appreciated that there was demand for these, as I hadn't seen practically useful results. But I'm happy to hear that others are finding them useful!

Could someone please review/merge #30844? It is on the critical path to providing Canonical on master.

Kim Morrison (Oct 27 2025 at 01:03):

@Patrick Massot, as you'll see on the Canonical thread we're making good progress on that front.

However LeanHammer is not "mathlib ready", mostly due to their difficulties keeping up to date with recent versions. (But it also has much heavier dependencies, including auto and duper, which I am highly skeptical will ever be in a state we could require into Mathlib.) I'm happy to help out with integration when that is resolved.

Kim Morrison (Oct 27 2025 at 01:15):

(Further on LeanHammer, some of those dependencies still require "precompileModules", which doesn't really work at present.)

Robin Carlier (Oct 27 2025 at 10:54):

Kim Morrison said:

Yes, it's long been on the agenda, but no near term plans, unfortunately.

Is there even an RFC somewhere that we can upvote? I am always a bit puzzled by the number of things mathlib "requires", and I’d really wish for a way to have people cherry-pick which "development goodies" (packages that do not contribute to actually checking a single proof, but may helps writing them) they actually want instead of coming with a bunch of "you-might-like-it" packages that everyone (also downstream deps) have to download, that can potentially break at releases, go unmaintained etc.

Kim Morrison (Oct 28 2025 at 00:06):

It's a big project, sorry.

Yutong Wang (Oct 30 2025 at 09:17):

Snir Broshi said:

Kim Morrison said:

I think if we can get a few people to give it a 30 minute try on a current project inside Mathlib, and report back what they found, we may be able to talk about turning it on for everyone.

Tried it now for an hour on some trivial graph theory lemmas, and I'm pretty disappointed.
It keeps suggesting various simp only [...] calls which all contain lemmas already tagged with simp.
Sometimes it closes the goal with an exact, but in those cases exact? suggests the same thing.
So it seems pretty similar to doing simp_all; exact?.
The only "smart" thing I saw it do so far was suggest some cases or induction, which are helpful but obvious suggestions.
image.png
So I think adding cases or induction on inductive structures (such as graph walks in this case) to hammers will get better results than reap.

Specific problems:

Having both reap and reap? means that whenever I type reap? it performs 2 calls to the model instead of one because reap is a prefix of reap?.

When I want to retry reap? because it says "no suggestions" my instinct is to delete the last character and type it again, but this again creates more uneccessary API calls.

The suggestions only show the first goal a suggestion creates, which is misleading. It looks like this suggestion solved almost all of the cases, but actually it creates 3 goals and is useless.
image.png

Thanks for trying and providing these valuable feedbacks!

Current reap mainly tests on algebra-related use case. I think there are indeed some conceivable performance gaps in certain domains (combinatorics, graph theory, etc). This may arise from
1) Relatively sparse coverage in mathlib4 on these topics (current reap version cutoff is v4.16.0). And our data augment regime (expert iteration) mainly takes in sources from general algebra related domain.
2) Tac gen system now will not consume anything other than current proof state, which means it is not aware of local theorems/premises available in current env (project). This will sacrifice some of its potential to resolve obvious states.

For 1), since Lean resources about combinatorics are much more abundant than 2 years ago, reap could be much smarter after next iteration after taking in those resources. Meanwhile, could you try reap!! to conduct proof search and see if it is more useful to generate multiple tacs?

For 2), we are working on ways to securely extract infos from user's local env, but it will be some trade-off for privacy (relative file paths, comments might be sent to our cluster).

reap and reap?: I think one of reap and reap? will eventually be removed for simplicity, since reap is not actually useful in proof writing
Unnecessary API calls: We are solving this through setting up client-side and server-side cache system. Perhaps also adding a button for retrying generation will be more friendly in this case?
Goal display: This is indeed misleading. reap should at least show types of all unsolved goals, which will not take up much spaces. I can work on this issue.

Again these are very good feedbacks (at least we never had a chance to test reap on graph theory problems). We can work to get it right.

Last updated: Dec 20 2025 at 21:32 UTC