Zulip Chat Archive

Stream: general

Topic: Kimina Prover 72B Release Discussion Thread

Bolton Bailey (Jul 10 2025 at 13:00):

Here is a discussion thread for the announcement on Kimina's new release. Also feel free to DM me or other members of the team with comments or questions.

Justin Asher (Jul 10 2025 at 13:27):

Great work. I am impressed by the MiniF2F score, seeing how much smaller the model is than DeepSeek-Prover. I suppose the next step is training the model on more general mathematics? Because my impression is a lot of these custom math / Lean models, while good at benchmarks, struggle with, e.g., autoformalizing arithmetic geometry.

Bolton Bailey (Jul 10 2025 at 13:56):

Yes, models that work with math that goes beyond olympiad-type problems would be great to see. I feel it's a bit of an open question how best to approach that with deep learning because deep learning requires a lot of data, and it is hard to think how we'd assemble a good data source with an appropriate signal for measuring performance. I liked Jason's suggestion of a dataset of bachelors/masters-thesis level problems.

I think for me, the most important next step is to make models easier for lean users to install and use. I am currently in the woods of Tennessee with a poor internet connection but when I get back, I am going to try reuploading the models we released today to ollama for people who find that an easier way to install.

Eric Wieser (Jul 10 2025 at 16:41):

Which "widely used miniF2F benchmark" is being used?

Bolton Bailey (Jul 10 2025 at 17:21):

Eric Wieser said:

Which "widely used miniF2F benchmark" is being used?

Here's a link to the MiniF2F we used, seems it originates from the DeepseekProverV1.5 version, but with some additional fixes.

Asei Inoue (Jul 12 2025 at 02:26):

Great! I'd like to use Kimina Prover as a model for Copilot—how can I do that?

(deleted) (Jul 12 2025 at 12:32):

Try implementing an MCP server!

David Renshaw (Jul 12 2025 at 13:12):

Thank you for https://demo.projectnumina.ai/! I used it just now to close a sorry in Compfiles: https://github.com/dwrensha/compfiles/commit/d10f7d516a6cf97e6dacb1c5bfcd9d9903d35a96

Fred Rajasekaran (Jul 12 2025 at 21:58):

On the demo site, I asked it to show (a standard trig-substitution integral)

$\frac{1}{2\pi} \int_{-2}^2 \sqrt{4 - x^2} \, dx =1.$

After some attempts, it appeared to figure out a correct approach in natural language to by setting $x = 2 \sin \theta$ (on page 9 in the attached file). However, when it tried implementing the computation in Lean, it got completely stuck in a loop. The next 60 or so pages of its output (in the attached file) are the model being stuck in a loop. Do you know what's going on here? It also seems like it didn't even try to implement the trig substitution it had just written in natural language.
Kimina Prover Test.pdf

I gave it the same prompt again (without any natural language this time) and it seemed to get further along with the correct trig substitution, but wasn't able to provide a complete error-free lean proof in the end.
Kimina Prover Test 2.pdf

GasStationManager (Jul 13 2025 at 00:06):

Here's a proof generated by Opus 4, with access to MCP tools LeanTool and LeanExplore:

import Mathlib
open Real Set
open scoped BigOperators

theorem my_favorite_theorem : ∫ x in (-2)..2, (1 / (2 * π)) * sqrt (4 - x ^ 2) = 1 := by
  -- First, let's factor out the constant (1 / (2 * π))
  rw [intervalIntegral.integral_const_mul]

  -- First, let's simplify the integrand using algebra
  have eq1 : (fun x => sqrt (4 - x ^ 2)) = (fun x => 2 * sqrt (1 - (x / 2) ^ 2)) := by
    ext x
    rw [show 4 - x ^ 2 = 4 * (1 - (x / 2) ^ 2) by field_simp; ring]
    rw [sqrt_mul (by norm_num : (0 : ℝ) ≤ 4)]
    norm_num

  conv_lhs => arg 2; rw [eq1]
  rw [intervalIntegral.integral_const_mul]

  -- Now use substitution u = x/2
  have eq2 : ∫ x in (-2)..2, sqrt (1 - (x / 2) ^ 2) = 2 * ∫ u in (-1)..1, sqrt (1 - u ^ 2) := by
    rw [intervalIntegral.integral_comp_div (fun u => sqrt (1 - u ^ 2)) two_ne_zero]
    simp

  rw [eq2]
  rw [integral_sqrt_one_sub_sq]
  -- Now we have: (1 / (2 * π)) * (2 * (2 * (π / 2))) = 1
  simp only [mul_comm, mul_assoc, mul_div_assoc]
  field_simp
  ring

You can check out an example set up in this thread

Bolton Bailey (Jul 13 2025 at 00:14):

Fred Rajasekaran said:

the model being stuck in a loop

Yeah, I have noticed that this sometimes happens. I think it's not just our models that do this kind of thing, it's a general issue with LLMs (see for example this paper for discussion). As a brief anecdote, one of my first experiences with ML4TP was in training a neural net to prove theorems in Coq and it would iterate the same tactics over and over :shrug: I guess a mystery for researchers to ponder in future work.

GasStationManager (Jul 13 2025 at 00:20):

I do have a more constructive point: I think the optimal length of chain-of-thought reasoning is different depending on whether the model will have access to Lean to fix the proof. Kimina Prover seems to prefer very long chains of reasoning, which is optimal when the goal is to produce a correct proof in one shot. But in "Proof Fixing" mode, the feedback from Lean will be very sparse and very delayed. On the other hand, if we know that we will be able to check with Lean and iterate, it becomes better to shorten the chain-of-thoughts in order to increase the density and reduce the latency of Lean feedback. It may even make sense to switch to a draft-sketch-prove style, so that you get incremental feedback as you build the proof piece by piece.

Bolton Bailey (Jul 14 2025 at 00:25):

Ok I have now uploaded the 1.7B and 8B distillations to ollama. I'm still having a bit of trouble uploading the 72B model due to its sheer size.

Bolton Bailey (Jul 14 2025 at 00:30):

Asei Inoue said:

Great! I'd like to use Kimina Prover as a model for Copilot—how can I do that?

In response to this, I thought I should mention - Sean Welleck kindly accepted a PR to the llmlean project to make it compatible with models trained to output markdown, so in principle it should be possible to use these models by following the instructions on that repo for using a kimina model and inserting the ollama identifier for model you want to use.

Unfortunately, I am finding out that my new computer does not have the RAM to handle very large models, so be warned that this is an issue you might experience.

Bolton Bailey (Jul 14 2025 at 00:32):

If you were interested in using these models with lean copilot or GitHb copilot, I'm afraid I don't currently have much expertise with those systems, but I could try to find out more.

Asei Inoue (Jul 14 2025 at 02:05):

@Bolton Bailey

Thank you. I understand that there's a tool called llmlean through which Kimina Prover can be used. I'll give it a try.

Bolton Bailey (Jul 14 2025 at 02:06):

Asei Inoue said:

Bolton Bailey

Thank you. I understand that there's a tool called llmlean through which Kimina Prover can be used. I'll give it a try.

Let me know if you have any issues, it would be good to have data about what could be fixed or made better.

(deleted) (Jul 14 2025 at 02:18):

When will it actually tackle IMO problems? :wink:

(deleted) (Jul 14 2025 at 02:22):

In fact Kimina tackling geometry problems would be very nice—AlphaGeometry isn't very rigorous

Asei Inoue (Jul 14 2025 at 02:33):

I don't know the command to pull the Kimina Prover model via Ollama.
I thought I could pull any model on Hugging Face using Ollama, but that doesn't seem to be the case.

Asei Inoue (Jul 14 2025 at 02:38):

@Bolton Bailey

GasStationManager (Jul 14 2025 at 06:18):

@Bolton Bailey I'm trying out the 1.7B distillation from ollama. When running on the command line (ollama run BoltonBailey/Kimina-Prover-Distill-1.7B), the model doesn't seem to be able to stop; it keeps looping even after it seems to have produced a solution. (This doesn't happen for the 72B model on the demo site when given the same prompts.)

Just tried the 8B version with ollama; it was able to stop sometimes but still loops sometimes. (My prompts were all very simple, like "prove that n*2+1 is odd")

Justin Asher (Jul 14 2025 at 06:24):

There’s normally a way to prevent repetition via a penalty IIRC. For instance, vLLM has a repetition penalty. Would be curious how this works here.

(deleted) (Jul 14 2025 at 06:35):

Justin Asher said:

There’s normally a way to prevent repetition via a penalty IIRC. For instance, vLLM has a repetition penalty. Would be curious how this works here.

Doesn't fix the underlying issue. The underlying issue is the LLM has no idea how to solve the problem.

(deleted) (Jul 14 2025 at 06:38):

Oh wait. For these RL models not following exactly the recommended system prompt can also cause infinite looping. These models aren't flexible.

(deleted) (Jul 14 2025 at 06:57):

@GasStationManager Read this Python file for the correct way to prompt the model: https://github.com/MoonshotAI/Kimina-Prover-Preview/blob/master/kimina_prover_demo/error_fix.py

(deleted) (Jul 14 2025 at 06:58):

You aren't allowed to deviate from the template.

Bolton Bailey (Jul 14 2025 at 17:07):

Asei Inoue said:

I don't know the command to pull the Kimina Prover model via Ollama.
I thought I could pull any model on Hugging Face using Ollama, but that doesn't seem to be the case.

Sorry, the command would be ollama pull BoltonBailey/Kimina-Prover-Distill-1.7B or ollama pull BoltonBailey/Kimina-Prover-Distill-8B, and you would also need to replace the analogous parts of the configuration file described in the llmlean ollama model instructions. I'll make a small PR to try to change the instructions to use the more up to date models.

Ollama is different from HuggingFace, the only reason the models I mentioned above are available on ollama is that I downloaded them from hugging face, reformatted them to the ollama model format, and reuploaded them to ollama.

Bolton Bailey (Jul 14 2025 at 17:16):

GasStationManager said:

Bolton Bailey I'm trying out the 1.7B distillation from ollama. When running on the command line (ollama run BoltonBailey/Kimina-Prover-Distill-1.7B), the model doesn't seem to be able to stop; it keeps looping even after it seems to have produced a solution. (This doesn't happen for the 72B model on the demo site when given the same prompts.)

Just tried the 8B version with ollama; it was able to stop sometimes but still loops sometimes. (My prompts were all very simple, like "prove that n*2+1 is odd")

Maybe this is also relevant to the earlier comment about repetitions: Ollama seems to allow a repetition penalty in its modelfiles. It might be good to try experimenting with this.

suhr (Jul 14 2025 at 17:39):

Bolton Bailey said:

Ok I have now uploaded the 1.7B and 8B distillations to ollama. I'm still having a bit of trouble uploading the 72B model due to its sheer size.

Any plans for a 24B or 32B models? People with a good GPU might appreciate this.

Bolton Bailey (Jul 14 2025 at 18:36):

suhr said:

Any plans for a 24B or 32B models?

Not that I'm aware, sorry!

Asei Inoue (Jul 15 2025 at 06:25):

ollama pull failed.

❯ ollama pull BoltonBailey/Kimina-Prover-Distill-1.7B
pulling manifest
pulling 4276fbcfa23e:   0% ▕                           ▏ 1.4 MB/4.1 GB
Error: max retries exceeded: write C:\Users\11325\.ollama\models\blobs\sha256-4276fbcfa23e9104d10114919e2a1f76d1abe4f9ce64e0364dfe3237d700febb-partial: There is not enough space on the disk.

(deleted) (Jul 15 2025 at 06:34):

Read the error message. You might need a beefier machine.

Asei Inoue (Jul 15 2025 at 14:01):

oh...

Alessandro Sosso (Jul 17 2025 at 12:58):

@Bolton Bailey I am having some issues using the online demo: it seems that as soon as the model finishes generating the proof and its lean formalization, the two dropdown containers with them disappear and I get brought back to the beginning. I am seeing this behaviour on firefox (chromium seems to work just fine).
Any ideas on what might be the problem?

Bolton Bailey (Jul 18 2025 at 14:50):

Weird, I am seeing this too, let me ask if anything changed recently. (At least the pass@16 button seems to be working fine for me, feel free to use that in general)

Bolton Bailey (Jul 18 2025 at 20:09):

Alessandro Sosso said:

the two dropdown containers with them disappear and I get brought back to the beginning. I am seeing this behaviour on firefox (chromium seems to work just fine).
Any ideas on what might be the problem?

Seems this was some kind of bug due to a stream event being closed. It should be fixed now (though it only stopped manifesting for me after I cleared my cookies and refreshed).

Bolton Bailey (Jul 18 2025 at 20:13):

Perhaps now is also a good time to say thanks to everyone for trying the demo - looks like we've gotten several thousand requests over the last few days, and I was happy to see that people were interested in it.

I wanted to relay that we are happy to accept any feedback about it here, or if you want to give feedback on the demo anonymously, you can click the little paper airplane icon on the site.

GasStationManager (Jul 19 2025 at 20:49):

Slightly OT: Kimi-K2 came out a few days ago, and got good reviews. I tried it briefly; it's doing okay in Lean. Anyone else tried it? Any inside gossip on collaborations :)

Alessandro Sosso (Jul 21 2025 at 11:28):

Bolton Bailey said:

Seems this was some kind of bug due to a stream event being closed. It should be fixed now (though it only stopped manifesting for me after I cleared my cookies and refreshed).

After removing the cookies everything works like a charm now :)
Thank you very much!

Asei Inoue (Jul 23 2025 at 12:39):

@Bolton Bailey
I was able to use Kimina Prover via LLM Lean.
Thank you! I’ll give it a try.

Alessandro Sosso (Oct 27 2025 at 17:35):

@Bolton Bailey I am again having some issues with the online Kimina Prover demo. The proof fixing feature doesn't seem to be working, giving back Error: 404 The model kimina-prover-preview does not exist.

image.png

Bolton Bailey (Oct 27 2025 at 19:50):

I'll ask about it, but I didn't actually realize we had a proof fixing mode now and I don't see how to activate it. Can you provide any more details?

Alessandro Sosso (Oct 29 2025 at 19:14):

@Bolton Bailey I am accessing proof fixing by pressing the 'Try to fix' button that appears whenever the generated proof does not compile.

image.png

Bolton Bailey (Nov 03 2025 at 18:10):

I'll send this to someone else on the team and try to see what the issue is (though I doubt kimina will be able to prove this particular theorem even building on this proof :grinning_face_with_smiling_eyes: )

Bas Spitters (Dec 10 2025 at 12:58):

@Bolton Bailey Thanks for the nice work on Kimina. Are there plans on supporting versions of lean later than 4.15 ?

Bolton Bailey (Dec 11 2025 at 01:46):

Bas Spitters said:

Bolton Bailey Thanks for the nice work on Kimina. Are there plans on supporting versions of lean later than 4.15 ?

There are no plans for supporting later lean versions (at least for our prior releases) sorry!

Last updated: Dec 20 2025 at 21:32 UTC