Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Kimina Prover Preview Demo

Mert Ünsal (Apr 23 2025 at 15:49):

As a follow-up on our Kimina-Prover-Preview paper, we are releasing a demo of our 72B model! You can go to https://demo.projectnumina.ai/ to try it out!

This demo supports two main workflows:

Enter an informal math statement to autoformalize it using the Kimina-Autoformalizer-7B model.
Input a formal statement directly to run proof search with 1 or 16 samples. Note that both the informal and formal statement is passed into the model.

One useful trick we found out is that you can enter a tactic state as an input to the autoformalizer and it will still generate somewhat reliable autoformalizations, which is easier than trying to write a formalization of your sublemma.

Limitations:

Please note that this model is specialized in solving competition-style problems so it might not work particularly well on advanced mathematics.
Sometimes autoformalizer can produce formal statements with contradictory assumptions.
The model is trained on toolchain version 4.15.
This is a preview model and it's a work in progress!

If not sure what to try, you can paste the following problem into the demo: "Show that the period of $cos 2x$ is $pi$."

We wanted to get this into the hands of people to see how people make use of this - we are happy to hear your feedback!

Auguste Poiroux (Apr 23 2025 at 16:18):

Are you collecting the data for potential training? Just to know if we can try benchmark data with it :)
Amazing tool, thanks a lot for making the model accessible!

Jia Li (Apr 23 2025 at 16:23):

Auguste Poiroux said:

Are you collecting the data for potential training? Just to know if we can try benchmark data with it :)
Amazing tool, thanks a lot for making the model accessible!

no we don't keep any data. currently the model runs on very few gpus, so it might be difficult for you run benchmark. (you might get connection error when server is full)

Auguste Poiroux (Apr 23 2025 at 16:26):

Thanks! Oh, I don't want to run the full benchmark, just a few samples by hand to see how it performs

Yakov Pechersky (Apr 23 2025 at 17:09):

It got stuck in a loop:
image.png

Jia Li (Apr 23 2025 at 17:16):

Yakov Pechersky said:

It got stuck in a loop:
image.png

This happen quite often unfortunately. We observe this when the model did not manage to get close to a solution of the problem. You can press stop and maybe try it again

Last updated: May 02 2025 at 03:31 UTC