Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: DeepSeekMath_V2


Deleted User 968128 (Nov 27 2025 at 17:07):

https://github.com/deepseek-ai/DeepSeek-Math-V2/blob/main/DeepSeekMath_V2.pdf

image.png

Deleted User 968128 (Nov 27 2025 at 17:11):

https://github.com/deepseek-ai/DeepSeek-Math-V2/tree/main

image.png

Deleted User 968128 (Nov 27 2025 at 17:27):

(deleted)

Ivan Eric (Nov 27 2025 at 17:30):

https://arxiv.org/abs/2402.03300 is the v1. It introduced the GRPO algorithm r1 was based on

Deleted User 968128 (Nov 27 2025 at 17:36):

Yeah, I just saw that now. I see Zhihong Shao is the only one from the previous papers? Not a lot in common https://huggingface.co/deepseek-ai/deepseek-math-7b-instruct/blob/main/config.json versus https://huggingface.co/deepseek-ai/DeepSeek-Math-V2/blob/main/config.json

Deleted User 968128 (Nov 27 2025 at 17:51):

TBH, I think this is more of a continuation of DeepSeek-Prover-V2 (which they actually cite in the paper). v1 was based on answer, while prover was more about RL on proofs.https://arxiv.org/abs/2504.21801

Interesting

We then train a proof generator using the verifier as the reward model, and incentivize the generator to identify and resolve as many issues as possible in their own proofs before finalizing them. To maintain the generation-verification gap as the generator becomes stronger, we propose to scale verification compute to automatically label new hardto-verify proofs, creating training data to further improve the verifier.

So much output, hard to keep track sometimes.

Deleted User 968128 (Nov 27 2025 at 18:02):

Lean is turning into a bit of a bottleneck. More SAT stuff might be a good idea (maybe) https://www.quantamagazine.org/to-have-machines-make-math-proofs-turn-them-into-a-puzzle-20251110/

Gavin Zhao (Nov 27 2025 at 19:07):

Lean is turning into a bit of a bottleneck.

I heard an interesting point about this from an author of AlphaGeometry. They said that writing Lean proofs require one to consider many things that are important for logical completeness but irrelevant for the actual proof insights/progress, which is not how humans approach proofs.

So if you're trying to train an LLM to think more like mathematicans, theoretically you're likely to have much better results if you simulate the human thought process by only considering the important parts of the solution. Basically outputting Lean proofs distracts the LLM from actually finding the solution.

To remedy this, you either:

  1. Don't use Lean at all and instead use a slightly less robust logical framework that discards the trivial/distracting parts of the proof, which is what AlphaGeometry does. Formally encoding orientations is very cumbersome so AlphaGeometry relies on diagram checks, which is not completely sound but gets the job done.
  2. Maybe split up the tasks to have a dedicated model writing the Lean proofs so the model that's doing the hard work of finding the solution is not aware of the existence of Lean and is only focused on solving the problem.

Deleted User 968128 (Nov 27 2025 at 19:17):

This seems to be the key idea:

While using informal reasoning to guide formal proof generation has been explored extensively (Jiang et al., 2023), recent reasoning models have dramatically improved informal reasoning quality, making this guidance far more effective

I think what we're seeing right now the gains are coming from scaling and training on the work of mathematicians, though the pre-training set did have some synthetic data (though that model that generated them had been mostly training on the works of mathematicians).

LLM is probably not the best pure RLVR approach for math, but clearly good for extracting knowledge from humans.

Deleted User 968128 (Nov 27 2025 at 19:22):

I suspect Ilya might be on to something and the scaling of LLMs is coming to an end. A new continual learning approach (or at least a significant change in architecture) is going to be required to get to the next set of leaps.

Junyan Xu (Nov 28 2025 at 17:21):

Isn't it a bit weird no one is hosting the model online yet (including DeepSeek themselves)? The architecture isn't different from their flagship models, right? (I asked Perplexity to do the research for me.)

As of yesterday, "the model isn't deployed by any inference providers. we'll integrate it as soon as it becomes available."

Deleted User 968128 (Nov 28 2025 at 20:23):

This model is unlikely to get good support as it is expensive to host and very low traffic.

But if you do want it, here you go:

https://endpoints.huggingface.co/new?repository=deepseek-ai/DeepSeek-Math-V2

I've been able to get this to work for other models. No promises for this one.

Junyan Xu (Nov 28 2025 at 22:07):

Maybe people will make a distilled model for AIMO ...

Anh Nguyễn (Nov 29 2025 at 02:32):

https://x.com/nrehiew_/status/1994038917277077853

Anh Nguyễn (Nov 29 2025 at 02:32):

A concise description


Last updated: Dec 20 2025 at 21:32 UTC