Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: DeepSeek-Prover-V1.5

Huajian Xin (Aug 16 2024 at 04:03):

We have submitted a new technical report on arXiv introducing DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4. This model demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the miniF2F benchmark (63.5%) and the ProofNet benchmark (25.3%).

DeepSeek-Prover-V1.5 enhances its predecessor by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with a specialization in formal mathematical languages, it undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths.

The base, SFT, and RL models have been open-sourced on Hugging Face.

Notification Bot (Aug 16 2024 at 04:44):

A message was moved from this topic to #mathlib maintainers > spam by Kim Morrison.

Yufan Zhao (Aug 16 2024 at 07:35):

Huajian Xin said:

We have submitted a new technical report on arXiv introducing DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4. This model demonstrates significant improvements over DeepSeek-Prover-V1, achieving new state-of-the-art results on the miniF2F benchmark (63.5%) and the ProofNet benchmark (25.3%).

DeepSeek-Prover-V1.5 enhances its predecessor by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with a specialization in formal mathematical languages, it undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-V1. Further refinement is achieved through reinforcement learning from proof assistant feedback (RLPAF). Beyond the single-pass whole-proof generation approach of DeepSeek-Prover-V1, we propose RMaxTS, a variant of Monte-Carlo tree search that employs an intrinsic-reward-driven exploration strategy to generate diverse proof paths.

The base, SFT, and RL models have been open-sourced on Hugging Face.

Hi, I just tried running the model, but I'm getting errors regarding "DeepseekConfig" and "DeepseekV2Config".

More specifically, running the huggingface code directly gives the error:

File "./.cache/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Prover-V1.5-RL/944fec2f7a2006a2a22ded13b187feb4304b48c2/modeling_deepseek.py", line 57, in <module>
from .configuration_deepseek import DeepseekV2Config
ImportError: cannot import name 'DeepseekV2Config' from 'transformers_modules.deepseek-ai.DeepSeek-Prover-V1.5 RL.944fec2f7a2006a2a22ded13b187feb4304b48c2.configuration_deepseek' (./.cache/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Prover-V1.5-RL/944fec2f7a2006a2a22ded13b187feb4304b48c2/configuration_deepseek.py)

Modifying the class name in the configuration_deepseek.py file (from DeepseekConfig to DeepseekV2Config) gives another error essentially saying that "somewhere else is trying to import DeepseekConfig and can't find this attribute".

Are the DeepseekConfig and DeepseekV2Config classes supposed to be the same thing? Would renaming them in the code be a safe fix for these errors?

llllvvuu (Aug 16 2024 at 12:41):

Based on the weight dictionary (particularly the self_attn keys), this seems to be using the DeepSeek MoE architecture which is not the same as the DeepSeek V2 architecture.

These are both implemented in vLLM:
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/deepseek.py
https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/deepseek_v2.py

but only V2 has a WIP PR to huggingface/transformers. Maybe people probably forgot about DeepSeek MoE now that DeepSeek V2 is out. If I have time I'll send a PR, otherwise you can raise an issue with DeepSeek team.

llllvvuu (Aug 16 2024 at 16:54):

For MacOS, I have PR'd: https://github.com/ml-explore/mlx-examples/pull/942

I will make a similar PR for huggingface/transformers when I get a chance

llllvvuu (Aug 17 2024 at 03:07):

This should work for running with huggingface/transformers:

Look in the following directories for modeling_deepseek.py

~/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-Prover-V1.5-RL/snapshots
~/.cache/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Prover-V1.5-RL

Replace with the modeling_deepseek.py from DeepSeek MoE V1: https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py
Try again:

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline

model_id = "deepseek-ai/DeepSeek-Prover-V1.5-RL"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer)
pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256, streamer=streamer
)
text = """import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\\frac{2\\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by"""
outputs = pipe(text, max_new_tokens=256)
response = outputs[0]["generated_text"]
print(response)

Currently working on integration into the main repo: https://github.com/huggingface/transformers/pull/32862
As well as as to fix the HF custom code: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL/discussions/1

It is extremely slow on MacBook since bitsandbytes quantization only supports CUDA. So, probably those with CUDA will get better performance with vLLM, while those with MacBook will get better performance with mlx_lm

Kevin Ishimwe (Aug 19 2024 at 16:00):

llllvvuu said:

This should work for running with huggingface/transformers:

Look in the following directories for modeling_deepseek.py
~/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-Prover-V1.5-RL/snapshots
~/.cache/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Prover-V1.5-RL
Replace with the modeling_deepseek.py from DeepSeek MoE V1: https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py

Try again:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline

model_id = "deepseek-ai/DeepSeek-Prover-V1.5-RL"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer)
pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256, streamer=streamer
)
text = """import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\\frac{2\\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by"""
outputs = pipe(text, max_new_tokens=256)
response = outputs[0]["generated_text"]
print(response)
Currently working on integration into the main repo: https://github.com/huggingface/transformers/pull/32862
As well as as to fix the HF custom code: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL/discussions/1

It is extremely slow on MacBook since bitsandbytes quantization only supports CUDA. So, probably those with CUDA will get better performance with vLLM, while those with MacBook will get better performance with mlx_lm

i tried using the hugging face transformer like in the example above, but I'm getting

$ python3 run_model.py
The repository for deepseek-ai/DeepSeek-Prover-V1.5-RL contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/deepseek-ai/DeepSeek-Prover-V1.5-RL.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
The repository for deepseek-ai/DeepSeek-Prover-V1.5-RL contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/deepseek-ai/DeepSeek-Prover-V1.5-RL.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y

this gives

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn

I looked into this and it says flash_attn requires CUDA to run so it is not available for Mac.
any work around here! even mlx_lm is not working

llllvvuu (Aug 21 2024 at 14:04):

Kevin Ishimwe said:

llllvvuu said:

This should work for running with huggingface/transformers:

Look in the following directories for modeling_deepseek.py
~/.cache/huggingface/hub/models--deepseek-ai--DeepSeek-Prover-V1.5-RL/snapshots
~/.cache/huggingface/modules/transformers_modules/deepseek-ai/DeepSeek-Prover-V1.5-RL
Replace with the modeling_deepseek.py from DeepSeek MoE V1: https://huggingface.co/deepseek-ai/deepseek-moe-16b-base/blob/main/modeling_deepseek.py

Try again:
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer, pipeline

model_id = "deepseek-ai/DeepSeek-Prover-V1.5-RL"
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
streamer = TextStreamer(tokenizer)
pipe = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, max_new_tokens=256, streamer=streamer
)
text = """import Mathlib
import Aesop

set_option maxHeartbeats 0

open BigOperators Real Nat Topology Rat

/-- The second and fourth terms of a geometric sequence are $2$ and $6$. Which of the following is a possible first term?
Show that it is $\\frac{2\\sqrt{3}}{3}$.-/
theorem amc12b_2003_p6 (a r : ℝ) (u : ℕ → ℝ) (h₀ : ∀ k, u k = a * r ^ k) (h₁ : u 1 = 2)
  (h₂ : u 3 = 6) : u 0 = 2 / Real.sqrt 3 ∨ u 0 = -(2 / Real.sqrt 3) := by"""
outputs = pipe(text, max_new_tokens=256)
response = outputs[0]["generated_text"]
print(response)
Currently working on integration into the main repo: https://github.com/huggingface/transformers/pull/32862
As well as as to fix the HF custom code: https://huggingface.co/deepseek-ai/DeepSeek-Prover-V1.5-RL/discussions/1

It is extremely slow on MacBook since bitsandbytes quantization only supports CUDA. So, probably those with CUDA will get better performance with vLLM, while those with MacBook will get better performance with mlx_lm

i tried using the hugging face transformer like in the example above, but I'm getting

$ python3 run_model.py
The repository for deepseek-ai/DeepSeek-Prover-V1.5-RL contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/deepseek-ai/DeepSeek-Prover-V1.5-RL.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
The repository for deepseek-ai/DeepSeek-Prover-V1.5-RL contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/deepseek-ai/DeepSeek-Prover-V1.5-RL.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y

this gives

ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run `pip install flash_attn

I looked into this and it says flash_attn requires CUDA to run so it is not available for Mac.
any work around here! even mlx_lm is not working

I have an improvement to step 1: in both locations, delete configuration_deepseek.py and modeling_deepseek.py (after backing them up somewhere), and replace config.json with:

{
    "architectures": [
        "LlamaForCausalLM"
    ],
    "attention_bias": false,
    "attention_dropout": 0.0,
    "bos_token_id": 100000,
    "eos_token_id": 100001,
    "hidden_act": "silu",
    "hidden_size": 4096,
    "initializer_range": 0.02,
    "intermediate_size": 11008,
    "max_position_embeddings": 4096,
    "model_type": "llama",
    "num_attention_heads": 32,
    "num_hidden_layers": 30,
    "num_key_value_heads": 32,
    "pretraining_tp": 1,
    "quantization": {
        "group_size": 64,
        "bits": 8
    },
    "rms_norm_eps": 1e-06,
    "rope_scaling": null,
    "rope_theta": 10000,
    "tie_word_embeddings": false,
    "torch_dtype": "bfloat16",
    "transformers_version": "4.33.1",
    "use_cache": true,
    "vocab_size": 102400
}

Then try step 2 again. The "trust_remote_code" message should be gone (if not, let me know). mlx_lm.convert should also work at this point.

Seth Ahrenbach (Aug 21 2024 at 19:47):

How do we have the model hosted as a Hugging Face Inference Endpoint?

Hanting Zhang (Jan 29 2025 at 05:09):

(deleted)

Last updated: May 02 2025 at 03:31 UTC