Zulip Chat Archive
Stream: general
Topic: Hardware suggestions?
fzyzcjy (Feb 06 2023 at 09:44):
Hi friends, I wonder what is the most cost-effective way of using Lean? More specifically,
- Should I use AMD or Intel CPUs? (Some say AMD is more cost-effective, while some software are less optimized on it)
- How much memory should I have? (e.g. CPU:memory = 1:2, 1:4, or 1:8?)
- Any other suggestions about hardware?
- Is hyper-threading useful or useless for Lean? (e.g. a CPU with hyperthreading will say it is "8 core 16 threads", then will Lean work well on the doubled threads?)
Some more context if you are interested: I am reproducing a paper that uses Lean to do theorem proving, thus uses Lean extensively by calling it really a lot lot of times. The original paper even uses 10000 cpu cores.
Thanks for any ideas!
Jason Rute (Feb 06 2023 at 12:25):
I don’t think this answers your question, but if you are not aware, there is a stream #Machine Learning for Theorem Proving which discusses papers like (and including) this one. (I think you question however is in the correct stream.)
Jason Rute (Feb 06 2023 at 12:26):
I assume this is Lean 3, like in that paper?
fzyzcjy (Feb 06 2023 at 12:31):
@Jason Rute Hi thanks for the reply! I am in that stream (and has read many of your posts and has discussed with you indeed ;) ). Yes it is Lean 3 (though maybe after a few years we may use Lean 4?)
Jason Rute (Feb 06 2023 at 12:33):
One idea is to reach out to the authors and ask them about some of these details. I don’t know how much they optimized everything, but they have been pretty open about details of their paper. I also imagine if you want to especially optimize this a lot of experimentation is needed. Only Meta and OpenAI have done large scale reinforcement learning on Lean. I think OpenAI just threw compute at the problem. I know Meta did a bit more optimization such as writing some of the key parts in C++.
Jason Rute (Feb 06 2023 at 12:37):
As for memory, Lean needs a lot of memory to compile mathlib. But I don’t think you will be doing anything like that in your use case since it will likely be already compiled. But you probably will be wanting to load all of mathlib, so you should see how much memory that uses.
fzyzcjy (Feb 06 2023 at 12:43):
@Jason Rute Thanks for the insights!
One idea is to reach out to the authors and ask them about some of these details
I have background in open source (as can be seen in my github profile), so I originally expect people discuss details publicly (e.g. on GitHub). Since they do not release code (or more details), do you suggest me to send an email, or is there any other standard approaches to reach out that I (as a newbie) do not know?
But you probably will be wanting to load all of mathlib, so you should see how much memory that uses.
Totally agree I should not need to compile mathlib. So how do I measure "load all of mathlib"? I am using https://github.com/openai/lean-gym with minif2f
and mathlib
(with your helpful https://github.com/jasonrute/lean_proof_recording as training data btw). The question is, currently I just start the lean-gym command line and use it. Is it the memory that it loads all of mathlib, or shall I somehow trigger something before lean-gym loads the whole mathlib?
Johan Commelin (Feb 06 2023 at 12:45):
Loading all of mathlib in RAM requires ~16 GB to run comfy.
fzyzcjy (Feb 06 2023 at 12:48):
@Johan Commelin Thank you for the numbers! So do you know how many CPU cores does one serial and full-speed-running Lean usually uses? (I have used Lean by myself, but not sure whether it is representative since I am still learning Lean and use it only for very simple examples)
Background: The https://github.com/openai/lean-gym I am using is a Lean package, which accepts inputs like "execute apply add_comm
" by a machine and apply such tactics to goals. It is serial, i.e. can only process one single tactic at a time.
Johan Commelin (Feb 06 2023 at 12:53):
I run with 10 cpus, and I can easily max them out
Johan Commelin (Feb 06 2023 at 12:53):
I guess you can probably run with up to 40 cpus and use all of them a large amount of the time, when compiling mathlib
fzyzcjy (Feb 06 2023 at 12:55):
@Johan Commelin Thank you!
I will not compile mathlib indeed, but will run proof searches on https://github.com/facebookresearch/miniF2F, such as:
theorem imo_1974_p3
(n : ℕ) :
¬ 5∣∑ k in finset.range (n + 1), (nat.choose (2 * n + 1) (2 * k + 1)) * (2^(3 * k)) :=
begin
sorry
end
Johan Commelin (Feb 06 2023 at 12:56):
But do you want to run many of them at the same time?
Johan Commelin (Feb 06 2023 at 12:56):
I don't know how those Lean-AI packages work. But I imagine they benefit from big GPUs.
fzyzcjy (Feb 06 2023 at 12:58):
Yes and no. The lean-gym
currently does not support parallelism at all, so I have to wait for one tactic to finish before trying the second tactic. But if it is a big problem, I may try to hack that code.
Totally agree, the most expensive part is using a ton of big GPUs to run the AI. But at the same time the paper uses Lean - so I have to know CPU/memory usage of Lean as well.
Jason Rute (Feb 06 2023 at 12:59):
To load all of mathlib, search on Zulip on how to make an all.lean
file. There is a script in mathlib that will do it for you.
Jason Rute (Feb 06 2023 at 13:00):
I think running LeanGym on all theorems using one call of simp would give a decent baseline benchmark also. Or maybe try other variations closer to your workflow.
fzyzcjy (Feb 06 2023 at 13:01):
@Jason Rute Thanks! The lean-gym setup.sh already makes an all.lean
including all import
s.
I think running LeanGym on all theorems using one call of simp would give a decent baseline benchmark also.
I see. will try that.
Jason Rute (Feb 06 2023 at 13:01):
As for contacting the authors, I’d shoot them an email or direct message.
fzyzcjy (Feb 06 2023 at 13:02):
@Jason Rute Thanks! So about "direct message", do you mean Zulip or Twitter or somewhere else?
Johan Commelin (Feb 06 2023 at 13:03):
most of them are on zulip
Jason Rute (Feb 06 2023 at 13:03):
Wherever you can find them, but they are on Zulip, yes.
fzyzcjy (Feb 06 2023 at 13:04):
Get it. Thank you guys for the kind replies!
Last updated: Dec 20 2023 at 11:08 UTC