Zulip Chat Archive

Stream: Machine Learning for Theorem Proving

Topic: Performance of models for formalisation

Archie Browne (Sep 02 2025 at 08:16):

Hi all, I am interested in hearing about your experiences in using various language models for formalisation (GPT-5, Opus 4.1, DeepSeek-V3, Kimina, Grok etc...). Which have you found most capable? Are they something you commonly use in your work with formalisation? In particular, if a model has shortcomings, what are they? Are there certain areas of Mathematics that are easier for specific models? Most of all it would be good to hear about their performance on research Mathematics. Any concrete examples would be fantastic. Thanks.

(deleted) (Sep 04 2025 at 03:18):

Rumor has it that GPT-5 with GitHub Copilot is very good

West (Sep 04 2025 at 03:52):

#lean4 > Vibe Proofing

Cross posting my experience.

Last updated: Feb 28 2026 at 14:05 UTC