Zulip Chat Archive
Stream: Machine Learning for Theorem Proving
Topic: EvoGPT-f Lean Benchmark
John Mercer (Feb 28 2024 at 14:41):
Hi Lean Community,
Pleased to announce EvoGPT-f: An Evolutionary GPT Framework for Benchmarking Formal Math Languages
This includes an estimated improvement of machine learnability of Lean 4 over Lean 3. Look forward to your feedback! :raised_hands:
Abstract
Formal mathematics is the discipline of translating mathematics into a programming language in which any statement can be unequivocally checked by a computer. Mathematicians and computer scientists have spent decades of painstaking formalization efforts developing languages such as Coq, HOL, and Lean. Machine learning research has converged on these formal math corpora and given rise to an assortment of methodologies to aid in interactive and automated theorem proving. However, these papers have primarily focused on one method, for one proof task, in one language. This paper introduces EvoGPT-f: a novel evolutionary framework for the first systematic quantitative analysis of the differential machine learnability of five formal math corpora (Lean 3, Lean 4, Coq, HOL 4, HOL Light) using four tokenization methods (character, word-level, Byte Pair Encoding and StarCoder tokenizer). This paper does not put to rest the question of the "best" or "easiest" language to learn. Rather, this framework and preliminary findings begin to illuminate the differential machine learnability of these languages, offering a foundation to forge more systematic quantitative and qualitative comparative research across communities.
Martin Harrison (Feb 28 2024 at 20:50):
this is interesting but I am not sure that I understand the evolutionary aspect of it - the purpose is to assess machine learnability and an evolutionary framework lets you do that?
John Mercer (Feb 28 2024 at 23:59):
Martin Harrison said:
this is interesting but I am not sure that I understand the evolutionary aspect of it - the purpose is to assess machine learnability and an evolutionary framework lets you do that?
Hi Martin,
Yep, "Here, we will treat the respective formal languages as the environment in which an evolutionary process must take place, where natural selection forces populations of Transformers to optimize their training hyperparameters and architectures to best learn the statistical structure of the language."
Notification Bot (Feb 29 2024 at 00:44):
Junyan Xu has marked this topic as resolved.
Notification Bot (Feb 29 2024 at 00:44):
Junyan Xu has marked this topic as unresolved.
Martin Harrison (Feb 29 2024 at 12:45):
is there something about the evolutionary framework that addresses the question of machine learnability in a definitional way or is it one heuristic among others? It seems that you are looking at a bigger family of functions than a single architecture and training configuration would afford, is that the main significance of the evolutionary framework?
John Mercer (Mar 03 2024 at 21:10):
Martin Harrison said:
is there something about the evolutionary framework that addresses the question of machine learnability in a definitional way or is it one heuristic among others? It seems that you are looking at a bigger family of functions than a single architecture and training configuration would afford, is that the main significance of the evolutionary framework?
Hi Martin,
Apologies for the delay! An evolutionary framework isn't the only way to benchmark. I designed it this way because transformers are the gold-standard of language learning, so casting this in an evolutionary method and spanning tokenization methods was a means to distill and normalize the inherent learnability of the language/formalism. Is this wording more helpful?
Martin Harrison (Apr 26 2024 at 17:18):
John Mercer said:
Is this wording more helpful?
Not really. It leaves me wondering what is meant by "inherent learnability", which is what I was wondering to begin with and the reason I asked if it was being addressed in a definitional way. It sounds like there is a property of languages called learnability and that this approach is able to quantify that at least in relative terms.
Last updated: May 02 2025 at 03:31 UTC