Zulip Chat Archive

Stream: new members

Topic: RDF Reasoning with Lean, Theorem Databases & Formalising ML

Jesse Wright (Jan 19 2024 at 12:07):

I recently attended a fantastic talk that @Kevin Buzzard gave at the Oxford CS department; having spent some time mulling it over, Lean seems quite promising for several projects I am involved with - and I wanted to ask about whether there has been any work in the following directions I’m interested in:

Building RDF Reasoning Engines using Lean: A large amount of my work is related to structured reasoning over knowledge graphs. Typically this is done using Description Logics. In particular, I am currently investigating whether I can implement the RDF Surfaces specification (which has a correspondence with first order logic) in Lean. Has there been any work on the implementation of domain specific automated reasoners that include proof of their results that I can be pointed towards?

Mathematics Database generated from theorems proven in Lean: I’ve seen it mentioned in a couple of Kevin’s talks that this is something that Tom Hales has interest in - and was also an idea mentioned in this grant proposal. Since my background is in working with the Semantic Web and Solid; so I would be interested in generating a Knowledge Graph of theorems proven in Lean (including the proofs) so that the theorems themselves can be entities on the Web - and linked to from other domains. Has there been any substantial work on this.

Proofs of Machine Learning theorems: I can also see Lean being quite relevant to colleagues working on foundational ML models. Before I take some of the ideas I have to them I was wondering if there has been any work to formalize ML related proofs - such as the Universal Approximation Theorem in Lean?

Jason Rute (Jan 19 2024 at 13:28):

It's not Lean, but my colleagues at IBM research have some work in Coq formalizing machine learning theory including Divoretsky's approximation theory and the convergence of some classical RL algorithms. https://github.com/IBM/FormalML

Jason Rute (Jan 19 2024 at 13:28):

As for the universal approximation theory, I would assume it would be a simple thing to prove if you already have the standard approximation theorems (such as continuous functions are approximable by polynomials, piecewise affine functions, etc, and integrable/measurable functions are approximable by the same as well as step functions). I assume these are already in mathlib, and indeed I can find various lemmas of this form with moogle.ai.

Jason Rute (Jan 19 2024 at 13:38):

Do you really think theorems would fit into a nice knowledge graph format? If so, what would it look like? I'm all for applying lots of different methods (theorem proving, AI, better cataloging) to unify and explore the mathematical universe, but I think mathematics by its very nature is a bit hard to put into a clean box. (But maybe I'm not understanding the power and expressivity of knowledge graphs, and if there were good knowledge graph representations, that would indeed be interesting.)

Jason Rute (Jan 19 2024 at 13:45):

As for RDF (which I know nothing about), is your goal to make Lean tools that:

Reason over knowledge graphs and output proofs? This sounds like making tactics, although if your proof format is just a formal object deeply embedded into Lean, then outputting a proof could just be outputting some formal object in Lean code.
Develop formally verified RDF algorithms to give correct deductions? If you describe the mathematics of knowledge graphs and whatever else in Lean, you can write algorithms to do stuff and then prove they are correct.
Or is your goal to actually reason about theorems in Lean with RDF? I don't know what this would look like, but again it sounds like a form of a tactic.

Jason Rute (Jan 19 2024 at 13:48):

Also, as for mathematical databases, here is the current Lean one (including Lean core, STD, and Mathlib): https://leanprover-community.github.io/mathlib4_docs/index.html. You can search over it with Loogle or Moogle.

Jesse Wright (Jan 19 2024 at 15:00):

Thankyou for the detailed responses and links - they are very useful starting points!

Jason Rute said:

Do you really think theorems would fit into a nice knowledge graph format? If so, what would it look like?

On expressiveness - a knowledge graph is essentially just a multi-relational directed graph. However, one is also permitted to define arbitrary relations for the graph as you go. This means, for instance, you can could encode any abstract syntax tree in a KG. The function ontology - which is a vocabulary for describing functions and algorithms in knowledge graphs - is a starting point to describe programming functions in a more language agnostic manner.

My impression is if a theorem can be formalised in Lean - then it should be possible to map that formalization to a KG - given that at some point Lean files are going to be parsed into an abstract syntax tree (not that this would be the best KG representation, this is just my reasoning as to why a KG representation exists).

As for RDF (which I know nothing about), is your goal to make Lean tools that:

The goal is to "Reason over knowledge graphs and output proofs" in the sense that we have sets of rules (implications from decidable fragments of first order logic) which we recursively apply to the Knowledge Graph to derive new information. As a basic example using the data :Jesse a :Student . :Student a :Human and the rule ?x a ?c \land ?c subClassOf ?c2 => ?x a ?c2 one derives :Jesse a :Human. The purpose of the proof is to enable one system - that is potentially malicious - to prove/explain to another system how it derived the data.

outputting some formal object in Lean code.

Indeed - this is quite close to what I think I want - except I want to map this proof back in terms of RDF in order to be interoperable with existing RDF Proof checkers (e.g. CWM and EYE) when using rules from fragments of FOL that they are programmed to understand - I'll have a crack at this in the coming weeks and report back.

Last updated: May 02 2025 at 03:31 UTC