Zulip Chat Archive

Stream: general

Topic: SOTA on symbolic algebraic systems written in Lean4


Srivatsa Srinivas (Jan 08 2025 at 23:39):

Hi,

I wanted to know what is the SOTA symbolic algebra system written in Lean compatible with mathlib?

Best,
Vatsa

Jason Rute (Jan 09 2025 at 01:17):

What is one “symbolic algebra system written in Lean compatible with mathlib”?

Jason Rute (Jan 09 2025 at 01:21):

There are tactics that do various degrees of manipulation, but is there anything like Mathematica, Maple, or Sage? Also Mathlib isn’t really about computation. (Like I thought I heard polynomials in Mathlib aren’t even something you can compute with?)

Jason Rute (Jan 09 2025 at 01:26):

I’m not an expert so feel free to tell me there is something out there that does fit this criteria. It would be really cool if there was. A long time ago I saw a paper by Cezary Kaliszyk and @Freek Wiedijk about creating a verified computer algebra system prototype in HOL Light.

Jason Rute (Jan 09 2025 at 01:27):

And @Rob Lewis et al once made a Lean-Mathematica interface

Jason Rute (Jan 09 2025 at 01:33):

I guess if one did make a CAS in Lean they would need to figure out what data types to use for symbolic expressions? If they use Expr then you couldn’t prove things about the algorithm I think, but it would be compatible with tactics and you would enter the expressions as Lean code. If they used custom symbolic expression datatypes, then they could use them for computing and interfacing with Mathlib, but you would enter them in a different way that you normally enter expressions in Lean. (I haven’t really thought this through. Maybe someone else has.)

Srivatsa Srinivas (Jan 09 2025 at 15:58):

@Jason Rute Thank you for the comprehensive answer. It would be nice to have a CAS built in Lean4 because that would reduce the dependency on SageMath, Mathematica, Maple etc. I was just wondering if anyone had already started this project, because if not I will apply to the AI in Math grant with a proposal along the lines of writing a CAS in lean

Srivatsa Srinivas (Jan 09 2025 at 16:00):

I think the Agda style of programming bridges the gap between computation and proof more clearly, but is also not nearly as performant. Having read the codebase of SymPy it would be fun to try and see what a "Lean-ish" CAS would look like

Jason Rute (Jan 09 2025 at 16:30):

Also see https://leanprover.zulipchat.com/#narrow/channel/113488-general/topic/What.20should.20Sage.20look.20like.20in.20Lean.3F

Jason Rute (Jan 09 2025 at 16:44):

What design plans do you have? For example, would it be run inside a Lean file, or in a separate application? If inside Lean, how would you implement something like Simplify(x^2 + (x+1)^2)? Would Simplify be a lean function, a tactic, or a totally different thing? And what would x^2 + (x+1)^2 be? It couldn’t be an object of type Reals -> Reals since you can’t compute with those and because of functional extensionality it would just be the identity function). It could be of typeExpr. You can compute with that, and that is how tactics work, but would you be returning an expression then? Can one connect expressions like this to Mathlib types? It might be possible (and maybe tactics like ring_nf already do this sort of thing). Or you could have your own intermediate expression type. I would certainly like to see something like a formally verified CAS, and it seems like once one works out the design it could share a lot of code with existing tactics and with Mathlib.

Srivatsa Srinivas (Jan 10 2025 at 00:10):

I think that type isomorphisms would be the main benefit of writing a CAS in dependent type theory. For example, one could prove that K[x] is isomorphic via a function \phi to elements of \sigma_{n \in \mathbb{N}_{\geq 0}} Vec n K \times \mathbb{N}_{\geq 0} sorted by the second value in the pair. This isomorphism will commute with equality (as in x = y iff \phi (x) = \phi(y)). We would then perform all the computations on side of the type isomorphism and then pull back the equality to the other side.

I guess the main benefit would be writing all the CAS algorithms in Lean4, with types that are isomorphic to standard objects in mathematics. This can give performance boosts as the Lean4 virtual machine gets better, reduce errors in implementations of algorithms and provide "Batteries" in order to rigorously perform calculations that appear in research

Srivatsa Srinivas (Jan 10 2025 at 00:15):

(deleted)

Junyan Xu (Jan 10 2025 at 02:41):

I had some success in using typeclass inference to synthesize a computable representation for a term of type Polynomial _ (constructed using possibly noncomputable operations) here: #new members > Compute roots of polynomials @ 💬 It's not clear whether it's a superior or inferior approach compared to tactics.

You might also be interested in some relatively recent/ongoing work #12611 to improve the performance of certain #eval calls for symmetric groups.

Jason Rute (Jan 10 2025 at 03:13):

@Srivatsa Srinivas by CAS do you just mean computing with mathematical objects? Like a counterpart to Mathlib that focuses on computation?

Srivatsa Srinivas (Jan 10 2025 at 03:29):

@Jason Rute Originally, I thought that it would be nice to create a CAS where the outputs are "refl" objects, for example Simplify(x^2 + (x+1)^2) would return a term of type x^2 + (x+1)^2 = 2x^2 + 2x + 1, but after some deliberation, I think the Lean-ish way of creating a term of the type x^2 + (x+1)^2 = 2x^2 + 2x + 1 : Prop would be the goal. I understand that tactics already do this.

But I would like to write the "backend that does the calculation" in Lean4 as opposed to whatever the implementation is in SageMath. I think that this would be productive as dependent type theory can basically express most every mathematical computation we care about and so the "backend" has potential to be much more rigorous. Eventually, the most pragmatic approach would be to convert every "backend computation" into a proposition, because even defining a data type for sorted vectors, which are equivalent to polynomials, seems awful

Srivatsa Srinivas (Jan 10 2025 at 03:35):

I guess I am just rediscovering what a tactic is. But it would be nice to have a package of standard CAS algorithms such as Berkelamp, Gosper etc. implemented in Lean4 that generate propositions

Luigi Massacci (Jan 10 2025 at 10:09):

Note that it is not necessary to do everything in lean. One of mathlib’s most successful tactics is polyrith, which calls on a Sage procedure. You only need to be able to reconstruct the proof in lean, and that can be done by modifying whatever software is already there so that it provides some intermediate steps instead of just the final result. You might find this approach to be more practical, instead of doing everything from scratch

Srivatsa Srinivas (Jan 10 2025 at 17:09):

@Luigi Massacci Great point. Do you think it would not be easier to construct proof certificates from CAS algorithms written in Lean4 as opposed to those written in XYZ language that SageMath calls? I have no clue as I'm starting to get my toes wet with lean after experience with Haskell

Srivatsa Srinivas (Jan 10 2025 at 17:11):

I'd imagine having access to terms of the type : Prop would make it easier, as opposed to creating an ad-hoc method per program in SageMath

Luigi Massacci (Jan 10 2025 at 19:36):

Well, it depends on how much work it is to write the algorithm from scratch, and whether it is possible to verify the output of a computation without having to verify the computation itself. For example you mentioned polynomial factorisation in finite fields: when using a tactic like that, you don’t care how the factors have been generated anyway, so you can use other (already existing) software to give you the polynomials, and then have lean verify only that they are actually factors, which is significantly easier!

Luigi Massacci (Jan 10 2025 at 19:38):

(polyrith does more or less this for a similar problem)

Srivatsa Srinivas (Jan 10 2025 at 20:55):

Thanks for the explanation. I now see that why it might be futile to write a CAS in Lean in order to generate proof certificates for computations whose answer itself contains a proof certificate

Kim Morrison (Jan 11 2025 at 00:07):

In the long run, sure, let's have an entire new CAS written in Lean, integrated with the Mathlib in some form! But that's not the low-hanging fruit, as Luigi has explained.

I do hope to see lots more "interesting mathematical algorithms" implemented in Lean, even without proofs, and even without solving the giant problem of "how to represent polynomials for computation". Can we have LLL or PSLQ implemented in Lean, please? :-)

Srivatsa Srinivas (Jan 11 2025 at 00:24):

That sounds like a good goal!

Luigi is definitely correct that a CAS in Lean will not be an improvement over current technology for the most frequent use cases, but it might be a good academic exercise to construct a CAS in a language with dependent types.

Proofs of algorithms are also very important, as teaching an AI to create new algorithms rigorously would definitely be a good thing.

Unrelated question, @Luigi Massacci do you know if Lean has any strategies for parallel programming?

Luigi Massacci (Jan 11 2025 at 08:37):

I don’t have the faintest idea, you should go ask in #lean4 or #metaprogramming / tactics

Siddhartha Gadgil (Jan 11 2025 at 10:25):

Lean has a Task which allows parallel computing. I have used this. But it's rather low level as far as I know

Srivatsa Srinivas (Jan 11 2025 at 13:43):

Thanks!

Jz Pan (Jan 13 2025 at 06:20):

Srivatsa Srinivas said:

Originally, I thought that it would be nice to create a CAS where the outputs are "refl" objects, for example Simplify(x^2 + (x+1)^2) would return a term of type x^2 + (x+1)^2 = 2x^2 + 2x + 1, but after some deliberation, I think the Lean-ish way of creating a term of the type x^2 + (x+1)^2 = 2x^2 + 2x + 1 : Prop would be the goal. I understand that tactics already do this.

I think CAS in Lean could be:

#cas simplify (x^2 + (x+1)^2)

Then it outputs:

Try this: have : x^2 + (x+1)^2 = 2*x^2 + 2*x + 1 := by ring

And

#cas simplify ((Real.sin x)^2 + (Real.cos y)^2)

Then it outputs:

Try this: have : (Real.sin x)^2 + (Real.cos y)^2 = 1 := Real.sin_sq_add_cos_sq x

(This two examples are a bit too easy, as they already covered by existing results/tactics in mathlib. But I couldn't think of a more complex example now.)

Besides simplify, we could also have factor, etc. etc.

Kim Morrison (Jan 13 2025 at 11:12):

It doesn't even need #cas. #factor could output a Try this: have .... I guess one could even have a #simp that outputs a Try this.

Anand Rao Tadipatri (Jan 13 2025 at 16:01):

I'd attempted a basic experiment of this sort a while back using PARI-GP: #lean4 > Basic unverified symbolic calculations in Lean4 using PARI @ 💬. My code was more of a toy experiment proof-of-concept than a serious attempt at integrating symbolic algebra with Lean, but I thought I'd share it anyway since it may be relevant to this thread.

Mitchell Lee (Jan 15 2025 at 07:52):

Suppose that instead of thinking of x^2 + (x+1)^2 as an "expression", we were to simply think of it as a polynomial in x (writing it as Polynomial.X ^ 2 + (Polynomial.X + 1) ^ 2), and then have a computable function that factors a polynomial and returns the multiset of irreducible factors. Would this approach fall short of being a "symbolic algebraic system"?

The reason I ask is that Mathematica's strange handling of expressions is often more of a hindrance than a help. (Though at the same time, I see why it might sometimes be useful to be able to write something like (1 + x) ^ 1000000000 and not have the computer immediately try to evaluate it.)

Kim Morrison (Jan 15 2025 at 09:02):

It's complicated, and we need some of both approaches:

  • tactics like ring working on with untyped Expr (not so far from what Mathematica does, although with lots of clever tricks to avoid typeclass resolution when building new expressions)
  • functions that commit to specific computational representations of mathematical objects (e.g. polynomials), and run algorithms on them (these then come with tactic frontends that can reflect Expr representations into the specific representations, i.e. equipped with denote/eval etc functions)

Jason Rute (Jan 15 2025 at 12:14):

Cross posting: https://leanprover.zulipchat.com/#narrow/stream/113488-general/topic/Juniper.20-.20toy.20Rust.20CAS.20with.20Lean.20specifications

Srivatsa Srinivas (Jan 15 2025 at 22:18):

I can think of a problem which does not have a trivial verification procedure given the answer, but a CAS can compute: The definite integral of a function abf(x)dx\int_a^b f(x) \, dx. The CAS would have to create a proof that the definite integral has the particular answer, basically creating a proof by applying the rule it used to transform the integral at each point.

Obviously, symbolic differentiation being easy means that indefinite integrals can be proved using just the answer.

Kim Morrison (Jan 15 2025 at 22:39):

The difference between indefinite and definite integrals is "just" giving a proof of an evaluation of a function. This, I think, is the job of norm_num and its successors.

Michael Stoll (Jan 15 2025 at 22:40):

There are computable definite integrals in cases where there is no closed form indefinite integral, though.

Mario Carneiro (Feb 03 2025 at 14:54):

I think it would be a good idea to have a stream for CAS in lean. There are a lot of ideas and implementations floating around and it would be good to coordinate more

Kim Morrison (Feb 03 2025 at 23:43):

#Computer algebra


Last updated: May 02 2025 at 03:31 UTC