Zulip Chat Archive

Stream: Is there code for X?

Topic: Can automatic algebraic expansion be written in lean 4?


Mr Proof (Jun 02 2024 at 13:14):

Basically, I want an automatic way to prove things like "xyz=zyx" and "(c+d)(a+b) = ac+ad+bc+b*d" without having to do it manually step by step using rewrite rules.

This involves writing expressions in their "normal form".

Some steps involve sorting the terms alphabetically such as writing "yx" as "xy"

Now, I can think of two ways to achieve this:

(1) Write the simplification code directly in Lean
(2) Create/Use a computer algebra system that can do simplification, but which generates Lean 4 code as a certificate for each step.

The way (2) would work is you could use commands like "Simplify" or "Factorise" and it would use whatever algorithm is best to do this, but would then also prove the step by writing a list of Lean tactics.

So for example, if I had an expression "3x+3y+3z", I could call factorise it to get "3(x+y+z)" and then it would also generate the lean 4 rewrite rules for this.

So I guess the question is: Is Lean 4 powerful enough to write computer algebra simplification code in itself or is the best way to use it as a backend for a computer algebra system that verifies each step? And are there any tips for using Lean as a backend?

Yaël Dillies (Jun 02 2024 at 13:14):

Are you looking for the ring tactic?

Mr Proof (Jun 02 2024 at 13:17):

Yaël Dillies said:

Are you looking for the ring tactic?

Possibly, does that simplify expressions into alphabetical form e.g. "dbac = abcd"? I have looked at the ring tactic and it looks pretty good.

I'm also thinking of a way you could write automatic factorisation using complicated algorithms (as is done in things like Mathematica) but which proves the step using Lean. And whether that is theoertically possible to write in Lean or should be written in a separate program but using Lean as a proof checker.

Yaël Dillies (Jun 02 2024 at 13:19):

It simplifies into "alphabetical form" but where the "alphabet" is some secret (to you) list of all possible "ring atoms" (we don't just want to sort variable names, but also things which should be considered constants from the POV of the language of rings, like exp x, log x, π, etc...)

Yaël Dillies (Jun 02 2024 at 13:20):

Mr Proof said:

I'm also thinking of a way you could write automatic factorisation using complicated algorithms (as is done in things like Mathematica) but which proves the step using Lean. And whether that is theoertically possible to write in Lean or should be written in a separate program but using Lean as a proof checker.

That sounds like you should look at the polyrith tactic

Mr Proof (Jun 02 2024 at 13:33):

This is all looks very powerful. It looks almost you can write a full computer algebra software in Lean. (Which is tempting to try and do just by giving Lean a more computer-algebra-type interface).

I suppose my main concern is, is this necessarily the best thing to do? For example a computer algebra software like Mathematica may use lots of tricks and heuristics to factorise a large polynomial which are a black-box and not really important as long as it arrives at the right answer. But once we get the answer, the rewrite rules are simple to calculate.

Thus I want the power of a computer algebra system with the proof checking of Lean and somehow combine the two systems together.

So I'm basically thinking out-loud whether the best course of action is to re-write computer algebra rules in Lean, or write Lean into a computer algebra system for some subset of operations. Either way would generate Lean checkable code.

Yaël Dillies (Jun 02 2024 at 13:35):

This is more or less what polyrith does. I'm really sorry that I can't explain more now (it's exam season over here), but go read this: https://hrmacbeth.github.io/computations_in_lean/02_Using_Polyrith.html

Adam Topaz (Jun 02 2024 at 13:35):

This is all possible to do and there are people in this community who have been thinking about such things for a while.

Mr Proof (Jun 02 2024 at 13:41):

Thanks, I will check this out. If this is the case, I may have a go at making a front-end to Lean that is more like a familiar computer-algebra system which can use these tactics for simplification. (Unless someone has already done this.)
If it is like you say and can already do things like automatic factorisation of polynomials then this would be very useful for even high school students to use.
I think the other main thing a computer algebra system can do is reduce polynomials like the "x^2 + a x + b" to it's solutions. And also draw graphs of equations or geometric diagrams (but that is just extra fluff which is nice for humans to look at but which doesn't correspond to a proof).

Michael Stoll (Jun 02 2024 at 14:11):

The general idea would be a CAS to provide the factorization (say) and then use Lean to check that it is correct (which the ring tactic should be able to do). This approach should work reasonably well when there is a "certificate" available that can be checked reasonably efficiently (this is what polyrith does: an external Gröbner basis computation produces an explicit linear combination of relations, and the linear combination can be checked by the linear_combination tactic).

Dean Young (Jun 02 2024 at 14:21):

@Mr Proof Have you heard of widgets? You could check out the string diagram generator that Yumo Mizuno made.

Ralf Stephan (Jun 02 2024 at 14:58):

Mr Proof said:

Thanks, I will check this out. If this is the case, I may have a go at making a front-end to Lean that is more like a familiar computer-algebra system which can use these tactics for simplification.

Note that you wouldn't need to do this from to scratch even if there were no polyrith. The Sage CAS (sagemath.org) should have code for this.

Mr Proof (Jun 02 2024 at 17:13):

Michael Stoll said:

polyrith

I see. So I see it calls an external python file to do some calculations outside lean. Interesting. So presumably I could also make a tactic that queries a CAS or some other software. That is interesting. I will have to research more about how the interface works.

Mr Proof (Jun 02 2024 at 18:09):

@Ralf Stephan Thanks, I will check out Sage as well. Last time I used Sagemath, I found it quite bulky and only worked in a virtual environment on Windows but it's probably improved a lot since then. I usually use standalone wxMaxima for simple algebraic manipulation. That gives me a few options to explore.

So I'm thinking maybe create a widget that interfaces with maxima/sage that can for example factor an equation, and then exports that back into Lean together with the rewrite rules or polyrith. Or something like that. Probably not much of a time saver other than not having to change the format of the equations from Lean format to a CAS format.

Mind you I don't want to reinvent the wheel so if this is already been done let me know!

Ralf Stephan (Jun 03 2024 at 11:00):

https://github.com/leanprover-community/mathlib4/blob/master/scripts/polyrith_sage.py

Kim Morrison (Jun 04 2024 at 00:58):

If anyone would like to factor out that scripts into a pure sagecell wrapper, and separately the polyrith specific code, that would be very helpful for everyone interested in further sage connections.

Mr Proof (Jun 05 2024 at 02:28):

AFIK Sage uses maxima for doing algebraic computations. Therefor it seems like, a nicer approach would just be to include all the maxima libraries within Lean so you wouldn't need a separate program. Maxima is <50Mb so not very big.
You can use maxima from the command-line like:

maxima --very-quiet -r "display2d:false$factor(x^2+2xy+y^2);quit();"

Kim Morrison (Jun 05 2024 at 03:29):

That's a very big "just"!

Mr Proof (Jun 05 2024 at 03:58):

Nah

Ralf Stephan (Jun 05 2024 at 06:56):

Mr Proof said:

AFIK Sage uses maxima for doing algebraic computations.

That is no longer true for quite a while. As maintainer of Pynac I took part in decoupling Sage from Maxima, especially wrt algebraic manipulations. It would also simply be impossible to serve Sage cells to customers with a slow Maxima-based engine.

Dean Young (Jun 05 2024 at 22:17):

@Mr Proof here's another simple example of an automatic normal-form using rw [].

-- A category C consists of:
structure category.{u₀,v₀} where
  Obj : Type u₀
  Hom : Obj  Obj  Type v₀
  Idn : (X:Obj)  Hom X X
  Cmp : (X:Obj)  (Y:Obj)  (Z:Obj)  (_:Hom X Y)  (_:Hom Y Z)  Hom X Z
  Id₁ : (X:Obj)  (Y:Obj)  (f:Hom X Y) 
    Cmp X Y Y f (Idn Y) = f
  Id₂ : (X:Obj)  (Y:Obj)  (f:Hom X Y) 
    Cmp X X Y (Idn X) f = f
  Ass : (W:Obj)  (X:Obj)  (Y:Obj)  (Z:Obj)  (f:Hom W X)  (g:Hom X Y)  (h:Hom Y Z) 
    (Cmp W X Z) f (Cmp X Y Z g h) = Cmp W Y Z (Cmp W X Y f g) h


-- Notation for the identity map which infers the category:
def identity_map (C : category) (X : C.Obj) := C.Idn X
notation "𝟙_(" C ")" => identity_map C



-- Notation for composition which infers the category and objects:
def composition (C : category) {X : C.Obj} {Y : C.Obj} {Z : C.Obj} (f : C.Hom X Y) (g : C.Hom Y Z) : C.Hom X Z := C.Cmp X Y Z f g
notation g "∘_(" C ")" f => composition C f g


theorem Id₁ {C : category} {X : C.Obj} { Y : C.Obj} {f : C.Hom X Y} :
  (f ∘_(C) (𝟙_(C) X)) = f := C.Id₂ X Y f

theorem Id₂ {C : category} {X Y : C.Obj} {f : C.Hom X Y} :
  (𝟙_(C) Y ∘_(C) f) = f := C.Id₁ X Y f

theorem Ass {C : category} {W X Y Z : C.Obj} {f : C.Hom W X} {g : C.Hom X Y} {h : C.Hom Y Z} :
  ((h ∘_(C) g) ∘_(C) f) = (h ∘_(C) (g ∘_(C) f)) := (C.Ass W X Y Z f g h)


macro "cat" : tactic => `(tactic| repeat (rw [Id₁]) ; repeat (rw [Id₂]) ; repeat (rw [Ass]))

example {C : category}
          {W : C.Obj}
          {X : C.Obj}
          {Y : C.Obj}
          {Z : C.Obj}
          {f : C.Hom W X}
          {g : C.Hom X Y}
          {h : C.Hom Y Z}
          {i : C.Hom Z W}:
     (i ∘_(C) (h ∘_(C) (g ∘_(C) (f ∘_(C) (𝟙_(C) W))) ))
  = ((i ∘_(C)  h) ∘_(C) ((𝟙_(C) Y) ∘_(C) g)) ∘_(C) (f) := by cat

Dean Young (Jun 05 2024 at 23:09):

In the above the `(tactic| ) combines tactics.

Mr Proof (Jun 06 2024 at 00:32):

Ralf Stephan said:

Mr Proof said:

AFIK Sage uses maxima for doing algebraic computations.

That is no longer true for quite a while. As maintainer of Pynac I took part in decoupling Sage from Maxima, especially wrt algebraic manipulations. It would also simply be impossible to serve Sage cells to customers with a slow Maxima-based engine.

Thanks for the info. Maxima slow though? I use wxmaxima a lot. And at least for basic algebraic manipulation I often find it faster than Mathematica. e.g. I can expand and factor $(x+y+z)^10$ almost instantaneously. Which seems good enough, if you need a tactic to factor something rather than the other way round. (Which maybe you never need to IDK). I'm sure Pynac is just as good though and probably makes more sense to use.

@Dean Young Thanks. Now I'm wondering about how fast code written in Lean is. If it is purely interpreted or if it would use a JIT etc for extra speed.

Jireh Loreaux (Jun 06 2024 at 03:09):

Mr Proof said:

Now I'm wondering about how fast code written in Lean is.

Compiled code can be quite fast, if written correctly. But you must understand that Lean is doing things differently than a CAS, because it has to prove them. So, for example, using the ring tactic to should that (x + y + z) ^ 10 is equal to its expansion (which ring won't compute for you), is much slower than something that simply computes the expansion. This is because ring is generating a proof term on the fly, and a CAS is not burdened by such things.

You could write very fast Lean code that computes polynomial expansions, but simultaneously producing a proof of correctness is much harder.

Michael Stoll (Jun 06 2024 at 08:28):

I guess the way to proceed would be to

  1. write fast code that does some computation;
  2. provide a general proof of correctness of this code,

which can then be used for the proof that any given computation is correct without constructing a separate specialized proof each time (which I think is what the ring tactic is doing).

Kim Morrison (Jun 06 2024 at 13:28):

This is classic proof by reflection. Note that for this to work the "fast code" needs to be fast in the kernel --- merely fast in compiled code is not good enough, although reduceBool and reduceNat can (by adding the compiler to the trust base!) can get around this.

Mr Proof (Jun 06 2024 at 20:29):

I suppose also compiling Lean down into some more efficient machine code means it is no longer a simple kernel so that can be trusted... although for day-to-day work that's probably fine. Then you would just use the slower kernel to check the proof at the end. Though it would be devastating if you thought you'd proved an amazing breakthrough proof and it turned out to just be a bug in the compiler and it wasn't proved at all!


Last updated: May 02 2025 at 03:31 UTC