Zulip Chat Archive

Stream: general

Topic: Computer Algebra Systems and Lean


Kaiyu Yang (Aug 21 2023 at 17:44):

Hi,

I'm looking into stuff related to the integration of computer algebra systems (CAS) and Lean. There are already discussions on Zulip and GitHub (e.g., this issue by @Scott Morrison ) on this topic. I'm wondering if there are people currently working on (or interested in), e.g., implementing a general CAS (similar to sympy or Symbolics.jl) in Lean.

I'm aware of works trying to call external CAS from proof assistants, e.g., [1, 2]. Since existing CAS systems do not provide checkable proofs, their results have to be checked independently by Lean. Therefore, this approach seems to work only for problems that are hard to compute but easy to check, e.g., calculating anti-derivatives. You could also just trust the results from CAS and treat them as axioms in Lean. Still, this approach treats the CAS as a black box and might be suboptimal in terms of flexibility for some applications.

There are also attempts to implement CAS in proof assistants such as HOL and Coq ([3, 4]), but in my impression, they are more like research prototypes rather than fully fledged systems.

[1] Harrison, John, and Laurent Théry. "A skeptic's approach to combining HOL and Maple." Journal of Automated Reasoning 21 (1998): 279-294.
[2] Lewis, Robert Y., and Minchao Wu. "A bi-directional extensible interface between Lean and Mathematica." Journal of Automated Reasoning 66.2 (2022): 215-238.
[3] Kaliszyk, Cezary, and Freek Wiedijk. "Certified computer algebra on top of an interactive theorem prover." International Conference on Mathematical Knowledge Management. Berlin, Heidelberg: Springer Berlin Heidelberg, 2007.
[4] Dénès, Maxime, Anders Mörtberg, and Vincent Siles. "A refinement-based approach to computational algebra in Coq." International Conference on Interactive Theorem Proving. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012.

Some additional context: I want to codify the math knowledge related to the symbolic solution of PDEs (e.g., method of characteristics, Lie symmetries, and other techniques in reference books such as "Handbook of Differential Equations"). Existing CAS such as Maple and Mathematica have implemented some of these techniques in their symbolic PDE solver. However, these solvers are black boxes and make extensive use of heuristics to attempt to solve a given PDE automatically. What I want is a transparent solver backed by formalized mathematics (think of SciLean but focusing on symbolic solutions instead of numerical ones). It does not have to be fully automatic. Human experts can tell the system what solution techniques to try, similar to using tactics to prove theorems in Lean. Therefore, I think Lean might be suitable for building such a solver. However, Lean lacks many functionalities of CAS, e.g., calculating derivatives/anti-derivatives, series expansion, and manipulating polynomials, which are commonly used when solving PDEs symbolically.

Andrés Goens (Aug 21 2023 at 18:08):

As an additional data point here: with @Siddharth Bhat and Max Horn, the mail maintainer of GAP, we discussed an integration between Lean+mathlib and GAP. The main issue is that you more or less have to redo all/most of the work: the proofs of correctness of CAS systems are just another subfield of mathematics and just the same, they need to be formalized. In GAP/computational group theory in particular, a lot of algorithms are probabilistic, which makes it even more difficult. In general this is at odds with Mathlib which is not particularly compute-friendly (many definitions are noncomputable). I'd love to see more integration between the two, although PDE solvers are very far away from my area of expertise, so I can't say much more here.

Tomas Skrivan (Aug 21 2023 at 19:54):

When solving PDEs numerically or analytically, you usually reformulate the problem to something simpler and then apply either an analytic solution or apply a numerical method. I think there is a big overlap between SciLean and your goal.

Currently I'm working on symbolic(and automatic) differentiation which would be very useful for you. Also, I did some experiments with symbolic polynomials and at some point I would like to focus on it again.

I would be interested to see a more concrete plan and find a potential overlap with SciLean

Kaiyu Yang (Aug 21 2023 at 23:30):

@Andrés Goens Thank you for sharing your insights on GAP! By "redo all/most of the work", do you mean re-implementing GAP (and its library of algorithms) in Lean?

Verifying existing CAS in Lean sounds like an infeasible task. If the goal is to build a fully verified CAS, I see only two viable approaches (feel free to add more):

  • Implementing the entire system directly in Lean.
  • The system can be implemented in any language. But in addition to results, it should also generate certificates that can be checked by Lean.

We could also start with an unverified version implemented in Lean, with the assumption that we can incrementally verify parts of the system in the future when needed.

Kaiyu Yang (Aug 22 2023 at 00:11):

@Tomas Skrivan Yes, there is indeed overlap with SciLean. The system I envision looks like this: Starting with a PDE to solve, we can apply existing techniques for solving PDEs symbolically, which are implemented as tactics. Some will just fail since they are not applicable, and some may transform the PDE (hopefully simplify it). In the end, maybe we have found a closed-form solution, or we have reduced the PDE to something simpler, such as lower-order PDE, ODEs, or integrations. For the latter case, we can apply existing numerical solvers to the simplified problem, which may produce better results compared to applying directly to the original PDE. I'm less concerned with whether the numerical solver should be implemented in Lean or outsourced.

For a concrete example, the attached image (Evans' PDE book) is a solution to the transport equation.
transport.png

The system should allow us to formalize this solution in a natural way using tactics (maybe that's already possible in SciLean?). It should also be able to handle a much broader class of PDEs that humans have analyzed analytically, e.g., thousands of PDEs in reference books [5,6,7]. By "formalizing the solution", the focus should be capturing the transformation of math formulas (just as in SciLean), instead of verifying everything rigorously down to the underlying theory (mathlib doesn't have much PDE-related theory anyway).

[5] Zwillinger, Daniel, and Vladimir Dobrushkin. Handbook of differential equations. CRC Press, 2021.
[6] Polyanin, Andrei D., Valentin F. Zaitsev, and Alain Moussiaux. Handbook of first-order partial differential equations. CRC Press, 2001.
[7] Polyanin, Andrei D., and Valentin F. Zaitsev. Handbook of nonlinear partial differential equations: exact solutions, methods, and problems. Chapman and Hall/CRC, 2003.

Tomas Skrivan (Aug 22 2023 at 01:25):

There is not much support for things like this in SciLean yet. However, just today I was thinking about adding support for specifying systems of equations and having tactics that would allow you to eliminate variables and similar transformations.

In the past I have done quite some fluid dynamics, so I will try to write down a solution for a flow in pipe and around sphere. This should be quite a nice target, you have to assume that the solution is in particular form based on the symmetry and use cylindrical or spherical coordinates.

Kaiyu Yang (Aug 23 2023 at 15:37):

@Tomas Skrivan Is this something SciLean would like to have eventually (even though it might not be the first priority right now)? Examples in fluid dynamics sound great. Once I was trying to understand the SciLean codebase but found it too difficult by just reading the code without any high-level documentation of its design :(

Kaiyu Yang (Aug 23 2023 at 15:49):

Also, since I'm new to PDEs, I wonder how useful people would consider such as system. One could argue that it's difficult to further simplify important PDEs such as Navier-Stokes, since mathematicians would have done it if such simplification were easy to find?

Tomas Skrivan (Aug 23 2023 at 15:50):

Yes it is definitely something I would like to have as part of SciLean.

I know the doc is really lacking, but in the past half a year I have redesigned the library twice already ... The latest version is way more aligned with mathlib so hopefully I will not do any major overhauls anymore. Soon(ish) I want to write some documentation and welcome people to contribute either new functionality or correctness proofs.

If you want to understand the code, just let me know and I can walk you through it.

Tomas Skrivan (Aug 23 2023 at 15:55):

Kaiyu Yang said:

Also, since I'm new to PDEs, I wonder how useful people would consider such as system. One could argue that it's difficult to further simplify important PDEs such as Navier-Stokes, since mathematicians would have done it if such simplification were easy to find?

I think such a system would be really useful. From a computational perspective, you always want to use the analytical solution if available.

Of course, you can't simplify general Navier-Stokes equation but usually you are solving it in a particular geometry. Then you can impose symmetry requirements on the solution and reduce it into few ODEs.

Kaiyu Yang (Aug 23 2023 at 16:18):

That's great! I'm looking forward to the next milestone of SciLean. Currently I'm trying to understand more about mathlib but will definitely re-visit SciLean after that.

Tyler Josephson ⚛️ (Aug 24 2023 at 02:48):

@John Velkey ⚛️ @David⚛️ @Max Bobbin Check out this discussion.

Rick de Wolf (Aug 26 2023 at 20:35):

Reading this thread reminded me of an idea I had when I first learned about Lean. How about using the library of proofs in mathlib and the wider lean ecosystem to support CAS systems?

I tried using Symbolics.jl a while ago and was surprised that a feature like symbolic integration wasn't implemented except for linear or polynomial functions. (I don't remember which of these two cases was implemented.) This functionality has been implemented in many different CASs, but the Symbolics.jl devs apparently hadn't had time yet to implement a "basic" feature like symbolic integration.

I'm well aware symbolic integration is a difficult problem that takes a lot of time to implement properly, so after some thought my surprise was gone and I felt like the whole CAS development process could be made a lot more efficient. What if many CASs could rely on a library of mathematical knowledge like Mathlib to provide implementations that can be transformed into code for a specific CAS? The idea is to write the algorithms once in Lean, write a "translation layer" between Lean and a specific CAS once, and that would be most of the code that that specific CAS would have to maintain.

I admit this idea is too abstract still, but I'd love to talk about it with some people.

Kaiyu Yang (Aug 28 2023 at 01:05):

@Rick de Wolf I believe many packages in the Julia eco-system are already trying to separate math knowledge from the code that actually manipulates expressions, though the math knowledge is not formalized in a logic system such as Lean. E.g., ChainRules.jl and DiffRule.jl.

Rick de Wolf (Aug 28 2023 at 10:04):

Ah yes I remember this. One of the libraries of the Symbolics.jl project also has a set of rules, but I'm not sure if it's for a specific area like differentiation, off the top of my head. Having a library of rules is a great start if we want to link a Lean-based library to some software, but it's only the first step.

I want to make a quick prototype that replaces this library with one based on Lean, which I imagine could take quite some time already. Then the next step would be to expand and implement more than just differentiation rules. This is the point where things get more complicated, and to be honest I'm not sure what would be the best way forward.

Utensil Song (Aug 28 2023 at 10:06):

For symbolic integration, the approach is likely to convert rules in the Rubi project as SymPy did, and maintain it like *Rules.jl projects. There are similar efforts in Julia, but it seems that none are actively maintained. The conversion process is mostly technical, and subjects to the errors of parsing and generating. Converting them to Lean and verify them in Lean is interesting but non-trivial and mostly tedious, and can possibly be mechanized.

Utensil Song (Aug 28 2023 at 10:06):

Rules that manipulate expressions in CAS can be defined "at will" for the convenience of engineering applications, it only needs to ensure the soundness of the deductions obtained from these rules. It's entirely up to the maintainers of the rules to ensure the rules are correct, which is usually only corroborated by the source of the reference, like a physical evidence that someone has thought this through and there are others who use it in their work. Assuming the feedback loop is not broken, the rules are "likely" to be correct.

Example projects like DiffRule.jl, they merely collect these rules. Their soundness of the deduction is provided in the underlying algorithm, e.g. DiffRule.jl -> Symbolics.jl -> SymbolicUtils.jl -> Metatheory.jl -> egg(in Rust) -> e-graph rewriting (equality saturation, EqSat) , the papers on the last one are Rewrite Rule Inference Using Equality Saturation (OOPSLA 2021) and egg: Fast and extensible equality saturation (POPL 2021), which has an upgraded algorithm egglog that supports more efficient incremental execution, better abstractions in rules, and a lot more. For some earlier discussion in Zulip, check here, by then I was assessing if Lean has its counterpart.

(This level of soundness is not provided in other CAS projects, such as SymPy, which has a mixture of algorithms that have various theoretical foundations.)

Rick de Wolf (Aug 28 2023 at 10:22):

I like the sound of automating the Rubi -> Lean translation process! (As long as it's more efficient than manual translation)

One thing I haven't mentioned yet, is that I want this project to not be a "simple" library of transformation rules, I think it could be much broader in scope than that. If we could also put descriptions of algorithms used by CAS into the library, and automatically let the any CAS generate code for those algorithms, it would be much more powerful. The transformation rules are the easy part, and the algorithms would be the truly challenging part.

I think bringing soundness to CASs is certainly interesting, but that would require more intimately linking whatever programming language some CAS is written in to Lean. It would have to be repeated for all the languages that CASs are written in. It is worth looking into, but I think a lot of value can be extracted from Mathlib even if we don't focus on the soundness in CASs. As far as I know it's the first time we have access to a machine readable library of mathematics, and that opens up a lot of doors to make tools like this.

Tomas Skrivan (Aug 28 2023 at 16:51):

Rick de Wolf said:

If we could also put descriptions of algorithms used by CAS into the library, and automatically let the any CAS generate code for those algorithms, it would be much more powerful. The transformation rules are the easy part, and the algorithms would be the truly challenging part.

I'm quite curious, do you have any more concrete ideas in this direction?

Tomas Skrivan (Aug 28 2023 at 16:52):

Rick de Wolf said:

The idea is to write the algorithms once in Lean, write a "translation layer" between Lean and a specific CAS once, and that would be most of the code that that specific CAS would have to maintain.

If the algorithm is already implemented in Lean, what would be the role of another CAS? Why would you need it at all?

Rick de Wolf (Aug 30 2023 at 10:39):

So to answer your second question first, I think this approach would make building and maintaining CASs a lot more efficient in the long run if a large part of the CAS ecosystem can put in enough effort just once. The basic idea as I see it is similar to the idea behind the Language Server Protocol (LSP) that has made building and maintaining programming tools a lot more efficient.

So I'll give a quick introduction for those who haven't heard about the LSP yet. When you write software there are lots and lots of programming languages and lots and lots of editors and IDEs in which you can write your code. Many programmers use tools for syntax highlighting, debugging, and much more, but every one of those tools needed to be implemented separately for every possible (programming language, editor) pair before LSP came along.

With LSP, however, you write a tool, e.g. syntax highlighting, only once in Python in a way that's compatible with the Language Server Protocol. At the same time, every editor only needs to implement support for LSP only once, but they'll get access to all the different tools that support LSP. If there's anything more than a handful of editors and programming languages out there, the gains can be substantial because you don't have to repeat your effort every time there is a new editor, a new programming language, or a new type of tool.

Long story short, I think the CAS ecosystem can benefit from a similar approach, by only writing down the mathematical knowledge that CASs are based on just once, with a shared protocol. If many CASs are willing to support this protocol, they will only have to support the protocol but they will get access to all of the knowledge in libraries like Mathlib. This way, when people add their knowledge to Mathlib it can automatically become a part of the CAS ecosystem, allowing many people to use it, without a large amount of effort.

To answer your first question, I don't have many concrete ideas for the design of the protocol. I think the most important part of it, if we're thinking about long-term solutions, is to make it as general as possible. In the sense that we shouldn't base it around a fixed set of objects like real numbers and differential equations. Even if a completely new kind of structure is added to e.g. Mathlib it should become available in the upstream CASs.

Aside from the protocol design, it's also important to focus on how the CAS <-> protocol link is designed. This will likely be significantly different for different CASs, based only on the programming language they are written in. For example, Julia supports meta-programming, which would make it easy to write CAS code even while the CAS is already running. For compiled and interpreted languages without meta-programming support you'd likely have to generate the CAS code before launching the CAS, but that shouldn't be too big of an obstacle.

Sorry for the long-winded message, I'm still getting my head wrapped around the whole idea.

Mario Carneiro (Aug 30 2023 at 10:40):

I'm not sure how CAS proofs can be interpreted as a protocol

Mario Carneiro (Aug 30 2023 at 10:42):

what is even the content of this protocol? Who is saying what to whom

Rick de Wolf (Aug 30 2023 at 10:44):

I don't think the proofs should be part of the protocol, just the outputs of the proofs. So the protocol should be able to state theorems basically.

If I were to build a proof of concept I would only focus on Mathlib, in which case the language of the protocol would just be Lean, in a sense. But if we want support for other maths libraries a proper protocol would be needed down the line.

Rick de Wolf (Aug 30 2023 at 10:53):

I didn't answer your second question completely. With this system, many different CASs would be reading from the same maths library(s), which would homogenize them somewhat. However, the maths knowledge is only a part of a CAS (albeit a big part). CASs could put a lot less effort into implementing the latest research, and instead focus on things like more intuitive interfaces and quality of life aspects. Basically, this system would be more of an "under the hood" operation, while the front-facing part of the CASs is still completely free to be chosen however people want it to be.

Tomas Skrivan (Aug 30 2023 at 11:26):

I'm also a bit confused about what you mean by the protocol. To me it sounds that you are advocating for implementing CAS in Lean and then providing bindings to other languages/CASs. This way you would offload all user interface work to other systems and focus on the core computer algebra algorithms in Lean.

Rick de Wolf (Aug 30 2023 at 11:29):

That's pretty much it :)

Rick de Wolf (Aug 30 2023 at 11:30):

I just borrowed the term protocol because of the comparison to LSP. All I mean when I say "protocol" is to have a standardized way for the contents of Mathlib to be read by some CAS.


Last updated: Dec 20 2023 at 11:08 UTC